I need a research assistant
Wednesday, August 31st, 2005 03:46 amSome years ago, when we were the rector's pet project, the provost (who was standing in for him while he was in hospital) asked us if there was anything we needed. Somewhat to his surprise, we came out with a list of office improvements (new carpets, paint, pictures on the walls, no rats) rather than the grad assistants he was thinking of giving us. We declined the offer of slaves assistants because we couldn't think of anything for them to do (other than the traditional roles of washing coffee cups and supplying sexual favours, neither of which one is supposed to mention at such meetings).
Now, after a day of scanning, I really wish we'd accepted the offer. In my previous courses, I'd managed to compile my course book largely from online sources, thus minimising the amount of scanning and typing I had to do. My Matrix course book was, not surprisingly, compiled entirely from online sources (with the exception of a page from Baudrillard I typed out). My warriors course only required one chapter of a book to be scanned, and since it was an expensive book with clean white pages, it wasn't too hard. But here I am preparing the book for "Monsters Among Us" with a load of good material that is unfortunately trapped in paper. It is very frustrating to scan a chapter of a book, getting cramp from holding the thing flat on the screen, then find it turns out like this
Now, after a day of scanning, I really wish we'd accepted the offer. In my previous courses, I'd managed to compile my course book largely from online sources, thus minimising the amount of scanning and typing I had to do. My Matrix course book was, not surprisingly, compiled entirely from online sources (with the exception of a page from Baudrillard I typed out). My warriors course only required one chapter of a book to be scanned, and since it was an expensive book with clean white pages, it wasn't too hard. But here I am preparing the book for "Monsters Among Us" with a load of good material that is unfortunately trapped in paper. It is very frustrating to scan a chapter of a book, getting cramp from holding the thing flat on the screen, then find it turns out like this
as the real seducer/predator lurks behind the scenes. Dracula emerges to
away his wives and'insist that the victim belongs to him.
.\ second key scene of transgressive eroticism is Dracula)s rape of Lucy on a
by the sea in Whitby; this occurs rougWy one-third of the way through ilie
1, in chapter.B. This scene) told from Mina)s point of view) first registers her
eness of Dracula)s presence.l3 Awakened in the night) Mina.has the strong
that some~ing is wrong. Suspense builds and time is almost halted as Mina
tempts to rush t4rough the dark landscape) until finally she glimpses an ap+
no subject
Date: 2005-08-31 01:11 am (UTC)seems like there should be an automatic way to clean this up.
for example, a mixed word-backoff and spelling-correction system. Surely the correction from t4rough to through should be automatable...
no subject
Date: 2005-08-31 01:23 am (UTC)I'm tempted to build something for you while I'm bored at my internship.
no subject
Date: 2005-08-31 02:33 pm (UTC)For instance take this OCR'd article about the Marvel Comics History of Atlantis, which looks fine at first glance but includes correctly spelled but incorrect information such as that Atlantis was damaged by "explosive debt charges". Obviously the Atlanteans need to transfer their balances.
Also I must say I found the moment when Mina glimpsed her first ap+ to be extremely transgressive.
no subject
Date: 2005-08-31 03:22 pm (UTC)But even with using a word-processor spellchecker (and I wouldn't recommend that) the text would be cleaner than what
I was suggesting something more sophisticated: using the context in a statistical chooser environment. Train a backed-off n-gram window over a megaword or so and then I'm sure that "roughly" will get *much* better probability measures than "rougWy". One might set up a blended model that combines (weighted?) edit distance with word probability and see how small you could set the threshold for improving the overall model probability. If the threshold was high enough, you'd prevent any "false corrections" and still get a much better transcript in the end.
no subject
Date: 2005-08-31 10:34 pm (UTC)no subject
Date: 2005-09-01 10:40 pm (UTC)"modem man had been hunting on his own for Some 40,000 years or so"
Yeah - if he'd had ADSL, it wouldn't have taken half as long.
no subject
Date: 2005-08-31 05:48 pm (UTC)We stayed in Whitby last week. Didn't see a mirror the whole time we were there.
no subject
Date: 2005-09-01 04:43 am (UTC)no subject
Date: 2005-09-01 10:54 pm (UTC)no subject
Date: 2005-09-01 04:45 am (UTC)no subject
Date: 2005-09-01 10:37 pm (UTC)no subject
Date: 2005-09-01 05:11 am (UTC)no subject
Date: 2005-09-02 03:43 pm (UTC)no subject
Date: 2005-09-02 05:02 pm (UTC)