robinturner: (Default)
[personal profile] robinturner
Well this is puzzling. I'm working on a quick dirty Perl hack for downloading my journal (comments and all). It goes like this:

#!/usr/bin/perl
use LWP::Simple;
for ($count=579; $count<600; $count++) {
$head="http://www.livejournal.com/talkread.bml?journal=solri&itemid=";
$url=$head . $count;
$content = get($url);
print "$content \n";
}

Of course the last bit will be changed to append to a file, rather than fill the terminal with HTML. The problem with this method is that most itemids aren't used (so you download zillions of error pages), and I can't see a pattern for the one's which are used. I mean, can anyone see anything meaningful in this sequence?

76946
77116
77555
77741

OK, the numbers get bigger, but that's not much help. Of course I could include a search string for "No such entry" and not print that to the file, but I'd still waste time downloading a few hundred error messages for each journal entry.

Date: 2002-12-06 04:24 pm (UTC)
From: [identity profile] solri.livejournal.com
Oh yes, so it does. This is worth studying, since I'm largely doing this exercise to improve my Perl skills, which were never up to much and are now rustier than my car.

If I can get this written, then I can progress to the fun bit, which is to convert the downloaded entries to LaTeX. Of course I could just use latex2html, but then I'd still have to write something to strip out the stuff I don't want, and besides, the HTML involved in so basic it shouldn't be too hard to convert.

Nice userpic, BTW.

Profile

robinturner: (Default)
Robin Turner

June 2014

M T W T F S S
      1
2345678
9101112131415
16171819202122
232425 26272829
30      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags