] Heritrix is the Internet Archive's open-source,
] extensible, web-scale, archival-quality web crawler
] Heritrix (sometimes spelled heretrix , or misspelled or
] missaid as heratrix / heritix / heretix / heratix ) is an
] archaic word for inheritess . Since our crawler seeks to
] collect the digital artifacts of our culture for the
] benefit of future researchers and generations, this name
] seemed apt.
The odds just went up greatly that MemeStreams will keep a cache and revision record of every page that get's meme'd. (Add that to the list of everything else we have promised..)
I have not had a chance to look at this in depth yet, it just hit my radar. (via BoingBoing)
OSS'ing this was a great move. I was thinking about trying to get a part-time job working down at the Internet Archive. I'm a big supporter of everything they are doing over there..
Heritrix - Home Page - Archive.org open sources crawler