Here’s something cool in a geeky sort of way. A friend’s project just got /.ed. For those of you who don’t know what that means, an article about it was posted on slashdot.org. As a result, a huge number of people flocked to his site, overloading the server.
However, he had some corrections to make.
And of course they got it all wrong. Heritrix != WayBackMachine.
Heritrix gathers web pages (harvests)
The WayBackMachine gives access to harvested material.
Also Heritrix is a new web crawler meant to replace the one that IA has been using (which is owned by Alexa Internet).
Both the /. post and the linked article say that it’s actually the crawler code that’s being released. The Wayback Machine is actually a separate part, and its code is not being released at this time.
Heh, since I started looking at the article, the title has changed. “Internet Archive Opens Crawler Code Under LGPL” is what it’s called now, which is accurate.