Here’s something cool in a geeky sort of way. A friend’s project just got /.ed. For those of you who don’t know what that means, an article about it was posted on slashdot.org. As a result, a huge number of people flocked to his site, overloading the server.
Internet Archive Opens Wayback Code Under LGPL
However, he had some corrections to make.
And of course they got it all wrong. Heritrix != WayBackMachine.
Heritrix gathers web pages (harvests)
The WayBackMachine gives access to harvested material.Also Heritrix is a new web crawler meant to replace the one that IA has been using (which is owned by Alexa Internet).
Both the /. post and the linked article say that it’s actually the crawler code that’s being released. The Wayback Machine is actually a separate part, and its code is not being released at this time.
Heh, since I started looking at the article, the title has changed. “Internet Archive Opens Crawler Code Under LGPL” is what it’s called now, which is accurate.