Here’s something cool in a geeky sort of way. A f…

Here’s something cool in a geeky sort of way. A friend’s project just got /.ed. For those of you who don’t know what that means, an article about it was posted on As a result, a huge number of people flocked to his site, overloading the server.

Internet Archive Opens Wayback Code Under LGPL

However, he had some corrections to make.

And of course they got it all wrong. Heritrix != WayBackMachine.

Heritrix gathers web pages (harvests)
The WayBackMachine gives access to harvested material.

Also Heritrix is a new web crawler meant to replace the one that IA has been using (which is owned by Alexa Internet).

Both the /. post and the linked article say that it’s actually the crawler code that’s being released. The Wayback Machine is actually a separate part, and its code is not being released at this time.

Heh, since I started looking at the article, the title has changed. “Internet Archive Opens Crawler Code Under LGPL” is what it’s called now, which is accurate.

Leave a Reply

Your email address will not be published. Required fields are marked *

Note: This post is over 5 years old. You may want to check later in this blog to see if there is new information relevant to your comment.