The Why and What of Web Archives

For someone who thinks about web archiving almost every day it’s sometimes hard to explain to people outside the digital library community why archiving web sites is worth doing. “They archive themselves,” some say. “Why would you want to save what’s on the Internet?” they wonder. Instead of launching into explanations about cultural heritage, dynamic publishing streams and comprehensive collection policies, I can now point to recent and fun examples of why we should be archiving the web and what it looks like to archive the web.


NPR’s Weekend Edition Sunday ran a story about a project called which is a perfect example of why preserving websites is important. URLs are often given in citations and bibliographies to direct readers and researchers to source materials. The project address the problem of “link-rot,” or broken links that show up in the citations of legal articles and arguments.

We’ve all come across a 404 error or a URL that doesn’t exist anymore.  The people behind studied the problem and found over 70% of the links used in the citations of a sample of legal journals (published between 1999-2011) and 50% of the links cited in Supreme Court opinions are now dead or go to the wrong place. This link rot puts the basic information supporting our legal system at risk.

To address this problem works with law libraries and law authors to build a system where authors can create links to archived versions of their journals. There are a number of other  projects and services working on this problem as well. is a recent addition to the scene and it provides clear evidence that the web does not archive itself.  Librarians, archivists and researchers need to take action to ensure these resources are fully available in the future.


Space Jams website

Space Jams website

The NBA playoffs are a good excuse to bring in this next example, which originally went viral in late 2010Space Jams is a 1996 Warner Brothers movie starring cartoon characters and 1990s basketball stars like Michael Jordan and Charles Barkley. In a conversation about web archiving a friend mentioned that this movie’s website is still in its “original format.” And indeed, the screen capture here is from the live website for the movie and is identical to the website captured by the Internet Archive in November of 2003, when the website was first saved.

I can’t say why this website continues to exist but it is unique. Other popular movies from 1996 such as Independence Day, Scream and Fargo have no trace of a website–one may never have existed. The live website for Space Jams is not an example of a web archive but it’s a good example of what web archives are filled with.

Other sites from this era are only available in web archives and this site’s surprising existence points to how digital content created in new forms and formats are often at-risk.

One Comment

  1. Greg Bem
    May 1, 2014 at 3:32 am

    Space Jam, not Space Jams.

    Good post, though. Nostalgia exists even in our age of the instantaneous and overload. It just takes a little more effort to find.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.