Happy Birthday, Web!

This is a guest post by Abbie Grotke, Library of Congress Web Archiving Team Lead and Co-Chair of the National Digital Stewardship Alliance Content Working Group

Yesterday we celebrated the 25th anniversary of the creation of the World Wide Web.

How many of you can remember the first time you saw a website, clicked on a hyperlink, or actually edited an HTML page? My “first web” story is still pretty fresh in my mind: It was probably around October 1993, in D.C. My brother and his friends were fairly tech savvy (they’d already set me up with an email account). We went over to his friend Miles’s house in Dupont Circle to visit, and while there he excitedly showed us this thing called Mosaic. I remember the gray screen and the strange concept of hyperlinks; my brother remembers seeing a short quicktime movie of a dolphin doing a flip.

We were all really excited.

Screenshot from the Election 2000 Web Archive of gorelieberman.com, captured October 23, 2000.

Screenshot from the Election 2000 Web Archive of gorelieberman.com, captured October 23, 2000.

Flash forward to 2014: Although I vaguely remember life without the web (however did we find out “who that actor is that looks so familiar on the TV right now and what she role she played in that movie that is on the tip of my tongue”?), I certainly can’t imagine a life without it in the future.  I’m in a job, preserving parts of the Internet, which would not exist had it not been for Tim Berners-Lee 25 years ago. For more on the 25th anniversary of the World Wide Web, check out Pew Research Internet Project’s “The Web at 25” .

As evidenced by Pew’s handy timeline of the Web, you can see a lot has changed since the Internet Archive (followed by national libraries), began preserving the web in 1996. If you haven’t seen this other Web Archives timeline, I encourage you to check it out. Since those early days, the number of organizations archiving the web has grown.

The Library of Congress started its own adventure preserving web content in 2000. For an institution that began in 1800, it certainly counts as a small amount of “Library” time. Although we’re not quite sure what our first archived website was (following the lead of our friends at the British Library) the first websites we crawled are from our Election 2000 Web Archive, and include campaign websites from both George E. Bush and Al Gore, among others.

As you can see by the screenshots, and if you click off to those archived pages, certain things didn’t archive very well. Something as simple as images weren’t always captured comprehensively, and the full sites certainly weren’t archived. We’ve spent years, with our partners around the globe, working to make “archival-quality” web archives that include much more than just the text of a site.

We’re all preserving more content than ever, but there are still challenges for those charged with preserving this content to keep up with not only the scale of content being generated, legal issues surrounding preservation of websites, and keeping up with the technologies used on the web (even if we want to preserve it, can we?), as has been discussed before on this blog. We’ve still got a lot of work to do.

Screenshot from the Election 2000 Web Archive of georgewbush.com, captured August 3, 2000.

Screenshot from the Election 2000 Web Archive of georgewbush.com, captured August 3, 2000.

It’s also unclear what researchers of the future will want, how they want to use our archives and access the data that we’ve preserved. More researchers have interest in access to the raw data for data mining projects than we  ever envisioned when we first started out. The International Internet Preservation Consortium has been reaching out at the last few General Assembly sessions to engage researchers during their “open days,” which have been incredibly interesting as we learn more about research use of our archives.

Twenty five years in is as good a time as any to reflect on things, whether it’s the founding of the Web or the efforts to preserve the future web. Please feel free to share your stories and thoughts in the comments.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.