A Tale of a Disappearing Website

This is a guest post by Abbie Grotke, Web Archiving Team Lead at the Library of Congress.

Some of you may have heard the recent news that The National Biological Information Infrastructure (NBII) program, including their website, was to be terminated on January 15 due to a loss of funding.

This impending loss grabbed the attention of not only researchers and science and biology librarians, but web archivists as well. We don’t often

Announcement of NBII shutdown

get outright announcements that sites are going to be shut down (they tend to just disappear one day); sometimes a site threatens to disappear after the company shuts down but remains as an odd ghost of times past.

In the case of NBII.gov, they had announced on their blog in June 2011 that they would discontinue in 2012. It was only in recent weeks that some of our partners and staff at Library of Congress got in touch. “Did anyone archive this site? Should we archive this site?” questions started hitting my inbox. Usually when we get these sorts of questions, we look to see if we are already archiving the site in our own collections. If we aren’t here at LC, we cast a wider net to see if other partners are. This goes for entire collections too. We might ask: “Is anyone archiving sites regarding the Japanese Earthquake?”

I won’t go into questions of value and whether we should archive the site on this blog; I’ll leave that to the subject experts. For me, the more interesting question was (after determining that it had been archived) whether the archived version, given our current crawling technology, will satisfy the researchers and librarians who did find value in NBII.gov.

So did anyone archive it?

Well, yes. As best the crawler was able to do.

We’re in the midst of planning for “End of Term” 2012-2013, and had run a related crawl this Fall; someone on our team had learned that it was going away in October and we saw that the site was included in that list and archived. Digging around more recently in our previous “End of Term” archive (we’re working on the public access piece and hope to have it launched soon), I saw that we had archived it in 2008.

A few weeks ago, I learned that talk of this site’s shutdown was going around a number of listservs that our subject specialists were following. Our colleagues asked if we’d archived it, and as we relayed the above information, we decided to go ahead and archive it locally just to have another copy before it disappeared. We also archived the site’s blog in our crawl this week, which was not included in the recent End of Term crawl since it is on a blogspot URL rather than a .gov URL.

In an “End of Term” project meeting last week, we were also made aware that with the recent news, some of our partners have harvested the site for their own archives. Internet Archive also has copies in their general archive, and more recent captures have been done but aren’t viewable yet.

Will the Researchers and Librarians be Satisfied?

Not so sure. Experiencing the site in an archive will not be the same as the

National Biological Information Infrastructure Web site

site once was. As we describe in our FAQ, we try to get as much of a site we can, and retain functionality as much as possible, but many things do not function the same as on a live site. Anything requiring input to bring up content won’t function in the archive. Using an example from an earlier Internet Archive capture, the crawler can’t get beyond input boxes so things like this requiring a search to retrieve information won’t work in the archive. Interactive maps also won’t function in the archive. Some of the content on other sites sponsored by NBII funding may be disappearing as well; however many of those are .edu or .org, and those were out of scope for our .gov collection.

Whether or not any of the archived copies will satisfy those who once had access to nbii.gov is up to them to answer ultimately, I suppose. But with the site gone, it has raised awareness about the fleeting nature of the web and brought attention to web archiving initiatives, which can only be good for our digital preservation community.

Knowing what we do about crawling technology, do you think an archive copy of your favorite website would serve as a “good enough” replacement?

One Comment

  1. Scott Sheppard
    January 26, 2012 at 3:36 pm

    Honestly, nothing is as good as having the original up and running. That said though, I feel that so long as the information is all saved for later reference, then it’s met the basic requirements of archiving.

    It will not be the same, but at least the information is safe.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.