This is a guest post by Abbie Grotke, Web Archiving Team Lead at the Library of Congress.
Some of you may have heard the recent news that The National Biological Information Infrastructure (NBII) program, including their website, was to be terminated on January 15 due to a loss of funding.
This impending loss grabbed the attention of not only researchers and science and biology librarians, but web archivists as well. We don’t often
get outright announcements that sites are going to be shut down (they tend to just disappear one day); sometimes a site threatens to disappear after the company shuts down but remains as an odd ghost of times past.
In the case of NBII.gov, they had announced on their blog in June 2011 that they would discontinue in 2012. It was only in recent weeks that some of our partners and staff at Library of Congress got in touch. “Did anyone archive this site? Should we archive this site?” questions started hitting my inbox. Usually when we get these sorts of questions, we look to see if we are already archiving the site in our own collections. If we aren’t here at LC, we cast a wider net to see if other partners are. This goes for entire collections too. We might ask: “Is anyone archiving sites regarding the Japanese Earthquake?”
I won’t go into questions of value and whether we should archive the site on this blog; I’ll leave that to the subject experts. For me, the more interesting question was (after determining that it had been archived) whether the archived version, given our current crawling technology, will satisfy the researchers and librarians who did find value in NBII.gov.
So did anyone archive it?
Well, yes. As best the crawler was able to do.
We’re in the midst of planning for “End of Term” 2012-2013, and had run a related crawl this Fall; someone on our team had learned that it was going away in October and we saw that the site was included in that list and archived. Digging around more recently in our previous “End of Term” archive (we’re working on the public access piece and hope to have it launched soon), I saw that we had archived it in 2008.
A few weeks ago, I learned that talk of this site’s shutdown was going around a number of listservs that our subject specialists were following. Our colleagues asked if we’d archived it, and as we relayed the above information, we decided to go ahead and archive it locally just to have another copy before it disappeared. We also archived the site’s blog in our crawl this week, which was not included in the recent End of Term crawl since it is on a blogspot URL rather than a .gov URL.
In an “End of Term” project meeting last week, we were also made aware that with the recent news, some of our partners have harvested the site for their own archives. Internet Archive also has copies in their general archive, and more recent captures have been done but aren’t viewable yet.
Will the Researchers and Librarians be Satisfied?
Not so sure. Experiencing the site in an archive will not be the same as the
site once was. As we describe in our FAQ, we try to get as much of a site we can, and retain functionality as much as possible, but many things do not function the same as on a live site. Anything requiring input to bring up content won’t function in the archive. Using an example from an earlier Internet Archive capture, the crawler can’t get beyond input boxes so things like this requiring a search to retrieve information won’t work in the archive. Interactive maps also won’t function in the archive. Some of the content on other sites sponsored by NBII funding may be disappearing as well; however many of those are .edu or .org, and those were out of scope for our .gov collection.
Whether or not any of the archived copies will satisfy those who once had access to nbii.gov is up to them to answer ultimately, I suppose. But with the site gone, it has raised awareness about the fleeting nature of the web and brought attention to web archiving initiatives, which can only be good for our digital preservation community.
Knowing what we do about crawling technology, do you think an archive copy of your favorite website would serve as a “good enough” replacement?