Five Questions for the Smithsonian Institution Archives’ Lynda Schmitz Fuhrig

The following is a guest post from Michael Neubert, a supervisory digital projects specialist at the Library of Congress.

In February of this year I wrote a post here about an collaborative effort of representatives of the National Archives and Records Administration (NARA), the Government Publishing Office (GPO), and the Library of Congress to work together in various ways on archiving of federal government agency websites – Introducing the Federal Web Archiving Working Group.

Smithsonian Institution Building, 1000 Jefferson Drive, between Ninth & Twelfth Streets, Southwest, Washington, District of Columbia, DC.  1968.  Library of Congress.

Smithsonian Institution Building, 1000 Jefferson Drive, between Ninth & Twelfth Streets, Southwest, Washington, District of Columbia, DC. 1968. Library of Congress.

Since that time we have expanded the participation from NARA, GPO, and the Library to include some additional federal agencies that are more heavily focused on harvesting of their own agency sites and less on harvesting the sites of other agencies to include the National Library of Medicine, the Smithsonian Institution, and the Department of Health and Human Services. We plan to reach out to more soon. We have realized we have things we can learn from one another about web archiving and federal sites because of the relative newness of this activity in what is a small community of interested staff and managers at federal agencies with this shared interest.

Lynda Schmitz Fuhrig, electronic records archivist, is the representative to the Federal Web Archiving Working Group from the Smithsonian Institution Archives (SIA). SIA “captures, preserves, and makes available to the public the history of this extraordinary Institution. From its inception in 1846 to the present, the records of the history of the Institution—its people, its programs, its research, and its stories—have been gathered, organized, and disseminated so that everyone can learn about the Smithsonian. The history of the Smithsonian is a vital part of American history, of scientific exploration, and of international cultural understanding.” Since the late 1990s this has included archiving of the websites and social media presence for Smithsonian’s various museums, research centers, and offices.

Michael: Why does the Smithsonian Institution archive its own sites? What is your process?

Lynda: As the official recordkeeper of the Smithsonian, we document what the Institution does in terms of exhibits and program planning, construction of buildings, and many other aspects. Our websites and social media accounts also serve as the public face of the Smithsonian. Many of them contain significant content of historical and research value that is now not found elsewhere. These are considered records of the Institution. It is also interesting to see how websites evolve over time. It would irresponsible of us as an archives to only rely upon other organizations to archive our websites.

We use the web crawling service from Archive-It to capture most of these sites. In addition to Archive-It hosting our web archives, we also retain copies of the files in our collections. We use some other tools to capture specific tweets or hashtags or sites that are a little more challenging due to how they are constructed and the dynamic nature of social media content.

In terms of public-facing websites, we try to capture them every 12 to 18 months. It is more frequent if a redesign is happening, and the archiving will happen before and after the update/refresh. An archivist appraises the content on the social media sites to determine if it has been replicated and captured elsewhere in some instances. For example, a museum’s postings on Facebook and Twitter could be similar and don’t require frequent captures. We now have more than 400 websites and blogs and more than 600 social media accounts that include Twitter, Facebook, Instagram, and YouTube across the Institution.

Michael: You’ve been participating in the Federal Web Archiving Working Group since June 2015. What did you hope to learn or accomplish with this group and how is it going so far?

Lynda: I am hoping to learn from my colleagues about their experiences and challenges, as well as other tools or approaches they are implementing at their agencies regarding web archiving. It has been interesting to hear about the various collecting missions or directives at other government agencies.

Michael: When you talk to colleagues or managers at the Smithsonian Institution about web archiving, what is the reaction? How do they see the benefit of this activity?

Lynda: Many do understand the value of it since we reach more people globally via the web than visitors coming to our museums physically. Our websites and social media accounts do indeed document the history of the Institution. Many webmasters know it is important to contact us when they are getting ready to retire a website so we can get a capture and/or retrieve the actual files from a content management system. We also have made various presentations at the Institution about web archiving.

Michael: I can imagine someone suggesting that since the Smithsonian must “back up” its web servers that it seems redundant to archive the websites. How would you explain the difference?

Image of the Smithsonian Institution’s 1995 homepage. Credit: Smithsonian Institution Archives

Image of the Smithsonian Institution’s 1995 homepage. Credit: Smithsonian Institution Archives

Lynda: It is true that we back up our network servers at the Smithsonian, but backing up is the not the same as archiving. By crawling sites we deem appropriate, we have a snapshot in time of the look and feel of a website. Backups serve the purpose of having duplicate files to rely upon due to disaster or failure. Backups typically are only saved for a certain time period. The website archiving we do is kept permanently. If I wanted to see from Oct. 9, 2012, there is a good chance the backup tape no longer exists but if I crawled that site that day I will have those files.

Michael: We have talked about my next question before: what is your view on whether it makes sense to use web archiving to make complete copies of cultural heritage presentation sites, including the records and displays of digitized collection items?

Lynda: Our approach has been to exclude as much collection objects/images from our crawls of the museum websites, as per Smithsonian Institution Archives policy. Of course, there are items that do get crawled because of the nature of the sites and we usually have the main collections page. Physical collection items fall under the unit responsible for them and they are something that we would never accession in the Archives.

Personally, I have mixed feelings about this since it is not a “complete” website capture then, especially since the images themselves are only representations online and not the actual object.

We do crawl exhibit websites that contain collection objects though.

This is something that researchers need to be aware of when using web archives. Typically, many website captures are not going to have everything either because of excluded content, blocked content, or dynamic content such as Flash elements or calendars that are generated by databases. Capturing the web is not perfect.

Viewshare Supports Critical Thinking in the Classroom

This year I had the pleasure of meeting Dr. Peggy Spitzer Christoff, lecturer in Asian and Asian American Studies at Stony Brook University. She shared with me how she’s using the Library of Congress’ Viewshare tool to engage her students in an introduction to Asia Studies course. Peg talked about using digital platforms as a way to improve writing, […]

The Personal Digital Archiving 2015 Conference

The annual Personal Digital Archiving conference is about preserving any digital collection that falls outside the purview of large cultural institutions. Considering the expanding range of interests at each subsequent PDA conference, the meaning of the word “personal” has become thinly stretched to cover topics such as family history, community history, genealogy and digital humanities. New York […]

How to Participate in the September 2015 NDSA New England Regional Meeting

The following is a guest post by Kevin Powell, digital preservation librarian at Brown University. On September 25th, UMass Dartmouth will host the National Digital Stewardship Alliance New England Regional Meeting with Brown University. We enthusiastically encourage librarians, archivists, preservation specialists, knowledge managers, and anyone else with an interest in digital stewardship and preservation to […]

We Welcome Our Email Overlords: Highlights from the Archiving Email Symposium

This post is co-authored with Erin Engle, a Digital Archivist in the Office of Strategic Initiatives. Despite the occasional death knell claims, email is alive, well and exponentially thriving in many organizations. It’s become an increasingly complex challenge for collecting and memory institutions as we struggle with the same issues: How is email processed differently […]

Dodge that Memory Hole: Saving Digital News

Newspapers are some of the most-used collections at libraries. They have been carefully selected and preserved and represent what is often referred to as “the first draft of history.” Digitized historical newspapers provide broad and rich access to a community’s past, enabling new kinds of inquiry and research. However, these kinds of resources are at […]

Checking in with NGAC and the National Spatial Data Infrastructure

Several times a year I attend meetings of the National Geospatial Advisory Committee, a federal advisory committee that reports to the chair of the Federal Geographic Data Committee. The NGAC pulls together participants from across academia, the private sector and all levels of government to advise the Federal government on geospatial policy and ways to […]

Digital Preservation in Mid-Michigan: An Interview with Ed Busch

Conferences, meetings and meet-ups are important networking and collaboration events that allow librarians and archivists to share digital stewardship experiences. While national conferences and meetings offer strong professional development opportunities, regional and local meetings offer opportunities for practitioners to connect and network with a local community of practice. In a previous blog post, Kim Schroeder, […]

Dodging the Memory Hole: Collaborations to Save the News

The news is often called the “first draft of history” and preserved newspapers are some of the most used collections in libraries. The Internet and other digital technologies have altered the news landscape. There have been numerous stories about the demise of the newspaper and disruption at traditional media outlets. We’ve seen more than a […]

NDSA New England Regional Meeting Recap

The following is a guest post by Meghan Banach Bergin, Bibliographic Access and Metadata Coordinator, University of Massachusetts Amherst Libraries. On October 30th, the second New England Regional National Digital Stewardship Alliance (NE NDSA) meeting was held at the University of Massachusetts Amherst Libraries.  The meeting was generously sponsored by the Five Colleges Digital Preservation […]