The following is a guest post by Jefferson Bailey, Fellow at the Library of Congress’s Office of Strategic Initiatives.
The Insights Interview series is an occasional feature sharing interviews and conversations between National Digital Stewardship Alliance Innovation Working Group members and individuals involved in projects related to preservation, access, and stewardship of digital information. In this installment, Gary Wright of NDSA member Corporation of the President of The Church of Jesus Christ of Latter-day Saints interviews Jason Pierson, Senior Product Manager of Digital Preservation for FamilySearch International.
Jason, what is FamilySearch all about?
Since 1894, we have been working in close partnership with records custodians around the world to preserve the heritage of mankind. Today, FamilySearch International is the world’s largest family history organization. You can learn more about us at http://www.familysearch.org
That sounds very ambitious. Can you tell us more?
Operated by the Family History Department of The Church of Jesus Christ of Latter-day Saints, FamilySearch provides family history services free of charge to the public through www.familysearch.org or in one of our 4,600 family history centers around the world. Millions of people use FamilySearch records, resources, and services to learn about their family history.
Comprising more than 2.4 million rolls, the FamilySearch collection of microfilmed records is the largest collection of family history records in the world. The collection contains more than 13.1 billion images of historical and vital records collected in more than 100 countries.
Since 1999, FamilySearch has been digitizing this enormous microfilm collection in order to make the records more accessible via the Internet. Also, tens of millions of additional records are being photographed around the world with digital cameras every year. If authorized, these records (both digitized and digitally photographed) are published on the FamilySearch website as they become available.
Because of the microfilm digitization pipeline and ongoing digital capture of additional family history records, FamilySearch is generating multiple petabytes of data each year. All this digital information must be preserved for future generations because of its priceless and enduring value. Within ten years, FamilySearch expects that it will have generated a cumulativearchival capacity of more than 100 petabytes.
What implications does the FamilySearch purpose have for preservation, access, and discovery of digital objects?
Our digital preservation system is focused on each of these three areas. We are looking for cutting edge thinking, as well as emerging best practices around preservation of and access to digital objects that have complex relationships. Our system is one of the few in the world that is preserving petabytes of information on magnetic tape designed to take advantage of new storage technologies, as major advances change the cost/density/value curve. We attend conferences worldwide and share our architecture, design considerations, and technology choices in an effort to advance the entire field.
What is the most interesting aspect of your digital preservation work?
People are often astonished at the volume of information we process—10 to 15 terabytes of record images every day as we scan our microfilm collection and digitally capture and transcribe new records around the world. We currently have 4 petabytes of high resolution images stored in automated tape libraries.
What are your most pressing challenges?
Thinking ahead 50 to 100 years and trying to ensure that we hand this precious archive over to the next generation in pristine condition. We want those not yet born to have free and easy access to their family history information.
How do you think your challenges might change in the future?
The core challenges will not fundamentally change. However, the pace of change will likely increase, as will the volume of information being captured and preserved. The scalability challenges we are solving today may not work in 10, 20, or 50 years. Cost is another challenge that will continue to be an issue. As more data is preserved, the cost to maintain that data, to transform file formats, and to refresh storage media will only go up. Future generations will have to make decisions about what to preserve and how to preserve it.
Do you use cloud services for digital preservation?
We are building internally managed cloud services to support our computing environments.
What are the most important things we can learn from FamilySearch?
FamilySearch is pioneering massive scalability using nearline storage and system components that are designed to be upgraded over time as technology evolves. Our digital preservation system will provide a path for others to follow, and from which they can learn as their own digital preservation demands increase. We hope to help them better understand scalability and longevity considerations
How does FamilySearch contribute to other initiatives besides digital preservation?
We are heavily involved in imaging standards and quality, and in genealogy-related technology and concepts. Working closely with our system providers, we are influencing their plans for addressing our scalability needs.
How does FamilySearch contribute to digital preservation innovation?
We stay active on many fronts to drive innovation. We have internal architects and engineers who tackle many challenges. We partner with commercial vendors and ask them to help us solve problems. We also work in the open source space and with community organizations to share information and innovations.
Based on your experience, what kinds of work would you like to see other organizations tackle?
“Sheer preservation” is a term that refers to implementing nearly invisible tactics throughout the lifecycle of information. We believe it is important for all organizations involved with information to add features to their products and processes that take digital preservation into consideration—especially at information creation. Digital preservation needs to be brought to light in all industries, all technologies, and in all relevant communities so that adequate considerations can be inserted at the point of least cost. Digital preservation must become the concern of all—not just the archivists at the end of the information food-chain.