The following is a guest post by Jefferson Bailey, Fellow at the Library of Congress’s Office of Strategic Initiatives.
The Insights Interview series is an occasional feature sharing interviews and conversations between National Digital Stewardship Alliance Innovation Working Group members and individuals involved in projects related to preservation, access, and stewardship of digital information. In this installment, Gary Wright of NDSA member Corporation of the President of The Church of Jesus Christ of Latter-day Saints interviews Jason Pierson, Senior Product Manager of Digital Preservation for FamilySearch International.
Jason, what is FamilySearch all about?
Since 1894, we have been working in close partnership with records custodians around the world to preserve the heritage of mankind. Today, FamilySearch International is the world’s largest family history organization. You can learn more about us at http://www.familysearch.org
That sounds very ambitious. Can you tell us more?
Operated by the Family History Department of The Church of Jesus Christ of Latter-day Saints, FamilySearch provides family history services free of charge to the public through www.familysearch.org or in one of our 4,600 family history centers around the world. Millions of people use FamilySearch records, resources, and services to learn about their family history.
Comprising more than 2.4 million rolls, the FamilySearch collection of microfilmed records is the largest collection of family history records in the world. The collection contains more than 13.1 billion images of historical and vital records collected in more than 100 countries.
Since 1999, FamilySearch has been digitizing this enormous microfilm collection in order to make the records more accessible via the Internet. Also, tens of millions of additional records are being photographed around the world with digital cameras every year. If authorized, these records (both digitized and digitally photographed) are published on the FamilySearch website as they become available.
Because of the microfilm digitization pipeline and ongoing digital capture of additional family history records, FamilySearch is generating multiple petabytes of data each year. All this digital information must be preserved for future generations because of its priceless and enduring value. Within ten years, FamilySearch expects that it will have generated a cumulativearchival capacity of more than 100 petabytes.
What implications does the FamilySearch purpose have for preservation, access, and discovery of digital objects?
Our digital preservation system is focused on each of these three areas. We are looking for cutting edge thinking, as well as emerging best practices around preservation of and access to digital objects that have complex relationships. Our system is one of the few in the world that is preserving petabytes of information on magnetic tape designed to take advantage of new storage technologies, as major advances change the cost/density/value curve. We attend conferences worldwide and share our architecture, design considerations, and technology choices in an effort to advance the entire field.
What is the most interesting aspect of your digital preservation work?
People are often astonished at the volume of information we process—10 to 15 terabytes of record images every day as we scan our microfilm collection and digitally capture and transcribe new records around the world. We currently have 4 petabytes of high resolution images stored in automated tape libraries.
What are your most pressing challenges?
Thinking ahead 50 to 100 years and trying to ensure that we hand this precious archive over to the next generation in pristine condition. We want those not yet born to have free and easy access to their family history information.
How do you think your challenges might change in the future?
The core challenges will not fundamentally change. However, the pace of change will likely increase, as will the volume of information being captured and preserved. The scalability challenges we are solving today may not work in 10, 20, or 50 years. Cost is another challenge that will continue to be an issue. As more data is preserved, the cost to maintain that data, to transform file formats, and to refresh storage media will only go up. Future generations will have to make decisions about what to preserve and how to preserve it.
Do you use cloud services for digital preservation?
We are building internally managed cloud services to support our computing environments.
What are the most important things we can learn from FamilySearch?
FamilySearch is pioneering massive scalability using nearline storage and system components that are designed to be upgraded over time as technology evolves. Our digital preservation system will provide a path for others to follow, and from which they can learn as their own digital preservation demands increase. We hope to help them better understand scalability and longevity considerations
How does FamilySearch contribute to other initiatives besides digital preservation?
We are heavily involved in imaging standards and quality, and in genealogy-related technology and concepts. Working closely with our system providers, we are influencing their plans for addressing our scalability needs.
How does FamilySearch contribute to digital preservation innovation?
We stay active on many fronts to drive innovation. We have internal architects and engineers who tackle many challenges. We partner with commercial vendors and ask them to help us solve problems. We also work in the open source space and with community organizations to share information and innovations.
Based on your experience, what kinds of work would you like to see other organizations tackle?
“Sheer preservation” is a term that refers to implementing nearly invisible tactics throughout the lifecycle of information. We believe it is important for all organizations involved with information to add features to their products and processes that take digital preservation into consideration—especially at information creation. Digital preservation needs to be brought to light in all industries, all technologies, and in all relevant communities so that adequate considerations can be inserted at the point of least cost. Digital preservation must become the concern of all—not just the archivists at the end of the information food-chain.
Comments (5)
Want to know more, but link to familysearch.org in this article gives me an error.
Thanks.
Carol
Hi Carol,
Thanks, we just corrected the bad link, it should be working now.
Now this is most intriguing… I am curious though, what sort of business model do you use? This sounds like its a very costly exercise; just do you fund your activities?
The eternal nature of families is core to our doctrinal beliefs and underpins our desire to learn about our ancestors. FamilySearch is funded through donations by members of The Church of Jesus Christ of Latter-day Saints. Our family history resources are available to all, regardless of religious affiliation, made possible by our long-standing relationships with archivists, governments, churches, parishes and other record custodians around the world. It is a labor of love, which extends far beyond our own members, and includes volunteers worldwide who index image records to make them searchable and accessible.
What an amazing platform and great mission for a non-profit organization. The vision of an interconnected world family tree akin to other massively distributed knowledge projects (think Wikipedia) really is a world changing and noble undertaking in my view.
For a number of years I had been working on private family tree record keeping, and as soon as I discovered FamilySearch.org, I switched to using it only – the benefits of working with other distant relatives became immediately obvious.
My ONLY persistent concern with this approach is discussed to some degree in this article, and I would be very interested in learning more, namely the “longevity plan” of this archive, public access, etc. Are there any commitments, contracts, governing documents of the Family Search organisation which we can rely on even in the most extreme circumstances to trust that this collection will remain intact and available to the general public? Equally important, is there any plan to allow users to generate completely offline “archives” of the records and materials they themselves have added to the system, or more ideally that ANYONE has contributed to records on their own trees as a failsafe / assurance?
I can say for myself, that my efforts in contributing to the platform would be redoubled if I had the ability to take a full export of records on my own tree to offline media to know that my hard work is safe and under my own control should circumstances change.