B is for Bit Preservation

Continuing the alphabetical series of digital preservation topics.

Recently one of our California partners commented on the irony of the earthquakes occurring in Colorado and Maryland, two partner sites in areas not prone to earthquakes.  This practice of keeping  copies  of digital collections in geographically diverse areas is one technique employed for bit preservation. Fortunately no data was lost during those events.

Bits are the basic unit of digital information.  Most organizations preparing for the long-term care of digital content distinguish a basic level of methods and services known as bit preservation from a more complex set of services that support display, context and interpretation of the digital object, sometimes called functional preservation. For example, bit preservation of a collection of digital photographs addresses secure and well-managed storage of  the digital files and includes monitoring for changes over time.

Bit preservation does not address the very real needs for appropriate software to display and use the photographs and descriptions that will aid users’ understanding of when, where, and how the photographs were taken and, at an even more complex level, the subjects of the photographs and their context within larger events. However, without attention to bit preservation, there will be nothing to display or use in the long term.

Common approaches for the careful handling and storage of digital content are an effort to mitigate the risk of damage and loss of data. 

Multiple Copies. The first practice is to create more than one complete copy of the digital information. The number of copies recommended varies from two to six and is often most practically determined by the level of resources available to the organization managing the content. These copies are distinct from backups that are part of the routine operation of data centers. It is expected that these copies will serve to replace damaged or lost files.

Diverse Environments. The principles for storing these copies call for diversity in systems, organizations and geography. For the long-term, the best approach is to keep multiple copies stored on hardware from different manufacturers, using different operating systems, in different locations.  The heterogeneity of systems is a hedge against flaws in any particular hardware or operating system. The geographic diversity recognizes the likelihood of destructive events, natural or political.  Most ideally, stored copies would be managed by different organizations benefiting from a wider range of policies and practices.  This approach has led to the  development of cooperative partnerships for distributed storage and management of digital collections.

Cooperative Partnerships. Very few organizations can afford such a complex array of machines and configurations, therefore there have been explorations into a variety of relationships to achieve the redundant aspects of storage. Some are organizing cooperative alliances to use systems designed to distribute and audit multiple copies across diverse environments. Other organizations develop partnerships with one or two organizations either under contract or through a mutual services agreement. More recently organizations are looking at using cloud services for a least one of their copies in an effort to reduce costs. There are benefits and disadvantages to any of these approaches. Each organization embarking upon digital preservation must have a firm grasp on its capacity to leverage its systems and its relationships with other organizations to get the job done.

Audit and Inventory. Two components of good management include auditing the data at regular intervals, especially when moved or migrated to new hardware, and maintaining a good inventory of where and when the files were stored. The auditing is most often practiced using tools that create a unique hash of the file, or a checksum. If any part of the file changes, the next checksum will not match, therefore flagging an error. This approach is especially good to use when moving content from one device to another or even to another organization. The inventory is usually managed with a database or locally-developed tool or set of repository services.

Even basic digital preservation requires significant resources and a variety of skills and knowledge.  Bit preservation is an area of practice that is maturing because of a growing community sharing expertise. Tools and protocols to support the transfer  and auditing of the bits are  available. There is a continuing effort to improve storage systems. Bit preservation is the building block for the more complete set of practices and processes to ensure the survival of digital content over time. As digital content managers and stewards, there are practical steps we can take to keep digital content viable now.

To read more about the thinking around the bit preservation concept and its challenges, see A Framework for Object Preservation in Digital Repositories and  Bit Preservation: A Solved Problem?.

