From a preservation standpoint, some digital file formats are better than others. The basic issue is how readable a format remains over the course of time and successive waves of technological change. The ideal format will convey its content accurately regardless of advances in hardware, software and other aspects of information technology.
Over the last several years, the Library has developed a web resource to help guide preservation-optimal choices in selecting file formats. Sustainability of Digital Formats Planning for the Library of Congress Collections outlines a number of sustainability factors that have a bearing on how effective formats are expected to be with regard to long-term preservation.
The factors are listed below, in brief.
- Disclosure. Degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content.
- Adoption. Extent of acceptance by the primary creators, disseminators or users of information resources.
- Transparency. Openness to direct analysis with basic and non-propriety tools.
- Self-documentation. Inclusion of metadata needed to render the data as usable information or understand its context.
- External dependencies. Degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments.
- Impact of patents. Extent that licenses may inhibit the ability of archival institutions to sustain content.
- Technical protection mechanisms. Embedded capabilities to restrict use in order to protect the intellectual property.
Application of these factors to current format choices has led to identification of different flavors of TIFF and JPEG 2000 as preferred choices for scanned digital images. Also in the mix is PDF/A-1, PDF for Long-term Preservation.
The Library is also working with the The Federal Agencies Digitization Guidelines Initiative to define common guidelines, methods and practices to digitize historical content in a sustainable manner. The Federal Agencies Still Image Digitization Working Group, a subsection of the larger initiative, is concentrating its efforts on image content such as books, manuscripts, maps and photographic prints and negatives.