Note: We will occasionally post material to The Signal, with updates, that was previously published only on our website. The following is an article from our “Meeting the Challenge” series, October, 2010.
Behind every digital object, there is usually metadata with descriptive information about the object. But the library world is all too aware that metadata for access and discovery is no longer enough. Now, digital library professionals are looking to the future with an eye towards preservation, not only needing to preserve the digital objects themselves but also the valuable metadata that goes along with it.
Enter PREMIS, which stands for Preservation Metadata: Implementation Strategies. According to the publication Understanding PREMIS written by Priscilla Caplan and issued by the Library of Congress, preservation metadata “supports activities intended to ensure the long-term usability of a digital resource.”
The Library of Congress sponsors the PREMIS Maintenance Activity, and thereby is promoting the use and development of this preservation metadata standard as a regular part of the digital library process.
The motivation for PREMIS is based on the needs for implementing a digital preservation repository, which requires keeping important information about its digital objects to enable long-term management. As stated in Understanding PREMIS, “the primary uses of PREMIS are for repository design, repository evaluation and exchange of archived information packages among preservation repositories.”
So why is this important? Rebecca Guenther, senior networking and standards specialist at the Library of Congress, illustrates this by the following comparison: “In addition to being able to find books, you need to be able to bind the books so they don’t fall apart and perform other preservation actions that keeps their pages readable and intact.” PREMIS provides the information to ensure that the object can be preserved – as a sort of digital “binding” – to keep the items, through the metadata, useable over time.
PREMIS grew out of an effort by the cultural heritage community to build on the Open Archival Information System repository model for digital resources. Discussions about the need for preservation metadata led to the formation of a PREMIS Working Group in 2003, an international collaboration of experts involved in digital preservation activities, jointly sponsored by OCLC and RLG. The initial result was a tool that has been in increasing use ever since – the PREMIS Data Dictionary for Preservation Metadata.
The Data Dictionary is a comprehensive resource for the implementation of preservation metadata in digital library systems. It consists of a core set of standardized data elements that are recommended for repositories to manage and perform the preservation function. These crucial functions include actions to make the digital objects useable over time, keeping them viable, or readable, displayable and kept intact, all for the purpose of future access.
The digital preservation community recognized the importance of the Data Dictionary from the start: in 2005 the Working Group received the prestigious Digital Preservation Award from the Digital Preservation Coalition in the UK, and a year later, was given the Preservation Publication Award by the Society of American Archivists. The judges for this last award noted, “The work is intellectually sophisticated, groundbreaking, truly collaborative and international in scope, and is of great significance for the archival preservation community.”
To illustrate a general need for preservation metadata, for example, consider that certain file formats can become obsolete and not accessible by current applications. This would require either transforming older formats to new (migration), or reproducing the original experience with newer technology (emulation). In order to succeed, both of these strategies would require the following: technical metadata about the original files, the older hardware and software that they ran on, and what actions had been performed on them – in other words, preservation metadata.
The implementation of PREMIS has matured since the Data Dictionary was first issued. It is increasingly being seen as a key tool for developing a preservation infrastructure. Even though PREMIS is just beginning to be adopted by some institutions, it has had an impact on their overall preservation strategies.
Government agencies, such as the Government Printing Office and the National Archives and Records Administration have adopted or are planning to adopt PREMIS. Ex Libris, a library information technology company, has also integrated PREMIS into their preservation product. Translation of Understanding PREMIS into Spanish, Italian and German, and translation of the entire Data Dictionary into Japanese, demonstrates broad international appeal. In some countries, use of the Data Dictionary is mandated for digital objects in cultural heritage digital repositories.
The Library’s Network Development and Standards Office hosts the PREMIS website, which provides documentation and discussion lists, and serves as a central information point for all things PREMIS. The initial working group has now been replaced by the ongoing PREMIS Maintenance Activity, which includes a PREMIS Editorial Committee. This activity supports the maintenance of the Data Dictionary as well as the XML schema, maintains centralized discussion groups and forums, and provides tutorials and workshops on PREMIS. The PREMIS folks are not without a sense of humor, either – part of this ongoing activity is the PREMIS Implementers Group (the “PIG”), which hosts a wiki called “The PigPen.”
The Library of Congress contracted with the Florida Center for Library Automation to build a tool that supports PREMIS implementation by automating the preservation metadata creation process. The result was the PREMIS in METS , a free toolbox that is available on SourceForge.
According to Rebecca Guenther, there are many near-term goals for the overall PREMIS activity, including revisions of the Data Dictionary based on user experiences, explorations of changes to the underlying data model, experimentation with the exchange of objects between repositories and implementation at the Library of Congress in some digital projects. Guenther says, “a lot of these data elements are already available from the objects, but the challenge is capturing them and putting them in one place to be used for the preservation process.”
(UPDATE: Since this article was originally published, there is an updated version of the Data Dictionary available, as well as a new OWL ontology. Rebecca Guenther has left the Library and is now serving as an independent consultant to the project. See the PREMIS page for the latest information.)