This is a guest post by Kate Murray, IT Specialist in the Library of Congress’s Digital Collections and Management Services.
The Library of Congress’ Sustainability of Digital Formats Web site (informally just known as “Formats”) details and analyzes the technical aspects of digital formats with a focus towards strategic planning regarding formats for digital content, especially collection policies. Launched in 2004, Formats provides in-depth descriptions of over 400 formats sorted into content categories: still image, sound, textual, moving image, Web archive, datasets, geospatial and generic formats with more to come. There are other publicly available format assessment tools in the community at large including the British Library Format Assessments (via DPC wiki) and Harvard Library’s Digital Preservation Format Assessments just to name a few (see the iPRES 2016 workshop on Sharing, Using and Re-using Format Assessments for more examples) but in part, what makes the LC Formats resource unique is the fact that we document relationships between formats (subtypes and the like), especially the way wrappers and encodings interact when used together – what we call a “combo pack.”
Formats is also well-known for what we consider when evaluating formats including the seven sustainability factors and the quality and functionality factors which vary depending on the content category.
Not ones to rest on our laurels, we are excited to announce recent updates and improvements for Formats. First, it’s moved to a new URL from digitalpreservation.gov/formats to loc.gov/preservation/digital/formats. Each page has a page-level redirect to bring users to the correct site. Content at the old URL is no longer revised so be sure to update your bookmarks to get the most current information.
One of the new additions to Formats is the inclusion of the PRONOM Persistent Unique Identifier (PUID) and WIkidata Title ID information in order to help establish the correct relationships to other these format assessment resources. One example is the open source format identification tool Siegfried which includes both LC’s format document descriptions and PRONOM information in its results. It’s important to recognize that there’s not always a perfect match across resources for a variety of reasons –maybe the versions aren’t consistently described for example - but when there is a good match, we’ll include it. It’s more complicated than just looking for matching format extensions like .tif or .wav. There’s an intellectual research component to correctly pair like with like so it takes a bit of time. We’re working our way through the list of format document descriptions and adding as we can – it’s an ongoing project.
In addition, we’ve also added links to formats listed in the Recommended Formats Statement to better connect these related resources.
A reminder that all format document descriptions are available for download in XML, as individual pages or get the entire set in a zip file.
Formats continues to evolve to meet the Library’s and the digital preservation community’s changing needs. Stay tuned for announcements about the posting of new format descriptions – we have much more to come.