Top of page

Even More Fun with File Formats!

Share this post:

Today’s guest post is from Kate MurrayMarcus Nappier, and Liz Holdzkom of the Digital Collections Management & Services Division at the Library of Congress.

Fun with File Formats is back with another installment! Our first two blog posts from December 2021 and June 2022 were very popular with readers of The Signal. No surprise that there are lots of file format fans out there and we are here for it. Let’s catch all y’all up on what we’ve been up to over the last six months or so.

New Format Descriptions and Analysis

As usual, we’ve been hard at work with our file format descriptions or (FDDs) which include many hours of technical research, fact checking and generally nerdy deep dives into format specifications and standards. For the first time, we’ve decided to publish our 2022-2023 workplan which lists format descriptions that are expected to be added to the site in the coming months. It is not definitive as sometimes priorities change but, instead, is an overall indication of planned work.

Because our work is driven by Library of Congress collections, projects and interests, we are working to fill a few gaps from the Recommended Formats Statement in two major areas: Geospatial and, separately, 3D, Virtual Reality and related design formats. Accessibility features for supporting audio players and screen readers as well as audiovisual captioning and subtitle formats are additional areas of focus and touch on related projects with the Federal Agencies Digital Guidelines Initiative’s Accessibility Features for Digital Audiovisual Collections Content. Then there’s the catch-all “miscellaneous” group that covers a variety of topics including video formats (WebP, VP8, VP9 are all coming soon), as well as WACZ for web archiving and many others.

Since our last update in June 2022, we’ve published nine new FDDs including KML (formerly Keyhole Markup Language) Zipped (FDD 547), Well-known Text (FDD 548), BRF or Braille Ready Format (FDD 551), which in case you were wondering is pronounced brif with a short i sound, rhyming with cliff, HBL or BrailleSense File Format (FDD 553), JPEG XS Encoding (FDD 545), and JPEG XS File Format (FDD 546), HTJ2K (High-Throughput JPEG 2000) Encoding (FDD 565) and HTJ2K File Format (FDD 566).

Major contributions to our FDD work in the last few months are contractors Joel Lawhead, Allyson Caridad and Jessica Herr from NVision Solutions. This crackerjack team is working with us this year to provide technical research for new and existing format descriptions.

To help everyone (including us) keep track of when new FDDs hit the site, we’re introducing a “publication log” with basic FDD information like short and long name, number, HTML and XML URLs along with the initial publication date. This isn’t exactly the same as a change log because we won’t register every time we make an edit to an existing FDD because we make small adjustments all the time. If there’s a major change, we update the “Last significant FDD update” field in the upper right corner. The Publication Log will just record when we add brand new FDDs to the website starting from June 2022.

Figure 1: How to tell when an FDD was updated with significant changes.

Speaking of new FDDs, one of our favorites is WordStar (FDD 552) authored by our outstanding 2022 Junior Fellows Dan Hockstein and Mari Allison! (We are waving to you through the interwebs!) While their internship has ended and they are both off on exciting new adventures, you can read all about some of their research work in “Wow, it’s WordStar!” Exploring a Beloved Early Word Processor and its Many Formats.

Strategic Planning and Site Improvements

In addition to their file format research, Mari and Dan focused on strategic planning for Sustainability of Digital Formats.

As part of their work, Mari and Dan created user personas based on user groups identified in the Library’s Strategic Plan and developed a questionnaire for external users. The questionnaire was distributed through listservs and the Library’s social media, with a goal of gathering information on user demographics, usage of the Formats site, preferences for functionality, and other feedback that users chose to share. In addition to the questionnaire, our Junior Fellows conducted more in depth informational interviews with about ten power users of the Sustainability of Digital Formats site to help us round out a better understanding of our diverse users. We received a lot of positive feedback–including one user advocating for a permanent way to show support: “I love your site! Its detailed research and links to resources makes it the best site for performing format research. I tell my students to tattoo the URL.”

While we do not suggest tattoos in the name of formats (URLs do change, you know?), we were happy to hear about what’s currently working with the site. But what we were really interested in was the feedback on what could work better.

Thanks to Mari and Dan and their interviews, we now have a good list of some short and long-term updates that are recommended for the site.

For the short-term we plan to:

  • Review the homepage to make some layout changes that would prioritize the most important and useful information.
  • Make sure our contact information is easier to find.
  • Establish consistency and improved readability with more subheadings in the FDD descriptions.
  • Make the XML versions of the FDDs more visible.
  • Improve access to our FDD “Explanation of Terms” page by adding more links from the homepage and other pages.
  • Post our planned areas of research and specific upcoming FDDs.
Figure 2: The Explanation of Terms page is a key feature to understanding our FDDs.

We also have some long-term ideas that will take more time and consideration to implement such as making our data more actionable through APIs or linked data.

iPres 2022 Format Registries Workshop

Figure 3: Swag bags from iPres 2022 organized by the Digital Preservation Coalition. Flickr: Credit: Digital Preservation Coalition, CC BY-NC-SA 2.0.

Another recent fun project was participating in a file format research focused workshop at iPres 2022 in Glasgow, Scotland with other leaders of the international file format community. Registering our preservation intentions: A collaborative workshop on digital preservation registries (see iPres proceedings p. 503-504) brought together colleagues from the Digital Preservation Coalition, National Archives UK’s PRONOM, National Archives and Records Administration (NARA), The British Library, Yale University Library, Ravensburger AG and many more “to provide a space for discussion on the future of the preservation registries landscape, identifying gaps in provision, understanding changing user needs, and exploring opportunities for collaboration.” We had some theme-setting presentations but the most valuable part was the informal conversation and community building. There’s always plenty of work to do with file formats so we want to broaden participation and lower barriers for knowledge sharing and contributions.

Wrap Up

As always, we welcome your file format related questions and comments. Leave a comment here or send us a note at [email protected]. Until then, #fileformats4eva (which, honestly, would make a pretty good tattoo …).

Comments (2)

  1. Great post – always impressive to hear about/see all your updates. Just want to take a moment to say a big “THANK YOU” for all the work and effort the LoC-Team is putting into the FDDs! They’re such an an important and valuable ressource for the digital preservation community world wide!

  2. Great to see the ongoing activity with this resource. It got rolling in 2004 — hard not to wonder how many of the 500-odd formats (and “sub-formats”) are still rolling too. Indeo? Flash? They and some others may no longer be alive and kicking but — of course — that’s why the FDDs have value for preservation specialists. Keep up the good work!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.