Even More Fun with File Formats!

Today’s guest post is from Kate MurrayMarcus Nappier, and Liz Holdzkom of the Digital Collections Management & Services Division at the Library of Congress.


Fun with File Formats is back with another installment! Our first two blog posts from December 2021 and June 2022 were very popular with readers of The Signal. No surprise that there are lots of file format fans out there and we are here for it. Let’s catch all y’all up on what we’ve been up to over the last six months or so.

New Format Descriptions and Analysis

As usual, we’ve been hard at work with our file format descriptions or (FDDs) which include many hours of technical research, fact checking and generally nerdy deep dives into format specifications and standards. For the first time, we’ve decided to publish our 2022-2023 workplan which lists format descriptions that are expected to be added to the site in the coming months. It is not definitive as sometimes priorities change but, instead, is an overall indication of planned work.

Because our work is driven by Library of Congress collections, projects and interests, we are working to fill a few gaps from the Recommended Formats Statement in two major areas: Geospatial and, separately, 3D, Virtual Reality and related design formats. Accessibility features for supporting audio players and screen readers as well as audiovisual captioning and subtitle formats are additional areas of focus and touch on related projects with the Federal Agencies Digital Guidelines Initiative’s Accessibility Features for Digital Audiovisual Collections Content. Then there’s the catch-all “miscellaneous” group that covers a variety of topics including video formats (WebP, VP8, VP9 are all coming soon), as well as WACZ for web archiving and many others.

Since our last update in June 2022, we’ve published nine new FDDs including KML (formerly Keyhole Markup Language) Zipped (FDD 547), Well-known Text (FDD 548), BRF or Braille Ready Format (FDD 551), which in case you were wondering is pronounced brif with a short i sound, rhyming with cliff, HBL or BrailleSense File Format (FDD 553), JPEG XS Encoding (FDD 545), and JPEG XS File Format (FDD 546), HTJ2K (High-Throughput JPEG 2000) Encoding (FDD 565) and HTJ2K File Format (FDD 566).

Major contributions to our FDD work in the last few months are contractors Joel Lawhead, Allyson Caridad and Jessica Herr from NVision Solutions. This crackerjack team is working with us this year to provide technical research for new and existing format descriptions.

To help everyone (including us) keep track of when new FDDs hit the site, we’re introducing a “publication log” with basic FDD information like short and long name, number, HTML and XML URLs along with the initial publication date. This isn’t exactly the same as a change log because we won’t register every time we make an edit to an existing FDD because we make small adjustments all the time. If there’s a major change, we update the “Last significant FDD update” field in the upper right corner. The Publication Log will just record when we add brand new FDDs to the website starting from June 2022.

Figure 1: How to tell when an FDD was updated with significant changes.

Speaking of new FDDs, one of our favorites is WordStar (FDD 552) authored by our outstanding 2022 Junior Fellows Dan Hockstein and Mari Allison! (We are waving to you through the interwebs!) While their internship has ended and they are both off on exciting new adventures, you can read all about some of their research work in “Wow, it’s WordStar!” Exploring a Beloved Early Word Processor and its Many Formats.

Strategic Planning and Site Improvements

In addition to their file format research, Mari and Dan focused on strategic planning for Sustainability of Digital Formats.

As part of their work, Mari and Dan created user personas based on user groups identified in the Library’s Strategic Plan and developed a questionnaire for external users. The questionnaire was distributed through listservs and the Library’s social media, with a goal of gathering information on user demographics, usage of the Formats site, preferences for functionality, and other feedback that users chose to share. In addition to the questionnaire, our Junior Fellows conducted more in depth informational interviews with about ten power users of the Sustainability of Digital Formats site to help us round out a better understanding of our diverse users. We received a lot of positive feedback–including one user advocating for a permanent way to show support: “I love your site! Its detailed research and links to resources makes it the best site for performing format research. I tell my students to tattoo the URL.”

While we do not suggest tattoos in the name of formats (URLs do change, you know?), we were happy to hear about what’s currently working with the site. But what we were really interested in was the feedback on what could work better.

Thanks to Mari and Dan and their interviews, we now have a good list of some short and long-term updates that are recommended for the site.

For the short-term we plan to:

  • Review the homepage to make some layout changes that would prioritize the most important and useful information.
  • Make sure our contact information is easier to find.
  • Establish consistency and improved readability with more subheadings in the FDD descriptions.
  • Make the XML versions of the FDDs more visible.
  • Improve access to our FDD “Explanation of Terms” page by adding more links from the homepage and other pages.
  • Post our planned areas of research and specific upcoming FDDs.

Figure 2: The Explanation of Terms page is a key feature to understanding our FDDs.

We also have some long-term ideas that will take more time and consideration to implement such as making our data more actionable through APIs or linked data.

iPres 2022 Format Registries Workshop

Figure 3: Swag bags from iPres 2022 organized by the Digital Preservation Coalition. Flickr: https://www.flickr.com/photos/dpconflickr/52444030255/. Credit: Digital Preservation Coalition, CC BY-NC-SA 2.0.

Another recent fun project was participating in a file format research focused workshop at iPres 2022 in Glasgow, Scotland with other leaders of the international file format community. Registering our preservation intentions: A collaborative workshop on digital preservation registries (see iPres proceedings p. 503-504) brought together colleagues from the Digital Preservation Coalition, National Archives UK’s PRONOM, National Archives and Records Administration (NARA), The British Library, Yale University Library, Ravensburger AG and many more “to provide a space for discussion on the future of the preservation registries landscape, identifying gaps in provision, understanding changing user needs, and exploring opportunities for collaboration.” We had some theme-setting presentations but the most valuable part was the informal conversation and community building. There’s always plenty of work to do with file formats so we want to broaden participation and lower barriers for knowledge sharing and contributions.

Wrap Up

As always, we welcome your file format related questions and comments. Leave a comment here or send us a note at [email protected]. Until then, #fileformats4eva (which, honestly, would make a pretty good tattoo …).

Collaborations with Embedded Audio Metadata: Reusing Cue Chunk Data for IIIF Web Annotations

Collaborative editing and preservation capabilities enabled by an emerging open source workflow and updated preservation guidelines? More on a pilot of annotation approaches with AudioAnnotate Audiovisual Extensible Workflow, FADGI and BWF MetaEdit, and American Folklife Center collections in this post.

Fun with File Formats

Today’s guest post is from Kate Murray, Marcus Nappier, and Liz Holdzkom of the Digital Collections Management & Services Division at the Library of Congress. Are you a file format fan? If you’re curious how to pronounce the still image format HEIF (spoiler alert: it rhymes with “beef”) or the difference between PDF/A-3 and PDF/A-4, […]

It’s a bird, it’s a plane, it’s a…derivative dataset!

This post describes a collaboration between LC Labs member Eileen J. Manchester and Peter DeCraene, the Albert Einstein Distinguished Educator Fellow to answer the question: “what would it mean to treat a dataset as a primary source?”

A look at FADGI with Librarian-in-Residence Hana Beckerle

Today’s guest post is from Hana Beckerle, a 2021 Librarian-in-Residence at the Library of Congress. I graduated with my MSLIS from Catholic University of America (CUA) in May 2021 and joined the Library’s Digitization Services Section (DSS) as a Librarian-in-Residence in June. While at CUA, I worked as an Electronic Resources Assistant at the University […]

That’s Our Cue! Updates for the FADGI Embedded Metadata Guidelines and BWF MetaEdit for the Cue Chunk in Broadcast Wave Files

This is guest post, the first in a series of updates about the recent work of the Federal Agencies Digital Guidelines Initiative (FADGI) Audio-Visual working group, is co-authored by Kate Murray, Digital Projects Coordinator in Digital Collections Management and Services, audiovisual archivist and technologist Dave Rice, and Jérôme Martinez, Founder and President of MediaArea.net. The […]

Next Slide Please: 2021 Digital Strategy Summer Intern Design Sprint part I

This is an interview with Emily Zerrenner, Jodanna Domond, Luke Borland, and Darshni Patel, four of the seven students that joined our team during the summer of 2021. As a small group, they worked together to better understand the Library’s Web Archives with the needs of researchers and data visualization artists in mind.