Today’s guest post is from Kate Murray, Marcus Nappier, Liz Holdzkom and Genevieve Havemeyer-King of the Digital Collections Management & Services Division at the Library of Congress.
It’s hard to believe that this is our sixth installment of file format fandom blog posts! The traditional sixth anniversary gift is something made from iron but that seems like a pain to giftwrap. So just send us good vibes, comments and questions instead.
Way back in 2021 with our inaugural post, Fun with File Formats, we had only eight content categories with approximately 480 FDDs or format description documents. Three years later, we have 12 content categories and 568 FDDs. Any way you slice it, we have been busy. If you are imagining that meme with the cat frantically typing on a laptop, that is an accurate representation of your favorite Formats team at work.
New Format Descriptions Galore
Without a doubt, the main contributor to our impressive FDD output this year was the now-concluded contract with Myriad Consulting (Ashley Blewer, Frances Harrell and Abi Simkovic). Their good work contributed 32 new entries to the Sustainability of Digital Formats, alongside seven new entries authored by Library of Congress staff.
As Marcus explained in How to Write a FDD in 149 Easy Steps: Learning to Evaluate Digital File Formats, researching and writing an FDD is a collaborative and labor-intensive task so there are lots of fingers in all these FDD pies. It is the ultimate group project.
Since our last blog entry in December 2023, we posted the following new FDDS in specific areas of focus:
- Mobile device support: APK, Android Package (FDD 592); IPA, iOS App Store Package (FDD 593); Apple_ProRAW (FDD 594); and XAP, Silverlight Application Package (FDD 595).
- Packaging, software and installation support: LNK, Microsoft Windows Shortcut File (FDD 596); DS_Store, Desktop Services Store (FDD 597); class, Java Virtual Machine Class File Format (FDD 598); gzip (FDD 599); and bzip2 (FDD 600).
- Email and personal information management (PIM): TNEF, Transport Neutral Encapsulation Format (FDD 485); MLM, GroupWise Email Format (FDD 614); EMLX, Apple Mail Email Format (FDD 615); and vCard, Virtual Card Format (FDD 616).
- Audiovisual: CAF, Apple Core Audio Format (FDD 591); USAC, Unified Speech and Audio Coding (FDD 606); ADM, Audio Definition Model (FDD 607); NSV, Nullsoft Streaming Video (FDD 608); SIB, Sibelius Music Notation Format (FDD 609); IMF_Package, Interoperable Master Format (FDD 535); AV1, AOMedia Video 1 Video Encoding (FDD 541); MXF_RDD48, MXF Archive and Preservation Format Registered Disclosure Document (SMPTE RDD 48) (FDD 543); MXF_GC_FFV1, MXF Generic Container Mapped to FFV1 Encoding (SMPTE RDD 48 Amd 1) (FDD 544); and SWF_Family, Flash SWF File Format Family (FDD 629).
- 3D, VR and animation: 3DM, 3D Model File Format Family (FDD 601); VRML, Virtual Reality Modeling Language Family (FDD 602); XYZ, XYZ Point Cloud (FDD 617); DGN, MicroStation DGN Family (FDD 603); MA, Maya ASCII Scene File Format (FDD 604); and MB, Maya Binary Scene File Format (FDD 605).
- Imaging: DNG_1_6, Adobe Digital Negative (DNG) Version 1.6 (FDD 628) and AVIF, AV1 Image File Format (FDD 540).
- Forensics and disc imaging: Stream, KryoFlux Stream File (FDD 610); DFXML, Digital Forensics XML (FDD 611); MOOF, MOOF Disk Image (FDD 612); and HFE, HxC Floppy Emulator File Format (FDD 613).
- Miscellaneous: Apple_Fork, AppleDouble Resource Fork (FDD 625); Lyrx, ArcGIS Layer File (FDD 626); PEF, Portable Embosser Format (FDD 624); and CDX_Index, CDX Internet Archive Index File (FDD 590).
You can follow along at home with our progress on our workplan page as well as the regularly updated publication log. We’ve also published the draft workplan for the coming year if you want a sneak peak. This one is still very much in flux because we have no external contract support so it’s just us LC chickens on the FDD writing and updating duty again.
If you are curious about how and why we research the formats we do, read a refresher on our first post, Fun with File Formats. There’s a method to the madness but the gist is that we focus on formats that are of interest to the Library of Congress because we have them in our collections, such as those listed in the Recommended Formats Statement (more on the RFS below), or we will be adding them to our collections. Another path is that we encounter a format in the wild and want to learn about it in preparation for seeing it in our collections.
Documenting Digital Accessibility Features
Alongside these new format entries, we’ve also started a project related to the yearly update of the Recommended Formats Statement (RFS) to document digital accessibility features in order to help RFS Content Teams determine if a format is preferred or acceptable under the RFS guidance.
The key questions we sought to answer about digital accessibility include:
- Does this format support digital accessibility features such as those described in the W3C Accessibility Principles? For example:
- Text alternatives for non-text content (such as alt, or alternative, text)
- Captions and other alternatives for multimedia (and subtitles)
- Can text content be structured (as in XML) or tagged (as in PDF) for screen readers?
- Are dataset formats well-structured with page regions and headings identified, tagged or marked up content permitted, tables navigable for a screen reader and forms that can validate entries?
- In what way are accessibility features implemented in the format? Such as:
- Are there specific metadata tags to indicate accessibility features such as alt text, captions, transcripts and the like?
- Are embedded closed captions supported?
- Does the file rely on external data, such as a WebVTT file for caption data?
See the entry for WAVE Audio File Format (FDD001 – the very first FDD ever written!) which states:
Accessibility Features
WAVE files have moderate support for accessibility features. Closed captions and transcriptions can be embedded within the Labeled Text chunk (ltxt) [within the Associated Data Chunk] and identified as such with the ‘Purpose’ label. The specification includes the suggested Purpose value of ‘capt’ for closed-caption text and FADGI defines, in Guidelines for Embedding Metadata in Broadcast WAVE Files, the Purpose value of ‘tran’ for transcription. Overall, the optional Associated Data Chunk field allows a mechanism to provide context for the audio data along the timeline which is helpful, but it is not expressly designed for accessibility impacts.
In common practice, typically WAVE and other audio file content is supported by external caption and subtitle formats such as WebVTT. See W3C’s Making Audio and Video Media Accessible for more general information about accessible sound and moving image media.
This information is reported as part of Self Documentation, one of the seven sustainability factors. Each entry is prefaced with the Accessibility Features header in bold to make it easier to identify the information consistently on the page. It’s important to note that the RFS does not require these accessibility features to be enabled for a format, but our additions provide information on the capacity for the format to support these features.
We’ve defined the following levels of accessibility support for a format:
- Unknown or not applicable: This term is used if there is no support identified in the specifications/resources or if unknown.
- Limited: This term is used if there might be some very basic capability.
- Moderate: This term is used if there is some support but perhaps not explicitly designed for accessibility or not actually used.
- Good: This term is used if there are dedicated components in the file that support accessibility features, such as embedded metadata to label content, tagged text for reading order and structure for screen readers (such as in text files) or caption/subtitle/timecode/transcription options in AV wrappers.
The RFS Content Teams use the FDDs in their deliberations about preferences for the format as either “preferred” or “acceptable.” There are over 70 preferred formats listed in the RFS across all content categories, so we are tackling those first and we’ll get to the acceptable formats later. We also have plans to get wider community input on what information we’ve documented. While there’s a ways to go to organize and prettify this data as a compiled draft set, we’ve compiled it into a XLSX file in the interim so we can start to share and get feedback as we continue to refine our processes. Let us know what would be helpful to the community for this effort.
What’s Up Next
Your favorite Formats fam have lots in store for the next six months. We have a paper at iPres 2024 with colleagues from the US National Archives and Records Administration (NARA) about File Format Risk Assessment in Two U.S. Government Contexts, lots more FDDs to write, we’re thinking of updating our XML schema to accommodate some recent changes in how and what data we document (reminder that you can download all or part of our FDDs in XML!), and of course, continuing to refine our digital accessibility features work.
And of course, we love to hear from our users. All 170,272 unique visitors across six continents (come on Antarctica folks – complete the map and make some format nerds very happy!) since our last post in December 2023. As we say in many of our FDDs, comments welcome! Please drop us a note here or to [email protected].
Comments (3)
Great to see the continued evolution of this resource, evidence of its continuing value to the community. I’ll hope that funds in the next budget cycle will support engaging some consultants-in-poultry (pun intended) to join the LC format-chickens in updating and expanding the offering.
No questions from me, but sending plenty of good vibes and the comment “THANK YOU” for setting up, maintaining and continuously extending this resource used by digital preservationists and file format fans around the world.
Thank you for your great work on this important set of documents!