How to Write a FDD in 149 Easy Steps: Learning to Evaluate Digital File Formats

Today’s guest post is from Marcus Nappier, who is a Digital Collections Specialist in the Digital Content Management Section at the Library of Congress.

The Library of Congress maintains over 470 format description documents (FDDs) on the Sustainability of Digital Formats website that provide information about file-formats, bit stream structures and encodings, and their usage in various applications. This site, started in 2004, is one of the world’s most popular resources for technical information about file formats and receives approximately 40,000 visitors a month. FDDs highlight the relationships between formats as well as the Library’s evaluation criteria. These include sustainability factors (such as adoption, disclosure and impact of patents, which are applicable to all formats) and quality and functionality factors, which vary across format categories. How exactly do we get here though? How do we contextualize and organize all of the information out there to create these format description documents (FDDs)?

Research, Research, and More Research

Our work to write FDDs typically starts with research, more research, and when we think we’re done, there is likely more research to do after that. My first experience writing an FDD concluded with the recently published entry for CGM, Computer Graphics Metafile. To start, my colleague Laurel Gassie (who joins us on an informal detail during COVID telework), was instrumental to the research process as she provided a significant portion of the background information about the file format from a variety of online resources. This information will form part of the backbone of the FDD, as much of this information will characterize and contextualize the description of the format and chart its developmental history. While this initial research effort is atypical for previous FDD writing projects, it provided an opportunity for me to familiarize myself with the format. Future FDD writing efforts will require a comprehensive approach to identify a variety of sources and citations that provide sufficient information about the format. This may involve identifying sites with an overview or generalized level of information about the format, while following specific links or citations that provide more granular information about the format’s specifications.

Organization

One of the other challenges of writing FDDs, after research, is organizing all of the information into the appropriate, meaningful categories that will make the FDD useful to the Library’s user communities. As this was my first foray into writing a format description, no source was more instrumental in this process than the “Format Descriptions: Explanation of Terms.” This page provides details for each category of information we cover, including sustainability and functionality factors, as well as the types of information that reflect adequate data entry for each FDD term.

One of the best pieces of advice I received when drafting the CGM FDD came from Caroline Arms, one of the founders of the Sustainability of Digital Formats site and author of many of the format descriptions in her role as a longtime consultant. She suggested that it’s best to fill in the concrete factual information areas such as Format Specifications, File Signifiers, and History before writing the Description. It has been helpful to think of the Description as a summation of the significant characteristics of the file format compiled from other sections of the FDD.

Organizing the content into a meaningful and coherent document also forced me to constantly revisit the need for more research. In the case of the CGM FDD, the Adoption section required an understanding of the industry profiles that utilized the file format as well as specific software programs that offer the functionality to open and manipulate the file format. Additional research may shed more light on the usage of the format as well as detailed software capabilities,.

Writing and Formatting

Regarding the writing of the FDD, I found it easiest to draft, edit text, and compile the necessary hyperlinks in Microsoft Word before compiling into an XML editing tool. All format document descriptions, either the individual pages or a zip file of the entire site, are available for download in XML.

Screenshot of raw XML formatting in an XML editor. This is the tool used to “write” the FDD and ensure that formatting for text and added links are appropriate before publishing.

Screenshot of raw XML formatting in an XML editor. This is the tool used to “write” the FDD and ensure that formatting for text and added links are appropriate before publishing.

Formatting the content in XML is not particularly difficult but there are intricacies of XML for things such as hyperlinks, quotations, or parenthesis that can be tricky. Our XML editor presents a variety of available views of the XML, including one that integrates the text from the Explanation of Terms. The XML formatting process is another opportunity for collaboration with colleagues including Kate Murray who manages the Sustainability of Digital Formats site and provides excellent edits and guidance on XML formatting conundrums.

The completed CGM FDD published on the Library’s Sustainability of Digital Formats site.

The completed CGM FDD published on the Library’s Sustainability of Digital Formats site.

The process of writing my first FDD has proved to be a great learning experience and instilled an appreciation of the detailed research and writing processes involved. This first FDD writing experience will serve as a building block for the future as I look forward to writing more and collaborating with my colleagues to enhance the vast wealth of information that the Library’s Sustainability of Digital Formats site provides. Stay tuned!

One Comment

  1. Carl Fleischhauer
    September 17, 2020 at 3:42 pm

    Marcus: bravo!! Tremendous patience _is_ required by this process: 149 steps indeed. A great service, much appreciated by the community.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.