How to Write a FDD in 149 Easy Steps: Learning to Evaluate Digital File Formats

Today’s guest post is from Marcus Nappier, who is a Digital Collections Specialist in the Digital Content Management Section at the Library of Congress.

The Library of Congress maintains over 470 format description documents (FDDs) on the Sustainability of Digital Formats website that provide information about file-formats, bit stream structures and encodings, and their usage in various applications. This site, started in 2004, is one of the world’s most popular resources for technical information about file formats and receives approximately 40,000 visitors a month. FDDs highlight the relationships between formats as well as the Library’s evaluation criteria. These include sustainability factors (such as adoption, disclosure and impact of patents, which are applicable to all formats) and quality and functionality factors, which vary across format categories. How exactly do we get here though? How do we contextualize and organize all of the information out there to create these format description documents (FDDs)?

Research, Research, and More Research

Our work to write FDDs typically starts with research, more research, and when we think we’re done, there is likely more research to do after that. My first experience writing an FDD concluded with the recently published entry for CGM, Computer Graphics Metafile. To start, my colleague Laurel Gassie (who joins us on an informal detail during COVID telework), was instrumental to the research process as she provided a significant portion of the background information about the file format from a variety of online resources. This information will form part of the backbone of the FDD, as much of this information will characterize and contextualize the description of the format and chart its developmental history. While this initial research effort is atypical for previous FDD writing projects, it provided an opportunity for me to familiarize myself with the format. Future FDD writing efforts will require a comprehensive approach to identify a variety of sources and citations that provide sufficient information about the format. This may involve identifying sites with an overview or generalized level of information about the format, while following specific links or citations that provide more granular information about the format’s specifications.

Organization

One of the other challenges of writing FDDs, after research, is organizing all of the information into the appropriate, meaningful categories that will make the FDD useful to the Library’s user communities. As this was my first foray into writing a format description, no source was more instrumental in this process than the “Format Descriptions: Explanation of Terms.” This page provides details for each category of information we cover, including sustainability and functionality factors, as well as the types of information that reflect adequate data entry for each FDD term.

One of the best pieces of advice I received when drafting the CGM FDD came from Caroline Arms, one of the founders of the Sustainability of Digital Formats site and author of many of the format descriptions in her role as a longtime consultant. She suggested that it’s best to fill in the concrete factual information areas such as Format Specifications, File Signifiers, and History before writing the Description. It has been helpful to think of the Description as a summation of the significant characteristics of the file format compiled from other sections of the FDD.

Organizing the content into a meaningful and coherent document also forced me to constantly revisit the need for more research. In the case of the CGM FDD, the Adoption section required an understanding of the industry profiles that utilized the file format as well as specific software programs that offer the functionality to open and manipulate the file format. Additional research may shed more light on the usage of the format as well as detailed software capabilities,.

Writing and Formatting

Regarding the writing of the FDD, I found it easiest to draft, edit text, and compile the necessary hyperlinks in Microsoft Word before compiling into an XML editing tool. All format document descriptions, either the individual pages or a zip file of the entire site, are available for download in XML.

Screenshot of raw XML formatting in an XML editor. This is the tool used to “write” the FDD and ensure that formatting for text and added links are appropriate before publishing.

Screenshot of raw XML formatting in an XML editor. This is the tool used to “write” the FDD and ensure that formatting for text and added links are appropriate before publishing.

Formatting the content in XML is not particularly difficult but there are intricacies of XML for things such as hyperlinks, quotations, or parenthesis that can be tricky. Our XML editor presents a variety of available views of the XML, including one that integrates the text from the Explanation of Terms. The XML formatting process is another opportunity for collaboration with colleagues including Kate Murray who manages the Sustainability of Digital Formats site and provides excellent edits and guidance on XML formatting conundrums.

The completed CGM FDD published on the Library’s Sustainability of Digital Formats site.

The completed CGM FDD published on the Library’s Sustainability of Digital Formats site.

The process of writing my first FDD has proved to be a great learning experience and instilled an appreciation of the detailed research and writing processes involved. This first FDD writing experience will serve as a building block for the future as I look forward to writing more and collaborating with my colleagues to enhance the vast wealth of information that the Library’s Sustainability of Digital Formats site provides. Stay tuned!

Finding By the People Transcriptions in the Library’s Digital Collections

Today’s guest post is from Dr. Victoria Van Hyning, who served as a By the People Community Manager at the Library from 2018-2020. Starting in Fall 2020, she will be an Assistant Professor of Library Innovation at the University of Maryland iSchool, where she will continue her research on crowdsourcing, outreach, and inclusion.   The […]

Making a valuable resource even better: the Recommended Formats Statement and RFS 2.0

Today’s guest post is from Jesse Johnston (Sr. Research Development Officer Office of Research, Office of the Vice President for Research, University of Michigan), Kate Murray (Digital Projects Coordinator, Digital Collections Management & Services Division), Marcus Nappier (Digital Collections Specialist, Digital Content Management Section), and Ted Westervelt, Chief, US/Anglo Division. It has become ever more […]

Gina Jones and 20 Years of Web Archiving at the Library of Congress

Today’s guest blog post is from Gina Jones and Abbie Grotke, both of the Web Archiving Team. As a part of our series looking back at some of the people and stories around our 20th Anniversary of Web Archiving, I wanted to share with you an interview with a person who has been working on […]

Happy Birthday to LCWA! Celebrating the 20th Anniversary of Web Archiving at the Library of Congress.

Today’s guest post is from Abbie Grotke, who is Lead Librarian, Web Archiving Team in the Digital Content Management Section of the Library of Congress.   2020 marks a special occasion for the Library of Congress – our anniversary of 20 years of web archiving! Remember the year 2000? Back when we all breathed a […]

PDF is Here to Stay: Archiving with the Portable Document Format

Today’s guest post is from Kate Murray (Digital Projects Coordinator, Digital Collections Management and Services Division, Library of Congress), Duff Johnson (Executive Director, PDF Association / ISO Project Leader, ISO 32000), and Kevin De Vorsey (Senior Electronic Records Policy Analyst, Records Management Policy and Standards, National Archives and Records Administration). PDF in the Federal Archiving […]

In a Web Archives Frame of Mind: Improving Access and Describing the Collections

This is a guest post by Lauren Baker, a Librarian-in-Residence on the Library of Congress Web Archiving Team (a part of the Digital Collections Management & Services Division). The Librarians-in-Residence Program offers early career librarians an opportunity to contribute to Library projects while learning from professionals in the field. In 2018, the Library of Congress […]

The Library of Congress joins the Digital Preservation Coalition

Today’s guest post is from Kate Murray, a Digital Projects Coordinator in the Digital Collections and Services Division at the Library of Congress. Digital information drives our economy, spurs our culture, and connects our community. But it requires special care to ensure that our expanding archives of digital information will be there for the future. […]

Resilience in the Commons: Acquiring and Preserving Open Access Latin American Monographs

Today’s guest post is from Charlotte Kostelic, a Digital Collections Specialist within the Digital Content Management Section of the Library of Congress. The move toward publishing research through open publishing models is growing internationally, but in Latin America, Open Access (OA) publishing is growing at a faster rate than elsewhere. Recent studies suggest 51-95% of […]

Science Blogs Web Archive

This guest post is an interview with Lisa Massengale, Head of the Science Reference Section, with contributions by the Web Archive’s creator Jennifer Harbster, a Science Reference and Research Specialist for the Science, Technology and Business Division from Oct. 2001- Dec. 2015.  Along with her reference duties for the Library’s Science Reference Service, she created […]