The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in the Office of Strategic Initiatives.
We are pleased to announce the publication of nine new format descriptions on the Library’s Format Sustainability Web site. This is a closely related set, each of which pertains to a member of the Office Open XML (OOXML) family.
Readers should focus on the word Office, because these are the most recent expression of the formats associated with Microsoft’s family of “Office” desktop applications, including Word, PowerPoint and Excel. Formerly, these applications produced files in proprietary, binary formats that carried the filename extensions doc, ppt, and xls. The current versions employ an XML structure for the data and an x has been added to the extensions: docx, pptx, and xlsx.
In addition to giving the formats an XML expression, Microsoft also decided to move the formats out of proprietary status and into a standardized form (now focus on the word Open in the name.) Three international organizations cooperated to standardize OOXML. Ecma International, an international, membership-based organization, published first in 2006. At that time, Caroline Arms (co-compiler of the Library’s Format Sustainability Web site) served on the ECMA work group, which meant that she was ideally situated to draft these descriptions.
In 2008, a modified version was approved as a standard by two bodies who work together on information technology standards through a Joint Technical Committee (JTC 1): International Organization for Standardization and International Electrotechnical Commission. These standards appear in a series with identifiers that lead off with ISO/IEC 29500. Subsequent to the initial publication by ISO/IEC, ECMA produced a second edition with identical text. Clarifications and corrections were incorporated into editions published by this trio in 2011 and 2012.
Here’s a list of the nine:
- OOXML_Family, OOXML Format Family — ISO/IEC 29500 and ECMA 376
- OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012
- DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
- DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
- PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
- PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
- XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
- XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
- MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2012, ECMA-376, Editions 1-4
Microsoft is not the only corporate entity to move formerly proprietary specifications into the realm of public standards. Over the last several years, Adobe has done the same thing with the PDF family. There seems to be a new business model here: Microsoft and Adobe are proud of the capabilities of their application software–that is where they can make money–and they feel that wider implementation of these data formats will help their business rather than hinder it.
Although an aside in this blog, it is worth noting that Microsoft and Adobe also provide open access to format specifications that are, in a strict sense, still proprietary. Microsoft now permits the dissemination of its specifications for binary doc, ppt, and xls, and copies have been posted for download at the Library’s Format Sustainability site. Meanwhile, Adobe makes its DNG photo file format specification freely available, as well as its older TIFF format specification.
Both developments–standardization for Office XML and PDF and open dissemination for Office, DNG and TIFF–are good news for digital-content preservation. Disclosure is one of our sustainability factors and these actions raise the disclosure levels for all of these formats, a good thing.
Meanwhile, readers should remember that the Format Sustainability Web site is not limited to formats that we consider desirable. We list as many formats (and subformats) as we can, as objectively as we can, so that others can choose the ones they prefer for a particular body of content and for particular use cases.
The Library of Congress, for example, has recently posted its preference statements for newly acquired content. The acceptable category for textual content on that list includes the OOXML family as well as OpenDocument (aka Open Document Format or ODF), another XML-formatted office suite. ODF was developed by the Organization for the Advancement of Structured Information Standards, an industry consortium. ODF’s standardization as ISO/IEC 23600 in 2006 predates ISO/IEC’s standardization of OOXML. The Format Sustainability team plans to draft descriptions for ODF very soon.