Office Opens up with OOXML

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in the Office of Strategic Initiatives.

Office1

Before VisiCalc, Lotus 1-2-3, and Microsoft Excel, spreadsheets were manual although their compilers took advantage of adding machines. And there were contests, natch. This 1937 photograph from the Library’s Harris & Ewing collection portrays William A. Offutt of the Washington Loan and Trust Company. It was produced on the occasion of Offutt’s victory over 29 competitors in a speed and accuracy contest for adding machine operators sponsored by the Washington Chapter, American Institute of Banking.

We are pleased to announce the publication of nine new format descriptions on the Library’s Format Sustainability Web site. This is a closely related set, each of which pertains to a member of the Office Open XML (OOXML) family.

Readers should focus on the word Office, because these are the most recent expression of the formats associated with Microsoft’s family of “Office” desktop applications, including Word, PowerPoint and Excel. Formerly, these applications produced files in proprietary, binary formats that carried the filename extensions doc, ppt, and xls. The current versions employ an XML structure for the data and an x has been added to the extensions: docx, pptx, and xlsx.

In addition to giving the formats an XML expression, Microsoft also decided to move the formats out of proprietary status and into a standardized form (now focus on the word Open in the name.) Three international organizations cooperated to standardize OOXML. Ecma International, an international, membership-based organization, published first in 2006. At that time, Caroline Arms (co-compiler of the Library’s Format Sustainability Web site) served on the ECMA work group, which meant that she was ideally situated to draft these descriptions.

In 2008, a modified version was approved as a standard by two bodies who work together on information technology standards through a Joint Technical Committee (JTC 1): International Organization for Standardization and International Electrotechnical Commission. These standards appear in a series with identifiers that lead off with ISO/IEC 29500. Subsequent to the initial publication by ISO/IEC, ECMA produced a second edition with identical text. Clarifications and corrections were incorporated into editions published by this trio in 2011 and 2012.

Here’s a list of the nine:

  • OOXML_Family, OOXML Format Family — ISO/IEC 29500 and ECMA 376
  • OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012
  • DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2012, ECMA-376, Editions 1-4

Microsoft is not the only corporate entity to move formerly proprietary specifications into the realm of public standards. Over the last several years, Adobe has done the same thing with the PDF family. There seems to be a new business model here: Microsoft and Adobe are proud of the capabilities of their application software–that is where they can make money–and they feel that wider implementation of these data formats will help their business rather than hinder it.

Office2

Office work in the days before computer support. This photograph of the U.S. Copyright Office (part of the Library of Congress) was made in about 1920 by an unknown photographer. Staff members are using typewriters and a card file to track and manage copyright information. The original photograph is held in the Geographical File in the Library’s Prints and Photographs Division.

Although an aside in this blog, it is worth noting that Microsoft and Adobe also provide open access to format specifications that are, in a strict sense, still proprietary. Microsoft now permits the dissemination of its specifications for binary doc, ppt, and xls, and copies have been posted for download at the Library’s Format Sustainability site. Meanwhile, Adobe makes its DNG photo file format specification freely available, as well as its older TIFF format specification.

Both developments–standardization for Office XML and PDF and open dissemination for Office, DNG and TIFF–are good news for digital-content preservation. Disclosure is one of our sustainability factors and these actions raise the disclosure levels for all of these formats, a good thing.

Meanwhile, readers should remember that the Format Sustainability Web site is not limited to formats that we consider desirable. We list as many formats (and subformats) as we can, as objectively as we can, so that others can choose the ones they prefer for a particular body of content and for particular use cases.

The Library of Congress, for example, has recently posted its preference statements for newly acquired content. The acceptable category for textual content on that list includes the OOXML family as well as OpenDocument (aka Open Document Format or ODF), another XML-formatted office suite. ODF was developed by the Organization for the Advancement of Structured Information Standards, an industry consortium. ODF’s standardization as ISO/IEC 23600 in 2006 predates ISO/IEC’s standardization of OOXML. The Format Sustainability team plans to draft descriptions for ODF very soon.

All in the (Apple ProRes 422 Video Codec) Family

We’ve spent a lot of time recently thinking about digital video issues. As mentioned in a previous blog post, the Federal Agencies Digitization Guidelines Initiative published several reports on this topic including “Creating and Archiving Born Digital Video.” Work on the “Eight Federal Case Histories” (PDF) report nudged us to add the Apple ProRes 422 […]

Report Available for the 2014 DPOE Training Needs Assessment Survey

The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress. In September, the Digital Preservation Outreach and Education (DPOE) program wrapped up the “2014 DPOE Training Needs Assessment Survey” in an effort to get a sense of current digital preservation practice, a better understanding about what capacity exists […]

Unlocking the Imagery of 500 Years of Books

The following is a guest post by Kalev H. Leetaru of Georgetown University (Former), Robert Miller of Internet Archive and David A. Shamma from Yahoo Labs/Flickr. In 1994, linguist Geoff Nunberg stated, in an article in the journal “Representations,” “reading what people have had to say about the future of knowledge in an electronic world, […]

New FADGI Report: Creating and Archiving Born Digital Video

As part of a larger effort to explore file formats, the Born Digital Video subgroup of the Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce the release of a new four-part report, “Creating and Archiving Born Digital Video.” This report has already undergone review by FADGI members and invited colleagues including […]

Comparing Formats for Video Digitization

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in the Office of Strategic Initiatives. FADGI format comparison projects. The Audio-Visual Working Group within the Federal Agencies Digitization Guidelines Initiative recently posted a comparison of a few selected digital file formats for consideration when reformatting videotapes. We sometimes call these […]

Convergence of Audiovisual Archivists in the ‘Fairest Cape': A Report of the 2014 IASA Conference

Upon seeing the Cape of Good Hope near Cape Town, South Africa, for the first time in 1580, Sir Francis Drake wrote in his diary that “this cape is the most stately thing and the fairest cape we saw in the whole circumference of the earth” And I have to say that I agree. In […]

Data Infrastructure, Education & Sustainability: Notes from the Symposium on the Interagency Strategic Plan for Big Data

Last week, theĀ  National Academies Board on Research Data and Information hosted a Symposium on the Interagency Strategic Plan for Big Data. Staff from the National Institutes of Health, the National Science Foundation, the U.S. Geological Survey and the National Institute for Standards and Technology presented on ongoing work to establish an interagency strategic plan […]