The following is a guest post by Carl Fleischhauer, who organized the FADGI Audio-Visual Working Group in 2007. Fleischhauer recently retired from the Library of Congress.
The Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce a milestone in the development of the AS-07 MXF video-preservation format specification. AS-07 has taken shape under the auspices of a not-for-profit trade group: the Advanced Media Workflow Association. AS-07 is now an official AMWA Proposed Specification, and the current version (CC by SA Creative Commons license and all) has been posted at the AMWA website. Although this writer retired from the Library in April, he helped shepherd the specification through this phase.
AS-07 is one of three new AMWA specifications announced in June. Another one is the organization’s new process rule book. The new AMWA process is patterned on the Requests for Comment approach used by the Internet Engineering Task Force. In the new AMWA scheme, there are three levels of maturity:
- Work in Progress
- Proposed Specification
Two earlier versions of AS-07 were exposed for community comment at the AMWA website, beginning in September 2014, and this met the requirements for a Work in Progress. For more information about the history of AS-07, refer to the FADGI website.
AS-07 is a standards-based specification. For the most part it is a cookbook recipe for a particular subtype of the MXF standard. MXF stands for Material eXchange Format, and that format’s complex and lengthy set of rules and options is spelled out in more than thirty standards from the Society of Motion Picture and Television Engineers. AS-07 also enumerates a number of permitted encodings and other components, each of which is based on other standards from SMPTE, the International Organization for Standardization and International Electrotechnical Commission, the European Broadcast Union, and special White Paper documents from the British Broadcasting Corporation. It is no wonder that a cookbook recipe is called for!
Why the emphasis on standards? The short answer is that standards underpin interoperability, in the digital world just as surely as they have for, say, the dimensions of railroad tracks, so my boxcar will roll down your rail line. It is worth saying that, in our preservation context, interoperability has both current and future dimensions. Today, cooperating archives may exchange preservation master files and these must be readable by both parties. More important, however, is temporal interoperability: today’s content must be readable by the archive of tomorrow. AS-07’s extensive use of standards-based design supports both types of interoperability.
At a high level, the objectives for video archival master files (aka preservation masters) are like those for the digital preservation reformatting for other categories of content. Archives want their masters to reproduce picture and sound at very high levels of quality. In addition, the preservation masters should be complete and authentic copies of the originals, i.e., in the case of video, they should retain components like multiple timecodes, closed captions and multiple soundtracks. And–back to temporal interoperability–the files must support access by future users.
What are some of the features of AS-07? The specification emphasizes encodings that ensure the highest possible quality of picture and sound, including requirements for declaring the correct aspect ratio and handling the intricacies of interlaced picture, a characteristic of pre-digital video. Beyond those elements, AS-07 also specifies options for the following:
- Captions and Subtitles
- retain and provide carriage for captions and subtitles
- translate binary-format captions and subtitles to XML Timed Text
- Audio Track Layout and Labeling
- provide options for audio track layout and labeling
- Content integrity
- provide support for within-file content integrity data
- provide coherent master timecode
- retain legacy timecode
- label multiple timecodes
- Embedding Text-Based and Binary Data
- provide carriage of supplementary metadata (text-based data)
- provide carriage of captions and subtitles in the form of Timed Text (text-based data)
- provide carriage of a manifest (text-based data)
- provide carriage of still images, documents, EBU STL, etc. (binary data)
- Language Tagging
- provide a means to tag Timed Text languages
- retain language tagging associated with legacy binary caption or subtitle data
- provide a means to tag soundtrack languages
- provide support for segmented content
AS-07 has not been exclusively developed in writing (“on paper,” in oldspeak). The format is based on pioneering work done by Jim Lindner in the early 2000s, when he developed a system called SAMMA (System for the Automated Migration of Media Archives). SAMMA produces MXF files for which the picture data is encoded as lossless JPEG 2000 frame images. It also operates in a robotic mode, to support high-volume reformatting.
Jim’s design for SAMMA was motivated by the forecasts for high-volume reformatting at the Library’s audio-visual center in Culpeper, Virginia (today’s Packard Campus for Audio-Visual Conservation), which was then in its planning phase. The Packard Campus began operation in 2007 and, since then, more than 160,000 videotapes have been reformatted using the SAMMA system. AS-07 is very much a refinement and elaboration of the SAMMA format. In order to get a better look at those refinements, in 2015, the AS-07 team commissioned the production of custom-made sample files.
What next? The interesting — and I think proper — feature of the new AMWA process concerns the movement from Proposed Specification to Specification. The rulebook lists several bullets as requirements but the gist is this: you gotta have implementation and adoption. AS-07 at this time is, metaphorically, a recipe ready to test in the kitchen. Now it is time to cook and taste the pudding. After there are instances of implementation and adoption, these will be reported to the AMWA board with a request to advance AS-07 to the level of [approved] Specification. (Of course, if the process reveals problems, the specification will be modified.)
The first steps toward implementation are under way. On FADGI’s behalf, the Library has contracted with Audiovisual Preservation Solutions and EVS to assemble additional test files, and to have them reviewed by an outside expert. At the same time, James Snyder, the Senior Systems Administrator at the Packard Campus, is working with vendors to do some actual workups. (James oversees the campus’s use of SAMMA and has been an active AS-07 team member.) We trust that these implementation efforts will bear fruit during the remaining months of 2016.
I was wondering if you could give me more details on the labeling of interlaced frame images with JPEG 2000. I went through the AS-07 Proposed Specification and did not find (apart from page 19 of 114) enough information of how to implement MXF/JPEG 2000 when it comes to digitizing interleaving videos.
Thank you for your answer. Kind regards,
Thanks for the question, Julien. Forgive my very lengthy comment but this may be of interest to others as well, and I have tried to provide a thorough response. You used two different terms–interlacing and interleaving–with rather different meanings and different implications for MXF and JPEG 2000 picture. So I will say something about both.
INTERLACING, MXF, AND JPEG 2000. The SMPTE MXF format is specified in a series of thirty-odd standards documents (we have a list here — but I fear that omits a few recent additions). One key feature of the format is this: although MXF is a “wrapper,” the standards require a special cookbook for each video and sound essence. The cookbook spells out how a given essence is to be “mapped” to what is called the MXF Generic Container (GC). These mappings are provided in separate standards documents, thereby making the format extensible (a good thing!). For example, just now a mapping is being worked out for Apple ProRes video, which will permit the creation of MXF files that carry ProRes and–in the course of the process at SMPTE–will also pin down the ProRes format itself (another good thing).
JPEG 2000 is mapped to MXF in SMPTE standard ST 422. The first version of ST 422 was published in 2006, and it resulted from work by the specialists who were developing the (separate) digital cinema family of specifications. The picture essence in digital cinema takes the form of JPEG 2000 frame images wrapped in MXF. Digital cinema picture data is progressively scanned, not interlaced. The focus on digital cinema requirements meant that the mapping of interlaced picture got a bit of short shrift.
In the years following the 2006 publication, two or three vendors developed systems unrelated to digital cinema that wrapped interlaced video as JPEG 2000. Alas, since the recipe in the standard did not spell things out clearly, these vendors’ wrappings differed, and the resulting files were not interoperable.
In 2012, as we were developing AS-07, we joined a number of others who sought improvements in the specification for interlaced video as JPEG 2000 in MXF. The relevant SMPTE standards committee made a revision, published in 2014. Members of the committee–and this is perfectly normal–included stakeholders from companies that had already invested in one or another wrapping method. Thus the final 2014 version of ST 422 features three options for interlaced wrapping (and two for progressive):
• “I1” Interlaced Frame Wrapping, 1 field per KLV Element (details spelled out in section 5.4 in the standard)
• “I2” Interlaced Frame Wrapping, 2 fields per KLV Element (spelled out in 5.5)
• “F1” Field Wrapping, 1 field per KLV Element (spelled out in 5.6)
As we completed drafting AS-07, we wanted to constrain this a bit more. You will find our informative and normative statements on this topic in section 188.8.131.52 (with several subsections) on pages 26-27 of the document posted on the Advanced Media Workflow Association website.
Here’s an excerpt from the normative statement: “AS-07 encoders shall place JPEG 2000 picture essences in a SMPTE ST 422-compliant GC Element. . . . Interlaced picture data in JPEG 2000 encodings shall be formatted in accordance with case I1 or case I2 as specified in SMPTE ST 422:2014, section 6.3, and labeled 03h or 04h respectively as specified in section 6.4 table 2.” The labeling refers to the SMPTE Universal Labels required by the referenced table in standard ST 422. This is, of course, metadata embedded in the file that tells the decoder which wrapping was used in a given file (a very good thing).
INTERLEAVING AND MXF. This has to do with the way a file stores the essences, and is an independent variable from interlacing. The following is somewhat simplified for this blog. Roughly speaking, in an interleaved video file, segments (usually frames) of the picture and audio are stored together and when the file is played, the player decodes the segments “together” and presents them in an efficient way. The main alternative is for a file to store separate picture and audio as whole-item chunks, which requires the player to decode and stitch them together at play time. Less efficient but in some situations serves other purposes. BTW: one common instance of interleaving (in a broad sense) is the case of stereo audio, as on a CD or in an audio file, where a tiny segment of “left” is stored next to a tiny segment of “right” and the decoder puts them together at play time. Also very efficient.
The MXF standard offers more than one spin on this, but the main one of interest here is what is called frame wrapping, as compared to clip wrapping. AS-07 simply embraces the existing SMPTE standards, starting with ST 377-1:2011. In frame wrapping–referenced in AS-07 in section 6.2.2 and subsections (pages 19-20)–you get what MXF calls content packages that carry both picture and sound together. We prefer the interleaving offered by frame wrapping, although AS-07 permits clip wrapping as noted in section 184.108.40.206.1 (page 19)
AMWA offers a nice primer on MXF, and pages 5-7 (with an illustration) describe the structure of the MXF content package, with interleaved picture and sound.
IMPLEMENTATIONS AND EXAMPLES? At this writing, we are not well informed about systems that may have been deployed to conform to the 2014 SMPTE ST 422 standard pertaining to interlacing in JPEG 2000. Meanwhile, however, the Library has engaged a contractor to prepare some AS-07 sample files, including some that carry JPEG 2000 interlaced picture. We hope that these will be ready in the next few weeks and we will post them at the FADGI site in hopes that they will be of assistance to those with an interest in AS-07.
Thanks to all for your patience with this lengthy response!
Thank you very much for you answer. I indeed mistakenly thought that interlacing and interleaving referred to the same notion. I will have a look at this AMWA’s white paper on the structure of an MXF file.