Collaborations with Embedded Audio Metadata: Reusing Cue Chunk Data for IIIF Web Annotations

This guest post details research and preservation collaboration and is co-written by Tanya Clement at University of Texas at Austin; Sara Brumfield and Ben Brumfield at Brumfield Labs; Charles Hosale at American Folklife Center at the Library of Congress; Dave Walker at Smithsonian Center for Folklife and Cultural Heritage; Meghan Ferriter of LC Labs; and Kate Murray of Digital Collections Management and Services at the Library of Congress. 

In 2020-2021,  FADGI (Federal Agencies Digital Guidelines Initiative) – a collaborative group of 20 US federal agencies led by the Library of Congress – updated their well-known Guidelines for Embedding Metadata in Broadcast WAVE Files to, among other things, add the option to insert ‘Cue points’ in Broadcast WAVE files along with contextual embedded metadata. These guidelines are well-supported for implementation by the BWF MetaEdit open source application which was originally funded by the Library of Congress and FADGI in 2010 to support the first version of the FADGI guidelines. BWF MetaEdit is developed and maintained by MediaArea.

As the FADGI group was defining these metadata structure guidelines, we noted that some of this contextual information in the metadata is much the same type of information included in IIIF Annotation Layers. We first saw the link when looking for models to “code” the information in the BWF MetaEdit element ‘ltxt’ (described below) and came across the SENT metadata structure (speaker, environment, note, transcription) for IIIF Annotation Layers which was developed by Kylie Warkentin at the University of Texas at Austin in the AudiAnnotate project. FADGI extended this model to include a fifth code for “other.” With the four-character limit defined by the format, the codes FADGI uses are:

  • spea = speaker; to indicate a specific speaker name  
  • envi= environment noises like mic feedback, laughter, paper rustling, echoes, etc 
  • note = notes about the recording (ex: “possible cut in recording”) 
  • tran = transcription 
  • othr = other
Screenshot of BWF MetaEdit Cue Editor software featuring rows of cue chunks and columns with metadata about start and stop times and description of content

Image 1: Sample adtl data embedded via BWF MetaEdit for Man-on-the-Street,” New York, New York, December 8, 1941 Identifier: AFC 1941/004: AFS 6362. Note the “tran” code in the PurposeID field to indicate that this is transcription data.

Additional sample files provided by the Library of Congress and the Smithsonian Center for Folklife and Cultural Heritage are available on FADGI Test Sample Files for IIIF Web Annotations Using AudiAnnotate.

As we looked beyond the SENT model, we discovered that we had more in common with the IIIF Web Annotations. Thanks to BWF MetaEdit, we could export the ‘adtl’ chunk contextual data that is created for preservation purposes. What if this same data could be reused for access by researchers?  It’s at this point that we started to collaborate with the AudiAnnotate project. 

Collaborative Editing and More with AudiAnnotate Audiovisual Extensible Workflow

In response to the need for a workflow that supports IIIF manifest creation, collaborative editing, flexible modes of presentation, and permissions control, the AudiAnnotate project developed the AudiAnnotate Audiovisual Extensible Workflow (AWE), a documented workflow using the recently adopted IIIF standard for AV materials to help libraries, archives, and museums (LAMs), scholars, and the public access and use AV cultural heritage items. AWE connects existing best-of-breed, open source tools for AV management (Aviary), annotation (such as Audacity and OHMS), public code and document repositories (GitHub), and the AudiAnnotate web application for creating and sharing IIIF manifests and annotations. Users can use AWE as a complete sequence of tools and transformations for accessing, identifying, annotating, and sharing AWE “projects” such as singular pages or multi-page exhibits or editions with AV materials. Some examples include annotations of recordings like Zora Neal Hurston’s WPA field recordings in Jacksonville, FL (1939) available from the Library of Congress, a lesson plan that uses the audio recording “‘Criminal Syndicalism’ case, McComb, Mississippi,” from the Harry Ransom Center’s John Beecher Sound Recordings Collection, and annotations for Camille, from the Internet Archive. AWE is built on W3C web standards in IIIF for sharing online scholarship, and generates static web pages through GitHub that are lightweight and easy to preserve and harvest. AWE represents a new kind of AV ecosystem where the exchange is opened between institutional repositories, annotation software, online repositories and publication platforms, and all kinds of users.

Primarily used for preservation purposes, Broadcast WAVE files embed  “chunks,” each comprising a four-character code chunk identifier, the chunk size, and the chunk data. Each file starts with a RIFF header and a WAVE data type identifier, followed by a series of chunks. Every file must include a

  • Broadcast Audio Extension (‘bext’) chunk, containing metadata required for the exchange of information between broadcasters
  • Format chunk, which describes the format of the audio data, and
  • Data chunk, containing the audio data itself.

The Cue chunk (Cue) is an optional, non-repeatable chunk in WAVE files that contains any number of Cue Points. A Cue Point is a specific point of special interest in the audio waveform data, such as a change in speaker, start of a speech or vocal arrangement, just to name a few examples. Cue Points are sometimes referred to as flags or markers in digital audio applications. The contexts for individual Cue Points is defined not in the Cue chunk but in the Associated Data List Chunk (adtl) and its subchunks Label (labl), Note (note), and Labeled Text (ltxt).

Before publishing the recent guideline updates, FADGI collaborators noted that while many digital audio applications used for preservation support the creation of Cue Points, the implementation methods were far from standard. Few commercial software tools take advantage of subchunks. To maximize the potential for use in preservation and access contexts, the guidelines establish how the ‘adtl’ and associated subchunks can and should be used.

According to the FADGI guidelines, the ‘labl’ element is the primary label of the specific Cue Point and this information may be displayed next to markers, flags or cues in digital audio editors. The note ‘text’ element associates a comment to a specific Cue Point, either further explaining the labl text label or otherwise providing additional context. The ‘purpose’ element in the ‘ltxt’ chunk works in concert with the ‘ltxt’ text element with the purpose element defining the context of the information in the ‘ltxt’ text element.

As FADGI’s model is to develop guidelines first, then develop or support open source tools to implement the guidelines, the new information about the cue and adtl chunks was added to BWF MetaEdit starting with v 21.07. BWF MetaEdit was already supported importing, editing, embedding, and exporting specified metadata elements in WAVE audio files, including bext and INFO chunks. With this release, the ‘cue’ and ‘adtl’ chunks were added. 

Exploring a Use Case with BWF MetaEdit and American Folklife Center collections

As the AWE team began exploring use cases, LC Labs participated as a grant partner in surfacing potential applications. The potential research reuse enabled by BWF MetaEdit made a strong use case at the Library of Congress. After discussions with AudiAnnotate, the American Folklife Center collaborated with FADGI to develop two simple sample files to be used in experiments.

AFC selected two recordings that were already available on their LOC.gov digital collections. AFC took metadata that had been stored separately from the files and inserted it into the preservation wavs. One of the files – “My dear mother, don’t you cry; Soldier’s lament; Bandit song” – contains recording quality information and track start/stop points. “Man-on-the-Street,” New York, New York, December 8, 1941 contains a text transcript of the recording. The files show the breadth of data that can be embedded via the Cue chunk, the flexibility the “SENTO” model provides, and the value of embedding data for preservation. Embedding contextual data in preservation files helps ensure that information is perpetuated.

However, that data is only most valuable to users when systems actually show them the data. Common AV players and computer operating systems don’t do a great job of showing this embedded data to users, so the sample files also highlight the need for a presentation platform that surfaces Cue chunk data – a gap that AudiAnnotate can fill. 

Creating an Ingest Workflow for BWF MetaEdit

Working with the FADGI team, the AudiAnnotate project took an export of BWF MetaEdit Cue chunks in XML format and then transformed the time stamps and content of the Cue chunks into W3C Web Annotations. With that accomplished, AudiAnnotate’s existing web annotation driven exhibit interface displays the Cue chunks and enables search of the Cue chunks and playback of the specific time of the Cue chunk.

Screenshot of the AudiAnnotate interface that displays the audio file player and rows of cue chunks from BWF MetaEdit to demonstrate the start and stop of medata annotations

Image 2: IIIF Web Annotation metadata created from the extracted adlt metadata from the sample in Image 1.

 

WNYC digital collections use of BWFMetaEdit Cue chunks to note issues during transfers presents a case for the AudiAnnotate collaboration. WNYC Radio Archives Manager and frequent FADGI collaborator Marcos Sueiro Bal explains that their reformatting vendor commonly uses the Cue chunk to include edits after a second pass, error flags from digital decks, or generally problematic sections. WYNC also recently used BWFMetaEdit to embed a transcript. They transformed the original vendor-supplied text (with timecodes) to create a cue.xml file that they then imported into the file using BWFMetaEdit. 

Cue chunk data has a lot of interesting potential for AFC workflows, but also has some limitations. Having access to a pipeline modeled in this project – one that allows for collaborative editing and easier public presentation – would expand use cases beyond those focusing on preservation. We look forward to future collaboration on Cue chunk data and IIIF annotations!

Through this blog post, we hope to have highlighted potential ways that the updated guidelines and added support in BWF MetaEdit for under-used elements in WAVE files can yield more dynamic presentations and enhanced manipulation of digitally preserved assets. We encourage cultural stewards to consider other ways that Cue chunks and contextual subchunks can be folded into alternative preservation and access workflows and build upon these pilot projects.

Recommended Formats Statement: Updates for 2022-2023

Today’s guest post is from Liz Holdzkom, Marcus Nappier and Kate Murray of the Digital Collections Management & Services Division and Ted Westervelt, Chief, US/Anglo Division at the Library of Congress. Introduction As the Library of Congress expands its digital collecting activities, the Recommended Formats Statement (RFS) supports a structured methodology to assess the viability […]

FADGI is a Finalist for the Digital Preservation Coalition 20th Anniversary Award

Today’s guest post is from Kate Murray, Tom Rieger and Hana Beckerle, leaders of the FADGI working groups at the Library of Congress. The Federal Agencies Digital Guidelines Initiative (FADGI) is thrilled to announce that it is a finalist for the prestigious Digital Preservation Coalition (DPC) 20th Anniversary Award! The DPC 20th Anniversary Award celebrates a […]

FADGI Publishes Revision to Influential Still Image Digitization Guidelines

Today’s guest post is from Hana Beckerle, a 2021/22 Librarian-in-Residence at the Library of Congress. The Federal Agencies Digital Guidelines Initiative (FADGI) Still Image Working Group is pleased to announce the publication of the 3rd edition of the Technical Guidelines for Digitizing Cultural Heritage Materials. The newly-revised Guidelines are in draft form and are open for […]

New from FADGI: Mapping FFV1 into MXF

Today’s guest post is from Kate Murray, Digital Projects Coordinator in the Digital Collections Management and Services Division at the Library of Congress. The Federal Agencies Digital Guidelines Initiative (FADGI) AudioVisual working group is pleased to announce new resources to support diverse digital preservation workflows using the open source FFV1 video encoding. FADGI, through its […]

An Introduction to Born Digital Collections at the Manuscript Division, or How to Cross the Equator

The following guest post by Josh Levy, Historian of Science and Technology in the Library’s Manuscript Division, is part two of a series. You can find Part 1 of the series, “Doing History with Born Digital Files: the Rhoda Métraux and Edward Lorenz Papers,” posted on The Signal. Archives can’t just collect physical objects anymore. […]

Fun with File Formats

Today’s guest post is from Kate Murray, Marcus Nappier, and Liz Holdzkom of the Digital Collections Management & Services Division at the Library of Congress. Are you a file format fan? If you’re curious how to pronounce the still image format HEIF (spoiler alert: it rhymes with “beef”) or the difference between PDF/A-3 and PDF/A-4, […]

FADGI’s embARC: Extending embedded metadata support and validation for DPX and MXF files

Today’s guest post is from Kate Murray, Digital Projects Coordinator in Digital Collections Management and Services at the Library of Congress and Bertram Lyons, Partner and Managing Director for Software at AVP. Note: This is the last in a series of updates from the Federal Agencies Digital Guidelines Initiative (FADGI) Audio-Visual working group. See That’s […]

Reading the (Same) Signals: Using FADGI’s ADCTest for Quality Control in Outsourced Audio Digitization

This is the second in a series of updates from the Federal Agencies Digital Guidelines Initiative (FADGI) Audio-Visual working group. See That’s Our Cue! Updates for the FADGI Embedded Metadata Guidelines and BWF MetaEdit for the Cue Chunk in Broadcast Wave Files for the first installment. This post is co-authored by Kate Murray, Digital Projects […]