Top of page

Collaborations with Embedded Audio Metadata: Reusing Cue Chunk Data for IIIF Web Annotations

Share this post:

This guest post details research and preservation collaboration and is co-written by Tanya Clement at University of Texas at Austin; Sara Brumfield and Ben Brumfield at Brumfield Labs; Charles Hosale at American Folklife Center at the Library of Congress; Dave Walker at Smithsonian Center for Folklife and Cultural Heritage; Meghan Ferriter of LC Labs; and Kate Murray of Digital Collections Management and Services at the Library of Congress. 

In 2020-2021,  FADGI (Federal Agencies Digital Guidelines Initiative) – a collaborative group of 20 US federal agencies led by the Library of Congress – updated their well-known Guidelines for Embedding Metadata in Broadcast WAVE Files to, among other things, add the option to insert ‘Cue points’ in Broadcast WAVE files along with contextual embedded metadata. These guidelines are well-supported for implementation by the BWF MetaEdit open source application which was originally funded by the Library of Congress and FADGI in 2010 to support the first version of the FADGI guidelines. BWF MetaEdit is developed and maintained by MediaArea.

As the FADGI group was defining these metadata structure guidelines, we noted that some of this contextual information in the metadata is much the same type of information included in IIIF Annotation Layers. We first saw the link when looking for models to “code” the information in the BWF MetaEdit element ‘ltxt’ (described below) and came across the SENT metadata structure (speaker, environment, note, transcription) for IIIF Annotation Layers which was developed by Kylie Warkentin at the University of Texas at Austin in the AudiAnnotate project. FADGI extended this model to include a fifth code for “other.” With the four-character limit defined by the format, the codes FADGI uses are:

  • spea = speaker; to indicate a specific speaker name  
  • envi= environment noises like mic feedback, laughter, paper rustling, echoes, etc 
  • note = notes about the recording (ex: “possible cut in recording”) 
  • tran = transcription 
  • othr = other
Screenshot of BWF MetaEdit Cue Editor software featuring rows of cue chunks and columns with metadata about start and stop times and description of content
Image 1: Sample adtl data embedded via BWF MetaEdit for Man-on-the-Street,” New York, New York, December 8, 1941 Identifier: AFC 1941/004: AFS 6362. Note the “tran” code in the PurposeID field to indicate that this is transcription data.

Additional sample files provided by the Library of Congress and the Smithsonian Center for Folklife and Cultural Heritage are available on FADGI Test Sample Files for IIIF Web Annotations Using AudiAnnotate.

As we looked beyond the SENT model, we discovered that we had more in common with the IIIF Web Annotations. Thanks to BWF MetaEdit, we could export the ‘adtl’ chunk contextual data that is created for preservation purposes. What if this same data could be reused for access by researchers?  It’s at this point that we started to collaborate with the AudiAnnotate project. 

Collaborative Editing and More with AudiAnnotate Audiovisual Extensible Workflow

In response to the need for a workflow that supports IIIF manifest creation, collaborative editing, flexible modes of presentation, and permissions control, the AudiAnnotate project developed the AudiAnnotate Audiovisual Extensible Workflow (AWE), a documented workflow using the recently adopted IIIF standard for AV materials to help libraries, archives, and museums (LAMs), scholars, and the public access and use AV cultural heritage items. AWE connects existing best-of-breed, open source tools for AV management (Aviary), annotation (such as Audacity and OHMS), public code and document repositories (GitHub), and the AudiAnnotate web application for creating and sharing IIIF manifests and annotations. Users can use AWE as a complete sequence of tools and transformations for accessing, identifying, annotating, and sharing AWE “projects” such as singular pages or multi-page exhibits or editions with AV materials. Some examples include annotations of recordings like Zora Neal Hurston’s WPA field recordings in Jacksonville, FL (1939) available from the Library of Congress, a lesson plan that uses the audio recording “‘Criminal Syndicalism’ case, McComb, Mississippi,” from the Harry Ransom Center’s John Beecher Sound Recordings Collection, and annotations for Camille, from the Internet Archive. AWE is built on W3C web standards in IIIF for sharing online scholarship, and generates static web pages through GitHub that are lightweight and easy to preserve and harvest. AWE represents a new kind of AV ecosystem where the exchange is opened between institutional repositories, annotation software, online repositories and publication platforms, and all kinds of users.

Primarily used for preservation purposes, Broadcast WAVE files embed  “chunks,” each comprising a four-character code chunk identifier, the chunk size, and the chunk data. Each file starts with a RIFF header and a WAVE data type identifier, followed by a series of chunks. Every file must include a

  • Broadcast Audio Extension (‘bext’) chunk, containing metadata required for the exchange of information between broadcasters
  • Format chunk, which describes the format of the audio data, and
  • Data chunk, containing the audio data itself.

The Cue chunk (Cue) is an optional, non-repeatable chunk in WAVE files that contains any number of Cue Points. A Cue Point is a specific point of special interest in the audio waveform data, such as a change in speaker, start of a speech or vocal arrangement, just to name a few examples. Cue Points are sometimes referred to as flags or markers in digital audio applications. The contexts for individual Cue Points is defined not in the Cue chunk but in the Associated Data List Chunk (adtl) and its subchunks Label (labl), Note (note), and Labeled Text (ltxt).

Before publishing the recent guideline updates, FADGI collaborators noted that while many digital audio applications used for preservation support the creation of Cue Points, the implementation methods were far from standard. Few commercial software tools take advantage of subchunks. To maximize the potential for use in preservation and access contexts, the guidelines establish how the ‘adtl’ and associated subchunks can and should be used.

According to the FADGI guidelines, the ‘labl’ element is the primary label of the specific Cue Point and this information may be displayed next to markers, flags or cues in digital audio editors. The note ‘text’ element associates a comment to a specific Cue Point, either further explaining the labl text label or otherwise providing additional context. The ‘purpose’ element in the ‘ltxt’ chunk works in concert with the ‘ltxt’ text element with the purpose element defining the context of the information in the ‘ltxt’ text element.

As FADGI’s model is to develop guidelines first, then develop or support open source tools to implement the guidelines, the new information about the cue and adtl chunks was added to BWF MetaEdit starting with v 21.07. BWF MetaEdit was already supported importing, editing, embedding, and exporting specified metadata elements in WAVE audio files, including bext and INFO chunks. With this release, the ‘cue’ and ‘adtl’ chunks were added. 

Exploring a Use Case with BWF MetaEdit and American Folklife Center collections

As the AWE team began exploring use cases, LC Labs participated as a grant partner in surfacing potential applications. The potential research reuse enabled by BWF MetaEdit made a strong use case at the Library of Congress. After discussions with AudiAnnotate, the American Folklife Center collaborated with FADGI to develop two simple sample files to be used in experiments.

AFC selected two recordings that were already available on their LOC.gov digital collections. AFC took metadata that had been stored separately from the files and inserted it into the preservation wavs. One of the files – “My dear mother, don’t you cry; Soldier’s lament; Bandit song” – contains recording quality information and track start/stop points. “Man-on-the-Street,” New York, New York, December 8, 1941 contains a text transcript of the recording. The files show the breadth of data that can be embedded via the Cue chunk, the flexibility the “SENTO” model provides, and the value of embedding data for preservation. Embedding contextual data in preservation files helps ensure that information is perpetuated.

However, that data is only most valuable to users when systems actually show them the data. Common AV players and computer operating systems don’t do a great job of showing this embedded data to users, so the sample files also highlight the need for a presentation platform that surfaces Cue chunk data – a gap that AudiAnnotate can fill. 

Creating an Ingest Workflow for BWF MetaEdit

Working with the FADGI team, the AudiAnnotate project took an export of BWF MetaEdit Cue chunks in XML format and then transformed the time stamps and content of the Cue chunks into W3C Web Annotations. With that accomplished, AudiAnnotate’s existing web annotation driven exhibit interface displays the Cue chunks and enables search of the Cue chunks and playback of the specific time of the Cue chunk.

Screenshot of the AudiAnnotate interface that displays the audio file player and rows of cue chunks from BWF MetaEdit to demonstrate the start and stop of medata annotations
Image 2: IIIF Web Annotation metadata created from the extracted adlt metadata from the sample in Image 1.

 

WNYC digital collections use of BWFMetaEdit Cue chunks to note issues during transfers presents a case for the AudiAnnotate collaboration. WNYC Radio Archives Manager and frequent FADGI collaborator Marcos Sueiro Bal explains that their reformatting vendor commonly uses the Cue chunk to include edits after a second pass, error flags from digital decks, or generally problematic sections. WYNC also recently used BWFMetaEdit to embed a transcript. They transformed the original vendor-supplied text (with timecodes) to create a cue.xml file that they then imported into the file using BWFMetaEdit. 

Cue chunk data has a lot of interesting potential for AFC workflows, but also has some limitations. Having access to a pipeline modeled in this project – one that allows for collaborative editing and easier public presentation – would expand use cases beyond those focusing on preservation. We look forward to future collaboration on Cue chunk data and IIIF annotations!

Through this blog post, we hope to have highlighted potential ways that the updated guidelines and added support in BWF MetaEdit for under-used elements in WAVE files can yield more dynamic presentations and enhanced manipulation of digitally preserved assets. We encourage cultural stewards to consider other ways that Cue chunks and contextual subchunks can be folded into alternative preservation and access workflows and build upon these pilot projects.

Comments (2)

  1. Thanks for this report about a VERY USEFUL approach to packaging together recorded sound and certain types of, um, navigation-and-description-and-explanation-of-clarity-issues. We all understand this to be “lower-case-s” standardization but — as always — community consensus is a powerful force, paving the way for widespread adoption and (possibly) formal standardization. Bravo and onward!

  2. Thank you for the post! In case anyone is curious, here are screenshots of two DAWs that seem to adhere to the cue/adtl standard reasonably well. They both allow to search for text natively, and take you to the corresponding audio.

    https://github.com/MarcosSueiro/nypr-archives-ingest-scripts/blob/master/additionalDocs/auditionMarkers.png

    https://github.com/MarcosSueiro/nypr-archives-ingest-scripts/blob/master/additionalDocs/iZotopeMarkers.png

    Marcos Sueiro
    WNYC Archives

Add a Comment

Your email address will not be published. Required fields are marked *