Collaborations with Embedded Audio Metadata: Reusing Cue Chunk Data for IIIF Web Annotations

This guest post details research and preservation collaboration and is co-written by Tanya Clement at University of Texas at Austin; Sara Brumfield and Ben Brumfield at Brumfield Labs; Charles Hosale at American Folklife Center at the Library of Congress; Dave Walker at Smithsonian Center for Folklife and Cultural Heritage; Meghan Ferriter of LC Labs; and Kate Murray of Digital Collections Management and Services at the Library of Congress. 

In 2020-2021,  FADGI (Federal Agencies Digital Guidelines Initiative) – a collaborative group of 20 US federal agencies led by the Library of Congress – updated their well-known Guidelines for Embedding Metadata in Broadcast WAVE Files to, among other things, add the option to insert ‘Cue points’ in Broadcast WAVE files along with contextual embedded metadata. These guidelines are well-supported for implementation by the BWF MetaEdit open source application which was originally funded by the Library of Congress and FADGI in 2010 to support the first version of the FADGI guidelines. BWF MetaEdit is developed and maintained by MediaArea.

As the FADGI group was defining these metadata structure guidelines, we noted that some of this contextual information in the metadata is much the same type of information included in IIIF Annotation Layers. We first saw the link when looking for models to “code” the information in the BWF MetaEdit element ‘ltxt’ (described below) and came across the SENT metadata structure (speaker, environment, note, transcription) for IIIF Annotation Layers which was developed by Kylie Warkentin at the University of Texas at Austin in the AudiAnnotate project. FADGI extended this model to include a fifth code for “other.” With the four-character limit defined by the format, the codes FADGI uses are:

  • spea = speaker; to indicate a specific speaker name  
  • envi= environment noises like mic feedback, laughter, paper rustling, echoes, etc 
  • note = notes about the recording (ex: “possible cut in recording”) 
  • tran = transcription 
  • othr = other
Screenshot of BWF MetaEdit Cue Editor software featuring rows of cue chunks and columns with metadata about start and stop times and description of content

Image 1: Sample adtl data embedded via BWF MetaEdit for Man-on-the-Street,” New York, New York, December 8, 1941 Identifier: AFC 1941/004: AFS 6362. Note the “tran” code in the PurposeID field to indicate that this is transcription data.

Additional sample files provided by the Library of Congress and the Smithsonian Center for Folklife and Cultural Heritage are available on FADGI Test Sample Files for IIIF Web Annotations Using AudiAnnotate.

As we looked beyond the SENT model, we discovered that we had more in common with the IIIF Web Annotations. Thanks to BWF MetaEdit, we could export the ‘adtl’ chunk contextual data that is created for preservation purposes. What if this same data could be reused for access by researchers?  It’s at this point that we started to collaborate with the AudiAnnotate project. 

Collaborative Editing and More with AudiAnnotate Audiovisual Extensible Workflow

In response to the need for a workflow that supports IIIF manifest creation, collaborative editing, flexible modes of presentation, and permissions control, the AudiAnnotate project developed the AudiAnnotate Audiovisual Extensible Workflow (AWE), a documented workflow using the recently adopted IIIF standard for AV materials to help libraries, archives, and museums (LAMs), scholars, and the public access and use AV cultural heritage items. AWE connects existing best-of-breed, open source tools for AV management (Aviary), annotation (such as Audacity and OHMS), public code and document repositories (GitHub), and the AudiAnnotate web application for creating and sharing IIIF manifests and annotations. Users can use AWE as a complete sequence of tools and transformations for accessing, identifying, annotating, and sharing AWE “projects” such as singular pages or multi-page exhibits or editions with AV materials. Some examples include annotations of recordings like Zora Neal Hurston’s WPA field recordings in Jacksonville, FL (1939) available from the Library of Congress, a lesson plan that uses the audio recording “‘Criminal Syndicalism’ case, McComb, Mississippi,” from the Harry Ransom Center’s John Beecher Sound Recordings Collection, and annotations for Camille, from the Internet Archive. AWE is built on W3C web standards in IIIF for sharing online scholarship, and generates static web pages through GitHub that are lightweight and easy to preserve and harvest. AWE represents a new kind of AV ecosystem where the exchange is opened between institutional repositories, annotation software, online repositories and publication platforms, and all kinds of users.

Primarily used for preservation purposes, Broadcast WAVE files embed  “chunks,” each comprising a four-character code chunk identifier, the chunk size, and the chunk data. Each file starts with a RIFF header and a WAVE data type identifier, followed by a series of chunks. Every file must include a

  • Broadcast Audio Extension (‘bext’) chunk, containing metadata required for the exchange of information between broadcasters
  • Format chunk, which describes the format of the audio data, and
  • Data chunk, containing the audio data itself.

The Cue chunk (Cue) is an optional, non-repeatable chunk in WAVE files that contains any number of Cue Points. A Cue Point is a specific point of special interest in the audio waveform data, such as a change in speaker, start of a speech or vocal arrangement, just to name a few examples. Cue Points are sometimes referred to as flags or markers in digital audio applications. The contexts for individual Cue Points is defined not in the Cue chunk but in the Associated Data List Chunk (adtl) and its subchunks Label (labl), Note (note), and Labeled Text (ltxt).

Before publishing the recent guideline updates, FADGI collaborators noted that while many digital audio applications used for preservation support the creation of Cue Points, the implementation methods were far from standard. Few commercial software tools take advantage of subchunks. To maximize the potential for use in preservation and access contexts, the guidelines establish how the ‘adtl’ and associated subchunks can and should be used.

According to the FADGI guidelines, the ‘labl’ element is the primary label of the specific Cue Point and this information may be displayed next to markers, flags or cues in digital audio editors. The note ‘text’ element associates a comment to a specific Cue Point, either further explaining the labl text label or otherwise providing additional context. The ‘purpose’ element in the ‘ltxt’ chunk works in concert with the ‘ltxt’ text element with the purpose element defining the context of the information in the ‘ltxt’ text element.

As FADGI’s model is to develop guidelines first, then develop or support open source tools to implement the guidelines, the new information about the cue and adtl chunks was added to BWF MetaEdit starting with v 21.07. BWF MetaEdit was already supported importing, editing, embedding, and exporting specified metadata elements in WAVE audio files, including bext and INFO chunks. With this release, the ‘cue’ and ‘adtl’ chunks were added. 

Exploring a Use Case with BWF MetaEdit and American Folklife Center collections

As the AWE team began exploring use cases, LC Labs participated as a grant partner in surfacing potential applications. The potential research reuse enabled by BWF MetaEdit made a strong use case at the Library of Congress. After discussions with AudiAnnotate, the American Folklife Center collaborated with FADGI to develop two simple sample files to be used in experiments.

AFC selected two recordings that were already available on their LOC.gov digital collections. AFC took metadata that had been stored separately from the files and inserted it into the preservation wavs. One of the files – “My dear mother, don’t you cry; Soldier’s lament; Bandit song” – contains recording quality information and track start/stop points. “Man-on-the-Street,” New York, New York, December 8, 1941 contains a text transcript of the recording. The files show the breadth of data that can be embedded via the Cue chunk, the flexibility the “SENTO” model provides, and the value of embedding data for preservation. Embedding contextual data in preservation files helps ensure that information is perpetuated.

However, that data is only most valuable to users when systems actually show them the data. Common AV players and computer operating systems don’t do a great job of showing this embedded data to users, so the sample files also highlight the need for a presentation platform that surfaces Cue chunk data – a gap that AudiAnnotate can fill. 

Creating an Ingest Workflow for BWF MetaEdit

Working with the FADGI team, the AudiAnnotate project took an export of BWF MetaEdit Cue chunks in XML format and then transformed the time stamps and content of the Cue chunks into W3C Web Annotations. With that accomplished, AudiAnnotate’s existing web annotation driven exhibit interface displays the Cue chunks and enables search of the Cue chunks and playback of the specific time of the Cue chunk.

Screenshot of the AudiAnnotate interface that displays the audio file player and rows of cue chunks from BWF MetaEdit to demonstrate the start and stop of medata annotations

Image 2: IIIF Web Annotation metadata created from the extracted adlt metadata from the sample in Image 1.

 

WNYC digital collections use of BWFMetaEdit Cue chunks to note issues during transfers presents a case for the AudiAnnotate collaboration. WNYC Radio Archives Manager and frequent FADGI collaborator Marcos Sueiro Bal explains that their reformatting vendor commonly uses the Cue chunk to include edits after a second pass, error flags from digital decks, or generally problematic sections. WYNC also recently used BWFMetaEdit to embed a transcript. They transformed the original vendor-supplied text (with timecodes) to create a cue.xml file that they then imported into the file using BWFMetaEdit. 

Cue chunk data has a lot of interesting potential for AFC workflows, but also has some limitations. Having access to a pipeline modeled in this project – one that allows for collaborative editing and easier public presentation – would expand use cases beyond those focusing on preservation. We look forward to future collaboration on Cue chunk data and IIIF annotations!

Through this blog post, we hope to have highlighted potential ways that the updated guidelines and added support in BWF MetaEdit for under-used elements in WAVE files can yield more dynamic presentations and enhanced manipulation of digitally preserved assets. We encourage cultural stewards to consider other ways that Cue chunks and contextual subchunks can be folded into alternative preservation and access workflows and build upon these pilot projects.

What’s new online at the Library of Congress – Winter 2021/22

Interested in learning more about what’s new in the Library of Congress’ digital collections? The Signal will now be sharing out semi-regularly about new additions to publicly-available digital collections and we can’t wait to show off all the hard work from our colleagues from across the Library. Read on for a sample of what’s been […]

Computing Cultural Heritage in the Cloud: An Interview with Victoria Scheppele

We are delighted to introduce Victoria (Tori) Scheppele, a Library Technician in the Prints & Photographs Division who has joined us temporarily to work on the Computing Cultural Heritage in the Cloud (CCHC) initiative. The CCHC initiative is supported by a generous grant from the Andrew W. Mellon Foundation. Centered in LC Labs, the project […]

The Open Access Books Collection: Expanding Access and Building Connections

This is a guest post by Kristy Darby, a Digital Collections Specialist in the Digital Content Management Section at the Library of Congress. In March 2020, we first shared about the growing collection of open access e-books available on loc.gov. A lot has changed since then but, in particular, the Open Access Books Collection was […]

Developing a New Digital Collections Strategy at the Nation’s Library

Today’s guest post is from Joe Puccio, Collection Development Officer at the Library of Congress. Tremendous progress has been made by the Library of Congress in acquiring born-digital content as part of a coordinated strategy presented in its 2017 Digital Collecting Plan and previously reported in the Signal. With that plan now in its fifth […]

Newspaper Navigator Search Application Now Live!

On September 15, 2020, the Library of Congress announced the release of Newspaper Navigator, an experimental web application which makes 1.5 million photographs from the dataset from Chronicling America available to the public to explore for the first time. Read more about the design and features of the project below or jump straight to the newly launched application at //news-navigator.labs.loc.gov/search !

Citizen DJ at the virtual National Book Festival

This post was originally featured on the Minerva’s Kaleidoscope blog for kids and families. We’re excited and grateful to be able to re-share about this opportunity to experience Citizen DJ at the virtual National Book Festival next week!