Bagger’s Enhancements for Digital Accessions

This is a guest post by John Scancella, Information Technology Specialist with the Library of Congress, and Tibaut Houzanme, Digital Archivist with the Indiana Archives and Records Administration. BagIt is an internationally accepted method of transferring files via digital containers. If you are new to BagIt, please watch our introductory video.

John Scancella. Photo by Mike Ashenfelder.

John Scancella. Photo by Mike Ashenfelder.

Bagger is a digital records packaging and validation tool based on the BagIt Specification. This BagIt-compliant software allows creators and recipients of BagIt packages to verify that the files in the bag are complete and valid. This is done by creating manifests of the files that exist in the bag and their corresponding checksum values.

Bagger, built in Java, works in a variety of computing environments such as Windows, Linux and Mac. As a graphic user interface application, Bagger is a simpler tool for the average computer user than the text-only command-line interface implementation of BagIt.

Many improvements were made to Bagger recently:

  • Added more profiles to give the user and archival communities more options. Users can select from various profiles and fields to decide on their own requirements.
  • Bagger’s build system was switched to Gradle. Gradle is quickly becoming the standard build system for Java applications, and its use contributes to future-proofing Bagger’s improvements by giving Bagger the advantage of having a domain-specific language that leads to concise, maintainable and comprehensible builds.
  • The lowest compatible version of Java that Bagger can run with now is 1.7. Running Bagger with at least Java 1.7 helps with security and brings a host of new programming language features that allow for easier maintenance and performance improvement.
  • General code cleanup was performed for easier maintenance.
  • Long standing bugs and issues were fixed.

The Indiana Archives and Records Administration prepared a relatively detailed accession profile that is included with Bagger 2.5.0. A generic version of this profile is also available, where metadata fields are all optional.

These profiles were designed to help facilitate the accessioning of digital records, with preservation actions and management in mind. Overall, intellectual and physical components of digital records’ metadata were targeted. The justifications behind the metadata fields in these new profiles are:

  1. Consistent metadata fields with simple descriptors. The metadata field names use clear and simple terms. The consistency in the order of the fields on the display screen and in the metadata text file (part of the recent improvements) is also a benefit to data entry and review. The profiles use pre-identified values in drop-down menus that will help reduce typing mistakes and enforce cleaner metadata collection. The Indiana profile also uses pre-populated field entries, such as names and addresses, which help reduce repetitive data entry and save time during accessioning.
  2. Adaptable to various institutional contexts and practices. IARA requires the collection of metadata that it deems essential for digital records; these are represented in its profile. To make the profile adaptable across institutions, the generic version uses optional fields only. Individual users can edit the metadata fields, delete them or change their optional/required status. Switching between “Required: false” to “Required: true” in the local JSON file will be sufficient to help achieve the desired level of enforcement appropriate for each institution. Additional fields from the main menu can be added that draw from the BagIt specification. Also, custom metadata fields can be created or added on the fly.
  3. Collection of data points that matter for preservation decisions and actions. Some of the metadata fields added to standard accession fields help to identify records that are available only in digital formats so they can be treated accordingly; others assist with being able to locate records in proprietary digital formats that need migration to open standards formats. Information about sensitive records can also be captured to assist with prioritization and access management.
  4. Make automation possible through fields mapping. By using consistent and orderly metadata fields in a profile, you will create bags with a well-structured and predictable metadata sequence and value. This makes it easier to map the bag’s fields, values or collected information to a preservation system’s database fields. Investing in this automation opportunity will likely reduce the data entry time when importing bags into a preservation system. This assumes that the preservation system is either BagIt-compliant already (interoperability benefit) or will be made to effectively know what to do with each part of the bag, each metadata field and the captured values (to be achieved through integration).

Following are two screenshots of Bagger with the full list of metadata fields for a sample accession:
Screen shot of Bagger with metadata fields filled in.
Figure 1: IARA Profile with Sample Accession Screen 1 of 2 [ENLARGE]

Screenshot of Bagger tool.
Figure 2: IARA Profile with Sample Accession[ENLARGE]

In both screenshots, the letter “R” next to a metadata field means that you must enter or select a value, or the right value, before the bag can be finalized. The drop-down selection marked with “???” indicates that a value can be selected through clicks. Also question marks “???” as a value, or a different value in their place, can be used as a placeholder that may be found and replaced later with the correct value. In IARA’s experience, a single accession may come on multiple storage media/carriers. For that reason, the “records/medium carrier” field has been repeated five times (arbitrarily) to allow for multiple choices and entries; it can be further expanded. The number of media received, when entered with consistency, can help with easier media count and inventories.

Once completed, Bagger also adds, in the “bag-info.txt” metadata file, the size of the bag in Bytes and in Megabytes. When all the required metadata is entered and the files added, the bag can be completed. A successful bagging session process will see this message displayed: “Bag Saved Successfully.”

The fictitious metadata values in the first two figures are for demonstration and include additional metadata such as hash value and file size in the figure below:

Screenshot of Bagger tool outputFigure 3: Metadata Fields and Values in the bag-info.txt File after Bag Creation [ENLARGE]

This test accession used random files freely accessible from the Digital Corpora and Open Preservation websites.

IARA’s accession profile, the generic version or any profile available in Bagger, can be used as is if it meets the user’s requirements. Or they can be customized to fit institutional needs, such as enforcing certain metadata, field-name modifications, additional fields or drop-down values, and to support other document forms (e.g. audiovisual metadata fields such as linear duration of content). As Bagger’s metadata remain extensible, a profile can be created to fit almost any project. And the more profiles are available directly in Bagger, the better for the archival community who will have choices.

To use the IARA’s profile, its generic version or any other profile in Bagger, download the latest version (as of this writing 2.5.0). To start an accession, select the appropriate profile from the drop-down list. This will populate the screen with profile-specific metadata fields. Select files or folders, enter values and save the bag.
For detailed instructions on how to edit metadata fields and their obligation level, create a new profile, or change an existing profile to meet the project/institution’s requirements, please refer to the Bagger User Guide in the “doc” folder inside the downloaded file.

BagIt has been adopted for digital preservation by The Library of Congress, the Dryad Data Repository, the National Science Foundation DataONE and the Rockefeller Archive Center. BagIt is also used at Cornell, Purdue, Stanford, Ghent, New York and the University of California. BagIt has been implemented in Python, Ruby, Java, Perl, PHP, and in other programming languages.

We encourage feedback for BagIt. Here are some ways to contribute:

The generic profile has benefited greatly from the overall framework developed by the InterPARES Trust’s prior studies on authenticity and metadata profile in applications. Our sincere thanks to Prof. Luciana Duranti, Corinne Rogers, Joseph T. Tennis (UW), Lyse Rowledge, Kat Timms (LAC), Scott Owens, and the InterPARES Trust Team for the permission to use part of their content, additional resources and useful comments they shared. Their contribution is primarily of conceptual nature.

The following colleagues helped improve the profiles from practitioner’s perspective: Carol Kussmann (Digital Preservation Analyst at the University of Minnesota Libraries), Sarah Barsness (Digital Collections Assistant, Minnesota Historical Society), and Nick Connizzo (Digital Archivist, Vermont State Archives).

Intellectual Property Rights Issues for Software Emulation: An Interview with Euan Cochrane, Zach Vowell, and Jessica Meyerson

The following is a guest post by Morgan McKeehan, National Digital Stewardship Resident at Rhizome. She is participating in the NDSR-NYC cohort. I began my National Digital Stewardship Residency at Rhizome — NDSR project description here (PDF) — by leading a workshop for the Emulation as a Service framework (EaaS), at “Party Like it’s 1999: […]

APIs: How Machines Share and Expose Digital Collections

Kim Milai, a retired school teacher, was searching on for information about her great grandfather, Amohamed Milai, when her browser turned up something she had not expected: a page from the Library of Congress’s Chronicling America site displaying a scan of the Harrisburg Telegraph newspaper from March 13, 1919. On that page was a story […]

Acquiring at Digital Scale: Harvesting the Collection

This post was originally published on the Folklife Today blog, which features folklife topics, highlighting the collections of the Library of Congress, especially the American Folklife Center and the Veterans History Project.  In this post, Nicole Saylor, head of the American Folklife Center Archive, talks about the mobile app and interviews Kate Zwaard and […]

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update

The following is a guest post by John Caldwell, National Digital Stewardship Resident at the United States Senate Historical Office. Who remembers Home Improvement? Tim the “Tool Man” Taylor was always trying to show the “Tool Time” audience how to build things, make repairs and of course, demo new tools made by the show’s sponsor, […]

Improving Technical Options for Audiovisual Collections Through the PREFORMA Project

The digital preservation community is a connected and collaborative one. I first heard about the Europe-based PREFORMA project last summer at a Federal Agencies Digitization Guidelines Initiative meeting when we were discussing the Digital File Formats for Videotape Reformatting comparison matrix. My interest was piqued because I heard about their incorporation of FFV1 and Matroska, […]

Cultural Institutions Embrace Crowdsourcing

Many cultural institutions have accelerated the development of their digital collections and data sets by allowing citizen volunteers to help with the millions of crucial tasks that archivists, scientists, librarians, and curators face. One of the ways institutions are addressing these challenges is through crowdsourcing. In this post, I’ll look at a few sample crowdsourcing projects […]

The National Digital Platform for Libraries: An Interview with Trevor Owens and Emily Reynolds from IMLS

I had the chance to ask Trevor Owens and Emily Reynolds at the Institute of Museum and Library Services (IMLS) about the national digital platform priority and current IMLS grant opportunities.  I was interested to hear how these opportunities could support ongoing activities and research in the digital preservation and stewardship communities. Erin: Could you […]

Seeking Comment on Migration Checklist

The NDSA Infrastructure Working Group’s goals are to identify and share emerging practices around the development and maintenance of tools and systems for the curation, preservation, storage, hosting, migration, and similar activities supporting the long term preservation of digital content. One of the ways the IWG strives to achieve their goals is to collaboratively develop […]

Viewshare Supports Critical Thinking in the Classroom

This year I had the pleasure of meeting Dr. Peggy Spitzer Christoff, lecturer in Asian and Asian American Studies at Stony Brook University. She shared with me how she’s using the Library of Congress’ Viewshare tool to engage her students in an introduction to Asia Studies course. Peg talked about using digital platforms as a way to improve writing, […]