Bagger’s Enhancements for Digital Accessions

This is a guest post by John Scancella, Information Technology Specialist with the Library of Congress, and Tibaut Houzanme, Digital Archivist with the Indiana Archives and Records Administration. BagIt is an internationally accepted method of transferring files via digital containers. If you are new to BagIt, please watch our introductory video.

John Scancella. Photo by Mike Ashenfelder.

John Scancella. Photo by Mike Ashenfelder.

Bagger is a digital records packaging and validation tool based on the BagIt Specification. This BagIt-compliant software allows creators and recipients of BagIt packages to verify that the files in the bag are complete and valid. This is done by creating manifests of the files that exist in the bag and their corresponding checksum values.

Bagger, built in Java, works in a variety of computing environments such as Windows, Linux and Mac. As a graphic user interface application, Bagger is a simpler tool for the average computer user than the text-only command-line interface implementation of BagIt.

Many improvements were made to Bagger recently:

  • Added more profiles to give the user and archival communities more options. Users can select from various profiles and fields to decide on their own requirements.
  • Bagger’s build system was switched to Gradle. Gradle is quickly becoming the standard build system for Java applications, and its use contributes to future-proofing Bagger’s improvements by giving Bagger the advantage of having a domain-specific language that leads to concise, maintainable and comprehensible builds.
  • The lowest compatible version of Java that Bagger can run with now is 1.7. Running Bagger with at least Java 1.7 helps with security and brings a host of new programming language features that allow for easier maintenance and performance improvement.
  • General code cleanup was performed for easier maintenance.
  • Long standing bugs and issues were fixed.

The Indiana Archives and Records Administration prepared a relatively detailed accession profile that is included with Bagger 2.5.0. A generic version of this profile is also available, where metadata fields are all optional.

These profiles were designed to help facilitate the accessioning of digital records, with preservation actions and management in mind. Overall, intellectual and physical components of digital records’ metadata were targeted. The justifications behind the metadata fields in these new profiles are:

  1. Consistent metadata fields with simple descriptors. The metadata field names use clear and simple terms. The consistency in the order of the fields on the display screen and in the metadata text file (part of the recent improvements) is also a benefit to data entry and review. The profiles use pre-identified values in drop-down menus that will help reduce typing mistakes and enforce cleaner metadata collection. The Indiana profile also uses pre-populated field entries, such as names and addresses, which help reduce repetitive data entry and save time during accessioning.
  2. Adaptable to various institutional contexts and practices. IARA requires the collection of metadata that it deems essential for digital records; these are represented in its profile. To make the profile adaptable across institutions, the generic version uses optional fields only. Individual users can edit the metadata fields, delete them or change their optional/required status. Switching between “Required: false” to “Required: true” in the local JSON file will be sufficient to help achieve the desired level of enforcement appropriate for each institution. Additional fields from the main menu can be added that draw from the BagIt specification. Also, custom metadata fields can be created or added on the fly.
  3. Collection of data points that matter for preservation decisions and actions. Some of the metadata fields added to standard accession fields help to identify records that are available only in digital formats so they can be treated accordingly; others assist with being able to locate records in proprietary digital formats that need migration to open standards formats. Information about sensitive records can also be captured to assist with prioritization and access management.
  4. Make automation possible through fields mapping. By using consistent and orderly metadata fields in a profile, you will create bags with a well-structured and predictable metadata sequence and value. This makes it easier to map the bag’s fields, values or collected information to a preservation system’s database fields. Investing in this automation opportunity will likely reduce the data entry time when importing bags into a preservation system. This assumes that the preservation system is either BagIt-compliant already (interoperability benefit) or will be made to effectively know what to do with each part of the bag, each metadata field and the captured values (to be achieved through integration).

Following are two screenshots of Bagger with the full list of metadata fields for a sample accession:
Screen shot of Bagger with metadata fields filled in.
Figure 1: IARA Profile with Sample Accession Screen 1 of 2 [ENLARGE]

Screenshot of Bagger tool.
Figure 2: IARA Profile with Sample Accession[ENLARGE]

In both screenshots, the letter “R” next to a metadata field means that you must enter or select a value, or the right value, before the bag can be finalized. The drop-down selection marked with “???” indicates that a value can be selected through clicks. Also question marks “???” as a value, or a different value in their place, can be used as a placeholder that may be found and replaced later with the correct value. In IARA’s experience, a single accession may come on multiple storage media/carriers. For that reason, the “records/medium carrier” field has been repeated five times (arbitrarily) to allow for multiple choices and entries; it can be further expanded. The number of media received, when entered with consistency, can help with easier media count and inventories.

Once completed, Bagger also adds, in the “bag-info.txt” metadata file, the size of the bag in Bytes and in Megabytes. When all the required metadata is entered and the files added, the bag can be completed. A successful bagging session process will see this message displayed: “Bag Saved Successfully.”

The fictitious metadata values in the first two figures are for demonstration and include additional metadata such as hash value and file size in the figure below:

Screenshot of Bagger tool outputFigure 3: Metadata Fields and Values in the bag-info.txt File after Bag Creation [ENLARGE]

This test accession used random files freely accessible from the Digital Corpora and Open Preservation websites.

IARA’s accession profile, the generic version or any profile available in Bagger, can be used as is if it meets the user’s requirements. Or they can be customized to fit institutional needs, such as enforcing certain metadata, field-name modifications, additional fields or drop-down values, and to support other document forms (e.g. audiovisual metadata fields such as linear duration of content). As Bagger’s metadata remain extensible, a profile can be created to fit almost any project. And the more profiles are available directly in Bagger, the better for the archival community who will have choices.

To use the IARA’s profile, its generic version or any other profile in Bagger, download the latest version (as of this writing 2.5.0). To start an accession, select the appropriate profile from the drop-down list. This will populate the screen with profile-specific metadata fields. Select files or folders, enter values and save the bag.
For detailed instructions on how to edit metadata fields and their obligation level, create a new profile, or change an existing profile to meet the project/institution’s requirements, please refer to the Bagger User Guide in the “doc” folder inside the downloaded Bagger.zip file.

BagIt has been adopted for digital preservation by The Library of Congress, the Dryad Data Repository, the National Science Foundation DataONE and the Rockefeller Archive Center. BagIt is also used at Cornell, Purdue, Stanford, Ghent, New York and the University of California. BagIt has been implemented in Python, Ruby, Java, Perl, PHP, and in other programming languages.

We encourage feedback for BagIt. Here are some ways to contribute:

The generic profile has benefited greatly from the overall framework developed by the InterPARES Trust’s prior studies on authenticity and metadata profile in applications. Our sincere thanks to Prof. Luciana Duranti, Corinne Rogers, Joseph T. Tennis (UW), Lyse Rowledge, Kat Timms (LAC), Scott Owens, and the InterPARES Trust Team for the permission to use part of their content, additional resources and useful comments they shared. Their contribution is primarily of conceptual nature.

The following colleagues helped improve the profiles from practitioner’s perspective: Carol Kussmann (Digital Preservation Analyst at the University of Minnesota Libraries), Sarah Barsness (Digital Collections Assistant, Minnesota Historical Society), and Nick Connizzo (Digital Archivist, Vermont State Archives).

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.