Top of page

Creating Workflows for Born-Digital Collections: An NDSR Project Update

Share this post:

The following is a guest post by Julia Kim, National Digital Stewardship Resident at New York University Libraries.

Julia Kim. Photo by Elena Olivo.
Julia Kim analyzing Jeremy Blake’s digital artwork. Photo by Elena Olivo.

I’m now into the last leg of my nine-month residency, and I’m amazed by what has been accomplished and the major steps still ahead of me. In this post, I’ll give a project update on my primary task: to create, test and implement access-driven workflows of born-digital collections at New York University Libraries.

My residency is very broad; I am tasked with investigating and implementing workflows that encompass the entirety of the born-digital process, from accession to access (project overview). This means that while I spent a month learning digital forensics techniques, I have also researched and implemented workflow steps that occur before acquisition and after ingest. Rather than signing off when the bits have been checked, duplicated and dispersed in multiple locations to long-term storage, I’ve also focused on access. In the past five months, I’ve worked on many collections. Such depth and breadth has been crucial. Time and again, I’ve been challenged to revise and refine my sense of the workflow.

The ingestion of incoming born-digital material is time consuming. In many cases, I only create a bit-exact disk image or copy of the content for ingest with minimal metadata from my end. NYU’s three archives (and now Abu Dhabi) collect actively. Imaging or copying files, validating, bagging and ingesting such increasingly large collections tie up our dedicated imaging station and localized storage. This past week, for example, I finished ingesting a collection into the repository with 2 TB, 5 TB and 3 TB hard drives. It took the full weekend to create the initial image of the 2 TB hard drive and validate with checksums and approximately the same amount of time for ingest into the repository. The Digital Forensics Lab, however, contains a number of other computers at my disposal in addition to the imaging desktop. This is also extremely helpful with collections that rely on other operating systems.

NYU's Digital Forensics Laboratory.
NYU’s Digital Forensics Laboratory.

Over the course of my residency I’ve also worked with the digital counterparts of previously published hybrid collections including Exit Art Archive (2 TB organizational RAID) and the Robert Fitch Papers (several floppy disks with easily renderable text files and no researcher restrictions). The collection I’ve spent the most time with is the Jeremy Blake Papers which were acquired in 2007. These “papers” include files copied on-site at the donor’s house from Blake’s MacBook Pro, an external hard drive and a flash drive. NYU also acquired several hundred optical disks, three additional hard drives, dozens of zip disks and digital linear tapes. The Blake Papers present many of the challenges that hinder access: sheer data size and variety of media format types, a prevalence of incompletely documented or misunderstood proprietary file formats, and complicated rights and privacy restrictions.

Jeremy Blake's Adobe PSD files, accessed with a Power PC
Jeremy Blake’s PSD files, accessed with a Power PC.

The bulk of the Blake Papers is composed of Photoshop files (PSD) that span the late 1990s to 2007. To create his work, Blake would collage different sources into Photoshop. These sources would be layered and further processed to create the dense and dreamlike imagery characteristic of his final moving image work. Blake would share these layered PSD files with close collaborators that animated his still images and composed the soundtracks under his close supervision.

PSD file format normalization was not a viable preservation solution. Normalization would render a file with fifty layers, turned on and off in different ways, into a singular flat image. Any normalization process would lose Blake’s working process, the area in which we thought his archive could be most valuable to future researchers. We cannot simply migrate the files to TIFF 6.0. Paradoxically, any TIFF that did encompass layers would no longer be a true TIFF.

While Photoshop has retained robust backward and forward compatibility with its files and software, Blake’s working methods are very much a product of the intersection of developing technologies and art-making practices of his time. His methods, were cutting edge at the time, but they seem unimaginably labor-intensive today. For these reasons, his works will be migrated through Photoshop software to the current version of Photoshop, but they will also be migrated and made accessible through emulations of the approximate software versions and operating systems used. Some of my focus recently was to create these emulations.

Emulated Access of Blake's artwork.
Emulated Access of Blake’s artwork.

Next month, I will lead and design a usability test of representative portions of the Jeremy Blake Papers and the Exit Art Collection with a small, representative group of NYU’s Fales Library & Special Collections researchers. This will serve as a pilot test for making accessible emulation of complex media. It will also be an opportunity to test my documentation as I explain these concepts and strategies to researchers unused to the idea of archival research done with only a (non-networked) laptop.

A secondary purpose will be to note qualities of interest to researchers. This may seem an odd question to pose, but given the still enormous effort needed to stabilize and make accessible this type of work, it is worth noting which qualities researchers are interested in. Their subjects of research and even their definition of “content” may differ. A digital humanist may be more interested in the timestamps across a large digital collection rather than any of the text and image “content” in the files themselves. Some researchers may be well versed in Photoshop’s changes, while some may only be interested in the finalized moving images. Through these pilot studies, I hope to answer some of these questions while creating a template for other archivists interested in replicating and adding to the data gathered from this study.

In addition to this technical work, I’m also coordinating a born-digital workflows CURATEcamp (April 23), which will be hosted at the beautiful landmark Brooklyn Historical Society in Brooklyn Heights. This un-conference will bring together digital archivists, stewards, repository managers, and staff involved in managing born-digital collections for discussions, presentations and demonstrations. In addition to two streams of small groups that will tackle issues like the Forensic Toolkit’s integration into workflows, we will also have a larger stream of demonstrations and workshops to highlight developments with BitCurator Access, for example.

In addition to CURATEcamp, I will be sharing updates of my work at the American Institute of Conservation conference (May 2015), as well as at the Society of American Archivists (August 2015). It’s been especially gratifying to be able to learn from different intersecting worlds and competencies, whether moving images, digital curation, fine art or archiving.

The activities and tasks mentioned in this post should keep me busy for the next two months. As someone who loves investigating and research with tangible “hands-on” components and outputs, this has been a great experience for me. I’d like to note that without the administrative and technical support from my mentors, Don Mennerich and Lisa Darms, this work would not have been at all possible. I have been able to explore very interesting questions with not only exceptional collections, but exceptional mentors.

Comments (2)

  1. Julia,

    A very nicely written and comprehensive blog post. Thank you.

    I have a question for you. Or, rather, I’d like to ask you for some clarification about accessibility.

    You noted that old formats continue to be a challenge to access.And some of the collections you worked with, while you ingested files to the repository, were huge — terabytes in size. Which implies that there must be many, many files that digital curators everywhere don’t have time to access and analyze. The volume is overwhelming. So one of the main goals of a digital curator would be to rescue the files, to just get the files safely off the old storage media on which they reside and into a reliable digital repository.

    There, the files could be checksummed — checked for fixity — periodically to make sure they’re still intact; that could be an automated process that didn’t involve a human. And, ideally, some researcher(s) someday would work with the digital curator to deal with the access part as needed.

    Do I have that process correct? It seems to be a reality of digital curation.

    Thanks again,

    Mike

  2. Hi Mike –

    Yes, the volume can be overwhelming! While we need to stabilize and preserve the bits, the task of making terabytes of data accessible is another struggle. We have automation for fixity, but finding ways to meaningfully arrange and present such material to researchers is a challenge and one that I’ve tackled to varying with the Blake and Exit Art collections.

    I didn’t want to bring up too much in the post and didn’t have a chance to discuss it, but NYU uses the Forensic Toolkit for arrangement. Sibyl Schaeffer touched on it in her previous Signal blog post (“We’re All Digital Archivists Now”) and Peter Chan’s video is a great overview of the software program’s functionality. As Sibyl’s post notes, more and more archivists will be “digital archivists” (whether processing paper or processing files).

    One point that I’d like to make is that through creating emulations, I was forced to go back and reevaluate our workflows. That is, creating these “test cases” for born-digital access impacted how we process collections and is something repositories should think about sooner rather than later.

    Thanks for giving me the opportunity to expand a little! Let me know if I misunderstood something or if you had another point for clarification?

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.