Preserving and managing research data is a significant concern for scientists and staff at research libraries. With that noted, many likely don’t realize the length of time in which valuable scientific data has accrued on a range of media in research settings. That is, data management often needs to be both backward- and forward-looking, considering a range of legacy media and formats as well as contemporary practice. To that end, I am excited to interview Emily Frieda Shaw, Head of Preservation and Reformatting at Ohio State University (prior to August 2014 she was the Digital Preservation Librarian at the University of Iowa Libraries). Emily talked about her work on James Van Allen’s data from the Explorer satellites launched in the 1950s at the Digital Preservation 2014 conference and I am excited to explore some of the issues that work raises.
Trevor: Could you tell us a bit about the context of the data you are working with? Who created it, how was it created, what kind of media is it on?
Emily: The data we’re working with was captured on reel-to-reel audio tapes at receiving stations around the globe as Explorer 1 passed overhead in orbit around Earth in the early months of 1958. Explorer predated the founding of NASA and was sent into orbit by a research team led by Dr. James Van Allen, then a Professor of Physics at the University of Iowa, to observe cosmic radiation. Each reel-to-reel Ampex tape contains up to 15 minutes of data on 7 tracks, including time stamps, station identifications and weather reports from station operators, and the “payload” data consisting of clicks, beeps and squeals generated by on-board instrumentation measuring radiation, temperature and micrometeorite impacts.
Once each tape was recorded, it was mailed to Iowa for analysis by a group of graduate students. A curious anomaly quickly emerged: At certain altitudes, the radiation data disappeared. More sensitive instruments sent into orbit by Dr. Van Allen’s team soon after Explorer 1 confirmed what this anomaly suggested: the Earth is surrounded by belts of intense radiation, dubbed soon thereafter as the Van Allen Radiation Belts. When the Geiger counter on board Explorer 1 registered no radiation at all, it was, in fact, actually overwhelmed by extremely high radiation.
We believe these tapes represent the first data set ever transmitted from outside Earth’s atmosphere. Thanks to the hard work and ingenuity of our friends at The MediaPreserve, and some generous funding from the Carver Foundation, we now have about 2 TB of .wav files converted from the Explorer 1 tapes, as well as digitized lab notebooks and personal journals of Drs. Van Allen and Ludwig, along with graphs, correspondence, photos, films and audio recordings.
In our work with this collection, the biggest discovery was a 700-page report from Goddard comprised almost entirely of data tables that represent the orbital ephemeris data set from Explorer 1. This 1959 report was digitized a few years back from the collections at the University of Illinois at Urbana-Champaign as part of the Google Books project and is being preserved in the Hathi Trust. This data set holds the key to interpreting the signals we hear on the tapes. There are some fascinating interplays between analog and digital, past and present, near and far in this project, and I feel very lucky to have landed in Iowa when I did.
Trevor: What challenges does this data represent for getting it off of it’s original media and into a format that is usable?
Emily: When my colleagues were first made aware of the Explorer mission tapes in 2009, they had been sitting in the basement of a building on the University of Iowa’s campus for decades. There was significant mold growth on the boxes and the tapes themselves, and my colleagues secured an emergency grant from the state to clean, move and temporarily rehouse the tapes. Three tapes were then sent to The MediaPreserve to see if they could figure out how to digitize the audio signals. Bob Strauss and Heath Condiotte hunted down a huge, of-the-era machine that could play back all of the discrete tracks on these tapes. As I understand it, Heath had to basically disassemble the entire thing and replace all of the transistors before he got it to work properly. Fortunately, we were able to play some of the digitized audio tracks from these test reels for Dr. George Ludwig, one of the key researchers on Dr. Van Allen’s team, before he passed away in 2012. Dr. Ludwig confirmed that they sounded — at least to his naked ear — as they should, so we felt confident proceeding with the digitization.
So, soon after I was hired in 2012, we secured funding from a private foundation to digitize the Explorer 1 tapes and proceeded to courier all 700 tapes to The MediaPreserve for thorough cleaning, rehousing and digital conversion. The grant is also funding the development and design of a web interface to the data and accompanying archival materials, which we [Iowa] hope to launch (pun definitely intended) some time this fall.
Trevor: What stakeholders are involved in the project? Specifically, I would be interested to hear how you are working with scientists to identify what the significant properties of these particular tapes are.
Emily: No one on the project team we assembled within the Libraries has any particular background in near-Earth physics. So we reached out to our colleagues in the University of Iowa Department of Physics, and they have been tremendously helpful and enthusiastic. After all, this data represents the legacy of their profession in a big picture sense, but also, more intimately, the history of their own department (their offices are in Van Allen Hall). Our colleagues in Physics have helped us understand how the audio signals were converted into usable data, what metadata might be needed in order to analyze the data set using contemporary tools and methods, how to package the data for such analysis, and how to deliver it to scientists where they will actually find and be able to use it.
We’re also working with a journalism professor from Northwestern University, who was Dr. Van Allen’s biographer, to weave an engaging (and historically accurate) narrative to tell the Explorer story to the general public.
Trevor: How are you imagining use and access to the resulting data set?
Emily: Unlike the digitized photos, books, manuscripts, music recordings and films we in libraries and archives have become accustomed to working with, we’re not sure how contemporary scientists (or non-scientists) might use a historic data set like this. Our colleagues in Physics have assured us that once we get this data (and accompanying metadata) packaged into the Common Data Format and archived with the National Space Science Data Center, analysis of the data set will be pretty trivial. They’re excited about this and grateful for the work we’re doing to preserve and provide access to early space data, and believe that almost as quickly as we are able to prepare the data set to be shared with the physics community, someone will pick it up and analyze it.
As the earliest known orbital data set, we know that this holds great historical significance. But the more we learn about Explorer 1, the less confident we are that the data from this first mission is/was scientifically significant. The Explorer I data — or rather, the points in its orbit during which the instruments recorded no data at all — hinted at a big scientific discovery. But it was really Explorer III, sent into orbit in the summer of 1958 with more sophisticated instrumentation, that produced that data that led to the big “ah-hah” moment. So, we’re hoping to secure funding to digitize the tapes from that mission, which are currently in storage.
I also think there might be some interesting, as-yet-unimagined artistic applications for this data. Some of the audio is really pretty eerie and cool space noise.
Trevor: More broadly, how will this research data fit into the context of managing research data at the university? Is data management something that the libraries are getting significantly involved in? If so could you tell us a bit about your approach.
Emily: The University of Iowa, like all of our peers, is thinking and talking a lot about research data management. The Libraries are certainly involved in these discussions, but as far as I can tell, the focus is, understandably, on active research and is motivated primarily by the need to comply with funding agency requirements. In libraries, archives and museums, many of us are motivated by a moral imperative to preserve historically significant information. However, this ethos does not typically pervade in the realm of active, data-intensive research. Once the big discovery has been made and the papers have been published, archiving the data set is often an afterthought, if not a burden. The fate of the Explorer tapes, left to languish in a damp basement for decades, is a case in point. Time will not be so kind to digital data sets, so we have to keep up the hard work of advocating, educating and partnering with our research colleagues, and building up the infrastructure and services they need to lower the barriers to data archiving and sharing.
Trevor: Backing up out of this particular project, I don’t think I have spoken with many folks with the title “Digital Preservation Librarian.” Other than this, what kinds of projects are you working on and what sort of background did you have to be able to do this sort of work? Could you tell us a bit about what that role means in your case? Is it something you are seeing crop up in many research libraries?
Emily: My professional focus is on the preservation of collections, whether they are manifest in physical or digital form, or both. I’ve always been particularly interested in the overlaps, intersections, and interdependencies of physical/analog and digital information, and motivated to play an active role in the sociotechnical systems that support its creation, use and preservation. In graduate school at the University of Illinois, I worked both as a research assistant with an NSF-funded interdisciplinary research group focused on information technology infrastructure, and in the Library’s Conservation Lab, making enclosures, repairing broken books, and learning the ins and outs of a robust research library preservation program. After completing my MLIS, I pursued a Certificate of Advanced Study in Digital Libraries while working full-time in Preservation & Conservation, managing multi-stream workflows in support of UIUC’s scanning partnership with Google Books.
I came to Iowa at the beginning of 2012 into the newly-created position of Digital Preservation Librarian. My role here has shifted with the needs and readiness of the organization, and has included the creation and management of preservation-minded workflows for digitizing collections of all sorts, the day-to-day administration of digital content in our redundant storage servers, researching and implementing tools and processes for improved curation of digital content, piloting workflows for born-digital archiving, and advocating for ever-more resources to store and manage all of this digital digital stuff. Also, outreach and inreach have both been essential components of my work. As a profession, we’ve made good progress toward raising awareness of digital stewardship, and many of us have begun making progress toward actually doing something about it, but we still have a long way to go.
And actually, I will be leaving my current position at Iowa at the end of this month to take on a new role as the Head of Preservation and Reformatting for The Ohio State University Libraries. My experience as a hybrid preservationist with understanding and appreciation of both the physical and digital collections will give me a broad lens through which to view the challenges and opportunities for long-term preservation and access to research collections. So, there may be a vacancy for a digital preservationist at Iowa in the near future