Preserving and Curating Research Data: Panel Preview for DP2014

Continuing with our series of blog posts devoted to the upcoming Digital Preservation 2014 conference, the following interview features a preview of the panel session entitled “Research Data and Curation” with panel members Inna Kouper (Data to Insight Center at Indiana University), Elizabeth Yakel (University of Michigan School of Information) and Ixchel Faniel (OCLC Research).

Inna Kouper

Inna Kouper

Susan: Could you each provide a short overview of what will be covered for this panel session?

Inna: The panel is titled “Research Data and Curation” and it will address some of the challenges of preserving and curating research data, focusing particularly on data re-use. We plan to discuss the stages that research data go through as it is being collected, analyzed and used and how we can contribute to making sure that the data can be used or understood by others in the future. The problem is large and quite complex, so each of our talks will cover a certain aspect of it.

Dharma Akmon and I will talk about complexity and heterogeneity of data products and its implications for preservation. We approach the problem from a systems perspective and conceptualize heterogeneous bundles of data as research objects that transition from a live stage into a curation stage and then into a publication stage. Each stage is characterized by varying degrees of mutability. This model allows us to formalize two cases of data reuse – revisions and derivations – and use those formalizations to track provenance of data. We will use examples from the SEAD project to demonstrate how the model works.

Elizabeth Yakel and Ixchel Faniel approach the complexity of data curation from the perspective of multiple actors that are involved in data re-use and expand the notion of preservation from preserving the bits into capturing the meanings. We hope that between our two talks we will generate a rich discussion of how to capture context and content and what can or cannot be formalized in data curation.

Beth Yakel

Beth Yakel

Beth and Ixchel: Our presentation, “Three Perspectives on Data Reuse: Producers, Curators, and Reusers” presents a real-life instance of data sharing, data curation and data reuse. Unlike previous studies that concentrate on one perspective (that of data producers, repository staff or data reusers) our case study follows the data from sharing to reuse and captures the different perspectives of participants along the way.

Susan:  Concerning the issues surrounding data reuse for archeologists – why is this becoming so important in this field?

Beth and Ixchel: In the past, archaeologists focused on a single site, building a deep understanding of the culture, economics, and social structures within one locality. However, the nature of research questions has changed; archaeologists now want to examine larger social, economic and cultural transitions between ancient civilizations. No one archaeologist could possibly survey or excavate the number of sites needed for these broader research questions. This creates an imperative for data sharing and new opportunities for collaboration. As a result, data reuse has become increasingly important, although still not the norm. Disciplinary culture, logistics and legal questions surrounding data ownership are all factors impeding a more open data ethos.

Susan: You cite the need to capture transformations across the entire lifecycle of digital data.  Could you define transformation in this context?

Inna: We see transformations as changes in states that data entities go through. Following a model proposed by D. DeRoure, C. Goble and others, we consider research data as bundles of resources that can be “live” and modifiable at some point and then “fixed” and immutable at other points in time. Such bundles, or research objects, go through the processes of collection, compilation, cleaning, re-arrangement, computation, aggregation and so on. They receive additional descriptions and re-arrangements during the publication stage. A bundle can be later downloaded and used by a researcher from the same or a different field and then a new bundle of resources will be generated that will be related but different from its original. All these changes in state and content of resources need to be identified, captured and tracked.

Susan:  Why is it important to capture and preserve the data throughout it’s lifecycle?  Is this particularly important in scientific research, more than other fields?

Inna:  Curating research data throughout it’s lifecycle means that we capture information about who created the data and how, what was excluded or included, how the instruments were developed and calibrated and so on. It is particularly important in science, because it helps to establish trust and authority, ensure data quality and interpretability and realize its cumulative potential. But it can be equally important in such fields as journalism for the same reasons. Capturing as much as possible about processes and contexts of data collection from the beginning rather than at the end can also help us to avoid duplicating efforts and repeating the same mistakes. It is a tough challenge though.

Ixchel Faniel

Ixchel Faniel

Beth and Ixchel: There is a symbiotic relationship between different phases of the data lifecycle. Decisions made during collection and initial documentation can affect how easy or hard it is to share data; the condition of the data and the documentation affect the time it takes repository staff to process data; and that in turn influences data reusers ability to reuse data or even their decision to expend the effort to try to reuse data. The list of contextual elements which are important to capture is long, but some of the most important elements are data descriptive information, research design/methods and relationships among data.

Capturing and preserving information (dare we say metadata) or the context of different stages of the data lifecycle is important in documenting any type of data intended for reuse.  That includes administrative data generated by government agencies, qualitative interview or observational data or scientific data. This is simply part of the process by which the meaning of data is transmitted over time.  Preservation of the meaning is as important as preservation of the bits.  Preserved bits are useless if the context for interpretation is not preserved.

Susan: What will the audience discussion be focused on?

Beth and Ixchel:  Specifically, we would like to talk about what we can/should expect from data producers/sharers, repository staff and data reusers.  Also, what type of education each needs to curate data at their point in the lifecycle. And finally, how can we align incentives around a common goal of sharing, preserving, and reusing high quality data and documentation?

Inna:  I’d like the audience to help us think about the gaps in our approaches to preservation of research data. Is it effective to apply the existing preservation frameworks to digital data? What are we missing, especially when we’re trying to develop tools to support and automate data preservation? How are curation and preservation connected to data publication? Is it useful to distinguish between published and preserved/archived data objects or should we change our concepts and metaphors in the age of digital fluidity? What does reuse mean for the research data lifecycle? These and many other practical and conceptual considerations can become the focus of our discussion.

I’m really looking forward to the discussion and would welcome any contributions that would add more details and nuances to the picture of data curation and help this area move forward.

Digital Preservation Questions Meet Digital Preservation Answers

The following is a guest post by Jefferson Bailey, Partner Specialist at Internet Archive and co-chair of the NDSA Innovation Working Group. Continuing in our series previewing sessions at the Digital Preservation 2014 conference, the NDSA Innovation Working Group will host the session “Digital Preservation Questions and Answers” on Day 2 (Wednesday July 23, 2014) […]

LOLCats and Libraries: A Conversation with Internet Librarian Amanda Brennan

The following is a guest post from Julia Fernandez, this year’s NDIIPP Junior Fellow. Julia has a background in American studies and working with folklife institutions and is working on a range of projects leading up to CurateCamp Digital Culture in July. This is part of an ongoing series of interviews Julia is conducting to […]

Residency Program Update and Panel Preview for DP2014

The National Digital Stewardship Residency program just completed the first year of residencies in the Washington, DC area. The second, upcoming round of residencies will take place in New York and Boston, and both cities have recently announced the selection of residents and projects.  At this year’s Digital Preservation 2014, there will be a panel […]

Extending the Life of a Story Through Taxonomy at National Public Radio

Hannah Sommers has done just about every job one can do in a library.  Today she serves as NPR’s first Library Program Manager, helping forge a new path for the profession in her role directing product development for the NPR Library. This is her guest post. NPR’s mission is to create a more informed public, […]

July Library of Congress Digital Preservation Newsletter

The July issue of the Library of Congress Digital Preservation newsletter is now available! In this issue: Featuring “Digital Preservation and the Arts” including Web Archiving and Preserving the Arts, and Preserving Digital and Software-Based Artworks An Interview with Marla Misunas (and friends) of SFMOMA, part 2 Community Approaches to Digital Stewardship Exhibiting GIFs, with […]

Tag and Release: Acquiring & Making Available Infinitely Reproducible Digital Objects

What does it mean to acquire something, like a set of animated .gifs,  that are already widely available on the web? Archives and Museums are often focused on acquiring, preserving and making accessible rare or unique documents, records, objects and artifacts. While someone might take a photo of an object, or reproduce it in any […]

NDSA Standards and Practices Survey: Ranking Stumbling Blocks for Video Preservation

A new thread emerged during the recent monthly conference calls of the Standards and Practices Working Group of the National Digital Stewardship Alliance (NDSA). What do we do about preserving video? It’s a problem for many of our members. One participant even commented that video is often the last content type to be added to […]

End-of-Life Care for Aging, Fragile CDs and Their Data Content

Many institutions and individuals that use CDs as a storage medium are now concerned because information technologists have deemed the medium to be unsuitable for long-term use. As a result, institutions are racing to get the data off the discs as quickly and safely as possible and into a more reliable digital storage environment. Two […]

Preserving Folk Cultures of the Digital Age: An interview with Folklorist Trevor J. Blank, Pt. 2

The following is a guest post from Julia Fernandez, this year’s NDIIPP Junior Fellow. Julia has a background in American studies and working with folklife institutions and is working on a range of projects leading up to CurateCamp Digital Culture in July. This is part of an ongoing series of interviews Julia is conducting to […]