Open Data and Preservation

Yesterday, May 9, 2013, the U.S. government issued an executive order and an open data policy mandating that federal agencies collect and publish new datasets in open, machine-readable, and, whenever possible, non-proprietary formats.  The new policy gives agencies six months to create an inventory of all the government-produced datasets they collect and maintain; a list of datasets that are publicly accessible; and an online system to collect feedback from the public as to how they would like to use the data.  The goals are twofold — greater access to government data for the public, and the availability of data in forms that businesses and researchers can better use.  This builds on the earlier White House Memorandum on Transparency and Open Government.

Data

Data

These documents were accompanied by a link to something that actually caught my fancy even more – a greatly expanded Project Open Data Github repository for guidelines, use cases and tools.  This, alongside the ever-growing (and soon to be extensively updated) data.gov, are evidence of real efforts to release more data and make it truly useful and usable.

The documents provide guidance on open licensing, metadata, and standards, as well as lifecycle-based information stewardship. But what I personally keep struggling with are two questions: What IS open data? And how is is being preserved?

The project has some defining principles for open data that I think can inform any dataset preservation project.  While reading through some of the documents, I came across this bullet point:

  • Managed Post-Release. A point of contact must be designated to assist with data use and to respond to complaints about adherence to these open data requirements.

I am thrilled to see guidance about active management of datasets and supporting users in their work with the data.  But what could be available for this and all open dataset projects is more attention on dataset preservation.  These are a few of some great resources on this topic:

Do public sector datasets present different issues for preservation from other datasets?  Not really.  They definitely have a potentially much higher level of public scrutiny and use.  But they have the same level of investment of time and money in their creation, serve the research and public good, and present the same format preservation issues as other research data.

 

 

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.