This is a guest post by George Alter, Director of the Inter-university Consortium for Political and Social Research.
Research data produced by federally funded scientific projects should be freely available to the wider research community and the public at large.
That simple statement should not be controversial, especially as federal research funding agencies increasingly require data management plans including details on how research data will be shared with all grant proposals. Public data should be a public resource, and data sharing supports core scientific values like openness, transparency, and replication. But lack of resources for curating scientific data and a lingering tradition of data hoarding create resistance to open access to research data.
In 2009, the Interagency Working Group on Digital Data released Harnessing the Power of Digital Data for Science and Society, which recommended a strategic policy for access to and preservation of scientific information in digital formats. Recently this group solicited advice on approaches for ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research.
As director of the worlds largest repository of social science data, the Inter-university Consortium for Political and Social Research housed at the University of Michigan, I applaud the attention being drawn to this issue. And we look forward to the working groups continued progress on developing federal-wide data sharing standards.
The most important step that the working group could take to foster data sharing would be to recommend a policy of mandatory deposit of federally funded research data into long-term, publicly accessible data repositories. Such a requirement would promote re-use of scientific data, maximize the return on investments in data collection, and prevent the loss of thousands of potentially valuable datasets.
ICPSR advocates a network of domain-specific repositories to accommodate the unique needs of various scientific fields. These repositories would create the communities of practice that the working group called for in its 2009 report and act as liaisons between specific disciplines and the complex and rapidly changing world of digital preservation.
Creating new repositories or strengthening existing ones along with a mandate for data sharing would shift the burden of data management to an organized system of archives with expertise in data management and away from a patchwork of disparate entities. Discipline-specific repositories can develop solutions to problems, like protection of confidential information and creation of standards for metadata. They will work in partnership with the growing network of institutional repositories, most of which developed to manage digitized text and images held by libraries.
In the absence of appropriate repositories, responsibility for data sharing has fallen to journals, and much important research data is provided as supplements to published articles. Journal publishers, however, have neither the expertise nor financial incentives to redistribute data in formats that are useful for the research community, and publishers have no obligation to preserve data for the long-term. The increasing volume of research data threatens to create an unsustainable burden on this system.
The central problem currently is that preservation requires a long-term commitment and most federal funding agencies provide only short-term funding. The solution should be a combination of support for infrastructure and payments for long-term preservation. The latter could involve a single payment for the estimated present value of future distribution and preservation, which repositories could annuitize in some way. This type of long-term commitment to digital repositories is being made by other countries, where data archiving is considered an essential aspect of the research infrastructure.
The federal momentum behind data sharing and data management is truly heartening. Evidence of the recognition of these issues at the highest levels was apparent in President Barack Obamas State of the Union address, when he cited data management as a job of the future. But it will take a targeted allocation of resources and intelligent guidelines to continue advancing this cause in the coming years. The costs of these efforts should be weighed against the lost research opportunities being expensive digital resources are allowed to decay and disappear.