I had the opportunity today to talk at the Big Data and Big Challenges for Law and Legal Information symposium at the Georgetown University Law Center. The event marked the 125th anniversary of the University Law Library.
My panel was on Big Data Applications in Scholarship and Policy, and I was pleased to present with a distinguished group of scholars and jurists. I presented my personal overview of submissions in response to the November 2011 White House Request for Information: Public Access to Digital Data Resulting From Federally Funded Scientific Research.
The RFI is described as an
Opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research. The public input provided through this Notice will inform deliberations… [about] steps that can be taken by Federal agencies to encourage and coordinate the development of agency policies and standards to promote long-term preservation of and access to digital data resulting from federally funded scientific research.
The responses are posted on the Office of Science and Technology website,and provide a snapshot of current thinking on data stewardship from the perspective of different stakeholders. The 118 individual responses, which total over 600 pages, make up an excellent (if unstructured) data set. Respondents were diverse:
- 50 percent from academic research departments, professional organizations
- 35 percent from libraries, repositories and allied organizations
- 10 percent from publishers and commercial organizations
- 5 percent other
I was struck by the degree to which the respondents were in agreement on recommendations in support of:
- Agency data preservation mandates
- Requirements to promote secondary data use
- Resources to support data management across the data life cycle
- Efforts to extend a collaborative national data management infrastructure
Most commenters were enthusiastic in calling for what I characterize as “respect for data.” There were common calls for giving equal credit for publishing data sets as for publishing summary research findings, as well as for developing robust metrics to track data publication and use. A number of submissions suggested that criteria for evaluating grant applications should explicitly include data management, both during and after proposed projects.
There were other areas in which opinions diverged markedly, including thoughts on the role of guidelines for intellectual property and personal privacy.
But it is fair to say the submissions reflect a degree of consensus among data producers, users and keepers about what is needed to address big data stewardship going forward.