Not that repositories ever really were only about published scholarly output, but for some organizations that was the easiest first bar to reach. But at Open Repositories 2012, it was clear that the bar has been raised.
OR2012, held at the University of Edinburgh from July 9-13, 2012, had over 480 registered attendees from over 40 countries. The participants included developers, librarians, library administrators and service providers. The topics were, as usual, quite wide-ranging, including calls for increased open access to scholarly publications, introductions to core repository technologies, presentations on new types of repository services, the need for name identifiers/disambiguation and the ever-popular developer challenge. The entire conference was live-blogged, which provided some remarkable coverage. And there is a tweet archive for the tag #or2012. Oh, and there is a Flickr pool.
But if there was one word that that was woven into almost every presentation, it was this: DATA.
Which made me very happy. Because anyone who reads my posts here or has seen my speak anywhere in the past 9 months knows that I go on and on about two things: Library, Archive, and Museum collections are now being mined as data by researchers, which requires new management strategies and new self-serve services. And, consequently, cultural institutions all have big data and need different IT infrastructures for the processing and serving of these collections. And I said as much in my presentation at OR2012 (Check out the live blog post from the session I spoke in).
Just about everyone was discussing RDM, or Research Data Management. It has become clear that institutional repositories must not only manage scholarly publications, but the data that was created through observation and experimentation or collected and published, in order to support the “re-” activities: review, reuse, replicability and reproducibility. RDM platforms are needed to help researches capture and share and publish their datasets. The public-facing discovery infrastructure is but a small part of this effort: the greater need and effort is in capturing data from the original instruments and formats and the transfer and documentation of datasets in a reliable, documented way to support a forensic level of authenticity for future researchers. The Digital Curation Centre has a great blog post reviewing some of the sessions on this topic.
Another word which was everywhere was “identifiers.” Disambiguation of researchers/authors has been a known issue since the earliest Institutional Repositories, where one publication might be by “Leslie Johnston,” and another by “L. Johnston,” and another by “Leslie L. Johnston,” depending upon the publication’s stylebook. Is that the same person to someone searching for all my publications? ORCID is the more mature service in the assignment and resolution of identifiers, but the status of ISNI, aka ISO 27729, was also presented. There will likely never be a single unique identifier, as there are these two international services, national services, and institutional services. The catch will be crosswalking between all the identifiers. The same can be said for article or item identifiers, such as the DOI, which has a high level of buy-in in the publishing realm, but uneven adoption for other types of objects.
There was also a lot of discussion about scale. Not a lot of solutions yet, but a lot of discussion. I heard some very promising presentations about optimization for Solr and the use of noSQL databases.
Linked Open Data was, unsurprisingly, a topic of discussion. The opening plenary by Cameron Neylon from the Public Library of Science very much emphasized this point: “It’s about links; it’s about connectedness.” And it’s not just linking between objects and repositories, but synchronization between them. Some very interesting early work was presented on the Webtracks and ResourceSync projects.
I can never say enough about the Developer’s Challenge at OR, where developers pitch ideas, refine the ideas, and often develop code in but a few hours. DevSCI, which sponsored the challenge, covered the event and the winners.
I would encourage anyone interested in any of these topics to read through the comprehensive session live-blogging, and to check out videos on the OR2012 YouTube channel. My own session is apparently missing, due to unrecoverable file corruption (really).