The following is a guest post from Trevor Owens, Digital Archivist with the Office of Strategic Initiatives.
I’m excited to share this third interview for Insights, an occasional feature of The Signal sharing interviews and conversations between National Digital Stewardship Alliance Innovation working group members and individuals working on projects related to preservation, access and stewardship of digital information.
For our third interview, I am thrilled to have a chance to chat with Brett Bobley, the CIO and Director of the Office for Digital Humanities at the National Endowment for the Humanities. I wanted to catch up with him on how some of the work NEH is supporting under the Digging into Data grants might connect with issues around the preservation and access of digital content.
Trevor: One of the repeated themes at the recent National Digital Stewardship Alliance meeting was the idea that digital content must be used to be preserved. In one strand of this thinking, Helen Hockx-Yu of the British Library stressed a need for librarians, archivists and curators working with web archiving to move away from document centric approach to web archives and start approaching them as corpora. As an example of the implication of this line of thinking, the British Library launched an n-gram viewer that actually acts as a search interface for the content of their web archives. It seems to me that Helen’s point about web archives has much broader implications for the future of digital preservation and access. As libraries, archives, and museums are increasingly gathering large sets of information, all of those sets of content can be thought of as both individual objects and as corpora. Given this context, could you give us some examples of how some of the projects from the first round of digging into data (or other related work you have seen come through the Office of Digital Humanities) that have approached bodies of digital content that we might think about as individual objects of study in a collection but that offers interesting insights for institutions stewarding digital collections?
Brett: Sure. I think this is an important point that really gets at the heart of the Digging into Data Challenge. At our recent conference in June, I kicked things off by suggesting to the audience that “our ability to digitize materials has outstripped our ability to analyze them.” We’ve gotten quite good at scanning stuff — we can build big collections. But we haven’t changed the way we do research accordingly. We still tend to take the document centric approach — except we now have far more documents and no great way to find the ones of interest.
One Digging project that tackled this head-on is the Criminal Intent Project, which is a US-UK-Canadian team working with the Proceedings of the Old Bailey collection. The Old Bailey site collects 197,745 criminal trials held at London’s central criminal court between 1674-1913. It is a remarkable resource for historians. The Criminal Intent team wanted to come up with a new way for users to view this collection. Not only to help them drill down to cases of interest, but also to see trends across time and across cases. They built a new API for the website that allows the user to take advantage of sophisticated tools like Voyeur, Tapor, and Zotero and — in my opinion — makes for a much more powerful environment for using the collection, both as a corpus and as a group of documents.
Trevor: Do you have any advice for stewards of digital collections who would like to see their collections more actively used by researchers? There has recently been a lot of discussion about linked open data, many institutions already support the open archives public harvesting methods, and in the same spirit meetings like the NEH funded digital humanities API workshop have suggested approaches to providing cultural heritage collections through other methods, like REST APIs. As all of the Digging into Data grants involve international collaborations between multiple institutions I would be interested to know if and how any of these approaches played a role in those collaborations? More specifically, what kinds of approaches did grant winners take to enable collaborative work with their collections and what kinds of implications do you see from their approaches for stewards of digital collections who want to have their collections used for this kind of research?
Brett: I’m glad to see so much interest in this area — linked open data, APIs, etc. If the past ten years was the decade of digitization, the next ten will be the decade of making collections more usable. I think libraries and archives are so important right now — they’ve got to continue to be leaders in making large digital collections usable, interoperable, and sharable.
If you look at the Digging into Data projects from 2009, you’ll see many institutions collaborating in different ways. In many of these cases, the projects were very much about using these sharing, analysis, and visualization techniques to make large collections easier to navigate and understand. Another good example is the Digging into the Enlightenment project, which is about visualizing the correspondence of key Enlightenment figures like Locke, Voltaire, and Bentham, who wrote to each other and exchanged ideas using the social networking platform of their day. There is a great New York Times piece that describes the project.
One general piece of advice I’d have to collection holders is to try to be as open as you can with regard to intellectual property rights restrictions. Often, the real value of these collections will only be realized when you allow your data to be harvested and mixed with other collections. If you hold a large digital collection, it can be helpful to have well-defined methods for researchers to use to get at your data in different ways.
Trevor: The sessions from the digging into data conference earlier this year sounded fascinating. I am glad to see that the proceedings from the conference are all online and open access. If you were to suggest three must read papers for the conference for individuals working on collecting, preserving and providing access to cultural heritage content what would they be and why do you think they are must reads for this audience?
Trevor: I would be interested to know if you think there are any implications of how some of the digging into data work might feed into new modes of access and discovery for digital collections. For example, returning to the example of the UK Web Archive, they implemented the n-gram viewer, originally created as a research tool, as a new search interface to their collection. Do you think there are any potential implications for similar cross-pollination of research tools into new modes of discovery and access? If so, do you you have any thoughts for how some of the initial projects might be models for new modes for discovery and access?
Brett: Most definitely. I very much hope that some of the research that comes out of Digging into Data will ultimately work its way into production use on collections. In fact, the API that the Criminal Intent team developed is scheduled to go onto the production version of the Old Bailey site soon (you can use it now, at this web address).
My thanks to you and your colleagues at the Library of Congress for your great work. Also, let me thank my funding partners for Digging into Data: NSF, IMLS, JISC, SSHRC, AHRC, ESRC, and NWO. Having eight funders work together on one grant program demonstrates, in my mind, how important this topic is.