Recommendations for Enabling Digital Scholarship

Mass digitization — coupled with new media, technology and distribution networks — has transformed what’s possible for libraries and their users. The Library of Congress makes millions of items freely available on loc.gov and other public sites like HathiTrust and DPLA. Incredible resources — like digitized historic newspapers from across the United States, the personal papers of Rosa Parks and Sigmund Freud and archived web sites of U. S. election candidates — can be accessed anytime and anywhere by researchers, Congress and the general public.

The National Digital Initiatives division of the Library of Congress seeks to facilitate even more use of the Library’s digital collections. Emerging disciplines — like data science, data journalism and digital humanities that take advantage of new computing tools and infrastructure — provide a model for creating new levels of access to library collections. Visualizing historical events and relationships on maps, with network diagrams and analysis of thousands of texts for the occurrence of words and phrases are a few examples of what’s possible. NDI is actively exploring how to support these and other kinds of interactions with the Library’s vast digital holdings.

A visualization of links between web sites extracted from an October 2015 Library of Congress crawl of news site feeds. This diagram was created as part of the demonstration pilot for the Library of Congress Lab report.

A visualization of links between web sites extracted from an October 2015 Library of Congress crawl of news site feeds. This diagram was created as part of the demonstration pilot for the Library of Congress Lab report.

Michelle Gallinger and Daniel Chudnov were asked by NDI to study how libraries and other research centers have developed services that use computational analysis, design and engagement to enable new kinds of discovery and outreach. Their report, Library of Congress Lab (PDF), was just released. For the report, they interviewed researchers and managers of digital scholarship labs and worked with Library staff on a pilot project that demonstrated how the collections could be used in data analysis. This work resulted in concrete recommendations to the Library on how to approach setting up a Lab at the Library of Congress. These recommendations could also be helpful to other organizations who may be thinking of establishing their own centers for digital scholarship and engagement.

Michelle, Dan, thanks for the report, and thank you for talking with me more about it. How do you think digital labs are addressing a need or a gap in how digital collections are served by libraries and archives?

Michelle Gallinger

Michelle Gallinger

Michelle: The value proposition for digital collections has always been their usefulness to researchers, scholars, scientists, artists, as well as others. However, use was limited in the past because substantial computational analysis was something that an individual needed a great deal of specialized knowledge to pursue. That’s changing now. Tools have become more ubiquitous and labs have been established to support users in their analysis of digital collections. Where labs are supporting the work of users to delve deeply into the digital collections, we’re seeing computational analysis being used as another tool in areas of scholarship that haven’t benefited from it in the past. We are seeing that the support labs provide helps address the pent-up demand in a wide variety of fields to use digital content in meaningful ways.  And as this computational work is published, it’s creating new demand for additional support.

Dan: We were particularly impressed by the breadth of answers to this question shared by the colleagues we interviewed who lead and support digital scholarship services in Europe, Canada and the U.S. They have each molded their skills and services to fit these new and unique combinations of service demands coming from their own communities.  In university settings, labs fill a growing role supporting teaching and learning with workshops and consultations for younger students, graduate students, and early-career researchers alike.  In labs connected with large collections, they are enabling advanced researchers to perform large-scale computational techniques and finding ways — based on the services they are providing to scholars — to rethink and revise institutional workflows to enable more innovative uses of collections.  Each of these success stories represents a need- or a services-gap filled and presents an opportunity to consider doing more at our respective institutions.

Why do you think this is a good time for libraries to consider establishing a Lab?

Michelle: It’s a great time to be engaged in addressing the needs of scholars to work with digital collections. As I mentioned before, there really is a demand from users for support in performing digital scholarship. The Library of Congress receives regular requests for this support and it’s my opinion the number of those requests will continue to grow. Concepts of “big data” and data analytics have permeated society. Everyone knows about it, everyone wants to be working with digital scholarship techniques and tools. A Lab is an opportunity for the Library of Congress to start addressing these requests for support with routine workflows, regular access permissions, consistent legal counsel and predictable guidelines. This support not only helps further the transformative influence of digital scholarship, it also makes the Library of Congress more efficient and able to respond and serve the needs of its 21st century scholars.

Dan Chudnov

Dan Chudnov

Dan: As Michelle highlights, better tools and increased demand to work with much greater volumes of materials have changed the equation.  The pilot project we performed, working with Library of Congress Web Archive collections not directly available to the public, demonstrated this well.  We used a third-party cloud services platform to securely transfer and process several terabytes of data from the Library to the cloud.  Using tools included in the cloud services platform for cluster computing, we defined access controls for this data where it was stored, then automated file format transformations, extracted focused derivative data, and ran parallel algorithms on a cluster with two dozen virtual machines performing network analysis on a quarter of a billion web links.  Once the extracted data was ready, it took less than five minutes to run a half-dozen of these queries over the entire dataset, and after just a few minutes more to verify the results, we shut the cluster down, having spent no more than a few dollars to rent that computing power for under an hour.  Back in the early 2000s, I worked in a medical informatics research center and helped to support cluster computing there with expensive, custom-designed racks full of fickle servers that gobbled up power and taxed our building cooling systems beyond reason.  Today, any ambitious high school student or not-yet-funded junior researcher can perform that same scale of computation and more, much more easily, all for the price of a cup of coffee.  To do this, they need the kinds of support Michelle describes: tool training, a solid legal framework with reasonable guidelines and routine workflows for enabling access, all of which the Library of Congress is ideally suited to develop and deliver right now.

How could a Lab help to serve audiences beyond the typical scholarly or academic user?

Michelle: I loved [the new Librarian of Congress] Dr. Hayden’s quote in the recent New Yorker article when she asked herself: “How can I make this library that relevant, and that immediate?” I think a Lab supporting digital scholarship will help her achieve that vision of increasing the relevance and immediacy of the Library of Congress. The Lab offers a new way for users to access and get support in analyzing the Library’s digital collections. But it is also an opportunity for the Library to reach out to underrepresented groups and engage with those groups in new ways — coding, analytics, scholarly networks, and more. Unique perspectives help the Lab in its efforts to transform how the Library’s digital collections are used. The Lab becomes a controlled access point for users that might not be able to get to the Library in person.

One of the reasons Dan and I think that the Lab should have an open-ended name (rather than something more specific like “Digital Scholars Lab” or “Digital Research Lab”) is that we both feel strongly that the Lab should be as inclusive as possible. A specific name encourages a small group of people who identify with that name to come. Researchers look at a research lab. Scholars look to a scholarly lab. But a really transformative Lab environment gives anyone the tools to use digital collections for their work — whether that’s scholarship, research, data analytics, art, history, social science, creative expression, or anything else they can imagine. We think that there is significant value to making the Lab a space where anyone can imagine working — even if they aren’t a typical Library of Congress researcher. Everyone should be able to see themselves at the lab, engaging with the Library of Congress digital collections in a myriad of ways.

Dan: I agree on all counts.  That focus from Dr. Hayden resonates with something we heard from a scholar at the Collections as Data event last fall, that the sheer size of Library of Congress collections can sometimes overwhelm. Anyone approaching LC collections for the first time should be able to find and work with material at a scale that meets their needs and abilities. It is most important to provide access to collections and services at a ‘human scale’, whether that means one item at a time, or millions of items at a time, or some scale in between which best fits the needs of the individual coming to the Library.  For example, UCLA’s Miriam Posner engages humanities students with collections at the scale of a few thousand items, which challenges them to use automated tools and techniques but is still small enough that they can “get to know” the materials over the course of a project.  Another critical aspect of this focus is representation.  To make the Library relevant and immediate, anyone visiting its collections should be able to see themselves and to recognize stories of people like them reflected and featured among digital collections, at every scale.  The breadth and variety of collections at the Library of Congress reflects our wonderfully diverse culture, and that means all of us and all of our histories.

What other opportunities do you see in establishing a Lab at the Library of Congress?  

Michelle: The Library of Congress is a powerful convener. It has always been able to get people to come together around a table and talk through controversial or challenging topics — from copyright restrictions to stewardship responsibilities and many others. The Lab community is still emerging. There are some extraordinarily strong players that have a lot to share and there are a lot of opportunities for labs that haven’t yet been developed. The Library of Congress could provide valuable leadership by convening the full spectrum of this community to make sure that emerging successes are circulated and pitfalls are documented. It could really help move things to another level.

Dan: I agree, the possibilities of building communities around opening up access to digital collections, connecting students with collections and subject expertise across institutions, and convening practitioners to share what works by building networks of potential collaborators across disciplines and distances are compelling.  We heard from many people that public goodwill toward the Library of Congress is strong, which affords that ability to draw people with mutual interests together.  When the Library puts an event together, people will travel great distances and tune in from all over the net, as the recent #asdata event demonstrated. Similarly, when Library staff show up and participate in community initiatives and events, people take notice and take their contributions to heart.  A Lab at the Library of Congress could be a great new conduit for this kind of leadership, amplifying the great service innovations of many great peer institutions while assembling a mix of services that fit the unique possibilities and constraints at LC.

Thank you both again for the time and effort you put into the report (PDF). NDI is excited to work toward establishing a Library of Congress Lab in the coming year, we’ll keep you all posted on our progress.

Open Science Framework: Meeting Researchers Where They Are

This is a guest post by Megan Potterbusch, National Digital Stewardship resident at the Association of Research Libraries. Openly sharing research data, code and methodology are integral parts of open science. Whether due to disciplinary culture shifts or funder and publisher mandates, the general trend towards open science has been increasing in many research fields. […]