Here’s the text of the talk I gave last week at the Collections as Data event my group hosted on September 27, 2016. If you would like to watch it, the talk starts at about minute 54 of the video of the event.
Welcome to Collections as Data! I’m excited to tell you about our group, National Digital Initiatives that is hosting today’s event.
Meet our team: Abbey Potter, Mike Ashenfelder and Jaime Mears. They spent a lot of time putting this together and I think they did a great job. In addition to Jane McAuliffe, who you just heard from, Eugene Flanagan from our executive management team is here too, so if you like what we’re doing, be sure to stop them and thank them.
I’d like to take this opportunity to talk to you all a bit about this new group we’ve started, National Digital Initiatives, what we hope to accomplish, and a little more about what you can expect today. But first, a short story that I think helps illustrate the Library of Congress’ long history of technological innovation. I love to tell the story of Henriette Avram, whose work here at the Library of Congress replaced ink-on-paper card catalogs and revolutionized cataloging systems at libraries worldwide
Henriette Avram was born in New York in 1919. She took two years of pre-med courses at Hunter College, then left to start a family. She was in her 30s when she started learning programming. (I love this story because, there are people who will insist if you haven’t been coding since you were crawling, you’ll never make it as a programmer. They’re wrong. She did, and she changed the world.)
As some of you know, Henriette and her team created the MARC format, a structure to contain bibliographic information for all forms of materials. This format was the keystone that made a revolution in information science possible. Because of her work, we can, with a few keystrokes, search the treasures of a library on the other side of the earth.
Ms. Avram is a bit of a hidden hero in computer science. She did all of this before the first relational database, before character encoding was mature and before computers were networked. It’s amazing to me that we don’t hear her mentioned more often.
I like this story because it shows the power of cross-discipline fertilization. Ms. Avram herself combined two complex fields, computer programming and intricate cataloging practices to create a sea-change in how the public and scholars access library collections. Many of you who sit here may not remember using card catalogs, but this work was transformational. It made finding things much, much easier and allowed remote access to resources. I miss the smell of a card catalog, don’t you? But I don’t miss much else.
Ms. Avram’s work started the digital revolution in information science. We (both LC and the field as a whole) carry that spirit forward today in our contributions to standards work and open source tools. But innovation is not always in the big bang, it just looks like that from hindsight. People say MARC was released in 1968 but really it was released in stages over about a decade and continues to be refined to this day. The MARC of today is an international standard that covers character sets for all writing systems currently in use. Innovation often looks less like a dot on a timeline than a series of continuous improvements.
With that in mind, I’d like to talk a little about loc.gov, which is one dot on a timeline represents a lot of invisible labor and exciting treasures. Each new collection on that list contains work from people throughout the Library: especially in our CIO’s office and in Library Services.
So far this year, we have released a lot of new collections including the diaries of George Patton, The Chicago Ethnic Arts Project survey and Walt Whitman’s papers. These new collections represent the opportunity for remote access to resources that previously required a plane trip to DC for most of America.
I just wanted to take a minute to zoom down one more level and talk about one collection, Rosa Parks. Just as an example of the kind of digital work LC does. I picked this one because I think it’s really exciting. But we could just as easily be taking a close look at any of those other collections today, like ballroom dancing instruction manuals or web archives from the 2014 election.
Rosa Parks’ collection of personal correspondence and photographs is on a ten-year loan to LC from the Howard G. Buffett Foundation. The collection contains about 7,500 items in the Manuscript Division and 2,500 photographs in the Prints and Photographs division, documenting Mrs. Parks’ private life and public activism on behalf of civil rights for African Americans. The material was assessed and stabilized by LC’s Conservation Division staff, cataloged and described by archivists and librarians and then digitized. Files were moved around by people and software, assessed and validated and then prepared for access. Metadata was added and transformed. Webpages were made. Search indexes populated. Rights assessed. A few years worth of work in less than 140 characters.
For those of you who are librarians, you know that this is just the job. A bunch of invisible work to make information usable. But I want to make a big deal out of it. Because it’s important. And because new, cool websites get a lot of attention but the usual production work of adding vast resources to the Nation’s collective is easy to overlook.
As I think about NDI, my new team, and what it’s tasked with and what I want to accomplish, I often think about the tension between innovation and sustainability. The sustained effort it takes to ready new collections for the web. The enduring power of the MARC format and bibliographic standards in general. Because I think groups like ours can become the “cool new thing” group, and I don’t want to be that. Henriette Avram is a shining beacon for me. Like me she comes to libraries from software. And like me she was excited about what infrastructure could do. I want to carry forward that enthusiasm. And I want us to try new things, with a vision for the future.
So with that in mind, I want to talk a little bit about our team at NDI. What do we plan on doing?
We want to maximize the benefit of the Library’s digital collection to the American public and the world.
We have a lot of stuff here and a lot of it is publicly available and digital. For example, right now on our websites there are more than 10 million pages of historic newspapers, 1.2 million prints and photographs, and one-of-a-kind collections such as the papers of George Washington, Abraham Lincoln, Carl Sagan, Jackie Robinson and many more books, maps and archived websites.
And we know they’re being used by students, scholars and researchers. But how can we expand that reach even further? How do we reach more life-long learners? How do we encourage a new generation of journalists and writers to turn to LC for reference help and for resources?
When I mentioned students, there is a division at the Library that is dedicated to reaching them. The Educational Outreach team develops innovative resources for K-12 classroom teachers and it does make a difference. How do we apply those same ideas to advanced scholars, researchers and the curious to help maximize the benefit of the collections to the American public?
And we’d like to see more creative re-use of collections materials.
It’s fun but it can serve a scholarly and important purpose too. I talked earlier about the Rosa Parks papers, which contain a handwritten note…
…in which she says “I had been pushed around all my life and felt at this moment that I couldn’t take it anymore.”
The papers give us insight into Parks as an American hero in that moment, but other pieces help give us a fuller picture, to humanize her.
That’s why I love her pancake recipe, which got picked up by the press and food bloggers. From what I hear it makes good pancakes, but it helps us remember that she was a real human, not just an icon frozen in time. And hopefully it leads people to want to learn more from the source herself.
NDI has a tiny, tiny staff, which I like, actually. It helps us keep our focus and makes us very agile. We can try things, like a hackathon, without there being an impact on the critical production work of the Library of Congress. We can help make connections between staff members working on similar projects. I think of this work as being like an interface. I used to say semi-permeable membrane but that got a lot of weird looks. It’s the work of finding useful and interesting stuff in our community and making sure the people in LC who should know about it are connected. And it’s the work of letting the world know the cool and useful stuff that is going on here too.
I think we can also provide a useful service in catalyzing innovative projects. You might remember from chemistry that catalysts lower the activation energy of reactions. In our world, connections, advice and support can make new ideas possible.
I often say that the library profession is poor (I mean, we’re not Silicon Valley) but we’re scrappy as heck and we like each other. The power of indefatigable people working together is like the stream over the stone – our problems may be immovable objects but we are focused, we are many and we have a very long memory.
That is why one of the focuses on NDI will be on building relationships that can lead to successful partnerships, like our co-hosting of DPLAFest and the Archives Unleashed Datathon.
Matt Weber is up next, so in the interest of no spoilers, I won’t say much about the Hackathon that NDI co-hosted with other groups. But I will say that we had a great experience hosting coders, academics and librarians working together. In addition to making some new discoveries using data analysis on digital collections, the team collaborations showed just how important librarians are acting as experts on the data, highlighting the evolving shape of reference in this new century.
I wonder if I could tell you a quick story about that. One of the groups was looking at a text analysis from a set of Supreme Court nomination websites and the results were looking a little funny. Some words didn’t make sense in context to the scholars but luckily a Law Librarian was sitting at their table. He explained a little bit about the contours of the data set and why they might be getting some of the artifacts they were seeing. And he suggested ways to refine that query to improve the quality of the results. It’s a great example of the unique service we provide in libraries.
We’re thinking about ways to better support scholars and researchers working with digital material. The John W. Kluge Center hosts scholars from around the world to conduct research here at the Library of Congress. We’re working with the staff of the Kluge Center to determine how we can help scholars coming here to do computer-assisted research with the digital collections, such as visualization and network analysis.
We’ve asked two outside experts: Dan Chudnov and Michelle Gallinger to do a proof-of-concept for a digital scholars lab. Our goal in this pilot is to demonstrate what a lightweight implementation could look like and I’m really excited about it.
I’d like to introduce Tong Wang and Chris Adams. They’re sharing NDI’s inaugural fellowship that seeks to sponsor work that demonstrates an innovative use of LC digital collections materials. They’re well on their way, exploring collections and some open source tools, and we hope they’ll have something to share with you all soon. We’re working to expand this program in future years so we can welcome a wider range of applicants. Don’t read too much into the bike gear – it’s not a requirement.
As you can probably tell from our projects, our focus this inaugural year was on exploring Collections as Data, on seeking opportunities to get more value out of our own collections and on developing partnerships and friendships that can help advance the field. This summit is the public face to that work and one that we hope will grow into a sustainable program. And we’re exploring other ideas, like how we can enable more contributions to open source projects, what we can do to improve technical skill-building in libraries and how we improve shared infrastructure. All things that raise the tide, we hope.
Contact us! We want to work with you and we want to hear from you. Please drop us a line or read our blog, which we’ve recently relaunched with an expanded scope.
Like I said, we’re new and we are small, but I’m proud of what this team has been able to accomplish so quickly. I am particularly proud of holding this Collections As Data event, in which we invited librarians, programmers, archivist, researchers, artists, data journalists, thinkers and partners to help us explore this topic together. I hope the story about Henriette Avram inspires you to think about the power that an indefatigable and a determined individual can have on the course of human knowledge. And looking around this room, I see hundreds of you. I see visionaries and even more important, a network of collaborators that can make those ideas happen.
It’s an exciting time to be in this field, let’s think big and make friends.