Top of page

NDI Talk at Collections as Data

Share this post:

Here’s the text of the talk I gave last week at the Collections as Data event my group hosted on September 27, 2016. If you would like to watch it, the talk starts at about minute 54 of the video of the event.

Welcome to Collections as Data! I’m excited to tell you about our group, National Digital Initiatives that is hosting today’s event.

Meet our team: Abbey Potter, Mike Ashenfelder and Jaime Mears. They spent a lot of time putting this together and I think they did a great job. In addition to Jane McAuliffe, who you just heard from, Eugene Flanagan from our executive management team is here too, so if you like what we’re doing, be sure to stop them and thank them.

Text: "Hello and welcome from National Digital Initiatives," photos of staff

I’d like to take this opportunity to talk to you all a bit about this new group we’ve started, National Digital Initiatives, what we hope to accomplish, and a little more about what you can expect today. But first, a short story that I think helps illustrate the Library of Congress’ long history of technological innovation. I love to tell the story of Henriette Avram, whose work here at the Library of Congress replaced ink-on-paper card catalogs and revolutionized cataloging systems at libraries worldwide

Photo of Henriette Avram with a quote of hers that reads "When I speak of and refer to it as ‘the Great Library,’ I do so with sincerity and appreciation for everything that I learned within those walls."
Henriette Avram. Photo by Reid Baker, https://www.loc.gov/loc/lcib/9708/ala.html.

Henriette Avram was born in New York in 1919. She took two years of pre-med courses at Hunter College, then left to start a family. She was in her 30s when she started learning programming. (I love this story because, there are people who will insist if you haven’t been coding since you were crawling, you’ll never make it as a programmer. They’re wrong. She did, and she changed the world.)

Photo of a person using a card catalog
Woman at Main Reading Room card catalog in the Library of Congress. Photo by Jack Delano https://lccn.loc.gov/90712184.

As some of you know, Henriette and her team created the MARC format, a structure to contain bibliographic information for all forms of materials. This format was the keystone that made a revolution in information science possible. Because of her work, we can, with a few keystrokes, search the treasures of a library on the other side of the earth.

Ms. Avram is a bit of a hidden hero in computer science. She did all of this before the first relational database, before character encoding was mature and before computers were networked. It’s amazing to me that we don’t hear her mentioned more often.

Image of book stacks with text: "I am not a librarian by training but a brainwashed computer systems analyst." -Avram
Library of Congress, Jefferson Building, bookstacks area. Photo. https://www.loc.gov/item/91719273/.

I like this story because it shows the power of cross-discipline fertilization. Ms. Avram herself combined two complex fields, computer programming and intricate cataloging practices to create a sea-change in how the public and scholars access library collections. Many of you who sit here may not remember using card catalogs, but this work was transformational. It made finding things much, much easier and allowed remote access to resources. I miss the smell of a card catalog, don’t you? But I don’t miss much else.

Ms. Avram’s work started the digital revolution in information science. We (both LC and the field as a whole) carry that spirit forward today in our contributions to standards work and open source tools. But innovation is not always in the big bang, it just looks like that from hindsight. People say MARC was released in 1968 but really it was released in stages over about a decade and continues to be refined to this day. The MARC of today is an international standard that covers character sets for all writing systems currently in use. Innovation often looks less like a dot on a timeline than a series of continuous improvements.

Representative images from collections released on loc.gov in 2016: a portrait of Walt Whitman, an image of lace, ballroom dancing manual cover....
William T. Sherman Papers: Certificate of thanks signed by Abraham Lincoln, http://hdl.loc.gov/loc.mss/ms009309.mss39800.0155
Needlework display at St. Nicholas Greek Orthodox Church Photo by Jonas Dovydenas. https://www.loc.gov/item/afc1981004.018/
Walt Whitman Papers: Photograph of Whitman. Photo by Frank Pearsall, http://hdl.loc.gov/loc.mss/ms005001.mss77909.019
How to dance A complete ball-room and party guide. Published by Tousey & Small. https://www.loc.gov/item/musdi.098/
Walt Whitman Papers: Cardboard Butterfly and Notebooks, https://www.loc.gov/item/mss454430216/
Nathan W. Daniels Diary: Diary; Vol. I, 1861 http://hdl.loc.gov/loc.mss/ms014083.mss84934.01.

With that in mind, I’d like to talk a little about loc.gov, which is one dot on a timeline represents a lot of invisible labor and exciting treasures. Each new collection on that list contains work from people throughout the Library: especially in our CIO’s office and in Library Services.

So far this year, we have released a lot of new collections including the diaries of George Patton, The Chicago Ethnic Arts Project survey and Walt Whitman’s papers. These new collections represent the opportunity for remote access to resources that previously required a plane trip to DC for most of America.

I just wanted to take a minute to zoom down one more level and talk about one collection, Rosa Parks. Just as an example of the kind of digital work LC does. I picked this one because I think it’s really exciting. But we could just as easily be taking a close look at any of those other collections today, like ballroom dancing instruction manuals or web archives from the 2014 election.

Rosa Parks and Honorable Congresswoman Shirley Chisholm. [ca. 1968] Image. //hdl.loc.gov/loc.pnp/ppmsca.38704.
Rosa Parks’ collection of personal correspondence and photographs is on a ten-year loan to LC from the Howard G. Buffett Foundation. The collection contains about 7,500 items in the Manuscript Division and 2,500 photographs in the Prints and Photographs division, documenting Mrs. Parks’ private life and public activism on behalf of civil rights for African Americans. The material was assessed and stabilized by LC’s Conservation Division staff, cataloged and described by archivists and librarians and  then digitized. Files were moved around by people and software, assessed and validated and then prepared for access. Metadata was added and transformed. Webpages were made. Search indexes populated. Rights assessed. A few years worth of work in less than 140 characters.

Image of a tweet from @libraryofcongress that reads: "Rosa Parks Collection Now Online..."

For those of you who are librarians, you know that this is just the job. A bunch of invisible work to make information usable. But I want to make a big deal out of it. Because it’s important. And because new, cool websites get a lot of attention but the usual production work of adding vast resources to the Nation’s collective is easy to overlook.

As I think about NDI, my new team, and what it’s tasked with and what I want to accomplish, I often think about the tension between innovation and sustainability. The sustained effort it takes to ready new collections for the web. The enduring power of the MARC format and bibliographic standards in general. Because I think groups like ours can become the “cool new thing” group, and I don’t want to be that. Henriette Avram is a shining beacon for me. Like me she comes to libraries from software. And like me she was excited about what infrastructure could do. I want to carry forward that enthusiasm. And I want us to try new things, with a vision for the future.

So with that in mind, I want to talk a little bit about our team at NDI. What do we plan on doing?

Photo of a woman operating a hand drill from the U.S. Office of War Information, 1944. Text reads: Maximize the benefit of the digital collection"
Operating a hand drill at the North American Aviation, Inc., Photo by Alfred T. Palmer, http://hdl.loc.gov/loc.pnp/pp.fsac.

We want to maximize the benefit of the Library’s digital collection to the American public and the world.

We have a lot of stuff here and a lot of it is publicly available and digital. For example, right now on our websites there are more than 10 million pages of historic newspapers, 1.2 million prints and photographs, and one-of-a-kind collections such as the papers of George Washington, Abraham Lincoln, Carl Sagan, Jackie Robinson and many more books, maps and archived websites.

And we know they’re being used by students, scholars and researchers. But how can we expand that reach even further? How do we reach more life-long learners? How do we encourage a new generation of journalists and writers to turn to LC for reference help and for resources?

When I mentioned students, there is a division at the Library that is dedicated to reaching them. The Educational Outreach team develops innovative resources for K-12 classroom teachers and it does make a difference. How do we apply those same ideas to advanced scholars, researchers and the curious to help maximize the benefit of the collections to the American public?

And we’d like to see more creative re-use of collections materials.

Screenshot of the Flickr commons showing photos with the tag "greatmustachesoftheloc"
Great Mustaches of the LOC. https://www.flickr.com/commons/tags/greatmustachesoftheloc/.

It’s fun but it can serve a scholarly and important purpose too. I talked earlier about the Rosa Parks papers, which contain a handwritten note…

Note handwritten by Rosa Parks
Rosa Parks Papers: Accounts of her arrest, https://www.loc.gov/resource/mss85943.001810.

…in which she says “I had been pushed around all my life and felt at this moment that I couldn’t take it anymore.”

The papers give us insight into Parks as an American hero in that moment, but other pieces help give us a fuller picture, to humanize her.

Photo of Rosa Parks pancake recipe written on a bank deposit envelope
Rosa Parks Papers: Recipe for featherlite pancakes, https://www.loc.gov/resource/mss85943.002606/.

That’s why I love her pancake recipe, which got picked up by the press and food bloggers. From what I hear it makes good pancakes, but it helps us remember that she was a real human, not just an icon frozen in time. And hopefully it leads people to want to learn more from the source herself.

Photo of Martha Graham in dance with text: "Incubate, encourage, and promote digital innovation"
Ekstasis, No. 2. Library of Congress, Music Division. https://www.loc.gov/item/ihas.200154181. Reproduced with permission of Martha Graham Resources, a division of The Martha Graham Center of Contemporary Dance, www.marthagraham.org.

NDI has a tiny, tiny staff, which I like, actually. It helps us keep our focus and makes us very agile. We can try things, like a hackathon, without there being an impact on the critical production work of the Library of Congress. We can help make connections between staff members working on similar projects. I think of this work as being like an interface. I used to say semi-permeable membrane but that got a lot of weird looks. It’s the work of finding useful and interesting stuff in our community and making sure the people in LC who should know about it are connected. And it’s the work of letting the world know the cool and useful stuff that is going on here too.

Energy diagram of an exothermic reaction
Lowering the Activation Energy of a Reaction by a Catalyst http://2012books.lardbucket.org/books/principles-of-general-chemistry-v1.0/s18-08-catalysis.html.

I think we can also provide a useful service in catalyzing innovative projects. You might remember from chemistry that catalysts lower the activation energy of reactions. In our world, connections, advice and support can make new ideas possible.

I often say that the library profession is poor (I mean, we’re not Silicon Valley) but we’re scrappy as heck and we like each other. The power of indefatigable people working together is like the stream over the stone – our problems may be immovable objects but we are focused, we are many and we have a very long memory.

Crowd shots of DPLAFest and the Archives Unleashed datathon
DPLAfest 2016. Photo by Jason Dixon https://www.flickr.com/photos/dpla/26587463152/
Archives Unleashed Hackathon. Photo by Jaime Mears https://blogs.loc.gov/thesignal/2016/07/.

That is why one of the focuses on NDI will be on building relationships that can lead to successful partnerships, like our co-hosting of DPLAFest and the Archives Unleashed Datathon.

Matt Weber is up next, so in the interest of no spoilers, I won’t say much about the Hackathon that NDI co-hosted with other groups. But I will say that we had a great experience hosting coders, academics and librarians working together. In addition to making some new discoveries using data analysis on digital collections, the team collaborations showed just how important librarians are acting as experts on the data, highlighting the evolving shape of reference in this new century.

I wonder if I could tell you a quick story about that. One of the groups was looking at a text analysis from a set of Supreme Court nomination websites and the results were looking a little funny. Some words didn’t make sense in context to the scholars but luckily a Law Librarian was sitting at their table. He explained a little bit about the contours of the data set and why they might be getting some of the artifacts they were seeing. And he suggested ways to refine that query to improve the quality of the results. It’s a great example of the unique service we provide in libraries.

Photo of a Scientific Laboratory
Scientific Laboratory. Photo by Prokudin-Gorskiĭ, Sergeĭ Mikhaĭlovich, https://www.loc.gov/item/prk2000002511/.

We’re thinking about ways to better support scholars and researchers working with digital material. The John W. Kluge Center hosts scholars from around the world to conduct research here at the Library of Congress. We’re working with the staff of the Kluge Center to determine how we can help scholars coming here to do computer-assisted research with the digital collections, such as visualization and network analysis.

We’ve asked two outside experts: Dan Chudnov and Michelle Gallinger to do a proof-of-concept for a digital scholars lab. Our goal in this pilot is to demonstrate what a lightweight implementation could look like and I’m really excited about it.

Photos and description from the Github accounts of Chris Adams and Tong Wang
Photos from the Github accounts of Chris Adams and Tong Wang.

I’d like to introduce Tong Wang and Chris Adams. They’re sharing NDI’s inaugural fellowship that seeks to sponsor work that demonstrates an innovative use of LC digital collections materials. They’re well on their way, exploring collections and some open source tools, and we hope they’ll have something to share with you all soon. We’re working to expand this program in future years so we can welcome a wider range of applicants. Don’t read too much into the bike gear – it’s not a requirement.

Illustration of spirals with the text "Collections as Data September 27th 2016 Library of Congress Open to the Public"
Art created by the User Experience Team of The Library of Congress, a team of professionals that are committed to making the Library’s collections more available and accessible to the American people.

As you can probably tell from our projects, our focus this inaugural year was on exploring Collections as Data, on seeking opportunities to get more value out of our own collections and on developing partnerships and friendships that can help advance the field. This summit is the public face to that work and one that we hope will grow into a sustainable program. And we’re exploring other ideas, like how we can enable more contributions to open source projects, what we can do to improve technical skill-building in libraries and how we improve shared infrastructure. All things that raise the tide, we hope.

Photo of Trees and a Bridge with the text "Thank you"
Theodore Roosevelt Island. Photo by Carol Highsmith, https://www.loc.gov/item/2010630948/.

Contact us! We want to work with you and we want to hear from you. Please drop us a line or read our blog, which we’ve recently relaunched with an expanded scope.

Like I said, we’re new and we are small, but I’m proud of what this team has been able to accomplish so quickly.  I am particularly proud of holding this Collections As Data event, in which we invited librarians, programmers, archivist, researchers, artists, data journalists, thinkers and partners to help us explore this topic together. I hope the story about Henriette Avram inspires you to think about the power that an indefatigable and a determined individual can have on the course of human knowledge. And looking around this room, I see hundreds of you. I see visionaries and even more important, a network of collaborators that can make those ideas happen.

It’s an exciting time to be in this field, let’s think big and make friends.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.