Below, Eileen is in conversation with Dr. Elizabeth Lorang, Dr. Leen-Kiat Soh, and doctoral candidates Mike Pack and Yi Liu. They are members of a research team from the University of Nebraska-Lincoln collaborating with the Library of Congress on applying machine learning algorithms to Library collections for processing, metadata generation, and enhancing discoverability.
What, in your opinion, was the most promising outcome of the five machine learning projects you worked on this summer?
Soh: To me, [it] was the explorative nature of the five projects informed by insights from analyzing the data and by hands-on practical concerns from the Library.
Pack: I was excited about the fact that a set of features extracted by a deep learning model could deal with several tasks, such as classification and segmentation. Also, the fact that transfer learning (i.e., knowledge transfer) reduces training time makes it worthwhile to delve further into what deep representation can do, such as clustering document images.
Liu: The most fruitful part to me was also the exploration of transfer learning. The transferred knowledge could help us train and find performance sweet spots much faster than training from scratch. The best example is the project for digitization type differentiation, for which the training process took only three iterations to reach 90% accuracy!
Lorang: The further we made it into the summer, the more excited I got about the potential for the experiments we were conducting to inform thinking about the collections within the Library of Congress, to potentially help shape internal processes and decision-making as well as to provide an opportunity for people to potentially look at collections with fresh eyes, based on results of our experiments. I had more typically been thinking about the implications for end users, but it was exciting to spend some time thinking about the potential of some of the experiments for staff within the Library as well.
You mentioned that you had to make a lot of important decisions about which machine learning models to use. Could you tell us more about which factors you considered during this decision-making process?
Soh: Factors included the size of the data available, diversity of the characteristics or properties of the data, the level of difficulty of the tasks or applications (e.g., classification, identification; how accurate or precise must the final system be?), and availability of labeled (ground truth) data for training.
Pack: I have mainly considered the following two points: (1) type and (2) difficulty of the task.
If our objective is object recognition or segmentation, we should consider using a model having an up-sampling process so that it can preserve spatial information (e.g., fully convolutional network, U-net, etc.). On the other hand, if our objective is just a simple classification, then we can use a basic deep learning model (e.g., AlexNet, LeNet, VGG, ResNet, InceptionNet, etc.).
Secondly, the more difficult the task, the more complicated the model that is required. For example, in the ImageNet challenge where the objective is to classify a natural scene image into one of 1000 classes, the most advanced models have 101 layers. However, our classification task aims to classify a document image into one of three classes, which is relatively simpler than the ImageNet challenge.
Liu: Like Mike said, choosing a model includes two factors: (1) datasets at hand, and (2) the nature of the task.
What is the biggest benefit of this work to end users?
Soh: The biggest benefit of this work is providing better library experiences for end users that could revolutionize or revitalize how we process, index, store, retrieve, and analyze document images in the Library of Congress.
Pack: First of all, we can expect that the outcome of each project (i.e., segmentation results, classification results, image quality, etc.) be utilized as resources to generate more comprehensive metadata. This could be beneficial for both researchers and the public in that it can enrich discoverability and searchability. Second, from what we have found about the ground-truth quality of collections (e.g., inconsistent region annotation in Beyond Words,) we can let people who are participating in the crowd-sourcing project annotate data in a way that turns the collections into more valuable training datasets for researchers in the machine learning field.
Liu: The biggest benefit to end-user is providing better accessibility. With this item-level information, end users would be able to find material more effectively and efficiently.
Lorang: I’m thinking about the potential for expanding the types of questions users might come to the Library of Congress digital collections to explore. What becomes possible when there are more criteria or qualities around which a user might bring materials together, beyond being housed in a similar collection, associated with a particular individual, and so on? What if users could access content by “difficulty” of the material according to various criteria (quality of digitization, density of the page, heterogeneity of content, and so on)? Or if they were able to access materials that share or appear to share some common feature—which may have to do with their materiality and material history, what they are “about,” what they look like; and much more. Certainly, machine learning and applications of machine learning are not the only ways to arrive at some of these more expanded points of access, but they are one potential path that needs further, critical exploration before wholesale adoption.
Could you give us more information about the computing power necessary to train your models?
Pack: Please note that the hardware requirements will differ from task to task depending on several factors, such as network architecture (e.g., model size) or training configuration (e.g., batch size, input size). The following is the specification of the system that has been used for our projects:
Table 1. Specification of the deep learning system.
|Nvidia Tesla V100-DGXS (16GB)
|– The heart of deep learning; the choice of GPU is probably the most critical choice for the deep learning system
– General recommendation: GTX 1070 or GTX 1080, >8GB
|– RAM size does NOT affect deep learning performance, but mostly affects pre-processing performance
– We should have at least the amount of RAM that matches to biggest GPU memory
– More RAM can be useful if you frequently work with large dataset
|Intel Xeon E5-2698 v4 @ 2.20GHz
|– CPU does little contribution on training process, but mainly contributes on data preprocessing; load dataset (mini-batch) or pre-processing dataset (mini-batch)
– No significant performance grains when we have more cores
Liu: The computing power needed depends on the stage of the task. The training stage requires much more computing power than the deployment stage. While the speed of CPU/GPU decides the speed of training, the capacity of memory decides the capability of training. For example, in the segmentation project, to perform training on images that each has an average of 720,000 pixels in total requires a minimum of 16 GB of RAM.
Lorang: You asked about computing power, but I’m interested in talking about human power and labor. It’s important to know that the type of training we did requires human intervention and human knowledge—whether from the crowd, through data and information generated through By the People, or from the expert metadata generated over decades by professional staff at the Library of Congress, or through members of our team, including a graduate student at UNL, Ashlyn Stewart, who did important “ground truth” work for us. When we talk about machine learning, we need to be sure to consider the human power still necessary.
Can you outline some of the advantages of using the Library’s collections for this kind of work? Which collections did you end up working with and why did you choose them?
Soh: Advantages include the availability of metadata and ancillary data for each collection, each collection’s large number of document images with diverse characteristics, several collections’ ground-truth (labeled) data for machine learning, and the availability of use cases.
Pack: The collection that I used for the image classification task, which does not require ground-truth of regions, was Suffrage collection from the By the People, which has a fair range of intra-class variance.
Liu: The data from the Library’s collection is the best choice, considering the significantly wide range of the collections. Collections in my list are reviewed datasets on Beyond Words, and the Civil War collection on By the People.
What is the most exciting aspect of this collaboration between the University of Nebraska-Lincoln and the Library of Congress from your perspective?
Soh: From my perspective, the most exciting aspect of this collaboration is the exchange of ideas between the practitioners and researchers; the day-to-day, week-to-week insights or inspirations as a result of the analyses.
Pack: It was so exciting to play with real-world data collections, and it was a valuable opportunity to substantiate a practical usage of what we researchers have been investigating in the academy.
Liu: The most exciting aspect to me was the collaboration between the practice and research and the opportunity to understand where gaps exist between the two.
Lorang: The collaboration helped me pretty radically expand my understanding of the ways that researchers might partner with the Library of Congress, and it was a remarkable opportunity to get a look at the wide ranges of questions and concerns—and really creative ideas—that the staff at the Library of Congress are thinking about. The collaboration was also exciting because of the short path from considering a question or experiment and the opportunity to put it into practice. One of our other major activities right now on the Aida team is a three-year research project, which we hope will ultimately affect practice. This collaboration and the amazing team at LC Labs and the LC more broadly offered us an opportunity affect digital library development more immediately, and the lessons we learned will impact our larger project in pretty profound ways, too.
This post has been lightly edited for clarity.