{ subscribe_url:'//loc.gov/share/sites/library-of-congress-blogs/geography-and-maps.php', }

Digitizing 2.5 Million Maps: An Analyst’s View

Today’s guest post is from Mike Schoelen, a Post-Graduate GIS Research Fellow in the Geography and Map Division. This post was inspired by work on the Geographic Hotspot Dynamic Indexing Project and collaboration with Amanda Brioche, Erin Kelly, and Evan Neuwirth. Mike is a born and raised Marylander. After completing his undergraduate degree at Frostburg State University, he moved to central Maryland to pursue a Master’s Degree in Geography and Environmental Planning at Towson University. An excerpt from his thesis on the use of GIS to model population distribution is to be published by Applied Geography Magazine in November.

A success rate of 99% would be hailed in most professions as a grand accomplishment, if not a miracle. Just imagine: a restaurant that has only 1 complaint out of 100 customers served that day, a drug with one side effect in just 1 of 100 of the test subjects, or even a batter that steps up to the plate and manages to hit 9 out of 10 pitches. While the results of the Dynamic Indexing Project might not have lives on the line (or a professional sports career), we know one thing for sure: 99% doesn’t work here. And it all comes down to a simple matter of multiplication.

For the Dynamic Indexing Project, we have been building a digital singularity where the entire set map collection can live on a geographic information system (GIS) platform and be accessed through an interface. In the case of a few thousand maps, this could be done in a week or two, involving a few long nights of scanning and some simple quality assurance if any issues arise. We, however, are not talking about a few thousand maps, or a few ten thousand, or even just one million. The collection intended for digitization consists of over 2.5 million map sheets. If the collection were stacked into a single pile, it would tower the Washington Monument by over 300 feet. This doesn’t even include the single sheet collection housed in the other half of the Geography & Map Division!

A small sample of the 2.5 million maps slated for digitization. Geography & Map Division, Library of Congress.

A small sample of the 2.5 million maps slated for digitization. Geography & Map Division, Library of Congress.

For a temporal example, let us say that we perform a redundant process when digitizing these maps, something as simple as running an unnecessary process on a computer which causes the scanner to slow by one second per map sheet. From experience, unless you sit down with a stopwatch, you would not notice the delay. Fortunately for the Library (and the tax payers), we brought a stop watch to work and did just that. This one second delay, caused by a switch toggled deep within a scanner preference, would have accrued as the entire collection was fed through, amounting to over 650 hours or over 80 work days lost over the course of the project. Sorry, Summer Interns of 2016, you’ll have to find another task to do.

Let’s look at a qualitative example and say we manually entered data for these map sheets. For every 100 map sheets, one single digit was transcribed incorrectly, causing one in 100 images to have a bad value associated with it. Small errors are a daily occurrence in our field; I recall using some GIS stream data for a side-project and finding a small error showing a stream running uphill! But when the collections sheer size is accounted for, we would see that over 25,000 map sheets would be poorly transcribed. To combat this, we have built automation procedures to limit human interaction and the associated errors.

Every day is a challenge to beat 99%, because anything less would be a failure. It’s the challenge an analyst signs up for, and we love every day of it.

An index of Syria produced by the Geographic Hotspot Dynamic Indexing Project. Geography & Map Division, Library of Congress.

An index of Syria produced by the Geographic Hotspot Dynamic Indexing Project. Geography & Map Division, Library of Congress.