{ subscribe_url:'//loc.gov/share/sites/library-of-congress-blogs/geography-and-maps.php', }

Digitizing 2.5 Million Maps: An Analyst’s View

Today’s guest post is from Mike Schoelen, a Post-Graduate GIS Research Fellow in the Geography and Map Division. This post was inspired by work on the Geographic Hotspot Dynamic Indexing Project and collaboration with Amanda Brioche, Erin Kelly, and Evan Neuwirth. Mike is a born and raised Marylander. After completing his undergraduate degree at Frostburg State University, he moved to central Maryland to pursue a Master’s Degree in Geography and Environmental Planning at Towson University. An excerpt from his thesis on the use of GIS to model population distribution is to be published by Applied Geography Magazine in November.

A success rate of 99% would be hailed in most professions as a grand accomplishment, if not a miracle. Just imagine: a restaurant that has only 1 complaint out of 100 customers served that day, a drug with one side effect in just 1 of 100 of the test subjects, or even a batter that steps up to the plate and manages to hit 9 out of 10 pitches. While the results of the Dynamic Indexing Project might not have lives on the line (or a professional sports career), we know one thing for sure: 99% doesn’t work here. And it all comes down to a simple matter of multiplication.

For the Dynamic Indexing Project, we have been building a digital singularity where the entire set map collection can live on a geographic information system (GIS) platform and be accessed through an interface. In the case of a few thousand maps, this could be done in a week or two, involving a few long nights of scanning and some simple quality assurance if any issues arise. We, however, are not talking about a few thousand maps, or a few ten thousand, or even just one million. The collection intended for digitization consists of over 2.5 million map sheets. If the collection were stacked into a single pile, it would tower the Washington Monument by over 300 feet. This doesn’t even include the single sheet collection housed in the other half of the Geography & Map Division!

A small sample of the 2.5 million maps slated for digitization. Geography & Map Division, Library of Congress.

A small sample of the 2.5 million maps slated for digitization. Geography & Map Division, Library of Congress.

For a temporal example, let us say that we perform a redundant process when digitizing these maps, something as simple as running an unnecessary process on a computer which causes the scanner to slow by one second per map sheet. From experience, unless you sit down with a stopwatch, you would not notice the delay. Fortunately for the Library (and the tax payers), we brought a stop watch to work and did just that. This one second delay, caused by a switch toggled deep within a scanner preference, would have accrued as the entire collection was fed through, amounting to over 650 hours or over 80 work days lost over the course of the project. Sorry, Summer Interns of 2016, you’ll have to find another task to do.

Let’s look at a qualitative example and say we manually entered data for these map sheets. For every 100 map sheets, one single digit was transcribed incorrectly, causing one in 100 images to have a bad value associated with it. Small errors are a daily occurrence in our field; I recall using some GIS stream data for a side-project and finding a small error showing a stream running uphill! But when the collections sheer size is accounted for, we would see that over 25,000 map sheets would be poorly transcribed. To combat this, we have built automation procedures to limit human interaction and the associated errors.

Every day is a challenge to beat 99%, because anything less would be a failure. It’s the challenge an analyst signs up for, and we love every day of it.

An index of Syria produced by the Geographic Hotspot Dynamic Indexing Project. Geography & Map Division, Library of Congress.

An index of Syria produced by the Geographic Hotspot Dynamic Indexing Project. Geography & Map Division, Library of Congress.

One Comment

  1. Michael O’Neill
    November 18, 2015 at 12:46 pm

    I pause to contemplate the logistics of scanning 2.5M images. I spent some years overseeing conversion of paper to digital in a city planning department archive, and I’d love to see some more detailed description of that process.

    Clearly technology has advanced, and this glimpse behind the scenes is a valuable contribution to my continuing education. Thanks for posting!

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.