{ subscribe_url:'/share/sites/library-of-congress-blogs/geography-and-maps.php' }

Digital Image Processing: It’s All About the Numbers

This is a guest post by Rachel Trent, Digital Collections and Automation Coordinator in the Geography and Map Division.

Every time you look at an online image of a historical map, what you’re viewing is really just a spreadsheet of numbers. Or more likely, three spreadsheets, one each for red, green, and blue (the technical way to describe this is as a “3-dimensional array”, but it’s ok to simply think of it as three spreadsheets). Each of the image’s pixels is represented by a number from the red spreadsheet, the green spreadsheet, and the blue spreadsheet. Your device simply visualizes that numerical data as a grid of colors.

Thousands of Library of Congress maps are imaged each month, allowing you to not only view them online but also allowing you to analyze and transform the images using relatively straightforward mathematical computation. This computation is the same approach used any time you apply a filter to a photo on your phone, increase its contrast, crop it, etc. Your device treats the images as arrays of numbers and runs quick calculations over them. With a bit of programming knowledge, it is surprisingly easy to replicate a wide range of basic image editing techniques.

Below is a Library of Congress map sheet that was imaged and made available online this year. It is the first in a set of U.S. Army Map Service maps covering Pennsylvania at a scale of 1:25,000. Compiled in 1953, this sheet shows the west side of Pittsburgh and surrounding areas. The set has over 300 sheets, which makes it a little larger than average amongst the 12,000+ sets in the Geography and Map Division’s Set Map collection.

Each pixel in a digital image can be represented by its three values: red, green, and blue (or RGB). Sheet 1, Pittsburg West, Pennsylvania 1:25,000. Army Map Service, 1953. Geography and Map Division. [Images below are same sheet]

In the example image above, we’ve zoomed into nine pixels near the mouth of the Allegheny River. Each of the image’s pixel is defined by its color. Because all colors visible to the human eye can be created by some mixture of red, green, and blue, one of the most common ways to represent pixels is with three numbers: the amount of red, green, and blue used to mix the color. For example, the top left pixel above is made with 156 units of red, 101 units of blue, and 74 units of green.  (This part gets a little complicated, but in this context the maximum for any of the three colors is 255.)

When we use image editing software to alter images, they run calculations across these numbers to make edits. If we want to use a more speedy approach that gives us more control, we can use programming languages, such as Python, instead of image editing software.

Let’s say we’d like to crop each of this set’s sheets to the neatline, in order to remove the collar and leave only the actual map. Although there are effective machine learning approaches to this kind of task, for this demo we’ll stick to a more straightforward approach that relies on more intuitive image processing steps, using Python’s OpenCSV library. (A few of these steps do employ machine learning techniques behind the scenes, but our overall process is mostly manually configured.) Such an approach often works well for simple maps like those in our Pennsylvania set, but would be less effective for visually complex, diverse, or larger sets.

When we convert an image from RGB to grayscale, each pixel is represented by a single number from 0 to 255.

First, we’ll convert our image from red, green, and blue (known as RGB) to grayscale. Instead of each pixel being defined by three numbers, now each will be defined by just one number (ranging from 0 for black to 255 for white).

When we convert the image to black and white, every pixel in the image become either white (255) or black (0).

Next, we will convert the image from grayscale to black and white. Each pixel will now become either 0 (black) or 255 (white), without anything in between. For this example, we will we set the divider simply at 200: anything 0 – 200 we will round down to 0 (in other words, most grays will become black), and anything 201 – 255 we will round up to 255 (light grays will become white). Only one of our nine example pixels becomes white.

A process known as “closing” will remove black pixels from mostly-white areas of the image. Here, we’ve applied very subtle closing, to help make the border line around the map more solid and consistent.

Next, we will use a process called “closing” to remove noise in the white areas of the image. (This is what’s called a “morphological” process to “dilate” and then “erode” an image, and it comes pre-packaged in Python’s OpenCSV.) This step will help to close potential holes in the white border around the map, so that it is easier to detect in the next step. In the example above, we’ve applied a very limited amount of closing.

OpenCSV finds over 19,000 shapes (or “contours”) on our map sheet, each outlined in green here for demonstration purposes. We only need to keep one (the approximate neatline around the inner map), and can filter out the rest.

Next, we will use two more processes built into the Python OpenCSV library, the first for finding “contours” (also known as shapes) and the second to smooth out any dents along the edges of our shapes (known as “contour approximation”). We really only want to find one contour (the one along the neatline around the inner map), but our result is over 19,000 contours . . . too many! In the image above, each contour is shown outlined in green.

We can create a set of filters to automatically remove all contours except the contour around the map’s neatline.

We can easily calculate a few filters to remove the contours we don’t want, such as filters to remove contours whose area is too small or too large in proportion to the overall image. We can also filter out any shapes that aren’t four-sided (because we know that the neatline is roughly four sided). With some trial and error, it’s possible to create a set of filters that reliably leaves us with just one contour along the neatline of each sheet in our example set.

Once we’ve isolated the contour outlining the neatline, we can use it to automatically crop the collars—or margins—from our map images.

Lastly, we can return to our original color image and cut out out any pixels whose position falls outside the contour’s corners. We can then save the image as a new, cropped image file.

Running this process over all the images in our example Pennsylvania set takes just a few minutes and gives us relatively reliable results. For more varied or visually complex sets of map sheets, it may be more effective to use a more nuanced machine learning approach or simply manually crop the images in image editing software. Regardless, if you peek under the hood of any of these three approaches, what you’ll find is a set of numbers and a lot of math.

Mapping Magnitude: The Evolution of Earthquake Maps

This is a guest post by Sonia Kahn, Processing Technician in the Geography and Map Division. Did you know that worldwide, roughly 55 earthquakes are recorded per day? Of course, the vast majority of these seismic events are minor, making it all the more impressive that we are able to detect them. The technology used to […]

Gannett and Hewes’ Visualizations of the 1880 Census

The end of the 19th century saw a rise in the proliferation of data visualizations alongside traditional cartography and thematic mapping. A terrific example of this type of work is Scribner’s Statistical Atlas of the United States, which “shows by graphic methods [the states’] present condition and their political, social, and industrial development.” The atlas […]

New Interactive Map Showcases the Panoramic Maps Collection

The Panoramic Maps Collection, one of our most popular collections, features more than a thousand beautifully illustrated “bird’s-eye-view” maps of towns and cities across the United States, Canada, and even some internationally. To celebrate this collection, we are excited to launch View from Above: Exploring the Panoramic Map Collection, an interactive map that makes browsing […]

Geography and Map Virtual Orientation: The Boundaries of Afghanistan during the Great Game

Please join us for the fourth session in a new series of virtual orientations from the Geography and Map Division! Date: Tuesday, December 13th, 2022 Time: 3:00–4:00 pm (Eastern) Location: Zoom Reference specialist Cynthia Smith and librarian Carissa Pastuch will present an introduction to the maps housed at the Library of Congress. This orientation will […]

Little Atlas, Big World

In 1764, Le petit atlas maritime – “the little maritime atlas” – was published in Paris. Consisting of 5 volumes, each about 14 inches tall and containing 581 maps in total, Le petit atlas maritime is not particularly little. Its subject matter, in fact, is expansive: individual volumes are dedicated to maps of North America and the […]

Native American Spaces: Cartographic Resources at the Library of Congress

*The text for this blog post was adapted from the research guide created by former G&M reference specialist, Mike Klein, and from the essay by Jim Flatness, former G&M acquisitions specialist, found in the Library’s publication “Many nations : a Library of Congress resource guide for the study of Indian and Alaska Native Peoples of the […]

GIS Day 2022: Exploring Humanitarian GIS

Happy GIS Day from the Library of Congress! Today the Library celebrates GIS Day with a virtual event exploring the role of GIS in addressing humanitarian disasters. Today’s event aims to highlight the role that geospatial data and GIS technologies can play in creating positive change in the face of global humanitarian challenges. Geography enthusiasts, […]