```{ subscribe_url:'/share/sites/library-of-congress-blogs/geography-and-maps.php' }```

# Digital Image Processing: It’s All About the Numbers

This is a guest post by Rachel Trent, Digital Collections and Automation Coordinator in the Geography and Map Division.

Every time you look at an online image of a historical map, what you’re viewing is really just a spreadsheet of numbers. Or more likely, three spreadsheets, one each for red, green, and blue (the technical way to describe this is as a “3-dimensional array”, but it’s ok to simply think of it as three spreadsheets). Each of the image’s pixels is represented by a number from the red spreadsheet, the green spreadsheet, and the blue spreadsheet. Your device simply visualizes that numerical data as a grid of colors.

Thousands of Library of Congress maps are imaged each month, allowing you to not only view them online but also allowing you to analyze and transform the images using relatively straightforward mathematical computation. This computation is the same approach used any time you apply a filter to a photo on your phone, increase its contrast, crop it, etc. Your device treats the images as arrays of numbers and runs quick calculations over them. With a bit of programming knowledge, it is surprisingly easy to replicate a wide range of basic image editing techniques.

Below is a Library of Congress map sheet that was imaged and made available online this year. It is the first in a set of U.S. Army Map Service maps covering Pennsylvania at a scale of 1:25,000. Compiled in 1953, this sheet shows the west side of Pittsburgh and surrounding areas. The set has over 300 sheets, which makes it a little larger than average amongst the 12,000+ sets in the Geography and Map Division’s Set Map collection.

Each pixel in a digital image can be represented by its three values: red, green, and blue (or RGB). Sheet 1, Pittsburg West, Pennsylvania 1:25,000. Army Map Service, 1953. Geography and Map Division. [Images below are same sheet]

In the example image above, we’ve zoomed into nine pixels near the mouth of the Allegheny River. Each of the image’s pixel is defined by its color. Because all colors visible to the human eye can be created by some mixture of red, green, and blue, one of the most common ways to represent pixels is with three numbers: the amount of red, green, and blue used to mix the color. For example, the top left pixel above is made with 156 units of red, 101 units of blue, and 74 units of green.  (This part gets a little complicated, but in this context the maximum for any of the three colors is 255.)

When we use image editing software to alter images, they run calculations across these numbers to make edits. If we want to use a more speedy approach that gives us more control, we can use programming languages, such as Python, instead of image editing software.

Let’s say we’d like to crop each of this set’s sheets to the neatline, in order to remove the collar and leave only the actual map. Although there are effective machine learning approaches to this kind of task, for this demo we’ll stick to a more straightforward approach that relies on more intuitive image processing steps, using Python’s OpenCSV library. (A few of these steps do employ machine learning techniques behind the scenes, but our overall process is mostly manually configured.) Such an approach often works well for simple maps like those in our Pennsylvania set, but would be less effective for visually complex, diverse, or larger sets.

When we convert an image from RGB to grayscale, each pixel is represented by a single number from 0 to 255.

First, we’ll convert our image from red, green, and blue (known as RGB) to grayscale. Instead of each pixel being defined by three numbers, now each will be defined by just one number (ranging from 0 for black to 255 for white).

When we convert the image to black and white, every pixel in the image become either white (255) or black (0).

Next, we will convert the image from grayscale to black and white. Each pixel will now become either 0 (black) or 255 (white), without anything in between. For this example, we will we set the divider simply at 200: anything 0 – 200 we will round down to 0 (in other words, most grays will become black), and anything 201 – 255 we will round up to 255 (light grays will become white). Only one of our nine example pixels becomes white.

A process known as “closing” will remove black pixels from mostly-white areas of the image. Here, we’ve applied very subtle closing, to help make the border line around the map more solid and consistent.

Next, we will use a process called “closing” to remove noise in the white areas of the image. (This is what’s called a “morphological” process to “dilate” and then “erode” an image, and it comes pre-packaged in Python’s OpenCSV.) This step will help to close potential holes in the white border around the map, so that it is easier to detect in the next step. In the example above, we’ve applied a very limited amount of closing.

OpenCSV finds over 19,000 shapes (or “contours”) on our map sheet, each outlined in green here for demonstration purposes. We only need to keep one (the approximate neatline around the inner map), and can filter out the rest.

Next, we will use two more processes built into the Python OpenCSV library, the first for finding “contours” (also known as shapes) and the second to smooth out any dents along the edges of our shapes (known as “contour approximation”). We really only want to find one contour (the one along the neatline around the inner map), but our result is over 19,000 contours . . . too many! In the image above, each contour is shown outlined in green.

We can create a set of filters to automatically remove all contours except the contour around the map’s neatline.

We can easily calculate a few filters to remove the contours we don’t want, such as filters to remove contours whose area is too small or too large in proportion to the overall image. We can also filter out any shapes that aren’t four-sided (because we know that the neatline is roughly four sided). With some trial and error, it’s possible to create a set of filters that reliably leaves us with just one contour along the neatline of each sheet in our example set.

Once we’ve isolated the contour outlining the neatline, we can use it to automatically crop the collars—or margins—from our map images.

Lastly, we can return to our original color image and cut out out any pixels whose position falls outside the contour’s corners. We can then save the image as a new, cropped image file.

Running this process over all the images in our example Pennsylvania set takes just a few minutes and gives us relatively reliable results. For more varied or visually complex sets of map sheets, it may be more effective to use a more nuanced machine learning approach or simply manually crop the images in image editing software. Regardless, if you peek under the hood of any of these three approaches, what you’ll find is a set of numbers and a lot of math.

# Mapping Magnitude: The Evolution of Earthquake Maps

This is a guest post by Sonia Kahn, Processing Technician in the Geography and Map Division. Did you know that worldwide, roughly 55 earthquakes are recorded per day? Of course, the vast majority of these seismic events are minor, making it all the more impressive that we are able to detect them. The technology used to […]

# A Rare Atlas of the First World War

On October 27, 2022, the Library of Congress held an event for members of the Philip Lee Phillips Society, the Washington Map Society, and the Friends of the Library of Congress. The event was named “Explore the Depths of the Geography and Map Division.” Unusual maps and atlases from the collections of the Geography […]

# A Year in Review: Newly Scanned Maps of 2022

On the Geography and Map Division home page, we keep a list of maps newly placed online. As has become tradition (see previous Year in Review posts), to celebrate the end of a year and to ring in the new, I take a look back at the digitized maps that are now available online from […]