This is a guest post by Rachel Trent, Digital Collections and Automation Coordinator in the Geography and Map Division.
The image below is of a TIFF file, but not just any TIFF. Hidden inside are coordinates that bind the image to a specific place on Earth. For every pixel in the image, an estimated latitude and longitude can be calculated, making it a powerful source for computational analysis and spatial visualization. This file is a GeoTIFF, a format widely used by the geospatial community for creating and sharing image-based data—particularly satellite imagery, aerial photography imagery, and raster datasets. It’s also used for digitized historical maps. GeoTIFFs are a special subtype of TIFFs, built with everything a TIFF file is required to contain plus a bonus section of hidden geospatial data that can be used to link the image to a particular space within a particular coordinate system.
At first glance, GeoTIFFs can appear indistinguishable from standard TIFF files. Their filenames end in the “.tif” file extension, just like any other TIFF. They can be opened and viewed in any application that opens TIFF files. Once opened in a standard TIFF image viewer, however, they can sometimes appear stretched (as seen above), rotated, or warped in more complex ways. The complexity of the warping depends on the type of calculation used during the original georeferencing of the image.
You can recognize GeoTIFFs with somewhat more certainty by opening them in a GIS or in certain web mapping platforms. Below, our example image has been opened in ArcGIS Pro, which correctly places the image north of Budapest, Hungary. If the file were to have dropped into the Atlantic Ocean at 0°N 0°E, that would have been a good sign that we were probably working with a vanilla TIFF. A properly formatted GeoTIFF with coordinate data will render at a specific location.
Complicating matters, a plain TIFF can also be spatially rendered by a GIS if it is accompanied by sidecar geospatial files, such as an Esri World File and .prj projection file. Alternatively, a GeoTIFF can contain only the coordinate system information but not the coordinate data, meaning that it also must be accompanied by a sidecar geospatial file such as an Esri World File in order to render correctly in a GIS. These two methods are often used over stand-alone GeoTIFFs when creators want to preserve the original ground control points used to georeference the image or where they want to perform complex georeferencing without altering the pixels in the original image.
By using command-line tools like GDAL, Python, or R, you can directly and unambiguously access the geospatial information inside GeoTIFF files. Take our 1917 Budapest example sheet above, which comes from the experimental Austro-Hungarian Map Set Data Package. Our file is “4962_002_geo.tif” from that dataset. Below, we’ve run a small bit of Python code. The code extracts summary metadata about the file’s coordinate system and calculates its four corner coordinates. We’re using the osgeo Python package, which is maintained by the Open Source Geospatial Foundation (OSGeo) and runs on GDAL (Geospatial Data Abstraction Library). This Python package is one of many options we could use to extract and view the geospatial metadata.
We can see that this item uses the common WGS84 geographic coordinate system. This makes it easy to calculate the four corner coordinates, which osgeo has done for us above. The file’s metadata doesn’t store all four coordinates themselves. Instead, it stores the upper left coordinate alongside sufficient information about the file’s scale and rotation to allow for calculation of the coordinates of every other pixel in the image. Most tools like GDAL and osgeo are designed to quickly calculate and report the corner coordinates.
At the bottom center of the image is Budapest’s northern half. If we look closely, we can see the Hungarian Parliament Building (Országház). Here we’ve selected a pixel at the approximate center of the building and calculated its coordinates (47.5083011, 19.0467043). The georeferencing of these maps is approximate and estimated to result in a maximum error of 220 meters. Our selected point is indeed about 150 meters from the true center of our target building.
Over 4,800 georeferenced map sheets from this dataset can be accessed from the full dataset at data.labs.loc.gov. This dataset was created as part of an experimental 6-month project in 2015, under the Library’s former GIS Research Fellows program. More information about the creation of the GeoTIFFs and about how to bulk download the files can be found at data.labs.loc.gov/maps.
Thank you for this well-presented, illustrated blog! Great to see, and terrific to know about the shared access to LC-produced GeoTIFF files. Meanwhile, readers may also be interested to look at one other Library of Congress source of information about the GeoTIFF format, in the Sustainability of Digital Formats resource (“fdd000279”). I’m not sure if URLs are permitted in blog comments — if not, search that site for “GeoTIFF” — and if URLs are OK, this is it: //www.loc.gov/preservation/digital/formats/fdd/fdd000279.shtml.