This is a guest post written by Amanda May, Digital Conversion Specialist in the Preservation Services Division. Her work includes recovering data from removable media in Library collections and providing consultation and services for born-digital collections data.
Of the hundreds of thousands of removable media items in the Library’s collections, the vast majority are optical disks and floppy disks. While optical disks are still in wide use and are very easy to transfer data from, floppy disks are older and finding hardware and software for recovering data is a more specialized skill. The Preservation Services Division (PSD) maintains several tools for transferring data off of floppy disks, even items that are damaged or are in uncommon formats.
Collections found on floppy disks can be as mundane as practice files for a particular coding language, found tucked in the back of a textbook. Some content creators like Rhoda Métraux (1914-2003) used their computers like typewriters, so we find text files, correspondence, or drafts of published works (//blogs.loc.gov/thesignal/2020/08/metaphors-for-understanding-born-digital-collection-access-part-iii/). Scientists like Edward N. Lorenz (1917-2008) created one-of-a-kind simulation programs demonstrating complex mathematical problems that we can view through use of emulators (//blogs.loc.gov/thesignal/2021/02/born-digital-history/). Once transferred off of the disk, we remove the reliance on aging and difficult to acquire hardware. The data can be preserved in our digital repository under the best practices for digital preservation.
PSD uses a KryoFlux to create disk images from floppy disks. This floppy controller can create an image of the magnetic fluctuations on the disk which can then be used to create a formatted disk image. Why go through this process? It is sometimes hard to tell from the disk itself what the format will be. Since floppy disks were used so widely for a very long time, the creator may have gone through several different computers and used several disk formats to store their data. While the KryoFlux can create formatted disk images directly from the disk if we know the correct format, sometimes it makes sense to create these magnetic flux images (called preservation streams) and work from them instead of the original item. This makes it possible to create multiple formatted disk images from the stream file while saving wear and tear on our equipment and the original disk. In addition, we can use the KryoFlux to recover data from damaged disks, including one from the American Folklife Center that was partially melted. The KryoFlux can also be used for multiple disk sizes, including 3.5”, 5.25”, and 8” (though the Library does not currently have an 8” floppy drive).
PSD also uses an FC5025 to image 5.25” floppy disks. This floppy controller, which only works with 5.25” floppy disks, is simple to set up and the Disk Image and Browse software can make disk images for the most common formats. In some cases, however, we have encountered disks that are not in any of the most common formats supported by the FC5025. In these cases, the KryoFlux can be used to create a preservation stream image that can then be analyzed and configured to try to recover the data. One Manuscript Division collection contained 5.25” floppy disks that conformed to the Wave Mate Bullet (https://en.wikipedia.org/wiki/Wave_Mate_Bullet) format, a fairly rare and unique find! This computer used the CP/M operating system and formatted floppy disks in a unique pattern. By analyzing the scatter plots from the stream file, we were able to figure out this puzzle and create a configuration to create a formatted disk image. Without a tool like the KryoFlux or the original computer, this data might have been lost forever.
After creating a disk image using the KryoFlux or the FC5025, we use tools like Forensic Toolkit, FTK Imager, and the BitCurator suite to explore the contents, export individual files, create reports about the contents, locate Personally Identifiable Information (PII) and other concerns, and package the data for ingest. We don’t always keep the disk images, but they are usually pretty useful to keep around during collection processing so that we don’t have to go back and use the original item again.
Because we have so many floppy disks in the collections, recovering the data from them is an essential part of born-digital preservation at the Library of Congress. We look forward to sharing some of our other born-digital preservation projects in the future.