Top of page

Kilobytes of Cultural Heritage: Preserving Collections on Floppy Disks

Share this post:

This is a guest post written by Amanda May, Digital Conversion Specialist in the Preservation Services Division. Her work includes recovering data from removable media in Library collections and providing consultation and services for born-digital collections data.

Of the hundreds of thousands of removable media items in the Library’s collections, the vast majority are optical disks and floppy disks. While optical disks are still in wide use and are very easy to transfer data from, floppy disks are older and finding hardware and software for recovering data is a more specialized skill. The Preservation Services Division (PSD) maintains several tools for transferring data off of floppy disks, even items that are damaged or are in uncommon formats.

Collections found on floppy disks can be as mundane as practice files for a particular coding language, found tucked in the back of a textbook. Some content creators like Rhoda Métraux (1914-2003) used their computers like typewriters, so we find text files, correspondence, or drafts of published works (https://blogs.loc.gov/thesignal/2020/08/metaphors-for-understanding-born-digital-collection-access-part-iii/). Scientists like Edward N. Lorenz (1917-2008) created one-of-a-kind simulation programs demonstrating complex mathematical problems that we can view through use of emulators (https://blogs.loc.gov/thesignal/2021/02/born-digital-history/). Once transferred off of the disk, we remove the reliance on aging and difficult to acquire hardware. The data can be preserved in our digital repository under the best practices for digital preservation.

Figure 1 3.5" floppy disks in a typical archival collection. Photo Credit: Amanda May
Figure 1 3.5″ floppy disks in a typical archival collection. Photo Credit: Amanda May
Figure 2 The KryoFlux (in a red case, right) at work. Photo Credit: Amanda May
Figure 2 The KryoFlux (in a red case, right) at work. Photo Credit: Amanda May

PSD uses a KryoFlux to create disk images from floppy disks. This floppy controller can create an image of the magnetic fluctuations on the disk which can then be used to create a formatted disk image. Why go through this process? It is sometimes hard to tell from the disk itself what the format will be. Since floppy disks were used so widely for a very long time, the creator may have gone through several different computers and used several disk formats to store their data. While the KryoFlux can create formatted disk images directly from the disk if we know the correct format, sometimes it makes sense to create these magnetic flux images (called preservation streams) and work from them instead of the original item. This makes it possible to create multiple formatted disk images from the stream file while saving wear and tear on our equipment and the original disk. In addition, we can use the KryoFlux to recover data from damaged disks, including one from the American Folklife Center that was partially melted. The KryoFlux can also be used for multiple disk sizes, including 3.5”, 5.25”, and 8” (though the Library does not currently have an 8” floppy drive).

Figure 3 KryoFlux GUI display from an undamaged disk. Photo Credit: Amanda May
Figure 3 KryoFlux GUI display from an undamaged disk. Photo Credit: Amanda May

 

Figure 4 KryoFlux GUI display from a badly damaged disk. Photo Credit: Amanda May
Figure 4 KryoFlux GUI display from a badly damaged disk. Photo Credit: Amanda May

PSD also uses an FC5025 to image 5.25” floppy disks. This floppy controller, which only works with 5.25” floppy disks, is simple to set up and the Disk Image and Browse software can make disk images for the most common formats. In some cases, however, we have encountered disks that are not in any of the most common formats supported by the FC5025. In these cases, the KryoFlux can be used to create a preservation stream image that can then be analyzed and configured to try to recover the data. One Manuscript Division collection contained 5.25” floppy disks that conformed to the Wave Mate Bullet (https://en.wikipedia.org/wiki/Wave_Mate_Bullet) format, a fairly rare and unique find! This computer used the CP/M operating system and formatted floppy disks in a unique pattern. By analyzing the scatter plots from the stream file, we were able to figure out this puzzle and create a configuration to create a formatted disk image. Without a tool like the KryoFlux or the original computer, this data might have been lost forever.

Figure 5 The FC5025 (small green card, bottom left) at work. Photo Credit: Amanda May
Figure 5 The FC5025 (small green card, bottom left) at work. Photo Credit: Amanda May

After creating a disk image using the KryoFlux or the FC5025, we use tools like Forensic Toolkit, FTK Imager, and the BitCurator suite to explore the contents, export individual files, create reports about the contents, locate Personally Identifiable Information (PII) and other concerns, and package the data for ingest. We don’t always keep the disk images, but they are usually pretty useful to keep around during collection processing so that we don’t have to go back and use the original item again.

Because we have so many floppy disks in the collections, recovering the data from them is an essential part of born-digital preservation at the Library of Congress. We look forward to sharing some of our other born-digital preservation projects in the future.

Comments (3)

  1. Thank you for this interesting article. As an owner of a large collection of PC, Macintosh, Commodore and Amiga disks, a question from me would be: after the content has been dumped in KryoFlux (or Greaseweazle) raw format, how do you recommend to file this information, catalog it, index? Is there a prescribed format for storing all that metadata (including images of the packaging, labels, etc.). Also, in addition to storing privately, I would love to submit the raw images somewhere for posteriority, is there a public service or database where I could upload dumped images? I know of archive.com so that would be a place to start, but is there anything specific to raw flux dumps? In general I’m interested in all the steps taken after the raw image has been obtained.

    • Filing, cataloguing, and indexing information is largely dependent on your needs and what might best suit the collection, but we can offer insight into the standards we use at the Library. We typically make a scan of the born-digital object being preserved on a flatbed scanner and save it as a TIFF. We use TIFF because although it is a larger file format that takes up more space, it stores a more detailed, high-quality image, which is preferable for a preservation copy. Occasionally we also make a TIFF scan of the object’s case or accompanying material (booklets, handwritten notes, etc) if it seems relevant to the collection, is unique, or provides context. We then organize our collection media into a “bag” that consists of the scanned object’s TIFF file, the raw image file (.img) created by the Kryoflux or FC5025, and the file system extracted from that .img file by FTK Imager. You can use whatever naming convention works best for you but typically something descriptive is best; if it is an individual’s personal collection, for example, something like “Smith_memoirs_001,” with “001” representing an individual CD or floppy disk in the collection, would work well. At the Library, we typically use the object’s barcode as our standard naming convention, so a bag often looks like this:

      00104483905_001.tiff
      00104483905.img
      Files
            Extracted file system in various formats (PDFs, .doc, .VOD, etc)

      The Library is fortunate to have internal long-term storage facilities to keep and preserve our data. As such, I am not familiar with any public service or database system an individual could upload data to. An external hard drive might be the best place to store collection materials once they have been scanned, ripped, and assembled. The majority of our work with born-digital media items is stored on external hard drives while we are processing them, and they temporarily remain on these external HDs before we can ingest them into our storage space. The key is to make multiple copies of your imaged collections so that if one format fails, you will not have lost all materials (and subsequently, all of your hard work).
      -Answer courtesy of Digital Conservation Specialist Hannah Noel

  2. Hi Amelia and Hannah,

    This is very helpful and actually confirms some of my uneducated guesses as to how such archives are maintained. Thank you so much for responding and I will keep trying to identify public archives which I could enrich with the images of floppies — in addition to storing them locally on my hard drives.

    With regards and thanks,
    Adam Podstawczynski

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.