Top of page

Can You Digitize A Digital Object? It’s Complicated

Share this post:

Visualization of magnetic information on a Floppy Disk
Visualization of magnetic information on a Floppy Disk

And if so, why would you ever want to? About a year ago the University of Iowa Libraries Special Collections announced a rather exciting project, to digitize the data tapes from the Explorer I satellite mission. My first thought: the data on these tapes is likely digital to begin with, so there’s not really something to digitize here. They explain, the plan is to “digitize the data from the Explorer I tapes and make it freely accessible online in its original raw format, to allow researchers or any interested parties to download the full data set. “ It might seem like a minor point for a stickler for vocabulary, but that sounds like transferring or migrating data from its original storage media to new media.

To clarify, I’m not trying to be a pedant here. What they are saying is clear and it makes sense. With that said, I think there are actually some meaningful issues to unpack here about the difference between digital preservation and digitization and reading, encoding and registering digital information. Edit: See the comment from Greg Prickman below for further explanation of why the digitization of the Explorer 1 tapes, which are in fact analog reel-to-reel recordings, is indeed a case of digitization. 

Digitization involves taking digital readings of physical artifacts

In digitization, one uses some mechanism to create a bitstream, a representation of some set of features of a physical object in a sequence of ones and zeros. In this respect, digitization is always about the creation of a new digital object. The new digital object registers some features of the physical object. For example, a digital camera registers a specific range of color values and a specific but limited numbers of dots per square inch. Digital audio and video recorders capture streams of discrete numerical readings of changes in air pressure (sound) and discrete numerical readings of chroma and luminance values over time. In short, digitization involves taking readings of some set of features of an artifact. (Edit/Note: See comment from Carl Fleischhauer below for a refinement of exactly how these digitization processes work.)

Reading bits off old media is not digitization

Taking the description of the data tapes from the Explorer I mission, it sounds like this particular project is migrating data. That would mean reading the sequence of bits off their original media and then make them accessible. On one level it makes sense to call this digitization, the results are digital and the general objective of digitization projects is to make materials more broadly accessible. Moving the bits off their original media and into an online networked environment feels the same, but it has some important differences. If we have access to the raw data from those tapes we are not accessing some kind of digital surrogate, or some representation of features of the data, we would actually be working with the original. The alographic nature of digital objects, means working with a bit for bit copy of the data is exactly the same as working with the bits encoded on their original media. With this noted, perhaps most interestingly, there are times when one does want to actually digitize a digital object.

When we do digitize digital objects

In most contexts of working with digital records and media for long term preservation, one uses hardware and software to get access to and acquire the bitstream encoded on the storage media. With that said, there are particular cases where you don’t want to do that. In cases where parts of the storage media are illegible, or where there are issues with getting the software in a particular storage device to read the bits off the media there are approaches that bypass a storage devices interpretation of it’s own bits and instead resort to registering readings of the storage media itself. For example, a tool like Kryoflux can create a disk image of a floppy disk that is considerably larger in file size than the actual contents of the disk. In this case, the tool is actually digitizing the contents of a floppy disk. It stops treating the bits on the disk as digital information and shifts to record readings of the magnetic flux transition timing on the media itself. The result is a new digital object, one from which you can then work to interpret or reconstruct the original bitstream from the recordings of the physical traces of those bits you have digitized.

So when is and isn’t it digitization?

So, it’s digitization whenever you take digital readings of features of a physical artifact. If you have a bit for bit copy of something, you have migrated or transferred the bitstreams to new media but you haven’t digitized them. With that said, there are indeed times when you want to take digital readings of features of the actual analog media on which a set of digital objects are encoded. That is a situation in which you would be digitizing a set of features of the analog media on which digital objects reside. What do you think? Is this helpful clarification? Do you agree with how I’ve hashed this out?

Comments (6)

  1. Thanks for the insightful splitting of semantic hairs, Trevor, but while we are at it, let’s split a few more. You write, “For example, a digital camera registers a specific range of color values and a specific but limited numbers of dots per square inch.” That’s probably OK for a blog, but it slights the intricacies of our digital cameras today, most of which collect what is called _sensor data_ via a Bayer color filter array. This sensor data is then post-processed, by the camera or by the photographer, at which point you might say that _color_ and _pixels_ really come into view (block that metaphor). The conversion of sensor data into, um, pictures, is a multi-step process. You also wrote, “Digital audio and video recorders capture streams of discrete numerical readings of changes in air pressure (sound) and discrete numerical readings of chroma and luminance values over time.” But that is not what is at stake in digitization, in the “digital readings of physical artifacts” that is your topic for that subsection of the blog. Generally speaking, for audio and video, we are not making new recordings. Rather, we are often working from changes in magnetic data on a tape, flux (as it were) that represents the changes in voltage and frequency in the previously recorded analog signal. This magnetic data is then sampled and quantized into a stream of bits. The conversion of changes in air pressure and the incoming chroma and luminance data into electrical signals occurred far earlier in the life of the content item at hand. And I’ll bet as soon as I post this, further hair-splitting (and error correction!) will occur.

  2. Thought a few precisions about the term “digital object” might be of interest. The concept of a “digital object” was first used by Corporation for National Research Initiatives (CNRI) as part of its seminal work on digital libraries starting in the late 1980s; and a digital object was defined specifically as: “a set of bits, or a set of sequences of bits, having an associated unique persistent identifier.” For an early discussion of this work, see “A Framework for Distributed Digital Object Services,” http://www.cnri.reston.va.us/cstr/arch/k-w.html. CNRI has now deployed various components of what became known as the Digital Object Architecture (“Overview of the Digital Object Architecture” available at http://www.cnri.reston.va.us/papers/OverviewDigitalObjectArchitecture.pdf).

    A new international standard, ITU-T X.1255 Recommendation: “Framework for discovery of identity management information,” was approved at an International Telecommunication Union (ITU) meeting in Geneva (ITU-T Study Group 17 (Security)) on September 4, 2013. The Recommendation [available
    free of charge at http://www.itu.int/rec/T-REC-X.1255-201309-I; ITU announcement: http://newslog.itu.int/archives/137%5D is based largely on CNRI’s Digital Object Architecture; and Robert E. Kahn, CNRI’s President, served as Editor. While the Recommendation is focused specifically on identity management information, it is applicable more generally to many different types of information in digital form and structured as a digital object, or, more abstractly, a digital entity.

  3. The data on the Explorer 1 tapes are audio tones captured at specific frequencies, recorded on analog tape machines to reel-to-reel tapes. Analyzing the tones after applying various filters allows the data to be studied (this is of course just a crude description of the process). All of this took place in the very analog world of 1958. The process we undertook was a true digitization process, converting the analog tones to digital representations–uncompressed WAV files. Look for the results, with accompanying website featuring other digitized material (photos, documents, etc.) towards the end of 2014.

  4. Dear Greg Prickman, Thanks for your clarifying comment. This was my take–that this is a true digitization process to digitize from analog tapes and create a machine actionable digital file format. In my work at the University of Florida and around the region and with international groups, the terminology for the conversion of legacy data on tape, printed, microform, and other formats is normally referred to as digitizing and sometimes as databasing (in conversations and publications, science museums seem to frequently enough refer to the conversion of printed records and hand written entries into digital form through OCR or through people doing data entry, especially when that entry is done into a tabular or other machine-actionable format, as databasing).

    With confusions on processes and needs, I’m very excited to hear more about this project with this as another example and case study that can help show people how and what’s being done, and to help with the terminology. Congrats on the very exciting work!

  5. Greg, Many thanks for the clarifications! I’ve added an edit to the post and linked to your comment. I likely read too much into the statement about “original raw format.” I’ve added a link to your comment up toward the top of the post as a clarification. Echoing Laurie’s comment, it is a fascinating project and if you are interested I’d be thrilled to interview you about it for The Signal when it’s ready to launch.

    Carl, thanks for the refinements on how exactly digital cameras work in these cases and on how the lower level signals are read straight off the media in A/V digitization. I’ve added a direct link to your comment from the spot where it’s relevant in the post

  6. This post piqued my interest about exactly what types of objects these data tapes are. The explanation by Greg Prickman, special collections librarian at the University of Iowa, lends clarity to the announcement.

    But it was never likely that the data tapes were digital for several reasons. The Jet Propulsion Laboratory that launched the rocket that put Explorer into orbit did not use digital computers, although it had access to analog computers for simulations. There were very few digital computers at universities in the 1950s and digital audiotape had not been invented yet. Although there were some analog computers like this one at the University of Iowa [http://digital.lib.uiowa.edu/cdm/ref/collection/ictcs/id/8559], James Van Allen’s group in Iowa did not use computers at the time of the Explorer launch in 1958. Starting with the Explorer IV mission later that year, teams of students manually recorded thousands of data points. See the comment of Annabelle Hudmon who supervised the team at [http://www.news.uiowa.edu/van-allen/tributes.html]

    Some digital computers from that era did use magnetic tape, but there certainly would not have been a digital computer on board during that period because the magnetic tape units alone weighed hundreds of pounds. The early Explorer satellites were small, about 30 pounds, with the instruments weighing slightly more than half of that. Explorer I also did not have a tape recorder; it had a transmitter that emitted signals picked up by listening stations around the world. The next successful U.S. satellite mission, Explorer III, did carry a miniature tape recorder designed by University of Iowa graduate student George Ludwig.

    Because the distinctions between digital files of computer data, audio and video do not seem to matter so much in our current time, it can be difficult to remember that digital versions of each were initially produced decades apart. And, even after they were first manufactured, they were not necessarily taken up by broad swaths of society.

    The timeline of analog/digital magnetic tape is approximately this:

    Audio
    Reel-to-reel analog magnetic audiotape was invented in the 1930s. Analog audio cassettes were invented in the 1960s but were only widely adopted during the 1970s. In a twist, during the 1970s some desktop computers used analog audio cassettes to store computer programs by recording the sound of a modem thereby converting digital information to analog. Digital audiotape did not appear until the late 1960s and was not widely used for some time.

    Computer data
    Early computers used punch cards, paper tape or magnetic drums for storage; versions of these remained in use until the 1970s and 1980s. In 1952 the IBM 729 Magnetic Tape Unit came along with its one-half inch, 2400-foot reels of magnetic tape. These iconic machines used the whirring reel-to-reel tapes that are familiar from 1950s-1970s popular culture.
    [http://www.columbia.edu/cu/computinghistory/701-tape.html]

    Video
    At the beginning of the television era and for some time thereafter, there was no videotape. The only way to record a television broadcast was to film the images on the screen with a motion picture camera. The first practical analog videotapes were invented in 1956. Television networks used them for time delay across time zones. Digital videotape didn’t appear until 1986 and was adopted only in professional studios. Even laserdiscs were analog at the outset (although they had digital audio). By 1996, a “prosumer” digital video format (mini-dv) was available.

    A comparison of the storage capacity of formats:
    Early magnetic computer tape: IBM 729 7 track one-half inch wide, 2400 foot reel = 3 mb
    Analog videotape (standard definition) 1 second = 40 mb
    35 mm color motion picture film 1 second = 1 GB
    DVD = 4.7 GB

Add a Comment

Your email address will not be published. Required fields are marked *