Respect Des Bits: Archival Theory Encounters Digital Objects & Media

In The Is of the Digital Object and the is of the Artifact I explored the extent to which digital objects confound and complicate some of our conceptions of what exactly digital things are.  I’m becoming increasingly convinced that the nature of digital objects offers an important opportunity for the cultural heritage community to consider how some of our core philosophies connect with the nature of digital objects. If we step back from the representations of digital objects on the screen, and think about them as sequences of bits that exist on particular mediums I think some core archival principles have much to offer us.

Example of archaeological stratigraphy. The layers of earth offer insight into understanding & contextualizing objects in the layers much like the arrangement of an archive, and the order of bits on a medium or in a file offer context for understanding and contextualizing. Wikimedia Commons

Media/Medium as Fonds

Whatever your feelings about the imperative to Respect Des Fonds it is a corner stone of the identity and professional practice of archives. Attempting to maintain the original order in which materials were managed before being accessioned and making decisions when processing an archive with respect to the whole both suggest a kind of archeological or paleontological understanding of documents, records and objects. An Objects meaning is always to be understood in context of the objects near it and the structure it is organized in.

To what extent do the bits on a medium’s relationship to other bits on that medium represent a parallel kind of context?  The structure and organization of records and knowledge says as much about the materials as what is inside them. The layers of sediment in which something is found enables you to understand it’s relationship to other things. Context is itself a text to be read.

The Original Order of Bits

All digital objects actually exist first as analog objects, as bits encoded on a particular medium. At that level the bits that exist on a particular medium have an original order to it. At the moment of accession, there is a liner set of bits on any particular media. The hardrive, the optical disk, the 5 inch floppy, each have on them an ordered set of bits that can be copied and made sense of by various technologies, both today and in the future. This is similarly true of the file level. Setting aside the actual physical arrangement of individual bits on a disk each file is composed of a sequence of ones and zeros which come with a order. The fixity check tells us if this order has been altered.

Normalization is Interpretation

When we decide to normalize, or to copy only the representations of digital objects as represented when rendered in particular situations, we are effectively disregarding the original order of the bits on the media. If you copy over the directory structure of files we can still preserve a good bit of the user perceived order of digital context. We can see what things were next to each other in the metaphorical and iconographic folder on someone’s desk top. However, those representations are still (in a sense) translations. They are particular ways of seeing and understanding the underlying bits.

The hashes for a set of files enable us to confirm the bit level fixity of a file. The ability to confirm that the bits are in exactly the same order as they were before. Once normalized we can no longer assert the bit level authenticity of a file. File hashes from the National Software Reference Library.

Beyond this, any attempts to normalize files themselves, to derive other kinds of files is a much deeper disregard for the ideal of respecting the integrity, order and structure of digital objects. In this case, even the screen essentialist notion of the digital object is in question. Each of these moves to normalize, each of these transformations and degradations moves us one step further away from bit level fixity and authenticity, from the authenticity of the fixity check, and toward a kind of performance or restaging of the artifact. We get further and further from being able to assert that what we have is exactly what we were given. We become artists engaged in a performative interpretation or recreation of the artifact.

Recording of single magnetisations of bits on a 200MB Harddisk-platter. The order and structure of the magnetizations of bits on a disk are even strata-like. Information patters with a clear original order. Matesy GmbH

The Order and Logic of Digital Media

But why are we even talking about order? Wasn’t the entire point of the digital the end of linearity? Our experience of digital media is one of non-linearity. The first row of the database or the spread sheet is reorganized based on parameters. The web is made of a linked pages and created from a rhizomic network of connections between nodes. While the representations of digital objects often appear non-linear it is critical to not be seduced by the flickering and transitory view of digital objects provided by our screens. At the end of the day, every digital object is encoded on some medium and that encoding is an ordered sequence of bits.

Letting go of representations and embracing the bits

To try and bring this whole discussion back from theory and into practice, when recently working with a set of files from floppy disks in a collection I came across a set of files I couldn’t open. The extensions made no sense to me or anything else for that matter. I changed the extensions to .txt and opened them in a text editor. Lo and behold, they were mostly made of characters that my computer could interpret as text. I didn’t need to know what format the files were in to be able to make sense of most of their contents. I didn’t need a secret decoder ring. I could just tell my computer to pretend this particular sets of bits we call a file is all text and show me what it sees.

This isn’t just true for files with text in them. While you might not be able to play the disk image of a game, anyone can crack it open and look at the text files, various script files, texture files, audio files (in the order they exist) and understand them in context. Even the metaphorical folder names inside that disk image tell us about what is there.

Computers and software become the tools we can use to make sense of the stratigraphy of the disk, to interpret the order of bits. Imaging disks (logical or forensic) attends to that order.

6 Comments

  1. Tibaut
    June 24, 2013 at 3:10 pm

    Very good viewpoint, Trevor.
    I agree that the essential principles of “respect des fonds”, when applied in the digital world, raises questions of original bits order. And if the original bit order is the original presentation of the data, the software used to create it or display it will show its “representation” for human understanding.
    Also, as you said, while the “representation” is what is kept/copied/reproduced (if I understand correctly) during normalization processes, then the question, in my opinion that also arises, is “how close to the original representation is this new representation?”, and particularly since, as you introduced, the original presentation is so different from the new presentation of the bits.
    Another question that may come up is the trust that should be bestowed to the systems and processes that produced the representations, hence the lack ( or presence) of authenticity/reliability/integrity properties critical to records admissibility in different contexts, particularly in the evidence-based Court or pure trust or credit arena….

    I am sure the security and systems experts are taking note :-)

  2. Matthew Kirschenbaum
    June 24, 2013 at 10:05 pm

    Nice piece of work, Trevor, and one that I read sympathetically of course. I think you might do well to push this in the direction of something like ANT, given how closely intertwined the actions of human *and* machine actors are when we’re talking about the behaviors of a file system. While forensic imaging respects original order in the sense of preserving the linear bitstream, that bitstream is itself an artifact of (only ever) *logical* decision making structures, that is “expert systems” optimizing the allocation of physical space on some piece of media. As we all know, often what appears to be a single homogeneous “file” at the level of the screen is really a multiplicity of fragments dispersed across the surface (or storage matrix) of a magnetic or solid state device. (Incidentally, in the MFM imagery you show, I believe we’re looking top-down at the surface of the media, not sideways or section-wise.) Here original order, the stratigraphic topos of data, is a wholly algorithmic artifact; here human agency does not reach.

    In MECHANISMS, I diagrammed the allocation of a particular file I was interested in across the storage sectors of a floppy disk. It was a satisfying thing to have been able to achieve, but also pointless. I remain unshaken in my conviction that forensic imaging is the custodial practice most faithful to our most baseline philological and humanistic traditions. But finding the compelling use cases–the ones where all that digital context engenders some palpable revision of our knowledge of the past–remains very hard.

  3. Trevor
    June 24, 2013 at 10:22 pm

    Great points Matt. It’s much easier to make the cases for the need for logical images of disks and eschewing normalization than it is for forensic disk imaging.

    One tact that I think could work for both is the second argument for original order. The SAA spells out two reasons that original order is useful,

    “Maintaining records in original order serves two purposes. First, it preserves existing relationships and evidential significance that can be inferred from the context of the records. Second, it exploits the record creator’s mechanisms to access the records, saving the archives the work of creating new access tools.”

    Clearly, the evidential nature of the forensic disk is valuable, but beyond that, the notion of “exploiting the record creator’s mechanisms to access the records” gestures toward a cost savings argument. Together, to me, they suggest the archivists goal is to value the integrity and authenticity of materials while being as thrifty as possible.

    I’d agree on the value of ANT for thinking through agency and action in these situations. In the same vein, I’ve been becoming increasingly interested in the role that object oriented ontology and Harman’s notions of the quadruple object can play out for folks in the cultural heritage world who interpret, describe, theorize, and present objects all day.

  4. Porter
    June 25, 2013 at 4:05 pm

    Great post, Trevor. I think you point that “Context is itself a text to be read” is particularly important in this discussion. To put this in terms of praxis, bulk_extractor (a digital forensics tool used to find personally identifiable information on a disk image http://www.forensicswiki.org/wiki/Bulk_extractor) relies on context in order to identify critical information such as credit card numbers, social security numbers, email addresses, etc. For example, a disk image may have 9 digit strings of numbers throughout, so in order to identify a particular string of numbers as a social security number, bulk_extractor looks at the context in which they are found, i.e., the letters “ssn” or “social security”.

    The argument for context, then, isn’t only theoretical, but an essential element in how we interrogate these digital objects as objects, especially in the case of the tools we use in those processes.

  5. Bruce N Smith MLIS
    June 28, 2013 at 12:35 pm

    This is an intelligent and easy to read post! I agree with your points, thank you.

    I’ll be pondering the practical difficulties of virtualized environments, and the increasing use of fragmentation (as an alternative security strategy to encryption) across distributed storage devices.

  6. Andrew Jackson
    July 29, 2013 at 9:50 am

    Great post. Although I don’t think disk imaging is appropriate in all circumstances, but pulling bitstreams out of context should be done with care, recognising that it is an act of interpretation and that something has been lost. The problem is that we don’t really have mature enough tools that let us know precisely what we are discarding, so that it can be done with confidence.

    Also, a closely related post came up recently on the Open Planets Foundation blogs – http://www.openplanetsfoundation.org/blogs/2013-06-12-mia-metadata – It’s a bit further down the context-preservation spectrum than disk imaging, but the underlying issue is the same I think.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.