What Do you Mean by Archive? Genres of Usage for Digital Preservers

One of the tricks to working in an interdisciplinary field like digital preservation is that all too often we can be using the same terms but not actually talking about the same things. In my opinion, the most fraught term in digital preservation discussions is “archive.” At this point, it has come to mean a lot of different things in different contexts. It can mean so many different things that some in digital preservation are reluctant to use the term writ large.  I wanted to spend a few moments putting text on a URL that anyone can reference from here on out when they need to try and parse and disambiguate what we mean by archive. For a some related reading, I’d suggest checking out Kate Theimer’s Archives in Context and as Context and the role of “the professional discipline” in archives and digital archives.

I’d stress here that I’m not really interested in telling people what is and isn’t an archive. Instead, I’m interested in 1) helping people ensure that they aren’t talking past each other and 2) briefly starting to suss out the resonances between these different usages. I would love to hear more perspectives on usage of the term and resonances between those uses in the comments. In many different contexts the term archive carries with it significant weight, the term often brings with it notions of longevity, safe keeping, order and concerns with authenticity, it’s about items or records that hang together for good reason. To varying extents, across each of the uses I articulate here I think we see these points surface. My objective here is not to exhaustively describe any of these ways the word is used, but just too briefly gesture toward different usages. I should stress that this is how I sort out some of the different usages of the terms. I invite readers to suggest additional and or different usages and comment these below the post.

Archive as in Records Management

Manuscript Division stacks with acid-free containers. Manuscript Division Slide Collection

Manuscript Division stacks with acid-free containers. Manuscript Division Slide Collection

In an organizational context, an archives is often the place in the organization that is required to retain and organize records of the organization. So a radio station, or a hospital, or a financial services company needs to keep around copies of records of its operation for a range of reasons (litigation, tax purposes, posterity, compliance with regulations, etc.). In this case, the archive serves the purpose of organizing, maintaining records and materials for use by the organization. In this case, a big part of the work of an archive is to make sure they are keeping around only what is deemed to be useful for particular future use cases.

Archive as in “The Papers of So and So”

One of the specific senses that archivists will use the term archives is to describe a particular kind of collection. Effectively, an archive is a kind of collection of materials that hang together for a very particular reason. An archive is either the papers of some particular person or the papers or records of a particular organization. What makes it an archive is the fact that the items and records in the collection represent “fonds” a particular name for a collection that are the result of the ongoing work of the individual or organization. The words “natural” and “organic” generally come into play here, the idea being that the archive is a collection of items and records that exist as a whole. To contrast with this, an archivist might refer to a collection of rare books pulled together by a collector over time an “artificial” collection. Artificial in this case is not to say that it’s “bad” just that the collection was assembled as a set of materials after the fact.

Archive as in “Right Click -> add to Archive”

Example of archive used in web mail.

Example of archive used in web mail.

For most people, the most common usage of the term archive is likely from a context menu in computing. In many operating systems you can simply right click on some icon for a file and click “add to archive” or “create archive.” In these cases, borrowing on a legacy of usage of the term more generally in computing, this ends up meaning stick it into some kind of compressed container file. In this vein, the term archive is largely tied to the idea of “back-up.” Effectively, the archived copy of these files is slightly more difficult to get to but right at your fingertips nonetheless.

Usage of the term in web applications, like web email clients, is very similar. In the case of many web mail systems the archive is simply all of your emails that you haven’t deleted and are not in your inbox. In the logic of “piling vs. filing” this actually makes sense. In the past, you might have organized your correspondence and bills in a particular and structured fashion, keeping only what you needed for the future and deliberately putting it where it would be easy to find in the future. That filing process for managing records is much more inline with what archivists mean by archive. As email has shifted further and further toward something that people expect to be able to simply do full text search against the term archive has come along with it, but the fact that folks now generally just let it pile up in one big thing called “archive” that they search against is very different from the deliberate organized thing that archivists are generally talking about.

Computer data storage in a modern office building, taken during the 1980s, Photographs in the Carol M. Highsmith Archive, Library of Congress, Prints and Photographs Division.

Archive as in “Tape Archive”

When IT people use the term archive they are generally talking about a piece of hardware. At the start of each of the Library of Congress storage architecture meetings we generally need to begin with this vocabulary discussion. As an example, many large organizations use a HSM, a hierarchical storage management system, that maintains different tiers of storage that have distinct performance requirements. At this point, the top level might be a relatively small amount of expensive but fast flash memory, below that might be a larger pool of spinning disk storage, below that you would likely find something called the “archive” layer. In this case, archive means tape archive. Magnetic tape remains the cheapest medium (you can store a lot more data on tape for a lower cost than disc) but it is significantly less responsive. So it is going to take you time to get the information back from tape. So within the design of a storage system, the stuff you need to keep around but don’t need to access that often, or your back up copies etc. ends up on the biggest but cheapest tier of your storage system.

The definition here relies on a long history of using the term archive as a synonym for magnetic tape storage systems. The file format .tar, a way to package data for storage, itself stands for “tape archive.” This use of the term archive goes back to 1940s computer systems architecture. In the original context it referenced online vs. offline storage. The reels of tape were quite literally “off line,”  the reel had to be located and mounted before data became accessible in contrast to things like a magnetic core at the time, and later random access memory.

Archive in “Web Archive”

Wendy's Blog: Legal Tags

Wendy’s Blog: Legal Tags, Legal Blawgs Web Archive, Law Library of Congress

Many organizations are now in the business of harvesting content from the web for long term access and preservation. In these cases, tools like Heritrix, an open source web webcrawler, are sent out to grab all of the rendered content of a webpage they can get ahold of  and, within defined parameters, the other pages that link to it and all their associated files. As part of this collection process, the tools log information about the date and time that the data was collected. At this point, tools store that content in WARC files, or Web Archive files, which can then be played back via tools like the Wayback machine. So there is a lot of information in here that can be used to assert the authenticity of the data, how a particular URL presented itself to Heritrix and how Heritrix interpreted it at a particular moment in time. With that said, it’s much more in keeping with the computing usage of archive as a back-up copy of information then the disciplinary perspective of archives.

Archive as in “Digital Archive”

At this point, there are a lot of digital collections that are using the term archive that don’t necessarily square with how archivists have been using the term. For instance, the September 11th Digital Archive, the Bracero Archive the The Shelley-Godwin Archive are good exemplars of some of the diversity of this usage. In each case, an effort was undertaken to bring collect or bring together related materials. The September 11th digital archive is a crowdsourced collection of materials related to the attacks, the Bracero Archive is a digitized collection of oral history interviews with individuals involved in the Bracero guest worker program, and The Shelley-Godwin Archive brings together digitized copies of primary manuscript sources related to a particular family. The origin of this usage is anchored in Jerome McGann’s work on the Rossetti Archive, which McGann had developed grounded in a theoretical perspective of the potential that hypermedia brought to allow for the creation of new kinds of archives. Alongside this usage, digital archive has also be used as a term to refer to born digital materials processed as part of a more traditional notion of an archive. In this case, see usage of “the born digital archives of Salman Rushdie.”

Some archives purists might call all of these “artificial” collections. I however wouldn’t. I don’t think this is so much about the computing terminology invading the space, but instead another tradition in which systematically collected materials have been called archives within cultural heritage organizations. Folklife archives, for example the American Folklife Center Archive, at the Library of Congress, have long worked to acquire ethnographic field collection’s for the archive.  In these cases, folklorists have gone out and made field recordings and then worked with archivists to organized them for access. With this said, its valuable to recognize that generally the term digital archive carries this language and meaning as opposed to the canonical repository for the “papers of so and so” or the records management terminology. That is, digital archives hang together as “a conscious weaving together of different representational media.” For another take on the idea of digital archives see Kate Theimer’s recent presentation at the American Historical Association’s annual meeting,  A Distinction worth Exploring: “Archives” and “Digital Historical Representations.”

Notions and Considerations of “The Archive”

The last category I am including here is about theorizing “the Archive.” A broad range of work in literary and media theory focuses attention on “the Archive.” Here I am thinking of Foucault’s notion of “the Archive” in The Archeology of Knowledge,” Derrida’s perspective in Archive Fever, and Kittler and Wolfgang Ernst’s notions of archives in Media Archeology. For the most part, this body of work is less about what goes on in an individual archives and is more about the role of “the archive” in society writ large or the idea of “the archive” as traces of the past in objects. For example, for Foucault, “the Archive” is not so much an individual set of materials but a term for the entirety of historical records/evidence that exists to work from. These theoretical takes on “the archive” can be frustrating to many archivists, as much of this work does not engaged with the professional practices of archives or with “archival theory,” the body of scholarship which archivists themselves have been building through ongoing practice and research since at least the French revolution.

At the institutional level, discussions of “the archive” are broadly useful for reflecting on the social roles that archives play in culture. Further, a considerable amount of this work in the Media Archeology and Media Theory traditions focus on processes of inscription and embedded logic of different media (optical media, gramaphones, databases, the MP3 format, etc) which are increasingly important genres of artifacts and records that archives are themselves tasked accessioning. Kirshenbaum’s Mechanisms: New Media and the Forensic Imagination is itself an invaluable exemplar of how work from these media theory traditions can combine with archival theory to produce scholarship that directly informs the development of tools and practices for practicing archivists. Again, these  broad and interdisciplinary conversation about archives can be quite useful to both those working in and outside archives.

So, are there other definitions I’m missing? Have I got any of the lineage wrong on this? I’d love to continue this discussion in the comments.

Thanks to Matthew Kirshenbaum, Nicki Saylor, and Kate Theimer for comments and suggestions for improvements to this post.

15 Comments

  1. Dean C. Rowan
    February 27, 2014 at 12:05 pm

    It’s always worth reviewing OED’s take on a term like “archive,” including its etymology. There is a distinct sense of the official to the term, as ἀρχή means “government.” Clearly, the several meanings you have set out above are abstracting from that sense, using it figuratively, which is how a lexicon evolves. I’d argue that the theorized Archive goes a step further. Foucault, Derrida, et al. want to impose on the term a sense of sinister agency, of an official mechanism of control. In this respect, the Archive is the flip-side of the Library, which is often rendered in spiritual terms. Libraries are “sacred,” “hushed,” transcendent, quiet places for contemplation, etc. I’m not especially fond of these murky ascriptions.

    Another work worth examining is Cornelia Vismann’s Files , which focuses on the contents of archives in the records management sense.

  2. Bill LeFurgy
    February 27, 2014 at 12:19 pm

    This is a good overview, and illustrates that “archive” and it’s pluralized variant have irrevocably escaped the semantic confines of their specialized origins.

    Some of us do remember a time when the terms were used with more precision, and still might quibble (in total futility, no doubt) with how they are applied today. I personally see “records management” as something quite differ than “archives,” most especially in the context mentioned above. Organizations have records management programs to “serve the purpose of organizing, maintaining records and materials for use by the organization.” Organizations have archival programs if they keep some of those records for their enduring (i.e. permanent) value. The key difference is that records management is focused on disposition: keeping records for some length of time (often quite briefly) and then disposing of them. Records management and archives are clearly linked, but they are not the same, to my mind.

    Now, in possible contradiction of myself, let me also say that I endorse the more recent usage of the term “personal archiving” and “personal archives” in reference to the material, often digital, that individuals create and maintain about themselves and their families. Some of this material might not be kept permanently (whether accidentally or by design) but people are now in possession of sizable personal collections that have important current and future personal value. Awareness is starting to dawn on many about this value, and that they probably, at some point, need to do something with their material. As you say, “archive” conveys notions of “longevity, safe keeping and order,” and I think these are the right concepts that people need to consider in connection with their personal digital material.

    Now, actually applying those concepts is a tricky business, but “personal archiving” provides the right motivation. Besides, I’m not sure there is a better term. Something like “personal information management” might be more descriptive, but it’s a snooze in terms of impact.

  3. Carl Fleischhauer
    February 27, 2014 at 12:45 pm

    Thanks as always for a helpful exploration of terms and usage. Your report also reminds us of the limits to surgical precision in speaking and writing — even when we wish to be. In your case, this is demonstrated by your apt use of _ostensive_ rather than “dictionary” definitions for the term _archive_ in its many contexts.

    As I read, my thoughts drifted off into one of the next layers (beyond your immediate topic), puzzling over how we find meaning in the contents of an, um, archive(s). And this made me recall the wonderful insider term _diplomatics_. As the redoubtable Wikipedia tells us: ” . . . a scholarly discipline centred on the critical analysis of documents – particularly, but not exclusively, historical documents . . . focuses on the conventions, protocols and formulae that have been used by document creators, and uses these to increase understanding of the processes of document creation, of information transmission, and of the relationships between the facts which the documents purport to record and reality.” (http://en.wikipedia.org/wiki/Diplomatics). If we are to study the entities found in, well, a set or batch of entities (“archive”), and if we wish to get at the relationships between the stated facts and reality (sometimes we don’t!), we will depend upon explanations of what sort of archive this is, and upon its organization. Those of us who work in libraries and (yes) archives are supremely aware of this need and this naturally prejudices us toward certain uses of the term.

  4. Web Webster
    February 27, 2014 at 6:08 pm

    The word “GENRE”

    Oh, but I see how this will do wonders for precision and recall, how it will enhance rankings for the word “archival”, and certainly provide more false drops and eventually, lead to calls to the support desk of asking, “Why am I getting the wrong information?”

    Not so much a misnomer, but more of how a word such as GENRE once meant “mongrel”, then meant offspring of a tame sow and wild boar, then child of a freeman and slave, then something such as cross-breeding–to a contemporary gas electric combo Prius.

    It will do wonders for voice input with a string of NOT s

    Web

  5. Stevan Lockhart
    February 28, 2014 at 6:12 am

    It was ever thus. The term “database” struggles similarly. Some people talk of a database of information which others would describe as a dataset. Some refer to the underlying technical system, others to a data management application and so on.
    In the digital era, the term archive is similarly conflated to some as a web application, not the process of selection, storage and curation that may be implied.
    The additional difficulty pointed out at the beginning of the piece is differencing expectations of terminology by practitioners and, for want of a better term, their non-specialist management who may promote the significance of an evident outcome while misunderstanding the role of design and process underlying that outcome. In this sense, trust among the parties communicating is especially important.

  6. Eldin Rammell
    February 28, 2014 at 11:04 am

    There is also a nuance on “Archive as in Records Management” that is not covered in your otherwise excellent summary. In many organizations there exists “records centers” and “archives”. Yoru description of archives as “the place in the organization that is required to retain and organize records of the organization” could equally apply to both. The distinguishing factors are that (a) archives are generally for records that are inactive and (b) archives are under the control of an archivist. A record or collection of records are typically organized in records centers, perhaps under special security conditions and environmental controls, but these are often not considered archives if the records are active or semi-active. Once a matter is closed and the records are transferred to “deep storage”, this is “the archive”. On my second point, archives in commercial organizations often have an individual identified as “the archivist”…. this is often even a regulatory requirement (e.g. OECD Principles of Good Laboratory Practice). The identification of an archivist role would thus also be helpful in distibguishing archives from records centers.

  7. Jason Cooper
    February 28, 2014 at 12:56 pm

    Archives and records management are clearly related, but not in all contexts. They are particularly intertwined in the corporate and governmental worlds, when materials move from a state of active use, through a period of less active use. At some point, some records are recognized to have enduring value beyond the intentions of their original creation. The records manager transfers these records to the archives (though in some cases these people are in fact one person) where they are properly described, filed, and preserved. This is how the records of the US Government are treated.

    From your examples, records management deals with those records that exist for litigation, tax purposes, and compliance with regulations, while an archives holds them for posterity after the business needs have been met. Interestingly, the use of archive in the email example follows a similar vein of thinking – messages go from “I’m using that” to “This has some value, so I’m going to keep it somewhere else.”

  8. Maarja Krusten
    February 28, 2014 at 4:30 pm

    Excellent summary! Have bookmarked it for future reference. Thanks also for the reference to Kate Theimer, guru to many archivists.

    One small piece of supplemental information on the side. Within the federal government,the venue with which I am most familiar, some records with value for “posterity” are used not only by the “business owners” but by federal historians doing research to support employees of all ranks in an agency or department. Research may provide information for testimony statements, policy making, etc.

    Such people are knowledge accountable officers. In agencies that have no historian, a records officer sometimes handles some of the records-search function, without the full scope of historian duties.

    The records may be held by the agency mission or mission support units or elsewhere in the agency (such as a records room or library annex) or at a Federal records center. In theory, some of the records that a federal historian and business owners use (for different purposes) while active at some later point change in designated status (at least) to “inactive.” Such permanently valuable records then are become a part of the collections held for the American people in the U.S. National Archives.

    At that point, the change in legal title affects the means of external access. It changes from the agency Freedom of Information Act request handling process to disclosure determination by the National Archives.

  9. John Rees
    February 28, 2014 at 4:57 pm

    In terms of archival descriptive practices, and theory as I was taught in Archives 101 long ago, your phrase “‘artificial’ collection” is redundant.

    In archival practice a ‘collection’ is naturally ‘artificial’ just as you describe, e.g. the “John Doe Collection of Louisiana Farm Workers Ethnographic Field Recordings.” One should not use the phrase ‘the collection of John Doe Papers” or “John Doe’s collection over there at the university.”

    But perhaps mine is an artifact of one person’s teaching. SAA’s Glossary of Archival Terminology conflates the phrase as well: http://www2.archivists.org/glossary/terms/c/collection

    DACS rule 2.2.18 similarly distinguishes between describing ‘creators’ and ‘collectors’ but does not delve too far into the semantics or metaphysics of the terms.

  10. Helen Halmay
    February 28, 2014 at 5:10 pm

    This is a very interesting, important topic. If you-all ever agree on a definition of “archive,” I’d like to hear about it, and publish it in my newsletter (with your permission and attribution to you and the Library of Congress, of course). NOTE: you wrote: “…to try and parse and disambiguate what we mean by archive.” It should be: “…to try to parse and … etc.” Just keeping you on your toes. Helen Halmay, Editor – Adelante, member newsletter for the Congress of History of San Diego and Imperial Counties, California

  11. Greg Bak
    March 3, 2014 at 8:35 am

    Great post! It is nice to have this all pulled together.

    Here is another one for you, from OAIS.

    Archive: An organization that intends to preserve information for access and use by a Designated Community.

    This definition is consistent with the rest of OAIS: focussed on access and use, always relative to the needs of a designated community, and emphasizing the social and operational dynamics of the archival organization rather than the technology used for preservation or delivery.

  12. Michael Winter
    March 13, 2014 at 7:55 pm

    It only adds to the confusion that this blog post and the series of replies it occasioned so well clarifies, that “archive(s)” for a very long time has been used as part of the titles of a significant number of scholarly journals, e.g. Archives of Psychiatry and Psychotherapy, Archives of Public Health, Archives of Sexual Behavior, and many others. The examples listed here, by the way, are all for current periodicals still using these titles, that are in no way archives in any sense that most of us would recognize.

  13. Christie Peterson
    March 19, 2014 at 1:41 pm

    Some of my colleagues here at Johns Hopkins have adopted the term “archive” to refer to the layer in a data management stack that manages fixity and integrity. See https://www.youtube.com/watch?v=F6iYXNvCRO4&feature=plcp for a full explanation. While this use of “archive” may raise my hackles a bit as an archivist, it makes for very useful shorthand during discussions about digital preservation.

  14. Katherine D. Harris
    September 17, 2014 at 1:35 pm

    In less detail (due to word count constraints), I’ve rehearsed some of these debates for my entry in the Johns Hopkins Guide to Digital Media (JHUP 2014). There was a huge kerfuffle about archive being used in literary studies and the imposition of “database” in 2007-2009 (PMLA, Digital Humanities Quarterly). And, there have been quite a few disagreements among professionals about the use/abuse/colonization of “archive” by literary studies and Digital Humanists.

    In an attempt to avoid replicating my entry for the JHGDM, I’ve crafted a further response to this idea of the archive for another project, but the keyword entry really is an extension of my thoughts from the JHG. (Due to copyright restrictions, I can’t post the JHG entry online, though.) I welcome comments on that rough draft of “archive”: http://culturedigitally.org/2014/09/archive-draft-digitalkeywords/

  15. Irene M.
    October 15, 2014 at 3:50 am

    Very useful information. Just a thought, would a digital repository like an Institutional Repository in any way be regarded as an archive of sorts? In the sense that it acts as long term access point for information?

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.