The PDF’s Place in a History of Paper Knowledge: An Interview with Lisa Gitelman

 Lisa Gitelman, Professor of English and of Media, Culture, and Communication at NYU.

Lisa Gitelman, Professor of English and of Media, Culture, and Communication at NYU.

What is the document as a format and a medium? Even apart from any divide between analog and digital, the document is itself a form that has its own history, one that has long been tied up in ideas about reproduction. In Paper Knowledge: Toward a Media History of Documents, Lisa Gitelman explores the history of different kinds of documents (from printed bills, notes, receipts and tickets to today’s PDFs) and the technologies used to produce and reproduce them: letterpress, mimeograph, xerography, optical scans.

In this installment of the NDSA Insights Interview series I am thrilled to talk with her about the book, in particular, about what her book has to say to those working to ensure long-term access to digital information that is often encoded in PDFs. For the purposes of this audience, our questions focus most directly on ideas about the PDF in the book.

This ticket is an example of the range of things involved in the history of documents. The Pullman Company ticket, Jersey City to Ithaca, New York. May 1911. Miller NAWSA Suffrage Scrapbooks, 1897-1911.Library of Congress, Rare Book and Special Collections Division. http://hdl.loc.gov/loc.rbc/rbcmil.scrp7007904

This ticket is an example of the range of things involved in the history of documents. The Pullman Company ticket, Jersey City to Ithaca, New York. May 1911. Miller NAWSA Suffrage Scrapbooks, 1897-1911. Library of Congress, Rare Book and Special Collections Division. http://hdl.loc.gov/loc.rbc/rbcmil.scrp7007904

Trevor: For starters, what exactly is a document? In particular, you talk about the function of the document as “know-show” and the “authority of documents.” Could you unpack those ideas for us a bit?

Lisa: Paper Knowledge sets out to describe documents very broadly, as instruments used in the kinds of knowing that are all wrapped up with showing, and showing wrapped up with knowing. Think about identity documents, of course, that you show to the authorities under specific conditions and to be known by them, but think too of any piece of paper you have ever squirreled away just in case and then produced to convince someone of something. Anything–not just paper–can be mobilized this way and might count as a document, but of course it is primarily the affordances and habitual uses of paper that have helped us to understand documents in the ways we do and now to create and use digital documents too.

In terms of authority, documents tend to possess power partly by dint of the institutions for which or within which they circulate, if you think about the many institutions relevant to the credit economy, civil procedure, voluntary association, medical practice, municipal governance, institutionalized education, corporate communication, etc. I wanted to think about documents in this book partly as an antidote to the contemporary fixation with “the book” as touchstone within our ongoing experience of digital mediation, but also as an antidote to “the literary” as cynosure within academic departments of English. There is a lot we need to know about vernacular texts–their pasts and their potential futures–that has nothing at all to do with books or with literature. Modernity is saturate with documents, though of course the document predates the modern; many of the earliest inscriptions found by archaeologists  in the Near East are identified as “administrative,” that is, as documents.

An example of a mimeograph machine in use. Washington, D.C. Learning to use a mimeograph machine at Woodrow Wilson High School. 1943 Oct. Farm Security Administration/Office of War Information Black-and-White Negatives. ibrary of Congress Prints and Photographs Division.

An example of a mimeograph machine in use. Washington, D.C. Learning to use a mimeograph machine at Woodrow Wilson High School. 1943 Oct. Farm Security Administration/Office of War Information Black-and-White Negatives. Library of Congress Prints and Photographs Division.

Trevor: What do you think we learn about the PDF by considering the various historical episodes in the history of the document? I’d be particularly interested in some of the connections to microfilm, photocopying and faxing.

Lisa: I guess one of the lessons I learn over and over is that the familiarity of present conditions can prevent us from seeing those conditions critically. I understood so much more about PDFs and other digital formats by learning some of the history of earlier media for reproducing documents, like microfilm, photocopies and faxes. Not only is the utopian rhetoric that welcomed microforms in the 1930s a humbling reminder that utopian rhetoric about digital media is, well, rhetoric, but the uses and contexts of these earlier media offer instructive parallels that can help defamiliarize present conditions and so make them visible.

Thinking about microfilm as a medium dependent upon a client/server logic or thinking about photocopies as an origin point for “archive-don’t-delete” type thinking (reproduction as preservation) is of course anachronistic. But there are ways in which the histories of these earlier media feed into and enrich our sense of documents in the present, be they PDFs or other forms.

Trevor: You suggest the PDF was created with the idea of corporate authorship in mind. Could you briefly explain that and suggest some of the implications of that idea of authorship?

Lisa: Because PDF technology involves a separation between those who create files and those who merely read them–with a PDF reader application–the technology helps to structure authorship that is often corporate authorship. (The early web worked this way too, if I can generalize, since browsers are distinct from HTML-editors.) For me the quintessential PDF file is an airline boarding pass, printed out or held open on a smartphone, or else it is the manual that explains the smartphone itself, or else the quarterly statements the smartphone corporation publishes for investors.

Going way back in time, there’s a way in which technologies like this reinstall a sort of monopoly that letterpress printers once enjoyed on the look of printedness. Before typewriters and before a whole raft of documentary reproduction technologies developed after them, only printers in printing houses could print; everyone else was trapped in longhand. We’re used to hearing the present moment celebrated as one of amateur cultural production, of YouTube, selfies, and blogs, but there are ways that authorial power remains structured (pre-programmed) by the material conditions of authorship. Paper Knowledge renders some of the history of the development of PDF at Adobe Systems, yet it also tries to gesture toward a broader story, both about techniques of documentary reproduction and about the office culture of the 1990s from and for which PDFs emerged.

A photograph of a Xerox machine. 2. Xerox Model D copier, one of the first production units. Still in use in 1985 - Battelle Memorial Institute, Xerography, 505 King Avenue, Ohio State University, Columbus, Franklin County, OH.Historic American Buildings Survey/Historic American Engineering Record/Historic American Landscapes Survey.  Library of Congress Prints and Photographs Division.

A photograph of a Xerox machine. 2. Xerox Model D copier, one of the first production units. Still in use in 1985 – Battelle Memorial Institute, Xerography, 505 King Avenue, Ohio State University, Columbus, Franklin County, OH. Historic American Buildings Survey/Historic American Engineering Record/Historic American Landscapes Survey. Library of Congress Prints and Photographs Division.

Trevor: At one point, you suggest that the PDF imagines its users and its users reimagine it. I would be curious to have you work through an example or two in this regard.

Lisa: I started in on this in a vague way in answer to your last question. Users tend to imagine PDFs primarily in contrast to other formats with which they are familiar: paper, yes, but also other digital formats like *.doc or *.htm or *.jpg. Like older, non-electronic formats, PDFs can feel fixed, locked, in comparison to other digital formats for text at the same time they can feel “smart” in comparison with digital formats for images.

Meanwhile, I think PDF technology imagines its users distributed across the hierarchy of an org chart, divided into authors and readers, form-makers and form-fillers. Some users resist. Some loathe PDFs as clunky and backward looking. And of course imagining is culturally and historically specific: imaginations–literally, what is imaginable–can change. As it was first imagined by Adobe, PDF technology “solved” a lot of the difficulties that office workers had in the 1990s, in part by reducing the uses of paper as well as the uses of copiers, fax machines, express mail, interoffice mail, airplanes, envelopes and paper clips. In all of our enthusiastic imagination of “the paperless office,” we tend to forget today about those airplanes and their relations with paper.

Trevor: You focus on documents, and a lot of our readers are archivists who focus on records. To what extent are these the same things with shared histories? Do you think your media history approach to documents would be synonymous with a media history of records? Or, do you think it would be something quite different?

<a href="http://www.loc.gov/pictures/item/owi2001006402/PP/">Microfilming Chinese documents.</a> Library of Congress.  1942 June.  Farm Security Administration - Office of War Information Photograph Collection (Library of Congress). Library of Congress Prints and Photographs Division.

Microfilming Chinese documents. Library of Congress. 1942 June. Farm Security Administration – Office of War Information Photograph Collection (Library of Congress). Library of Congress Prints and Photographs Division.

Lisa: I have an earlier book called Always Already New: Media, History, and the Data of Culture that is about both records and documents, though cutely so, because the records in question are by and large the phonograph records used for sound recording starting in 1878. Really I think there isn’t much distance between these two terms, though focusing on documents in this recent project has allowed me to focus in particular on techniques of reproduction.

What’s important is the shared impulse to preserve and interpret that defines both records and documents. Or, better, what’s important is the shared impulse to interpret and preserve, since designating something a document or a record is already in some sense to interpret its value to history, its status as potential evidence, as archivable. Archivists are now necessarily at the forefront in thinking about digital records and what their preservation and access must entail.

Trevor: An article in the Guardian titled “Is the PDF hurting democracy?” noted that “a new report by the World Bank suggests that the venerable PDF is keeping valuable information buried in servers, unread and unloved.” From your perspective on the history of the document as media is this a meaningful question?

Lisa: Well, it is interesting, though certainly less salient than it might have been in 2000, before Google started to index PDFs in 2001. These World Bank PDFs are findable, after all, they just aren’t mineable yet in any fully automated way, if I understand it correctly. It may help to remember that there has long been a technical literature, a “gray” literature. I’m thinking of the sort of material that circulates outside formal publishing channels, can prove problematic for cataloguers, and has a relatively short shelf-life because so soon obsolete. PDFs now inhabit a similar, gray logic, if one thinks of technical manuals, reports, price lists, college coursepacks, and–ironically–white papers. These are the kinds of documents it can be a challenge to locate, much less preserve. I don’t think that means democracy is on the skids. Today’s networked environment has helped promote a myth of total information–everything available to everyone–but a myth is just that, myth.

An artistic take on the know-show function of a document. [Woman seated near man with spectacles pointing at document] .  Elliott, Elizabeth Shippen Green, artist.  charcoal drawing, Published as headpiece for: "The Recrudescence of Madame Vic" by Thomas A. Janvier, Harper's magazine, 112:513 (March 1906). Forms part of: Cabinet of American illustration.  Library of Congress Prints and Photographs Division .

An artistic take on the know-show function of a document. [Woman seated near man with spectacles pointing at document] . Elliott, Elizabeth Shippen Green, artist. charcoal drawing, Published as headpiece for: “The Recrudescence of Madame Vic” by Thomas A. Janvier, Harper’s magazine, 112:513 (March 1906). Forms part of: Cabinet of American illustration. Library of Congress Prints and Photographs Division .

Trevor: Aside from being potentially problematic for democracy, a set of scientists have had ongoing meetings and discussions about getting “beyond the PDF.” In this vein, they are interested in developing document formats for scholarly communication that function less as documents and more as structured data. Given that there is this interest, I would be curious to see if you have any thoughts on what resistance in the material or format these kinds of endeavors would need to overcome.

Lisa: As unwise as it may be for me to predict the future, the PDF file has a backward-looking feel to it that begs precisely this question. It may be that future protocols for scholarly communication can displace current publication norms, I don’t know. I’m hopeful. Certainly so much is in flux right now when one considers scientific communication in particular and the success of platforms like arXiv.org. In general I guess one challenge we all face for the future is thinking of ways that the ongoing work of archives and archivists can include and adjust to databases as an important part of the reigning knowledge infrastructure within which we will continue to exist and prosper.

2 Comments

  1. Euan Cochrane
    June 16, 2014 at 10:44 am

    This is a really timely post from my personal perspective. I’ve had some interesting experiences attempting to use an infographic (e.g. http://infogr.am) as a report and presentation format recently. The experiences have really brought that question “what is a document” to the front of my mind.

    Infographics are really digital-first. They are difficult to print and have interactive components. The reception of the infographic we created has been fascinating. It requires a change in the way people think about capturing and presenting information and a reevaluation of the purpose of “documents” as a medium for interacting with information.
    So far the reception has been mixed. A lot of people have asked how they should print the (interactive, single (long) page) infographic and others have been frustrated with how all the information is on a single “page”. Many have enjoyed it though and it does seemed to have fulfilled the purpose we selected the format for: effectively presenting a large amount of complicated information in a simple form.

  2. Ed Summers
    June 20, 2014 at 4:10 am

    I’ve got a copy of Paper Knowledge in my to read pile, and after this interview I’m going to put it to the top of the stack :-)

    I have one observation regarding the comparison between PDF and HTML editing:

    “Because PDF technology involves a separation between those who create files and those who merely read them–with a PDF reader application–the technology helps to structure authorship that is often corporate authorship. (The early web worked this way too, if I can generalize, since browsers are distinct from HTML-editors.) ”

    My experience of the early Web was that there was actually a fairly large contingent of people that learned HTML by ‘viewing the source’ for a web page in their web browser and copying things that they liked. Editors like Frontpage and Dreamweaver did come on the scene in 1995 but this was a couple years after a lot of people had been viewing the source in Mosaic.

    The wonderful thing about the early web was that readers were also creators. I think this is still a visible trend today, and browsers have largely retained the ability to view the HTML source for a page you happen to be viewing.Today browsers often come with quite advanced developer tools that allow you to introspect on the web page, see what HTTP requests are going on, view/debug JavaScript and CSS, etc. All this seems quite different from PDFs explicit separation of publisher and reader.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.