Digitization and Digital Preservation: Questions Persist

Digitization–making a digital copy of a non-digital object–is a bedeviling topic for digital preservationists. Establishing a clear line of demarcation between the process of creating the digital copy and the process of keeping the copy over time is the central issue.

Questions about scanning, by Wlef70, on Flickr
I’ve always thought this was semantics. Well-meaning, but ill-informed, people said “digital preservation” when they meant “digitization” in reference to something like scanning a book or stack of family pictures. In my mind there was a bright line between creating the object and preserving it. Helping people understand this distinction would help bridge the gap.

I still think talking about the issue is critically important, but resolution is proving complicated, even within our own community. The Digital Preservation forum on Stack Exchange will, we are told, die as a non-public beta, in part because those of us who participated were unable to agree whether digitization was in scope. This is how it appeared to the presumably impartial site administrator who is pulling the plug:

It seems fairly clear that this proposal was backed by several distinct groups of people, and “digital preservation” means something different to all of them. To some, it means asset management within a company. To others, it’s about preserving legacy software. To others still, it is about manipulating file formats. The end result is that none of these groups is served well by the site we have now.

There is an element of truth here. Participants differed in how they defined digital preservation in connection with digitization. Here’s a shortened version of the question/comment that kicked off the discussion (spelling has been Americanized):

I would propose that this community should be involved in decisions related to the results of a digitization initiative. For example, the file formats and metadata schemas used, where/how the results are stored, and so on. However, questions focused on how to engage in digitization, what types of scanners should be used or resolution to digitize at, will be off topic…. As many of us in the digitization community are well aware, confusing digital preservation for digitization is a common mistake. I’d suggest adding some clear scoping detail on this to the “What kind of questions should I not ask here”?

This is an essential issue. Anyone involved in digital preservation has had experience with people who think the work is purely about scanning, as well as experience with people who are deeply interested in creating optimally preservable content. A public forum absolutely must be clear if a distinction is to be drawn between scanning and preserving, and if so, how.

That turned out to be a difficult task. Of the seven responses offered, none were chosen as “the best.” Here is my attempt to distill the essence of each answer, ranked in the order of up-votes.

  • The mechanics of digitization are not in scope of this site, but the preservation of the results are.
  • The answer should be a conditional yes, rather than an unqualified yes… It seems to me that if someone wants to digitally preserve something, and they have a legitimate question about the digitization process, especially if it relates to the preservation aspect, then we shouldn’t be turning them away. They’re a legitimate part of the constituency of this site and we should be taking their questions seriously.
  • There’s a continuum of relevance here. “What’s the best scanner to use” and “what DPI should I scan at” seem obviously out of scope, “should I save as TIFF or JPG” less so, “how should I organize and describe the files once I’ve scanned them” clearly on-topic.
  • I think digitization of analog objects with the goal of digitally preserving them, or keeping them easily-preservable in the future is on topic.
  • Yes. The phrasing is key however and it should be focused on the results of digitization – the formats, the metadata associated with it, the storage etc. and beyond that, errors that appear in the data stream.
  • If you’re not digitizing, how can you be doing digital preservation? Seems like it has to be on-topic.
  • Let’s think about it in narrow terms as creation of digital content from a preservation perspective, which would be in scope.  Anything that doesn’t relate to creating content for clear-cut preservation purposes (equipment, throughput, QC) would be out of scope. (Full disclosure: this was my suggestion).

The range of responses was interesting. Some were black and white (on both sides!) but some hedged, acknowledging a gray area. And, I have to say, the more I considered the comments, the grayer things got. Take my own answer. Is always true that equipment and quality control are out of scope for digital preservation? On reflection, some scanning workflows will create clearer, more detailed and accurate images than others. Better quality images–for certain types of content, at least–presumably have higher value, which arguably could influence their level of preservation.

Digital preservation is an emergent enterprise, and advocates are naturally eager to promote it as a distinct and deliberate activity. But we have to consider the value of a more general approach to framing the objective. Narrowly drawn questions and answers are important, but they apply narrowly as well. As a community, we still have quite a bit left to do in terms of fleshing out digital preservation methods, concepts and outcomes. More importantly, a huge amount of work is needed to raise public awareness about the need for preserving digital materials.

We should think about the value of engaging anyone with a question that touches even remotely on digital preservation. So what if someone asks about how to “digitally preserve” their family slides by scanning them? Instead of dismissing them as a hopeless noob, why not take the opportunity to explain that scanning is only the first step? They might even come away with a new appreciation for what preservation is all about.

  1. Certainly as part of the digital preservation group at The (United Kingdom) National Archives for the last six months or more I’ve been heavily involved in defining our technical standards for digitisation outputs (ie file format – jp2 – and various sorts of metadata; including information transcribed from the images), and looking at how we can evaluate conformance with those standards. This has to some extent also involved looking at the workflows of various commercial scanning outfits – we want to be sure that the process they propose is robust and will lead to consistent output.


  2. Bill, it’s as much about semantics as it is “do,” “doing,” and “done.” It comes from the same general school of thought (there’s a scanner involved somewhere, maybe), but after that…well…things get complicated.

    “Scanning” – No serious experience required–just need a flatbed and a computer and you can can anything.

    “Digitize” – Slightly more complex term, but the terms are interchangeable.

    “Digital Archivist” – Here’s were confusion sets in. Take the scanning/digitization elements and add knowledge of formats, metadata, DAMS, etc, and creating online collections (either with physical or born-digital objects). There’s an awareness of digital preservation, and sometimes it is part of the job (i.e. migrating an existing collection from an old format to a new DAMS or updating the file format).

    “Digital Preservation” – Not so much the creation of collections, but the practice of taking measures to ensure existing collections and items are still accessible over time and through newer platforms, and that the information stored is not lost. Additionally, digital preservation is proactive. Digital preservationists will test long-term feasibility of maintaining and preserving information for collections in development (or in the planning stages). They’re the firemen of the whole process.

    “Digital Conservation” – Repairing/attempting to correct information/software which may be damage or lost or create a “digital compromise” (totally making this up) where the context of an obsolete program is accessible.

  3. I think the more we encourage people to consider life-cycle approaches to digital curation – which includes preservation – the harder it is going to become to exclude any phase of the cycle from consideration when looking at any other phase. Just as we encourage digital curators/preservationists to participate when possible in discussions around best practices during the creation phase of born digital materials, I don’t see how we can create an artificial partition in the life-cycle of digitized materials.

    • Leah: Good point. A true life cycle approach would work very hard to influence all aspects of digital object creation.

  4. I don’t understand the need to partition the various steps from input procedures to ongoing preservation (adjusting formats and hardware/software over time), except that we may need access to experts in each of the disciplines, to provide accurate assessments and advice. Leah seems closest – the life cycle approach, but the cycle doesn’t come back to the beginning (I hope so – I don’t want to re-load our various databases because of technology changes or lost data!). We are a small archival organization – I am it for technology; whatever technological decisions (format, resolution, software) I make, we all use. I don’t want to have to belong to endless groups for access to technological progress and expertise to get the needed information to make those decisions. One stop shopping – I love it!

    • Jamie: You’ve hit on another aspect of the issue–how aggressive (assuming it’s an option) should be be in applying the complete life cycle, which to my mind does go to creation–and even before creation. In an ideal world, preservationists deal with the entire kit and caboodle, because that gives maximum control over preservation outcomes. But that adds complexity, technical, political, etc. Hard to see how we can avoid the dreaded “it depends” qualification.

