The following is a guest post from Megan Phillips, NARA’s Electronic Records Lifecycle Coordinator and an elected member of the NDSA coordinating committee and Andrea Goethals, Harvard Library’s Manager of Digital Preservation and Repository Services and co-chair of the NDSA Standards and Practices Working Group.
As part of the effort to publicize the NDSA Levels of Digital Preservation and as a way to continue to invite community comment on it, several members of the Levels group wrote a paper about it for the IS&T Archiving 2013 conference. The paper is The NDSA Levels of Digital Preservation: Explanation and Uses is available online.
At the conference, we got interesting comments and one significant suggestion to improve the paper from Christoph Becker, Senior Scientist at the Department of Software Technology and Interactive Systems, Vienna University of Technology. We wanted to present the suggestion he made here and ask for help from all of you to resolve it.
Christoph wrote that the major aspect of the levels that he would adjust is the label for the last function, “file formats.” You can see the table here. He pointed out that file formats are just one aspect of a larger preservation challenge related to how data (the bitstream) and computation (the software) collaborate in creating the “performances” that we really care about. New content is often not even file based. Format is just one element out of many that could be significant in preservation, and in some cases the format itself is almost meaningless. Often the real issues are related to specific features or feature sets (e.g. encryption), invalidities and sizes. (Petar Petrov tried to include part of this problem into his blog post about content profiling.) If you consider research data, for example, the format could be known to be XML-based but have no schema available. The real preservation challenge might be that the data requires a certain analysis module (found here) running on a certain platform, which is dependent on distributed resources — a certain metadata schema (found there), and certain understanding of semantics (found over here).
Christoph’s suggestion is that the overly-specific label “file format,” in the levels puts forward too narrow a view of the problem in question. The label could skew the real challenge since it excludes part of the problem (and part of the potential community). He suggested possible replacements for the “file formats” label. “Diagnosis and action”? “Issue detection and preservation actions”? “Understandability”? For him, in fact, this is the heart of preservation, and if you look at the SHAMAN/SCAPE capability model that Christoph works on, the preservation capability really is all about the last two rows (operations include metadata), assuming that the bitstream is securely stored and managed.
We (Andrea and Meg) think that Christoph has a valid point, but we’re still not sure of the best label to capture the suite of interrelated issues that need to be addressed in the last row of the Levels chart. Christoph’s suggestions make sense in isolation, but they would overlap with activities in other rows of the chart, and don’t quite convey the concept we originally intended.
- Do you think “file formats” is clear enough as shorthand for these kinds of issues, given where most of us are in our practical digital preservation efforts, or does this need to be changed?
- What label would you use for the last row of the chart? (Content characteristics? Usability? Just plain formats (without “file”?)
- Are there other changes you think we should make to improve that row?
- Any changes you’d recommend to other parts of the chart?
In the Archiving 2013 paper, we said that any comments received by August 31, 2013 would influence the next version of the Levels of Digital Preservation, so please suggest improvements! We may come back to you again over the summer to help resolve other issues.