This is a guest post by Shira Peltzman from the UCLA Library.
Last month Alice Prael and I gave a presentation at the annual Code4Lib conference in which I mentioned a project I’ve been working on to update the NDSA Levels of Digital Preservation so that it includes a metric for access. (You can see the full presentation on YouTube at the 1:24:00 minute mark.)
For anyone who is unfamiliar with NDSA Levels, it’s a tool that was developed in 2012 by the National Digital Stewardship Alliance as a concise and user-friendly rubric to help organizations manage and mitigate digital preservation risks. The original version of the Levels of Digital Preservation includes four columns (Levels 1-4) and five rows. The columns/levels range in complexity, from the least you can do (Level 1) to the most you can do (Level 4). Each row represents a different conceptual area: Storage and Geographic Location, File Fixity and Data Integrity, Information Security, Metadata and File Formats. The resulting matrix contains a tiered list of concrete technical steps that correspond to each of these preservation activities.
It has been on my mind for a long time to expand the NDSA Levels so that the table includes a means of measuring an organization’s progress with regard to access. I’m a firm believer in the idea that access is one of the foundational tenets of digital preservation. It follows that if we are unable to provide access to the materials we’re preserving, then we aren’t really doing such a great job of preserving those materials in the first place.
When it comes to digital preservation, I think there’s been an unfortunate tendency to give short shrift to access, to treat it as something that can always be addressed in the future. In my view, the lack of any access-related fields within the current NDSA Levels reflects this.
Of course I understand that providing access can be tricky and resource-intensive in general, but particularly so when it comes to born-digital. From my perspective, this is all the more reason why it would be useful for the NDSA Levels to include a row that helps institutions measure, build, and enhance their access initiatives.
While some organizations use NDSA Levels as a blueprint for preservation planning, other organizations — including the UCLA Library where I work — employ NDSA Levels as a means to assess compliance with preservation best practices and identify areas that need to be improved.
In fact, it was in this vein that the need originally arose for a row in NDSA Levels explicitly addressing access. After suggesting that we use NDSA Levels as a framework for our digital preservation gap analysis, it quickly became apparent to me that its failure to address Access would be a blind spot too great to ignore.
Providing access to the material in our care is so central to UCLA Library’s mission and values that failing to assess our progress/shortcomings in this area was not an option for us. To address this, I added an Access row to the NDSA Levels designed to help us measure and enhance our progress in this area.
My aims in crafting the Access row were twofold: First, I wanted to acknowledge the OAIS reference model by explicitly addressing the creation of Dissemination Information Packages (which in turn necessitated mentioning other access-related terms like Designated Community, Representation Information and Preservation Description Information). This resulted in the column feeling rather jargon-heavy, so eventually I’d like to adjust this so that it better matches the tone and language of the other columns.
Second, I tried to remain consistent with the model already in place. That meant designing the steps for each column/level so that they are both content agnostic and system agnostic and can be applied to various collections or systems. For the sake of consistency I also tried to maintain the sub-headings for each column/level, (i.e., “protect your data,” “know your data,” “monitor your data,” and “repair your data”) even though some have questioned their usefulness in the past; for more on this, see the comments at the bottom of Trevor Owens blog post.
While I’m happy with the end result overall, these categories map better in some instances than in others. I welcome feedback from you and the digital preservation community at large about how they could be improved. I have deliberately set the permissions to allow anyone to view/edit the document, since I’d like for this to be something to which the preservation community at large can contribute.
Fortunately, NDSA Levels was designed to be iterative. In fact, in a paper titled “The NDSA Levels of Digital Preservation: An Explanation and Uses,” published shortly after NDSA Levels’ debut, its authors solicited feedback from the community and acknowledged future plans to revise the chart. Tools like this ultimately succeed because practitioners push for them to be modified and refined so that they can better serve the community’s needs. I hope that enough consensus builds around some of the updates I proposed for them to eventually become officially incorporated into the next iteration of the NDSA Levels if and when it is released.
My suggested updates are in the last row of the Levels of Preservation table below, labeled Access. If you have any questions please contact me: Shira Peltzman, Digital Archivist, UCLA Library,[email protected] | (310) 825-4784.
LEVELS OF PRESERVATION
(Protect Your Data)
(Know Your data)
(Monitor Your Data)
(Repair Your Data)
|Storage and Geographic Location||Two complete copies that are not collocated
For data on heterogeneous media (optical disks, hard drives, etc.) get the content off the medium and into your storage system
|At least three complete copies
At least one copy in a different geographic location/
Document your storage system(s) and storage media and what you need to use them
|At least one copy in a geographic location with a different disaster threat
Obsolescence monitoring process for your storage system(s) and media
|At least 3 copies in geographic locations with different disaster threats
Have a comprehensive plan in place that will keep files and metadata on currently accessible media or systems
|File Fixity and Data Integrity||Check file fixity on ingest if it has been provided with the content
Create fixity info if it wasn’t provided with the content
|Check fixity on all ingestsUse write-blockers when working with original media
Virus-check high risk content
|Check fixity of content at fixed intervals
Maintain logs of fixity info; supply audit on demand
Ability to detect corrupt data
Virus-check all content
|Check fixity of all content in response to specific events or activities
Ability to replace/repair corrupted data
Ensure no one person has write access to all copies
|Information Security||Identify who has read, write, move, and delete authorization to individual files
Restrict who has those authorizations to individual files
|Document access restrictions for content||Maintain logs of who performed what actions on files, including deletions and preservation actions||Perform audit of logs|
|Metadata||Inventory of content and its storage location
Ensure backup and non-collocation of inventory
|Store administrative metadata
Store transformative metadata and log events
|Store standard technical and descriptive metadata||Store standard preservation metadata|
|File Formats||When you can give input into the creation of digital files encourage use of a limited set of known open file formats and codecs||Inventory of file formats in use||Monitor file format obsolescence issues||Perform format migrations, emulation and similar activities as needed|
|Access||Determine designated community1
Ability to ensure the security of the material while it is being accessed. This may include physical security measures (e.g. someone staffing a reading room) and/or electronic measures (e.g. a locked-down viewing station, restrictions on downloading material, restricting access by IP address, etc.)
Ability to identify and redact personally identifiable information (PII) and other sensitive material
|Have publicly available catalogs, finding aids, inventories, or collection descriptions available to so that researchers can discover material
Create Submission Information Packages (SIPs) and Archival Information Packages (AIPs) upon ingest2
|Ability to generate Dissemination Information Packages (DIPs) on ingest3
Store Representation Information and Preservation Description Information4
Have a publicly available access policy
|Ability to provide access to obsolete media via its native environment and/or emulation|
1 Designated Community essentially means “users”; the term that comes from the Reference Model for an Open Archival Information System (OAIS).
2 The Submission Information Package (SIP) is the content and metadata received from an information producer by a preservation repository. An Archival Information Package (AIP) is the set of content and metadata managed by a preservation repository, and organized in a way that allows the repository to perform preservation services.
3 Dissemination Information Package (DIP) is distributed to a consumer by the repository in response to a request, and may contain content spanning multiple AIPs.
4 Representation Information refers to any software, algorithms, standards, or other information that is necessary to properly access an archived digital file. Or, as the Preservation Metadata and the OAIS Information Model put it, “A digital object consists of a stream of bits; Representation Information imparts meaning to these bits.” Preservation Description Information refers to the information necessary for adequate preservation of a digital object. For example, Provenance, Reference, Fixity, Context, and Access Rights Information.
Thanks for this post, Shira. I’ve thought of Access separately mostly because digital preservation systems that include access functions are out of our financial reach. I completely agree with you that Access as a purpose of preservation is critical, and having it represented in the NDSA Levels is a great planning strategy.
Am I wrong in assuming that having Access as an integral part of a system is only going to happen with the high-ticket subscription? The way I address this in my own small corner of the dp world is through a systematic hybrid approach of preserving the “best copy” and keeping track of “access copies” elsewhere.
Intellectual vs. physical/virtual control seems retro these days but it’s all I’ve got!
Anyway, from a planning perspective I think having Access in the NDSA Levels is a great way to make sure we keep our eyes on that prize and to let developers and our communities know that Access doesn’t happen on its own.
Interesting post and great to hear about the continued and ongoing work to enhance the NDSA levels (which I find incredibly useful in my work).
I like the idea of including access but looking at the access level you suggest currently I think the bar for level 1 is set too high.
At the moment I like to think I’m firmly on level 1 of the NDSA levels and making progress in the other levels so hoping to move forward soon. However, the addition of the access level is going to push me back to level 0 as we haven’t really thought about redaction yet and how we might achieve that.
I think in terms of a progression around access to born digital material I would say that at an early level you would need to be thinking about being able to manage ad hoc requests for access in an ad hoc way (for example, providing a user in the searchroom with a laptop with the relevant files on) but that the more advanced levels would require this to be managed in a more streamline and consistent way that also addresses access requests from remote users.
First and foremost, really excited to see this Access row proposed for the NDSA Levels of Preservation. We have leveraged these Levels as easily digestable heuristics we can share up and down the digital preservation expertise ladder.
The first two comments make great points. Combining concerns about the separation of access and preservation, and finding one’s self bumped back to to Level 0 compliance, what if Level 1 was related to analytics of use?
The other rows for Level 1 — “Protecting your Data” — touch on inventories, understanding of data (fixity), identities of individuals interacting with the data. Thinking about access, analytics seem like one of the lowest bars to generating knowledge about use of materials. And if the materials are not yet accessible online, “analytics” might even include reproductions provided for users, subsets made available, etc.
Thinking further into the levels, an idea that seems to be coming from every angle — in the best possible way — is the idea of “Collections as Data”, nicely embodied in the recent “Collections as Data 2016” conference (http://digitalpreservation.gov/meetings/dcs16.html). Understanding use of materials hinges on access. To the degree that materials are available in structured formats, better yet, with URIs that can be referenced along the life-cylce of their use, perhaps that could integrate into Levels 3 and 4, where feedback from use is channeled back to the repository such that they might “Know”, “Monitor”, or “Repair” their data after it’s been out into the “wild” and back.
Probably should have held off on commenting until thoughts were more fully formed (just saw this this morning), but think it’s an interesting, if not also challenging, idea to include Access in the NDSA Levels. Appreciate the food for thought!
Coming back to this idea; I’ll join the others in saying that an Access functional area makes sense.
I agree with Mitcham that the bar is a bit high for Level One. Focusing on the premise that level one focuses on securing your data, the “ability to ensure the security of the material while it is being accessed” is one appropriate item listed.
Declaring a sufficiently useful designated community is surprisingly difficult for may archivists (who tend to default to “anyone who wants to use it”) and the activity is more in line with level two’s “know your data”, which is more accurately stated in this case as “know your users.”
I would actually break out the identify PII and redaction into two steps. It is fairly simple now to identify a lot of PII, but it isn’t perfect, and redaction tools are still very poor in my experience. By all means, we should throw tools like bulk_extractor at our data to find the obvious culprits and restrict access to positive results as a first line of defense (level one); but it takes additional review to identify false positives and find the even trickier false negatives (level two).
I would further note that creating SIPs and AIPs have very little to do with access; if we want to emphasize these OAIS packages they belong in the Metadata functional area. Similarly, “storing representation information and preservation description information” is what the file formats and metadata areas are all about. Now, creating DIPs on ingest is the one OAIS-centered recommendation that makes sense for an Access functional area. Relatedly, monitoring your own software library for providing access to the formats in your collection seems like a good match for level three.