Protect Your Data: Information Security and the Boundaries of your Storage System

The following is a guest post from Jane Mandelbaum, co-chair of the National Digital Stewardship Alliance Innovation Working group and IT Project Manager at the Library of Congress.

The NDSA Levels of Digital Preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received by the NDSA group working on this is that we provide more in-depth information on the issues discussed in each cell. To that end, we are excited to continue our series of posts, set up to help you and your organization think through how to go about working your way through the cells on each level.

There are 20 cells in the five levels, so there much to discuss. We previously wrote about row one cell one, Protect Your Data: Storage and Geographic Location and about row one cell two, Protect Your Data: File Fixity and Data Integrity.  If you want an overall explanation of the levels, take a look at The NDSA Levels of Digital Preservation: An Explanation and Uses.

This post is about row three, column one, the third box, in the NDSA Levels of Digital Preservation.

This post is about row three, column one, the third box, in the NDSA Levels of Digital Preservation.

In this post, we tackle row three cell one Protect Your Data: Information Security.

Storage Systems and Information Security

Requirements for information security in digital preservation may seem daunting, but these requirements can generally be met by simply configuring and operating your Storage System. The initial post in this series touches on the Storage System as a way to manage your stuff.  We’ll start this post by talking about Storage Systems in more depth, and how they address information security needs.

First, how are we defining a Storage System? 

A Storage System is a defined set of hardware, software, services and operational practices that you use to manage your stuff.  Many of us may feel that we don’t manage our own stuff because that is the job of our IT staff, or our external IT service provider.  But if you are engaged in digital preservation, you need to be able to identify the boundaries of the Storage System that does some (or any) digital preservation for you.

We’ll pose a few more questions to help you identify your Storage System.

Second, what are you keeping in the Storage System?

Because digital “stuff” is not tangible, it is sometimes difficult to describe what you have in your Storage System.

So you’re going to need someone who is willing and able to define the scope of the content in your Storage System.  And generally, you’re going to want to establish a few basic guiding principles, such as the following:

  • Anything in the Storage System has to be identified uniquely by a name or other identifier.
  • The base unit in the Storage System is a digital file – a discrete object.a.

More precisely, a Storage System generally has digital “files,” each with a name or identifier that can be managed and tracked.  A stream of data (such as a audio or video feed) won’t be in your Storage System unless you can define a start and stop point.

You might find it useful to think of the data in a Storage System as files “at rest.”

Third, what makes up a Storage System?

Once we’ve identified files as what we are preserving, we want to define what makes up the Storage System that we’ll keep our files in.  We can think of our Storage System as generally including servers, storage, services and operational practices.

Servers and Storage: Let’s start with servers and storage.  Differentiating between servers and storage is important in using the Levels and it is critical to understanding the Levels requirements of Information Security.

To be useful for this post, we’ll make some gross generalizations about what servers and storage are.  A server does data processing (runs programs, does indexing, does searches, and performs functions on data – read, write, execute and delete).  Your digital “stuff” is not on a server. It is true that a server has to have some amount of data to do its work (executable files, parameter files), but we don’t generally think of this as the “stuff” to be preserved.  As a gross generalization, this server-based data is typically located on small local disks and/or in memory (RAM).

Storage is where your “stuff” is.  This may be one or all of different kinds of storage.  The most commonly-used kinds of storage in a Storage System are “flash” or Solid State Disk (SSD) storage, hard disk storage, and tape storage.

Servers and storage both start as pieces of equipment that you can see and touch.  Both may be “virtualized” in practice.  A single piece of server equipment may be divided into multiple “virtual” servers.

To manage and protect your “stuff,” you need the functions performed by both servers and storage.  A server does any of the work required to manage your stuff.  Storage contains the files that you are considering in scope for your digital preservation storage system.   Digital objects or items may be complex, such as websites or video games or 3-D visualizations.  But each of these objects or items can be defined as a set of files.  And there’s always an interface that provides for the communication between the server and the storage.  The interface itself consists of both hardware and software.

So here are a few of examples of what servers and storage might be in your Storage System.  These are simply examples, and are not intended as a comprehensive list.

  • Storage System 1: One server with multiple large internal hard drives formatted or categorized with the principles of RAID (multiple physical drives that appear as one set up for redundancy and performance optimization), and a regular backup process for the data on the hard drives.
  • Storage System 2: One server attached via a network to a set of hard drives formatted or categorized with the principles of RAID (multiple physical drives that appear as one set up for redundancy and performance optimization), and to a tape system.
  • Storage System 3: Multiple servers attached via a network to multiple sets of hard drives and a tape system.
  • Storage System 4: Multiple servers attached via a network to multiple sets of hard drives and a local tape system, with a remote backup copy of designated files.

What are the services that can be performed in your Storage System? A Storage System is not complete with simply servers and storage.  Users need to be able to use the system to perform functions within the system.  These may vary from simple to complex, depending on the organization and its needs.   Examples of a base set of services might be:

  • Inventory of the files in the Storage System;
  • Capability to check the fixity of the files in the Storage System;
  • Capability to import files into, and export files out of, the Storage System;
  • Management of users with access to the files.

What are the Operational Practices, or rules, of your Storage System?  You may have rules that are managed through system functions and system parameters.  For example, your Storage System may enforce file naming conventions or rules about the number of files in a file system.  We will explore these in more depth in another blog post.

Your Storage System should have rules for access, which include managing who has access and what can be done by those who have access.  This brings us finally to the information security requirement – which is about access to the files.

Who has access to your Storage System and how do you control it?

The access to your “stuff” is the heart of the Information Security rows of the Levels.  You should always think about access, no matter what type of data you have.

There are two useful ways to think about what you need.

First, if you think about servers and storage as distinct parts of your Storage System, this will help define your access practices.  You need access policies and practices for both the server and the storage components of your Storage System.  Why is that and what does it buy you?  In the first column of the Information Security row of the Levels, we talk about identifying and restricting who has read, write and delete authorization to individual files.

You can think about this requirement as having two parts — the “who” part and the “files” part.   The “who” part is defined through user accounts, which are generally managed on servers. The user accounts may, for convenience, be members of user groups. Remember that your Storage System may have multiple servers – each of which needs to have an identified list of user accounts and groups.  The “files” may be accessed by different user accounts in different groups on different servers for different purposes.  This may seem confusing, but actually provides flexibility in control over your files.  You can provide different kinds of access to different files for different users.  You can think of this as a matrix that provides access controls through a combination of the user accounts on the servers with the files on the storage.

Another way of thinking about this is the concept of Role-based Access Controls.  With this concept, you start with the roles that you want your users to play in dealing with your files in your Storage System.  Examples of common roles are:

  • Omnipotent system administrator who can do anything and everything;
  • User administrator who can set up user accounts and user privileges but has no access to your files;
  • File administrator who has read and write access to all your files but cannot set up any user accounts;
  • User who has read access to all your files;
  • User who has read, write and delete access to all your files;
  • User has read access to some of your files;
  • User who has read, write and delete access to some of your files.

So if you’re looking at the requirements in this box it’s an opportunity to look at two views at the same time: the big picture of the Storage System as a whole, as well as the individual files in the Storage System.

Do these views make sense for the variety of scenarios in the field?  We’re always interested in hearing from practitioners on what is useful for you, so share your thoughts on this topic in the comments below.

Teaching and Learning About Digital Stewardship

Gaining the knowledge, skills and experience required to manage digital assets and provide access to them over time can sometimes feel like trying to hit a moving target. Almost all heritage organizations now have a responsibility to steward some kind of digital content be it e-books or journals, digitized materials, electronic records, digital photographs, data […]

Exploring Computational Categorization of Records: A Conversation with Meg Phillips from NARA

Continuing the insights interview series, I’m excited to share this conversation with Meg Phillips, External Affairs Liaison at the National Archives and Records Administration. A few years back we “un-chaired” CURATEcamp Processing: Processing Data/Processing Collections together. Meg wrote a guest post reflecting on that event for the Signal titled More Product, Less Process for Born-Digital […]

A Residency Update: Working with Digital Media Art at the Smithsonian

The following is a guest post by Erica Titkemeyer, National Digital Stewardship Resident at the Smithsonian Institution Archives As the National Digital Stewardship Resident placed within the Smithsonian Institution Archives I have been tasked with identifying the specialized digital curation requirements for time‐based media art (TBMA). I typically use this definition to best describe TBMA (also referred to here as digital media art): artwork containing […]

Hacking Digital Stewardship at Computers in Libraries 2014

While we officially welcomed Spring last month, April seems to be the unofficial start to conference season.  This week, NDIIPP staff are busy talking about the NDSA’s National Agenda for Digital Stewardship and NDIIPP’s personal digital archiving guidance at TLA’s Digital Libraries Roundtable on Friday, April 11 and at Personal Digital Archiving 2014 on April […]

April Issue of the Library of Congress Digital Preservation Newsletter is Now Available!

The April 2014 Library of Congress Digital Preservation Newsletter (pdf) is now available! In this issue: Where are the Born Digital Archives Test Data Sets? Fixity Data in Sound and Moving Image Files Managing a Library of Congress Worth of Data Personal Digital Archiving: The Basics of Scanning New NDSA Report: Geospatial Data Stewardship Online […]

Public Service Libraries and Personal Digital Archiving

The Texas Library Association Annual Conference started earlier this week, and I’ll be heading out there on Friday April 11 to participate in an interactive session with the TLA’s Digital Libraries Roundtable on the National Digital Stewardship Alliance and some of NDIIPP’s initiatives related to Personal Digital Archiving. Our parents and grandparents saved hand-written diaries, […]

Eyes of the World: Interview with George Jungbluth of the US National Oceanographic and Atmospheric Administration

This post is part of our ongoing NDSA innovation group’s Insights interview series. Scientific data is the biggest of the “big data.” In fact, research data and increased complexity and volume of data are two of the challenges addressed by the National Agenda for Digital Stewardship. To find out more about the data preservation and […]

Protect Your Data: File Fixity and Data Integrity

The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives. Here on The Signal, members of the NDSA Levels of Digital Preservation team have been providing some […]

Shaking the Email Format Family Tree

Recently, we’ve started to add email formats to the Sustainability of Digital Formats website. Eventually, when we get a more robust collection, we’d like to split them out into a separate content category but for now, they (mostly) are categorized with their closest cousin, the Textual Content family.  Our genealogical research is still very much […]