The following is a guest post by Chelcie Rowell, 2012 Junior Fellow.
Frequency of occurring? Rare. Impact of occurring? Huge. I’m talking about digital disasters.
Stewards of digital content, like stewards of analog content, must plan for catastrophe in advance in order to minimize loss and recover quickly. True, digital disasters may occur infrequently. But at the scale that institutions collect digital content and for the length of time that institutions wish to preserve digital content the risk of a disaster is non-trival.
Disasters may be natural (such as tornadoes and earthquakes) or failures of infrastructure (such as power failures). Disasters may result from intentional human action (such as cyberterrorism) or simply human error (such as accidental deletion).
A digital disaster negatively impacts an institution’s digital content. What distinguishes many catastrophes that threaten digital content from those that threaten analog content is that digital disasters may be much less visible. Bit rot is a one-in-a-million occurrence, for example, but when it happens special tools are needed to seek it out and prevent a digital disaster.
At a recent digital disaster planning workshop, Jessica Branco Colati walked participants through the process of preparing for and recovering from catastrophe. The importance of this is highlighted by two recent headlines that provide concrete examples of the stakes involved with disaster planning.
When the website for avant-garde 3:AM Magazine suddenly disappeared, what staff initially believed to be a glitch quickly turned into deeper concern when the service provider responsible for managing the site’s servers was unable to be reached. Editor Andrew Gallix was quoted as saying “I never expected those who were meant to host and back up our content to just switch us off without even telling us.” The extent of the digital disaster was difficult to assess due to crucial failures of communication. Unable to contact their service provider, staff felt powerless to take any action to recover their content. Referring to the missing service provider, Gallix said, “At this stage, we do not know if we’ll every be able to speak to him and if he can switch his server back on long enough to allow us to move 12 years’ worth of content to another, more reliable host.”
Pixar faced a digital disaster of comparably catastrophic impact involving the film Toy Story 2. As described by a Pixar technical editor, an accidental deletion wiped the working files before the film was finished. What audiences experience as an animated film is actually a complex digital object that contains thousands upon thousands of smaller files. Combined, these files are rendered into frames—including animation, set, and lighting data—that sequentially make up the moving image.
As the accidental deletion unfolded, pieces of that complex digital object were removed from disk, seemingly before the makers’ very eyes. As Oren Jacob, the film’s assistant technical director, put it “Woody’s hat disappeared. And then his boots disappeared. And then as we kept checking, he disappeared entirely. Woody’s gone.”
Fortunately, the studio was able to quickly restore the film from backups. But after the backups were revealed to be corrupt, the only recourse was to inventory different versions and perform human-intensive quality review to stitch together enough valid data to render a relatively complete film. Jacob recalled, “In the end, human eyes scanned, read, understood, looked for weirdness, and made a decision on something like 30,00 files that weekend.”
Both these episodes raise the issue of risk tolerance. When an institution manages unique digital materials, it needs to seriously consider what steps have to be taken to prevent–or at least minimize–loss.
This post was edited on 7/20/2012 to correct typographical errors.