The following is a guest post from John Spencer, President BMS Chace, and coordinating committee member for the National Digital Stewardship Alliance. In this post, Spencer shares information he presented on a National Digital Stewardship Alliance panel on the importance of understanding bit level threats to preserving digital content (PDF) at the 2012 NDIIPP Digital Preservation meeting.
I work with record labels and artists to create digital preservation file sets of commercial and catalog recordings, including the individual multitrack files as well as the various mixes generated in the creative process. My organization also works with institutional archives, and as you can imagine, the approach to archiving is quite different. Institutions and archives have standards and best practices they can rely on, which is extremely important, because an archive may have the only surviving copy of a wax cylinder, a 78-rpm record, or other obsolete format. In fact, there may be only one opportunity for a successful playback and creation of a digital preservation file, which is why organizations like NDSA work to bring cohesiveness to the standards process.
On the other hand, commercial recordings, mostly born-digital these days, face a different set of challenges. Release deadlines and marketing expenditures, along with different file types for e-commerce vs. physical, are a few of the issues that impact how a commercial recording may (or may not) be archived.
For some labels, the only deliverable they may get is an external hard drive – most likely transferred from yet another hard drive – but the release cycle has started, so the files are then moved to the various e-commerce sites as well as physical replication plants. The notion of bit-level fixity unfortunately falls down to the bottom of the list, not because labels don’t care, because these recordings may turn out to be extremely valuable, but most commercial label vaults are focused on the “object” as opposed to the underlying “asset.”
Working in both the recording industry and archival preservation environments, the thrust of my work with standards organizations like NDSA is to try and bridge the gap between the different “camps,” pointing out whenever possible the similarities while also trying to bring relevant standards and best practices towards a common goal.
We’re all intimidated by the technical jargon thrown about these days about how we are supposed to keep our music/ video/ etc., viable – some of us actually listened, and we’ve got backup hard drives, optical media, etc., where we have backed up our most important projects.
The Value of Bit Level Fixity to Media Producers
Often, folks in the recording industry I communicate with might ask me something like “So why are you talking to me about ’bit-level fixity‘ – I mean, what else are we supposed to do beyond having multiple copies?”
I explain, working to ensure bit-level fixity to them in two ways:
- Bit level fixity is a “property” of the digital preservation files.
- You ensure the fixity of your content through a “process” of managing of digital preservation files and their fixity information.
Bit-level fixity exists to enable those tasked with managing digital preservation files to have a tool available to them to ensure that when they make a transfer, it is indeed an exact copy. Most of the institutional archival community is well versed in the need for bit-level fixity, which is why they use some sort of checksum algorithm to enable them to verify that the copied digital file is indeed a bit-for-bit copy. Many people outside of the archival community have at least heard of the term “checksum”, or “md5 checksum”. The md5 checksum is probably the most widely used application to ensure that the copied file is exactlythe same as the source file. There are many free applications that enable one to create checksums, many stewardship organizations use a free and open source tool called Bagger, but you can also simply search for “md5 creation tools” via the search method of your choice.
More importantly, as mentioned above, it is a “process” – many, if not most digital preservation files are stored on data tape backups. Those backups, while robust, also require a cycle of migration (every 3-7 years) that should include bit-level fixity to make sure that the subsequent copy is indeed an exact bit-level copy.
Libraries and archives ultimately may become the digital stewards of your commercial work – or at the very least, they may be leading the way to help preserve the long-term viability of your recorded assets. In my view, it is imperative that institutional archives and owners of commercial content begin to engage in a more comprehensive dialog of shared information, which is why organizations such as the NDSA are so important, to create an environment where all stakeholders can share information and implement best practices with their own institution, whether public archives and libraries or commercial recordings.
In closing, there are many factors that may impact your personal or professional digital archive, but bit-level fixity is something that you want to add to your toolkit!
Comments (4)
John, very good article ,thanks. Outside of data that is being migrated (e.g. your tape example), can you talk you your requirements for data at rest? Such as the need for periodically “health checking” the data to check for bit rot. Is this ok at the physical device level (e.g. some storage technologies like object storage can periodically check and correct for bit rot), and if so, how do you associate this metadata with the object/data. Or do you also see a need higher up in the ‘stack’. Thanks for your musings on this too…
In a perfect world, I’d like to see automated checking of the data tapes at rest (assuming we have at least a couple of copies to start with) in conjunction with a data migration plan for off-line media, keeping in pace with what Commercial IT providers are using.
For example, LTO is great because it has a roadmap, but we can’t assume that a data tape by itself is an archival object – in my opinion it is a carrier, subject to subsequent fixity checks and migration.
Does that make sense?
Yes, makes sense – and to continue then, expanding this from a tape discussion to an IT and disk discussion (possible a disk-based cloud), what would an ideal situation be for vendors or IT performing checks on data at rest on disk (possibly a disk copy to that on the ‘carrier’ tape?). Thanks –
Well you pose a number of interesting questions – 2 things come to mind:
1. need for access vs. last-mile network speeds
2. budget
I (personally) think if you’ve got multiple copies, and have them on different fixity check schedules, that’s a good thing. How often? I’d love to see it happen every 6 months, but see point #2. I’m a big fan of inserting disks in the access/ retrieval/ management workflow, but it all comes down to $$.
Workflows are certainly different between institutions with TDRs and labels with “release dates”. Also, you must analyze the relationship between the digital archive and the existing IT infrastructure – friend or foe? Help or hinder? Sometimes the politics come into play as well.