This is a guest post by Lisa Gregory, Digital Projects Liaison, Digital Information Management Program, State Library of North Carolina.
About two weeks ago, I tried a little thought experiment. Throughout the course of an afternoon, for each person I spoke with or saw, I imagined myself talking to them about digital preservation. From grocery store to coffee shop, from close colleague to elevator acquaintance, I mentally engaged them in a tête-à-tête and tried to guess at their responses. Why? Because I think it’s important to remind myself that, for most of the population, digital preservation just isn’t on the radar.
My colleagues and I at the State Library of North Carolina think a lot about those folks even when we’re not mentally ambushing them in grocery stores. We know that our message is relevant to those who don’t have The Signal in their RSS reader account, or who wouldn’t know RSS if it hit them on the head. This week, we released a four-part video tutorial that isn’t for you, colleagues and friends – it’s for them. It’s about one of those first steps that can help someone start doing digital preservation before they even know what it is: file naming.
Our four-part tutorial explains (1) why file naming is important, (2) how to change a file name, (3) what not to do when naming files and (4) best practices for file naming. We tried to make these tutorials no-frills, brief and informative. We tried to present the information in a way that would make sense to folks who might only use one or two programs on their computer and who might not ever care about authenticity or obsolescence. We hope we succeeded.
Please let us know what you think at [email protected] or @ncpedia. If you think they’re useful, pass the videos on to those at the fringes of your lives, outside of your digital preservation posse. I’m hopeful that more of those mental conversations can become real ones.
Comments
Your file naming recommendations seem to me to be a bit out of date and inadequate in some other respects.
Space, Comma, Parenthesis Characters:
For well over a decade operating systems and software have been compatible with filenames containing these characters. It’s unlikely this will change in the future and it’s extremely easy to programmatically substitute one character for another.
Space characters may be incompatible with some obsolete operating systems and software but it does make filenames easier to read and understand by both humans and search engines.
Capitalization:
What your video says about not relying on capitalization for uniqueness is accurate but some people may misunderstand and be discourage from using capitalization at all. Perhaps you could emphasize “UNIQUENESS” and clarify it’s implications through example.
Best Practices:
IMHO, your suggestion for filename format is not adequate to address archival needs. A title and date are simply not enough information. The filename format should include additional information such as a category, a sequence number, and optional version number and notes. Such a format may look something like this.
Category yyyymmdd-ssss Title vvvv (note).ext
Capitalization should be encouraged for readability. If spaces are allowed then proper title capitalization can be used, preserved, and encouraged.
If no spaces, parenthesis, and comma are an absolute must then the format would look something like this.
Category-yyyymmdd-ssss-Title-vvvv-note.ext
Category (A Group of Files):
Category is used to identify a group of files. Proper English capitalization rules are applied to each word in the category. Words are separated by space or underscore characters
Date (yyyymmdd):
Consists of 8 digits consisting of 4 digit year, 2 digit month, and 2 digit day. Date is separated from Category by space or underscore and from sequence by a hyphen.
Sequence (ssss)
The sequence number consists of 4 digits. They indicate the order of this file in the Category. For example suppose there is an audio book consisting of one mp3 file per chapter. Category would contain book’s title and Title could contain the chapter title. The sequence number identifies the order in which the chapters are to be played.
For things like tv or radio series the sequence number could consist of a 2 digit season and 2 digit episode numbers.
Sequence is separated from date by a hyphen and from Title by an underscore.
Title:
Title is used to identify individual files within a Category or group. Same rules as Category apply.
Version:
Version is used in production environments where it is necessary to keep a revision history of the file. Version is optional and is used only when necessary. It is separatet from Title and Notes by a space or underscore.
Notes: (note1, note2, etc)
Notes are enclosed in parenthesis are are used to provide additional information about the file contents, format, etc). Individual items are separated by commas. Lowercase text is preferred except for formal nouns or abbreviations.
Extension:
The file extension identifies what type of file this is and by implication what program(s) is used to view or modify it.
Eample:
The four files containing your video presentation could be named like this.
Digital Preservation 20120124-0001 Why is File Naming Important (MP4, HQ).avi
Digital Preservation 20120124-0002 How to Change a File Name (MP4, HQ).avi
Digital Preservation 20120124-0003 What Not to do When Naming Files (MP4, HQ).avi
Digital Preservation 20120124-0004 Best Practices for File Naming (MP4, HQ).avi
I think file naming is important and that it’s everyone’s job to do it right. It’s possible to write programs to deal with variations in format by it’s impossible to divine information that isn’t provided.
Hope you find this useful