With Liberty and File Naming for All

This is a guest post by Lisa Gregory, Digital Projects Liaison, Digital Information Management Program, State Library of North Carolina.

About two weeks ago, I tried a little thought experiment. Throughout the course of an afternoon, for each person I spoke with or saw, I imagined myself talking to them about digital preservation. From grocery store to coffee shop, from close colleague to elevator acquaintance, I mentally engaged them in a tête-à-tête and tried to guess at their responses. Why? Because I think it’s important to remind myself that, for most of the population, digital preservation just isn’t on the radar.

My colleagues and I at the State Library of North Carolina think a lot about those folks even when we’re not mentally ambushing them in grocery stores. We know that our message is relevant to those who don’t have The Signal in their RSS reader account, or who wouldn’t know RSS if it hit them on the head. This week, we released a four-part video tutorial that isn’t for you, colleagues and friends – it’s for them. It’s about one of those first steps that can help someone start doing digital preservation before they even know what it is: file naming.

Our four-part tutorial explains (1) why file naming is important, (2) how to change a file name, (3) what not to do when naming files and (4) best practices for file naming. We tried to make these tutorials no-frills, brief and informative. We tried to present the information in a way that would make sense to folks who might only use one or two programs on their computer and who might not ever care about authenticity or obsolescence. We hope we succeeded.

Please let us know what you think at digital.info@ncdcr.gov or @ncpedia. If you think they’re useful, pass the videos on to those at the fringes of your lives, outside of your digital preservation posse. I’m hopeful that more of those mental conversations can become real ones.

4-part tutorial graphic from State Library of NC

 

One Comment

  1. Rick Jafrate
    January 24, 2012 at 5:34 pm

    Your file naming recommendations seem to me to be a bit out of date and inadequate in some other respects.

    Space, Comma, Parenthesis Characters:
    For well over a decade operating systems and software have been compatible with filenames containing these characters. It’s unlikely this will change in the future and it’s extremely easy to programmatically substitute one character for another.

    Space characters may be incompatible with some obsolete operating systems and software but it does make filenames easier to read and understand by both humans and search engines.

    Capitalization:
    What your video says about not relying on capitalization for uniqueness is accurate but some people may misunderstand and be discourage from using capitalization at all. Perhaps you could emphasize “UNIQUENESS” and clarify it’s implications through example.

    Best Practices:
    IMHO, your suggestion for filename format is not adequate to address archival needs. A title and date are simply not enough information. The filename format should include additional information such as a category, a sequence number, and optional version number and notes. Such a format may look something like this.

    Category yyyymmdd-ssss Title vvvv (note).ext

    Capitalization should be encouraged for readability. If spaces are allowed then proper title capitalization can be used, preserved, and encouraged.

    If no spaces, parenthesis, and comma are an absolute must then the format would look something like this.

    Category-yyyymmdd-ssss-Title-vvvv-note.ext

    Category (A Group of Files):
    Category is used to identify a group of files. Proper English capitalization rules are applied to each word in the category. Words are separated by space or underscore characters

    Date (yyyymmdd):
    Consists of 8 digits consisting of 4 digit year, 2 digit month, and 2 digit day. Date is separated from Category by space or underscore and from sequence by a hyphen.

    Sequence (ssss)
    The sequence number consists of 4 digits. They indicate the order of this file in the Category. For example suppose there is an audio book consisting of one mp3 file per chapter. Category would contain book’s title and Title could contain the chapter title. The sequence number identifies the order in which the chapters are to be played.

    For things like tv or radio series the sequence number could consist of a 2 digit season and 2 digit episode numbers.

    Sequence is separated from date by a hyphen and from Title by an underscore.

    Title:
    Title is used to identify individual files within a Category or group. Same rules as Category apply.

    Version:
    Version is used in production environments where it is necessary to keep a revision history of the file. Version is optional and is used only when necessary. It is separatet from Title and Notes by a space or underscore.

    Notes: (note1, note2, etc)
    Notes are enclosed in parenthesis are are used to provide additional information about the file contents, format, etc). Individual items are separated by commas. Lowercase text is preferred except for formal nouns or abbreviations.

    Extension:
    The file extension identifies what type of file this is and by implication what program(s) is used to view or modify it.

    Eample:

    The four files containing your video presentation could be named like this.

    Digital Preservation 20120124-0001 Why is File Naming Important (MP4, HQ).avi

    Digital Preservation 20120124-0002 How to Change a File Name (MP4, HQ).avi

    Digital Preservation 20120124-0003 What Not to do When Naming Files (MP4, HQ).avi

    Digital Preservation 20120124-0004 Best Practices for File Naming (MP4, HQ).avi

    I think file naming is important and that it’s everyone’s job to do it right. It’s possible to write programs to deal with variations in format by it’s impossible to divine information that isn’t provided.

    Hope you find this useful

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.