The following is a guest post by Nicole Contaxis, National Digital Stewardship Resident at the National Library of Medicine. She participates in the NDSR-DC cohort.
The National Library of Medicine (NLM) has a fifty year tradition of developing software in-house for its own use and for the use of its patrons. As part of the National Digital Stewardship Residency (NDSR) program, I am currently researching this history and devising a pilot preservation strategy (PDF) for NLM-developed software found in NLM’s archives and the offices of staff members.
The first step of this project was to create an inventory of these software programs, including that have been located as executable files or source code and those that have not. It acts as a checklist of software programs that need to be located and preserved while also acting as a reference when trying to understand NLM’s history of software development. As opposed to cataloging, it is a more informal list intended for internal use. Creating this inventory, however, became a complicated process that highlighted some of the issues facing archivists working with software. Perhaps the biggest issue is deciding which software programs deserve their own inventory record when each program works co-operatively within a complex computing environment.
Before I began the inventory, I knew I would run into significant issues when choosing what data elements to include in an inventory record. While an inventory of books or papers can rely on pre-existing understandings of the material (i.e. “One letter, five pages long” or “one copy of Do Androids Dream of Electric Sheep”), it is difficult to create a software inventory because of how widely software structures can vary. Because NLM has produced software from the 1960’s to the present day, an inventory would need to be able to accommodate the variations of software developed for batch-processing computers as well as software developed for mobile phones. While designing the inventory record was a challenge, it was not the only hurdle of the inventory process.
As I began to populate the inventory records, I was surprised to realize that it was not clear what did and what did not deserve its own inventory record. What constitutes a separate and individual piece of software within a complex computing system is not straightforward, especially when adequate documentation may not exist.
Here is an example to illustrate this challenge. Grateful Med, one of NLM’s hallmark software programs introduced in 1986, was an end-user friendly search system that allowed physicians and nurses to search NLM’s bibliographic data to help perform research and treat patients. It was such a notable piece of software at the time because it allowed end-users to search the data themselves rather than relying on computing specialists and truly considered the users point of view and experience. A new version of Grateful Med was created each year with updated vocabularies and new features.
Coach Metathesaurus was a software program that was developed in 1991, in part, to serve as one of these new features of Grateful Med. Designed to assist end-users with controlled medical vocabularies, it was meant to hook into Grateful Med seamlessly and provide assistance to the user when the user’s search queries returned inadequate responses. Although the piece of software was fully developed and tested in NLM’s Reading Room, it was not implemented across all versions of Grateful Med.
Here, the question becomes, does Coach Metathesaurus deserve its own inventory record? In this instance, I decided that it does. Because it was not implemented across all versions of Grateful Men, I could not confirm that the functionality and history of Coach Metathesaurus would be included in a preserved version of Grateful Med, and it would also be unlikely that anyone would know to search for it.
However, what if Coach Metathesaurus was implemented in all versions of Grateful Med? In this scenario, would Coach Metathesaurus deserve its own inventory record? It is at this line of questioning that we can begin to see the difficulties of delineating between individual pieces of software within a complex system, one of the major obstacles to performing an accurate software inventory.
As I considered this question, both for Coach Metathesaurus and for similar software programs, I established two basic guidelines. The first is: what information may be lost or gained by including a new inventory record for this piece of software. If, as with Coach Metathesaurus, information or functionality would be lost without an additional inventory record, I would make a new record. On the other hand, if information or functionality could be included in a larger record that adequately reflected the history of that piece of software, it would be included in the larger record.
The second guideline is less practical but still quite important: when deciding if something deserves its own inventory record, it is helpful to consider the ways in which people experienced that software. For the sake of these decisions, I separated these experiences into roughly two categories: the experiences of the development team and the experiences of the user community. Although experiences within both of these categories can vary wildly, these categories remain helpful when considering what information developers can access and what information users can access. For example, the user did not know that Coach Metathesaurus was a separate piece of software that performs different functions, but the developer would.
Contemporaneous documentation is necessary when attempting to consider the developer’s and user’s experience of a piece of software. There was clear documentation about how to use Grateful Med, as it was built for external users, rather than for in-house use. However, documentation of the development process itself can be difficult to locate because internal communications, like emails and memos, may be long lost. This lack of documentation is exacerbated when software is developed for internal use rather than external. As such, the archivist working with software intended for internal use needs to perform a significant amount of intellectual labor in order to document the how and why of software development, if that knowledge itself has not already been completely lost to time.
Describing software and drawing boundaries around what does and what does not constitute an individual piece of software is not an easy endeavor. At this point at NLM, decisions about what pieces of software deserve their own inventory record are made on a case-by-case basis. These decisions are made with respects to the developer’s experience of the product, the user’s experience of the product, the available documentation, and the practical needs of NLM. While these guidelines provide a useful roadmap, they require a significant amount of intellectual labor on the archivist’s part, resulting in complex answers. Some of the major obstacles hindering software archival practice lie in creating boundaries between individual pieces of software within computing systems and in respecting both the development and use of software within the description process. Thankfully, at NLM, we’ve been able to recognize these concerns, making the inventory process of our in-house developed software much easier to understand.
Updated 1/19/16 to fix typos.