This is the first in a series of articles that takes a closer look at the National Digital Stewardship Alliance.
The National Digital Stewardship Alliance is rooted in the work of the National Digital Information Infrastructure and Preservation Program. The NDSA is open to any organization committed to digital preservation, and each of its diverse members has an equal say in the decisions that affect the outcome of the NDSA’s work. NDSA members participate in one or more of five working groups; today’s post will look at the Content working group.
Simply put, the Content Working Group is attempting to take on all of the digital stuff that currently exists.
The scope of the group’s mission is almost inconceivable. To help understand how it is taking shape, I asked several members of the Content Working Group to describe their understanding of its purpose and activities.
Erik Rau, of the Hagley Museum and Library, said, “The group generally has to do with having a conversation about what institutions need to be aware of when it comes to starting their own digital repositories.”
Jonathan McGlone, of the University of Michigan Library, said, “The group is helping facilitate that conversation by starting to develop some criteria and categories for digital preservation.“
Jared Lyle, of the University of Michigan/ICPSR/Data-PASS, said “This alliance consists of disparate organizations, some of whom work with very different data and very different problems. But in the end we all have very similar overarching themes and problems that we’re dealing with.”
The Content Working Group consists of over 80 members who are surveying the wilderness of digital files and trying to define what those files are, how they are used, who uses them and how they can be preserved and kept accessible. Members acknowledge the work that has gone before them, such as the Sustainability of Digital Formats, the Federal Agencies Digitization Guidelines Initiative and the Unified Digital Format Registry , to name a few, and they are trying to determine what still needs to be addressed regarding the preservation of digital content.
The Library of Congress’s Abbie Grotke, who, with Cathy Hartman of the University of North Texas, co-chairs the Content Working Group, said, “Cathy and I have tried as best we could to organize the members so they could share experiences and best practices and document things that were at-risk in their content areas. Smaller teams can focus better.”
Most of the members work within their area of expertise, so the group is subdivided into teams that each has a different area of focus, such as Government, Geospatial, Cultural Heritage, Social Sciences and so on. It’s difficult to isolate topical elements in the digital-preservation ecosystem so cleanly though, so naturally there are overlapping areas. The group is still sorting that out.
Grotke emphasized that the current teams are the best way to organize current group members and, as membership changes from year to year, new teams may get added or current ones dissolved. Also, some teams are addressing new subjects while others are building upon established work. “The geospatial team already has a community formed around it based on the NDIIPP partners,” said Grotke. “So this work is just an extension of the NDIIPP partners’ work.”
Christie Moffat, of the National Library of Medicine, explains the group’s goals this way. “Once we identify the processes by which digital items are created and managed, we can figure out how we might intervene at different points to engage in the preservation of these resources, said Moffatt. “We can let people know what has been preserved, what content is most at risk and what can be done to save it all so that people in the future – researchers – can look back and be able to see it.”
Brett Abrams, from the National Archives and Records Administration, is facilitator of the Geospatial team. He suggested that one project his team could work on is identifying data. “There might be a variety of materials out there,” he said. “Even from the 70s and 80s, there might be ASCII flat-files or other kinds of forms that haven’t been saved. Or they might be found and we need to figure out what we can do with them to rescue that material.”
Lyle facilitates the Social Sciences team, which has discussed how to provide guidance to others regarding the time frame for preserving data. While there are many important data collections in the social sciences that should be preserved for the long-term, lots of other data — especially now with the proliferation and ease of data collection — isn’t meant to be kept forever. “That’s something that’s relatively new and I don’t think it’s been adequately addressed in the profession and the literature,” said Lyle.
As an archivist, Lyle practices selection and appraisal. He said that as the volume and variety of digital data grows we should continue to work towards actively appraising it for its long-term value. As an example, he referred to appraisal and selection by federal agencies. “From what I understand, most Federal records are not selected for long-term preservation. So you have retention schedules. And only a small but key portion of the Federal record is preserved for posterity. That’s the portion we deem ‘valuable.’ That doesn’t mean that not all data will be preserved. But we should have a tiered appraisal and selection approach.”
In talking with members of the Content Working Group, the words “history” and “research” come up a lot, so it’s clear who their target stakeholders are. McGlone said, “We’re trying to preserve content both for its own inherent value and because it could be used by future researchers, especially historians.”
Moffatt is the facilitator of the Science, Mathematics, Technology and Medicine team and her specific area of interest is how information, experiences and research in health and medicine is communicated online. Moffatt and her NLM colleagues are conducting a pilot project to archive doctor and patient blogs. Moffatt said, “At the NLM, for example, we’re interested in blogs that show how people participate in their own health in terms of researching medical conditions and bringing this information to the attention of their physicians. So there are the official publications of scientific and medical research, the behind-the-scenes online discussions, notebooks and blogs about how the scientists are doing their work. And then there are non-scientists blogging about science and medicine in their lives. Our Content team has tried to identify content that falls within these areas.”
Grotke is part of the News, Media and Journalism team. She said, “We started to identify what the at-risk content was. ePrints, citizen journalism, RSS feeds. We came up with a whole list of things that we don’t think are currently being preserved in any formalized way.”
Rau is facilitator of the Cultural Heritage team. He said that many of the issues that he deals with involve the human element of digital preservation and the nuances of each situation. “One of the members on the Cultural Heritage Team is involved in an oral history project with Native American groups,” said Rau. “They’re wary of us, naturally. There’s a history of exploitation regarding Native Americans and their cultural legacy. So how do you build trust? And if we do preserve their stuff digitally, who gets access to it? Everyone? Just members of the tribe?”
Glen McAninch, Kentucky Department for Libraries and Archives, is a co-facilitator of the Government Content team and he tells a story similar to Rau’s, of negotiating to preserve content. “Some local Kentucky people are protective of their geospatial records because it’s so hard for them to maintain their systems — financially — that they will sell the information to try to keep their program going,” said McAninch. “So the KDLA provides backup to their geospatial records with the understanding that we wouldn’t be the access point for those records; we would only be the preservation point.” Both Rau and McAninch’s experiences — and approaches to policy — may naturally spill over into the Content Working Group as the need arises.
Abrams says that digital geospatial content shouldn’t be too difficult to find, but reaching consensus on preservation among stakeholders is another matter. Abrams said, “The producers are generally government institutions or people who are working as contractors for government agencies, although obviously the Googles of the world — and Microsoft — are coming along, creating imagery and other stuff. The Content Working Group has to be more clear about what type of technical requirements we need before we talk to them. How do we save geospatial files and how can we get some of the major companies that create these formats to work with governments that are interested in more open standards and versions and develop an archivable version of some of these formats? But we’re not ready for discussions with commercial organizations yet.”
The Content Working Group intends to help frame the results of their work within a series of case studies, scenarios that describe a digital preservation situation that others could identify with. For example: “A guy walks into a bar with a duck under his arm and orders a martini. The bartender mixes the drink and rings it up, which transmits financial and inventory data to databases. The bartender says to the man, “We don’t get many ducks in here.”… And so on.
McGlone said, “We’ll come up with case studies that people in other institutions can learn from or apply to a project they’re working on. Maybe it’s digital art that we do a case study on and then artists or museums or other types of institutions working with digital art can use the case study as a way of navigating the more complex aspects of digital preservation.”
Grotke added that the case studies will help document types of content and what the group might do to reach out to stakeholders, such as researchers, content creators and content preservers. “We hope to have case studies from each of the teams by the end of the year,” said Grotke. “Then we’ll start advertising those and then talking about them more publicly. And draw more people into the conversation.”
It would seem that the Content Working Group could also establish some best practices, but that turns out to be not so easy. McAninch said, “Best practice depends on your institution. What you can afford, what you can muster and what type of institution you have. And some organizations have their own — successful — way of doing things. In the GeoMap project, for example, there were four states. One was a state library; two were archival institutions and geospatial entities independent of the archive. And each state had a different scenario. So when it came to ‘best practices’ it was a sort of down-the-middle thing. For example, North Carolina had widely dispersed geospatial records. And Kentucky had very centralized records. And Montana housed the state library and geospatial people together, so they just had just one unit, one institution; Kentucky, Utah and North Carolina had two separate institutions. One size doesn’t necessarily fit all.”
Given the scope of the Content Working Group’s mission and the amount of time each member could realistically devote to it, I asked several members what outcomes they’d like to see as a result of their efforts. McGlone said, “I’d like to see the Arts & Humanities team develop a case study and make it available and then actually hear from a group or someone who has used that case study and said it has benefited them.”
Abrams said, “I would like the group to keep trying to find datasets that have not been saved, and keep working towards bringing them into repositories so they can be saved.”
And Rau said, “What I would like to see are some pretty clear guidelines for any new repository, no matter how large or how small, to be able to figure out how they want to preserve either digitized material that they already own or born digital materials that they collect. What to do with it. How to preserve it. How to make it accessible in the future. Those are the things that, if they come to pass, I think will mean success for the Content Working Group.”