When Andrea Goethals wants to escape the demands of her software engineering work at Harvard University library, she heads to the mountains of Maine. But not for pampered leisure. She and her husband volunteer with the Appalachian Mountain Club, maintaining a trail they’ve both adopted. They purge debris, drain water and remove massive obstacles. On a recent weekend they cleared 16 fallen trees.
Even though Goethals’ trail preservation work is as physically difficult as her digital preservation work is mentally difficult, her intention for both is the same: creating access and utility. It’s a theme that threads through her career.
Goethals, digital preservation and repository services manager at the Harvard University library, studied architecture as an undergraduate. For awhile, she worked in sustainable construction, planning for the long-term usefulness of buildings by considering their social, economic and environmental implications. Her interest shifted to urban and regional planning and, while studying at the University of Florida, she had to use their Geographic Information Systems lab. “That was my first introduction to using technology to solve human problems,” she said. “And from there I was drawn more and more towards technology.”
She credits being in the right place at the right time for her landing a job at the Florida Center for Library Automation. The initial job interview was also a pivotal career event for her. Goethals said, “The interview become a brainstorming session and I learned about the problems of digital preservation and obsolete formats. I found something I could apply my skills and interests to.”
She analyzed file formats and created action plans for FCLA’s Florida Digital Archive. Meanwhile, the Library of Congress was assessing formats and defining format criteria. She said that other institutions were researching formats as well. “But there weren’t a lot of actual practical plans being made around formats at that time,” said Goethals. “It was more theoretical.” She also helped design and develop FCLA’s repository software, called Dark Archives In The Sunshine State or DAITSS , which helped prepare her for her next great challenge.
In 2005, Goethals accepted a position at Harvard University’s Office for Information Systems, a core division of the Harvard library that provides technology for the central catalog, eResources, main preservation repository and more.
Her first major project with OIS was working on the Digital Repository Service. The basic framework was the same as Florida’s DAITSS, so DRS wasn’t a huge skill leap for Goethals. But the content was modeled differently from DAITSS, so it was an opportunity for her to learn more about digital preservation and content modeling from a different perspective.
By the time Goethals got to Harvard though, DRS was straining with age and OIS was planning the next version, DRS 2. Her challenge was daunting: migrating not just files but an entire repository system.
Goethals and her team spent the rest of the decade planning DRS 2, which they expect to be ready in 2012. “This version has a lot more collection management and preservation functions than the original,” she said. “The underlying data model has changed a lot from DRS. Almost all of our metadata schemas have changed and the way we package metadata has changed completely.”
She is also concerned about — and preparing for — how the change will affect DRS’s users. “We’re kind of migrating the people in their understanding of the content too,” said Goethals. So, to ease the transition, OIS is planning for the DRS 2 rollout and for helping collection managers at the university understand how to use it.
Harvard has several repositories dedicated to specific purposes. But a lot of other valuable content is drifting loose around the university, such as faculty research, student work and drives full of unprocessed data. Much of it is on unstable media or in danger of being lost during a move. So Goethals is a co-creator of Zone 1, a catch-all “rescue repository” for homeless content. To enable access by a wide range of users, Zone 1 will deliberately be easy to use and will require only a small bit of metadata. Zone 1 users could evaluate the content and have it moved to the appropriate long-term preservation repository.
Goethals also worked on the Global Digital Format Registry and the more-inclusive Unified Digital Format Registry. The UDFR came about after OIS completed the GDFR prototype and did some needs assessment among digital preservationists worldwide. The U.K. National Archives had already developed a format registry, PRONOM, which was already in wide usage. The community of PRONOM users was also interested in the GDFR but they didn’t want to deal with multiple registries. So Harvard and the U.K. National Archives combined their efforts, with some help from the Library of Congress and NARA, and merged the GDFR and PRONOM into a single, shared international formats registry: the UDFR. It is being developed by NDIIPP and NDSA partner the California Digital Library and is expected to be ready in January, 2012.
Her digital preservation works extends to websites. Goethals helped create Harvard’s Web Archive Collection Service (WAX), which features browsable public collections and archives of Harvard’s vast website. OIS does its own crawls using Heritrix, though in the future some collections will be crawled by third parties.
Goethals represents Harvard University in the International Internet Preservation Coalition and she is part of the IIPC’s Preservation Working Group. One of the tools the group is working on will help users analyze website preservation risks and compare that information with how other institutions assess risk. Users can learn from each other and refine the tool accordingly.
She has other major projects in the works, but she ruminates about all of her projects collectively as if they were modules in a larger system, just as an architect ponders a building’s structure, exterior and functions simultaneously. She constantly mulls over how to make all the pieces of digital preservation work together. “How do we interconnect all these tools, the format registries, the format identification tools, the migration tools that we need?,” she said. “How do we build the infrastructure?”
Someday, on a remote mountain trail in Maine, Goethals will have an “Aha!” moment and the digital preservation community will be better for it.