On Sunday, March 2, I had the opportunity to attend an OpenNews Hack Day event at the Newseum in Washington DC, sponsored by Knight-mozilla OpenNews, PopUp Archive, and the Newseum. The event was held in conjunction with the NICAR (National Institute for Computer-Assisted Reporting) conference on working with datasets and developing interactive applications in journalism.
This was not a hackathon, but what they termed a “designathon,” where the goal was to brainstorm about end-to-end approaches for archiving and preserving data journalism projects. The problem of disappearing applications is very well outlined in blog posts by Jacob Harris and Matt Waite, which are part of “The Source Guide to the Care and Feeding of News Apps.” From the introduction to the Guide:
“Any news app that relies on live or updated data needs to be built to handle change gracefully. Even relatively simple interactive features often require special care when it comes time to archive them in a useful way. From launch to retirement and from timeliness to traffic management, we offer a collection of articles that will help you keep your projects happy and healthy until it’s time to say goodbye.”
For some, awareness of the need for digital preservation in this community came from a desire to participate in a wonderful Tumblr called “News Nerd First Projects.” Developers wanted to share their earliest works through this collaborative effort — whether to brag or admit to some embarrassment — and many discovered that their work was missing from the web or still online but irreparably broken. Many were lucky if they had screenshots to document their work. Some found static remnants through the Internet Archive but nothing more.
The event brought together journalists, researchers, software developers and archivists. The group of about 50 attendees broke out into sub-groups, discussing topics including best practices for coding, documenting and packaging up apps, saving and documenting the interactive experience and documenting cultural context. Not too surprisingly, a lot of the conversation centered around best practices around coding, metadata, documentation, packaging and dealing with external dependencies.
There was a discussion about web harvesting, which captures static snapshots of rendered data and the design but not the interaction or the underlying data. Packaging up the underlying databases and tables captures the vital data so that it can be used for research, but loses the design and the interaction. Packaging up the app and the tables together with a documented environment means that it might run again, perhaps in an emulated environment, but if the app requires interactions with open or commercial external web service dependencies, such as for geolocation or map rendering, that functionality is likely lost. Finding the balance of preserving the data and preserving the interactivity is a difficult challenge.
All in all, it’s early days for the conversation in this community, but the awareness-building around the need for digital preservation is already achieved and next steps are planned. I am looking forward to seeing this community move forward in its efforts to keep digital news sources alive.