This is Part Two of a two-part interview. Part One ran on Monday Dec. 10, 2012.
In this installment of the Insights Interviews series, a project of the Innovation Working Group of the National Digital Stewardship Alliance Innovation Working Group, I’m talking to Dirk von Suchodoletz from the Department of Computer Science at the University of Freiburg, Germany and a representative to the Open Planets Foundation. He visited the Library in October of 2012 to give a public presentation of his work on emulation and we thought it would be useful to get him to discuss it in even more detail for the blog.
Are there other pieces of infrastructure that need to be in place for emulation to work?
Emulation strategies depend on a couple of components like the hardware emulators themselves and a well managed software archive. Nevertheless, these efforts can be shared among the relevant memory institutions as they face similar challenges.
If running the actual emulator and reproducing the original environment can be separated from the user interface a specifically configured environment with remote access can be envisioned. Additionally, services could be distributed over several providers or institutions to enable specialization following the division of labor principle. Remote access applications and protocols need to be defined to abstract and translate the actual capabilities of the chosen local platform to the remote running service interfaces.
Here, optimally the same base principles are valid for accessing recent and obsolete environments. Emulated original environments then “blend” in seamlessly with actual services. These considerations could lead to a solution which provides seamless access to a variety of different older software: a 1985 home computer a game running in the Multiple Emulator Super System (MESS) emulator; mid-2000 Linux, Windows or Sun Solaris desktops; the mid-90s Apple Macintosh PowerPC architecture and even some modern 3D CAD applications through a single application representing a front-end interface to the emulation services. This application has also to adapt to different input/output methods.
When talking of distributed services a proper user authentication, authorization and accounting service needs to be provided. This service can help to protect not only the IP-related issues of certain artifacts, but deal with privacy-related problems of access-restricted objects as well as help to implement a business model for emulation services.
You just did a full day workshop on emulation at iPres 2012 in Toronto in early October. How did that go?
The workshop was actually the first full-day event featuring emulation at the annual iPres conference. It brought together the relevant practitioners and actors in the digital preservation community discussing new directions for new digital preservation and access challenges. Nearly fifty people discussed the challenges they are facing in their institutions or presented possible solutions.
The legal deposit requirements in many countries for their national libraries bring in a range of non-traditional artifacts like multimedia objects and computer software, like the Danish Legal Deposit Act which was changed in 1998 to include digital materials, including computer games and interactive software on physical media. The game data gathered from web harvesting comprises the most common file types for this kind of material: Flash, Java, and, more recently, Unity3d. The digital revolution changes the workflows in governmental departments and thus directly effects the type of material received by the mandated national archives. For example, the Austrian State Archives gets data stored in and created by online applications such as the tax reporting system. Not only public online applications, but also files created in Microsoft Access applications, complex Excel sheets and other so-called ”end-user programming” software poses a threat to data that is not easily migrate-able.
A domain with quite different requirements compared to archives and libraries is digital art. As the media and platforms digital artworks were made on are decaying, new ways to preserve access are needed. Digital Art often cannot be easily migrated as it depends on certain software and hardware setups. Solving emulation for art and games should solve most other emulation cases, or as a museum representative put it: “Digital art and games are the e-books of tomorrow.” The actual e-book standards already allow much more than would fit into plain PDF/A.
Migration as a strategy pushed its limits, emulation extends them significantly. Nevertheless, big institutions may not have hit their crisis point yet and thus haven’t started looking in the direction of emulation. Nevertheless, database preservation is often poorly done and cannot cope especially well with purpose-made business logic put into the interfaces. Emulation is, at the least, a good fall-back solution for objects which failed to migrate. The discussion on applicable strategies to preserve object authenticity can be pushed through significant properties.
Tell us about the current projects you’re working on, including the Baden-Wurttemberg Functional Longterm Archiving and Access (bwFLA) project.
After a series of more theoretical and prototypical research and implementations the bwFLA project actually tries to practically implement preservation and access workflows. If a digital artifact or a whole computer system becomes subject to digital preservation, a defined workflow is required to support the preservation process of the object’s original context i.e. rendering environment.
The workflow makes use of the user’s knowledge to identify necessary components of the object’s rendering environment, to the effect that the rendering environment is complete and there are no missing dependencies for the chosen configuration and the digital object’s contextual environment. BwFLA implements both ingest workflows for the objects with the instant control option by the ingesting user and access workflows making use of the collected knowledge and additionally preserved software components to reproduce the artifact’s environment.
The bwFLA preservation approach defines a user-centric workflow, which makes use of current user knowledge and thus is able to provide certain guarantees regarding completeness, rendering quality, and non-conflicting dependencies. Furthermore, through a defined framework all interactions between the user and the computer environment can be observed and recorded. This helps to develop a more efficient reproduction workflow instantiation while also enabling us to gather and preserve more detailed information on the usage of certain computer environments and their software components.
While an emulation approach has technical limitations (e.g. due to DRM, software licenses (external servers or hardware dongles), external data sources), the proposed workflow is able to find issues and indicates risks related to long-term preservation. BwFLA has now defined work processes and associated workflows and laid the groundwork for a couple of building blocks including Emulation-as-a-Service components.
We see digital preservation and access as an endeavor which should be coordinated by federated relevant memory institutions and third-party service providers. The advantages of today’s IT developments offers the chance for better (national and international) collaboration if using the proper approaches. The actual costs can be kept moderate if the major memory institutions join their forces and pool their efforts to sponsor emulator development communities.
Plus, the actual developments in general IT both for commercial and end users opens the chance to more tightly integrate digital long-term preservation into existing systems. The emergence of the cloud-paradigm re-centralizes services and end users interact with them remotely through standardized web-client applications on their various devices. This offers the chance to use partially the same concepts and methods to access obsolete computer environments.
In order to provide a large variety of user-friendly remote emulation services, especially ones with authentic performance and user-experiences, a distributed system model and architecture needs to be agreed-upon and to be developed, suitable to run as a cloud service. The shift of the non-trivial task of the emulation of obsolete software environments from the end user to specialized providers can help simplify digital preservation and access strategies. Besides offering their users better access to their holdings, libraries and archives may gain new business opportunities to offer services to a third party. Emulation-as-a-Service can help to fill the gap between the successful demonstration of emulation strategies as a long-term access strategy and its perceived availability and usability.
So bwFLA is preservation meets machine learning? The user who knows how to use the system has their interaction recorded. That seems a good solution to the problem of accessing an object in an old emulated environment where how to operate the software is not obvious and unlike modern UI paradigms (or the documentation is missing or sparse). Could you play the macro recording back to people wanting to access the object to show them how it was used?
Machine learning might be a little bit too big term for that. But, the idea is, that a process “is looking over the shoulder of a user when he does things” and records the interaction. That means to record everything the user is sending to the machine and the stuff which appears on the screen.
My colleagues were using this to implement migration-by-emulation scenarios in GUI environments (http://hdl.handle.net/10760/16263). This approach avoids the often impossible alteration and adaptation of outdated software to present-day environments. A virtual machine runs within the host environment, which contains the
selected original system environment suitable for handling a certain type of digital objects. The original system environment is either reproduced from original software stored in the software archive or cloned from a prototypical original system. The process got recorded for a certain prototypical object and then played back on the objects which needed to be migrated. The idea behind this approach was to use original applications and environments to deal e.g. with PowerPoint 4.0 files, which cannot be handled in modern environments, and produce PDFs from it.
The same approach can be used to record certain installation steps which reproduce an original environment consisting of the operating system, applications and additional components like font sets or multimedia codecs. Such a recording contains all the necessary interactions to reproduce a certain software setup. Another application can be an “assisted create view” – this would spare the user to do all the interaction required until the artefact he is interested in, can be rendered, like starting the appropriate application, making some configurations to it and render the artefact on the screen. From that point on the user can take over and simply use the controls to navigate the artefact.
To achieve such preservation or access scenarios without relying on user interaction, the user’s function needs to be replaced by a workflow execution engine (http://escholarship.org/uc/item/8jf067f6?query=Rechert). This requires appropriate interfaces in order to use emulators (remotely, http://drops.dagstuhl.de/opus/volltexte/2010/2771/pdf/10291.vonSuchodoletzDirk.Paper.2771.pdf). In contrast to simple command-line input-output migration tools, a migration-by-emulation service needs a more complex initial setup, though.