Visualizations and Digital Collections

The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives.

An affordance is a characteristic of an object or thing that supports a specific activity. For example, paper documents have specific affordances, such as the ability to be held, folded without damage or information loss, and to clearly evidence signs of alteration and corruption. In fact, there is a whole research project, The Affordances of Paper, devoted to examining the reasons for paper’s increased use in the office environment even well into the digital era. An object’s affordances influence how users relate to that object and guide expectations of how the object will act and react in certain circumstances. For instance, we base preservation assessments of paper on our familiarity with its affordances such as its tendency to become brittle over time, e.g. the double fold test.

Digital materials also have their affordances. In a previous post, I talked about the characteristics of digital materials that made their preservation especially challenging. In this post however, I wanted to highlight one affordance of digital materials that actually facilitates their use and preservation – specifically their ability to be visualized in the aggregate in order to serve a variety of managerial, access, and risk assessment functions.

Kress Collection View (Scatterplot Display) as seen in Viewshare

Kress Collection View (Scatterplot Display) as seen in Viewshare

Many of the posts on The Signal about the Library of Congress Viewshare platform have discussed the role of visualization in creating new modes of discovery for users of digital collections. Abbey Potter’s recent post offered a great round-up of recent Viewshare posts, papers, and presentations, many of them detailing how visualizations support new ways to use and understand digital collections. One affordance of digital objects is that they often have embedded, machine-generated metadata that can be used to enable new methods of analysis and understanding, such as representation through maps and timelines and other interfaces.

Visualization can enable collection managers to appraise, describe, and manage digital collections in ways not possible with physical records. Indeed, given the ever-increasing volume of material in born-digital archival collections, visualizations are increasingly a crucial tool in a variety of managerial functions for digital stewards, from analyzing directory contents prior to acquisition, to risk assessment, to visualizing contextual relations between collections. The projects described below offer an intriguing glimpse at the role that visualization may come to play in managing digitized and born-digital archival collections.

Distribution of risk (top), and the distribution of the predominant file category (bottom) in each directory of the Record Group. As featured in “Visualization for Archival Appraisal of Large Digital Collections” by Xu, Esteva, & Dott

Distribution of risk (top), and the distribution of the predominant file category (bottom) in each directory of the Record Group. As featured in “Visualization for Archival Appraisal of Large Digital Collections” by Xu, Esteva, & Dott

At the Texas Advanced Computing Center of the University of Texas at Austin, Weijia Xu, Maria Esteva, and Jain Dott are using an extremely large collection of born-digital files (their test-bed collection contains over 60,000 directories containing over 1 million files) to explore incorporating visualizations into the process of archival appraisal. By examining directory hierarchies and file type distributions, along with other aggregate information, through visualization, they hope to enable collection managers to notice trends and spot outliers in the course of better understanding a digital collection’s composition. As they note when describing the project, because of the size of contemporary digital collections, “data has to be presented in ways that facilitate understanding and enrich the analysis.”

A project at the Old Dominion University Web Science and Digital Libraries Research Group has prototyped an number of visualization for analyzing large collections of web archives. Visualizing these large data set allows for content stewards to gain a better understanding of the depth of web crawls, frequency of page updates, and the temporal characteristics of certain collections. As the authors of one paper on the project explains, “we found that visualizations are very helpful in exploring digital collections and offering quick insight into the size, length, and content of the visualized collection.”

Other visualization projects include Tim Sherrat’s work with the National Archives of Australia and the digitized newspaper collections of Australia and New Zealand. Also, the Visible Archive project, headed by Mitchell Whitelaw, has explored the role of visualization in stewarding digital collections. The stunning Visible Archive Series Browser, which visualizes collection information and interconnections for the over 65000 series in the National Archives of Australia, is one example of their work, which is oriented around the concept of “generous interfaces.” As Whitelaw describes, the idea of generous interfaces is “an argument that we can (and should) show more of these collections than the search box normally allows; and that there’s a zone between conventional web design and interactive data visualisation, where generous interfaces might happen.”

Visualizations of web archives, as featured in “Visualizing Digital Collections at Archive-It” by Padia & AlNoamany

Visualizations of web archives, as featured in “Visualizing Digital Collections at Archive-It” by Padia & AlNoamany

Granted, some of these examples serve collection use and discovery more than management, and some may depend upon the rich descriptive metadata that his human-supplied and not machine-generated. But taken as a whole, these projects demonstrate how visualization of digitized and born-digital collections is emerging as an essential tool for understanding collection contents, managing collections over time, mitigating their preservation risks, and making them more available and usable now and into the future – all key responsibilities of digital stewardship.

 

2 Comments

  1. Caleb Waller
    February 5, 2013 at 11:56 pm

    Fantastic article!
    Well written.
    Very educational and insightful.

  2. Lev Manovich
    February 14, 2013 at 10:52 pm

    Exellent article!

    Would like to add a reference to the work of my lab on visualizing massive image and video collections:

    http://lab.softwarestudies.com/2008/09/cultural-analytics.html

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.