<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Signal: Digital Preservation</title>
	<atom:link href="http://blogs.loc.gov/digitalpreservation/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.loc.gov/digitalpreservation</link>
	<description>The Signal: Digital Preservation</description>
	<lastBuildDate>Wed, 16 May 2012 12:11:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>All Digital Objects are Born Digital Objects</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/all-digital-objects-are-born-digital-objects/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/all-digital-objects-are-born-digital-objects/#comments</comments>
		<pubDate>Tue, 15 May 2012 14:54:32 +0000</pubDate>
		<dc:creator>Trevor Owens</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[born digital]]></category>
		<category><![CDATA[digitized]]></category>
		<category><![CDATA[objects]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7625</guid>
		<description><![CDATA[Consider this digital photo I took of the face of the Albert Einstein Memorial outside the National Academy of Sciences. Although my photo tells us something about what the memorial looks like, I don&#8217;t think anyone would say that I “digitized” it.  We think about this kind of photo as a creative work (albeit not [...]]]></description>
			<content:encoded><![CDATA[<p><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/einstein-monument.jpg"><img class="alignleft size-thumbnail wp-image-7626" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/einstein-monument-150x150.jpg" alt="" width="150" height="150" /></a>Consider this digital photo I took of the face of the Albert Einstein Memorial outside the National Academy of Sciences. Although my photo tells us something about what the memorial looks like, I don&#8217;t think anyone would say that I “digitized” it.  We think about this kind of photo as a creative work (albeit not particularly creative) but it has a creator (in this case me). In contrast, when someone scans a document in a flatbed scanner, or takes a digital photo of a page in a book, we talk of digitizing the document or the book. We tend to think about those digital objects as surrogates for their physical counterparts. I&#8217;m increasingly thinking that this distinction between the born digital and the digitized does more harm than good.</p>
<p><strong>What the Distinction Between Digitized and Born Digital Tries to Capture</strong></p>
<p>Cultural heritage professionals  often talk about “born digital” and “digitized” objects. In some respect this distinction captures meaningful differences. A digitized object exists to record and present characteristics of some physical object. In contrast, born digital objects began their existence as digital. In the case of digitized materials, we care about the fidelity of a digitized copy to an original. In contrast, born digital materials do not serve as surrogates for physical objects, these born digital objects are originals. That distinction should help place priority on the preservation of these digital originals. With this noted, the distinction between born digital and digitized objects can obfuscate as much as it illuminates.</p>
<p><strong>Digitization is Always the Creation of a Digital Object</strong></p>
<p>The idea of digitization obfuscates the fact that digitization is not a preservation act. Digitization is a creative act. What is the meaningful distinction between using a scanner to scan a document, taking a digital photo of a document and taking a photo of me holding that document?  In the end, all of these create a digital files, each of which have authors who made decisions about these compositions. There is no large red button that says “digitize” on it, we make decisions about what significant properties we want to record from a physical object and we work to ensure that those properties are recorded in the newly created digital object. When we talk about the scanner  “digitizing” it&#8217;s all too easy to forget the history of the creation of the digital object and we can easily forget that there are a range of individual and institutional authorial intentions that go into deciding what and how to digitize.</p>
<p><strong>When One Digitizes One Makes Authorial Decisions</strong></p>
<p>Like most words that end in –ition digitization has that seductive quality of sounding like a trivial and straightforward process. The <a  title="Digitization Guidelines" href="http://www.digitizationguidelines.gov/">Federal Digitization Guidelines</a> are a great resource for helping to make decisions about what matters for a given digitization project, however, individuals and institutions always need to make authorial decisions. Although I work on digital preservation I often find myself fielding digitization questions. After doing my due diligence to explain that <a  href="http://blogs.loc.gov/digitalpreservation/2011/07/digitization-is-different-than-digital-preservation-help-prevent-digital-orphans/">digital preservation and digitization are fundamentally different things</a>, I  go on to help answer these digitization questions. Most questions are something like “what resolution should I scan at?” and my answer is always “it depends, Why you are scanning? What do you want to capture about these physical objects in new digital objects you are going to create.” There isn’t a right way to digitize something, instead there are right ways to make sure that the traces of a particular physical object that you care about are visible in the newly created digital object.</p>
<p>You can say that you only care about the informational qualities of a book, but you still need to go about defining what exactly that information is. For example one can analyze traces of use of texts through looking at patterns of dirt left on high resolution scans (See Sarah Werner&#8217;s excellent post, <a  href="http://sarahwerner.net/blog/index.php/2012/04/where-material-book-culture-meets-digital-humanities/">where material culture meets the digital humanities</a>,  for more on this dirt example).  That dirt represents information that can be captured on scans. There is an inexhaustible amount of information in any physical object and it is up to the digitizer to decide what traces we want to make evident in the new digital object.</p>
<p><strong>Digital Preservation is a Consideration for All These Born Digital Objects</strong></p>
<p>If we want any of these born digital objects to stick around, the ones created on a flatbed scanner or the ones created with a digital camera, we need to be thinking about digital preservation. Beyond the fact that <a  href="http://blogs.loc.gov/digitalpreservation/2011/07/digitization-is-different-than-digital-preservation-help-prevent-digital-orphans/">digitization is not digital preservation</a>,  digitization always results in the creation of a new digital object. If we want to have any access to that new digital object in the future we need to be actively thinking about digital preservation.</p>
<p><strong>What do you Think?</strong></p>
<p>Do you agree that all digital objects are born digital? Or do you think there are things I am missing in the value of the distinction between born digital and digitized? Let&#8217;s talk about it in the comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/all-digital-objects-are-born-digital-objects/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>GeoMAPP and the Future of Digital Geospatial Preservation</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/geomapp-and-the-future-of-digital-geospatial-preservation/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/geomapp-and-the-future-of-digital-geospatial-preservation/#comments</comments>
		<pubDate>Mon, 14 May 2012 17:01:41 +0000</pubDate>
		<dc:creator>Butch Lazorchak</dc:creator>
				<category><![CDATA[Partners and Collaboration]]></category>
		<category><![CDATA[Publications and Resources]]></category>
		<category><![CDATA[Tools and Infrastructure]]></category>
		<category><![CDATA[business planning]]></category>
		<category><![CDATA[geoarchiving]]></category>
		<category><![CDATA[geomapp]]></category>
		<category><![CDATA[geospatial preservation]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7611</guid>
		<description><![CDATA[At the beginning of April 2012 we published States of Sustainability: A Review of State Projects funded by the National Digital Information Infrastructure and Preservation Program (NDIIPP) (PDF), a report written by Christopher A. Lee. The comprehensive report neatly wraps our recent digital preservation work with state governments, but in the case of the Geospatial [...]]]></description>
			<content:encoded><![CDATA[<p>At the beginning of April 2012 we published <em><a  href="http://www.digitalpreservation.gov/multimedia/documents/ndiipp-states-report032612_final.pdf">States of Sustainability: A Review of State Projects funded by the National Digital Information Infrastructure and Preservation Program (NDIIPP)</a> </em>(PDF), a report written by <a  href="http://blogs.loc.gov/digitalpreservation/2012/04/states-of-sustainability-the-ndiipp-preserving-state-government-information-initiative/">Christopher A. Lee</a>.</p>
<p>The comprehensive report neatly wraps our recent digital preservation work with state governments, but in the case of the <a  href="http://www.digitalpreservation.gov/partners/states_nc.html">Geospatial Multistate Archive and Preservation Project (GeoMAPP)</a>, the report only touches the surface of our decade-long engagement with preserving digital geospatial information.</p>
<div id="attachment_7614" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/nc_topography.jpg"><img class="size-medium wp-image-7614" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/nc_topography-300x225.jpg" alt="Topographic North Carolina by user baggis on Flickr" width="300" height="225" /></a><p class="wp-caption-text">Topographic North Carolina by user baggis on Flickr</p></div>
<p>GeoMAPP, led by a unique partnership between the <a  href="http://www.cgia.state.nc.us/">North Carolina Center for Geographic Information and Analysis</a> and the <a  href="http://www.ah.dcr.state.nc.us/archives/">North Carolina State Archives</a>, began its work in 2007, but the groundwork for the project was laid several years before.</p>
<p>NDIIPP recognized the value of geospatial information as a national asset early in the program, noting in 2002’s <em><a  href="http://www.digitalpreservation.gov/multimedia/documents/ndiipp_plan.pdf">Preserving Our Digital Heritage: Plan for the National Digital Information Infrastructure and Preservation Program</a></em> (PDF) that “even more complex digital media types, such as Geographic Information Systems (GIS)…will take libraries, archives, and museums beyond the formats that they have expertise in preserving.”</p>
<p>This acknowledgement of the importance of GIS data manifested itself in two of the original NDIIPP-funded digital preservation projects, the <a  href="http://digitalpreservation.gov/partners/ngda.html">National Geospatial Digital Archive Project</a> and the <a  href="http://digitalpreservation.gov/partners/ncgdap.html">North Carolina Geospatial Data Archiving Project</a>.</p>
<p>NGDA focused on collecting and preserving geospatial data on a large scale by defining a minimum level of preservation while also making preserved data available to the greatest degree possible.</p>
<p>NCGDAP took a slightly different approach. They explored the challenges of preserving geospatial data within the confines of a single state, exploring how the participants in a geospatial ecosystem (libraries, archives, local producers of data and the state geospatial data clearinghouse) interacted.</p>
<div id="attachment_7615" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/understanding_maps.jpg"><img class="size-medium wp-image-7615" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/understanding_maps-300x225.jpg" alt="In the future orientation will be simple by user cole007 on Flickr" width="300" height="225" /></a><p class="wp-caption-text">In the future orientation will be simple by user cole007 on Flickr</p></div>
<p>Both projects made it a point to engage with existing geospatial infrastructures. NGDA leveraged their natural base of academic map libraries to leverage the activities of the  <a  href="http://cuac.wustl.edu/">Cartographic Users Advisory Council</a> and secure the 2009 publication of a <a  href="http://www.tandfonline.com/toc/wmgl20/6/1">special issue</a> of the Journal of Map and Geography Libraries on the preservation of digital geospatial materials. They also published an influential <a  href="http://www.ngda.org/reports/InvestigateGeoDataFinal_v2.pdf">geospatial metadata report</a> (PDF) and later provided consultancy to the Library of Congress on its <a  href="http://www.digitalpreservation.gov/formats/fdd/gis_fdd.shtml">Geospatial Data formats sustainability site</a>.</p>
<p>NCGDAP developed a deep understanding of the relationships between government entities throughout the geospatial ecosystem, from the local producers of geospatial data, such as city and county governments, all the way up to the national <a  href="http://www.fgdc.gov/">Federal Geographic Data Committee</a>. In the course of their work they developed methods to build trust across these interactions, and worked to leverage existing infrastructure to build preservation actions into existing patterns of behavior.</p>
<p>They also brought a deep understanding of the special technical challenges facing geospatial technology, represented by the project participation in the Open Geospatial Consortium’s <a  href="http://www.opengeospatial.org/projects/groups/preservdwg">Data Preservation Domain Working Group</a> and their co-authoring of the Digital Preservation Coalition’s Technology Watch Report on <a  href="http://www.dpconline.org/component/docman/doc_download/363-preserving-geospatial-data-by-guy-mcgarva-steve-morris-and-gred-greg-janee">Preserving Geospatial Data</a> (PDF).</p>
<p>GeoMAPP successfully built on both of these projects as they took the basic work of NCGDAP and expanded it to include Kentucky and Utah (and later, Montana), working to replicate NCGDAP’s work within each of the partner states, but also working to collaborate across state lines.</p>
<p>Two of the original GeoMAPP co-principal investigators were in the inaugural class of the <a  href="http://www.fgdc.gov/ngac">National Geospatial Advisory Committee</a>, a group that reviews geospatial policy and provides a forum to convey views representative of non-federal stakeholders to the FGDC. This participation gave GeoMAPP access to a national network of geospatial decision makers to engage in support of preserving digital geospatial data.</p>
<p>In addition to its national outreach and engagement efforts, GeoMAPP tackled practical preservation issues of interest to both archivists and geospatial professionals. They began their work by conducting both state-specific and national surveys to identify geospatial creation and archiving trends among state and local agencies in North   Carolina as well as among state archives and members of the national geospatial community.</p>
<p>The results of these surveys guided their efforts, which eventually included tools for agencies to objectively <a  href="http://www.geomapp.net/docs/GeoMAPP_GeoArchiving_SelfAssessment_20100914.xls">evaluate their potential</a> (XLS) to archive geospatial data; a <a  href="http://www.geomapp.net/docs/GeoMAPP_Storage_Primer_final_20111231.pdf">primer</a> (PDF) on data storage concepts and technologies; guidance on <a  href="http://www.geomapp.net/publications_categories.htm#appraise">geospatial appraisal</a>, <a  href="http://www.geomapp.net/publications_categories.htm#xfr">data transfer</a>, and <a  href="http://www.geomapp.net/publications_categories.htm#process">archival processing</a>; and a geospatial file formats <a  href="http://www.geomapp.net/docs/GeoMAPP_Geospatial_data_file_formats_FINAL_20110701.xls">reference guide</a> (XLS).</p>
<p>The richness of the guidance and tools developed by GeoMAPP provides a template for other states to follow as they begin to address geospatial archiving. GeoMAPP also modeled potential next steps, especially their geospatial archiving <a  href="http://www.geomapp.net/publications_categories.htm#busplan">business planning toolkit</a>. The toolkit can be leveraged by funding agencies and organizations such as the <a  href="http://www.nsgic.org/">National States Geographic Information Council</a> and the FGDC to work towards incorporating geoarchiving as part of national activities such as the <a  href="http://www.geoplatform.gov/home/">Geospatial Platform</a> that will have a huge impact on the ways that geospatial data is created, shared and (hopefully) preserved in the future.</p>
<div id="attachment_7616" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/risky_games.jpg"><img class="size-medium wp-image-7616" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/risky_games-300x196.jpg" alt="Risky Games by user centralasian on Flickr" width="300" height="196" /></a><p class="wp-caption-text">Risky Games by user centralasian on Flickr</p></div>
<p>While GeoMAPP has come to a close, its work continues in the <a  href="http://www.digitalpreservation.gov/ndsa/index.html">National Digital Stewardship Alliance</a> Geospatial Content group and <a  href="http://www.fgdc.gov/participation/working-groups-subcommittees/hdwg/index_html">FGDC Users/Historical Data Working Group</a>.</p>
<p>The long-term stewardship of digital geospatial information is an issue that will only become more important over time and NDIIPP remains interested in supporting the efforts. Explore the work of GeoMAPP and let us know what you think. If you’ve got pointers to other activities in this area we’d love to hear about them.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/geomapp-and-the-future-of-digital-geospatial-preservation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Digital and Print: Living in a World of “Both/And”</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/digital-and-print-living-in-a-world-of-%e2%80%9cbothand%e2%80%9d/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/digital-and-print-living-in-a-world-of-%e2%80%9cbothand%e2%80%9d/#comments</comments>
		<pubDate>Fri, 11 May 2012 14:12:06 +0000</pubDate>
		<dc:creator>Susan Manus</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Outreach and Events]]></category>
		<category><![CDATA[access]]></category>
		<category><![CDATA[digital curation]]></category>
		<category><![CDATA[digital preservation]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7538</guid>
		<description><![CDATA[The following is a guest post by Ellen O’Donnell, Technical Writer, National Center for Complementary and Alternative Medicine, who recently spent a year on detail in OSI. After a strange winter in Washington, D.C., of no snow and warm temperatures, and a strange spring of early blossoms and drought, I woke up to something rare&#8211;a [...]]]></description>
			<content:encoded><![CDATA[<p><em>The following is a guest post by </em><strong><em>Ellen O’Donnell, </em></strong><em>Technical Writer, National Center for Complementary and Alternative Medicine, who recently spent a year on detail in OSI.</em></p>
<p>After a strange winter in Washington, D.C., of no snow and warm temperatures, and a strange spring of early blossoms and drought, I woke up to something rare&#8211;a rainy, cool Saturday.</p>
<p>“Great,” I thought.  “We need the rain, and I can catch up on some unpacking”&#8211;including 20 boxes of books I had just moved into my new home.</p>
<p>I looked forward to seeing those old “friends” as I unpacked them, but not to tough decisions that awaited me.  My shelf and storage spaces are limited, and I know I won’t read most of these books again.  But I have found it hard to take well-meaning suggestions, such as donating them, recycling them, or making digital images of ones I’d like to remember.</p>
<div id="attachment_7543" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/Suarez-high-res.-photo1.jpg"><img class="size-medium wp-image-7543" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/Suarez-high-res.-photo1-300x200.jpg" alt="" width="300" height="200" /></a><p class="wp-caption-text">Michael Suarez</p></div>
<p>What is the particular power that printed books have over us, and why? Can a digital version really substitute?</p>
<p>Michael Suarez, S.J., D.Phil., director of the <a  href="http://www.rarebookschool.org/">Rare Book School</a>, University of Virginia, Charlottesville, spoke to such concerns in a lecture on April 19 at the <a  href="http://www.nlm.nih.gov/"><span style="text-decoration: underline">National Library of Medicine</span></a>, “The Future for Books in a Digital Age.&#8221;  Among his other hats, Suarez is also Professor of English and Honorary Curator of Special Collections at Virginia, editor-in-chief of <em>Oxford Scholarly Editions Online</em>, co-editor of <em>The</em> <em>Oxford Companion to the Book</em>,<em> </em>and a Jesuit priest.</p>
<p>“The digital world is here,” said Suarez.  He noted that it is changing the shape of human inquiry, of the structures of human knowledge, and of the academy.  It can promote literacy, reading, and imagination.  It gives us fantastic research tools and access, and it’s convenient.  All of these are good things.</p>
<p>But, he added, while we are busy celebrating the digital domain, are we also being mindful of what we have lost?  This, too, is important&#8211;“for balance, and so that we can drive the tools we use, rather than those tools driving us.”</p>
<p>We can start by thinking about what a similacrum (or representation) is, and what it is not.  When color slide film came into wide use, changing how art history was taught, a ‘color slide controversy’ arose.  People worried that the simulacrum would gain primacy and <strong>become the work of art </strong>in students’ minds<strong>,</strong> especially if they never spent time with originals.  Not only would they miss out on impacts related to a work’s context and physical aspects (such as its true size, colors, and textures), but the new technology could change perception itself.</p>
<p>“Losing the artifactuality of the artifact matters,” Suarez said. “It matters in books as well. An image is not the book.”</p>
<p>He sees a codex book (i.e, one produced in the modern, bound format) as many things, including a transformation of a manuscript; a complex object laced with signs and codes, e.g., linguistic and cultural; “a coalescence of human intention”; and an artifact “made by communities of people and connected to meaning.”</p>
<p>We experience, for example, a book’s paper, type, bindings, and illustrations; variations among copies, editions, and interpretations; handwritten notes.  We give attention to where it came from—including, perhaps, the hands of someone important to us. Such things “are deeply meaningful and deeply human,” Suarez said, and are a part of literacy.  Our books in our own environment can become “totems” full of power and meaning.  “If we lose these things,” Suarez said, “it is to our impoverishment.  We are left with a horizon of dislocation and absence.”</p>
<p>Velocity of access is highly prized in the digital world.  But Suarez wonders what that velocity does to us, and how it affects sustained engagement with beautiful and important works of text or art.  If, through the power of the word search, searching becomes conflated with researching, does every book and corpus then become “an infinitely shuffleable deck of cards&#8211;a kind of never-ending subject index?” Hypertext and hyperlinks call us on many interesting side trips, but what does that do to our original focus?</p>
<p>The effects, at least as he has observed them in his students and himself, can include negative ones&#8211;e.g., upon the quality of attention, comprehension, performance, and scholarship.  This is an area that is being researched, as by Dr. Maryanne Wolf at Tufts University.</p>
<p>Yet Suarez is also “delighted” by the advent of the digital world, where he spends many of his working hours producing robust content with extensive metadata for a massive digital-humanities project: “I believe that we are way beyond the idea of <strong>either/or. </strong>We need to live in a world of <strong>both/and</strong>.<strong> </strong>The digital world is <strong>different</strong> from that of print, not better or worse.  I want both.”  Digital works should be preserved with the same care as a first or rare edition, he said, and while marketplace influences on the production of text are as real as ever, printed books are not going away.</p>
<p>T.S. Eliot wrote in 1934, “Where is the knowledge we have lost in information?”  Suarez believes that human desire will remain not only for information, but for knowledge, and the unique experiences that come from engagement of the human mind and heart with printed books.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/digital-and-print-living-in-a-world-of-%e2%80%9cbothand%e2%80%9d/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>@IIPC12: A Week of Web Archiving</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/iipc12-a-week-of-web-archiving/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/iipc12-a-week-of-web-archiving/#comments</comments>
		<pubDate>Thu, 10 May 2012 14:57:15 +0000</pubDate>
		<dc:creator>Abbey Potter</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Outreach and Events]]></category>
		<category><![CDATA[Partners and Collaboration]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7551</guid>
		<description><![CDATA[The Library of Congress was thrilled to host the 2012 International Internet Preservation Consortium General Assembly April 30 – May 4th. Over 150 registrants packed meeting rooms to discuss all aspects of web archiving. From legal issues to technical challenges to research use, the entire lifecycle of web archiving was covered. As a library and [...]]]></description>
			<content:encoded><![CDATA[<p>The Library of Congress was thrilled to host the 2012 International Internet Preservation Consortium General Assembly April 30 – May 4th. Over 150 registrants packed meeting rooms to discuss all aspects of web archiving. From legal issues to technical challenges to research use, the entire lifecycle of web archiving was covered.</p>
<p>As a library and heritage practice web archiving is a little over 10 years old, the IIPC GA is a vital meeting of professionals that are developing the tools, standards and best practices for this new but growing field. Below is a quick overview of the week’s events. More detailed posts will follow about some of the presentations and workshops. Presentations from the week are <a  href="http://netpreserve.org/events/2012ga.php">posted</a> and video of the proceedings will be posted in the coming weeks. Also, check out the collection of <a  href="http://www.tweetdoc.org/View/43229/IIPC-2012-Conference">tweets</a> from participants.</p>
<p><strong>Day One: The Broad Value of Web Archiving: Demonstrated Use</strong></p>
<p>During the open conference day presenters and participants explored in depth the current and possible uses of web archives in the research, business, legal, and public spheres.</p>
<p>The data contained in web archives conceivably covers virtually all contemporary (and many historical) subjects, in all languages, by an unknowable variety of authors. The detail and scale of information that web archives offer to researchers make them very unique and valuable resources. However, there are challenges. Participants discussed the technical and financial difficulties of providing access in the context of a public institution and the copyright and privacy laws that restrict access. Although researchers often build their own archives or use data services provided private companies, the archives collected by IIPC members have added authenticity that they are being preserved by a trusted party. This is an especially important issue in web archiving for legal purposes.</p>
<p>The collection of at-risk web sites is also an important role IIPC members serve for researchers. In the public sphere especially the web is often the only avenue of communication to constituents. After reorganizations, regime changes or even policy changes government publications on the web are often altered or disappear completely. Several IIPC members and associates are actively collecting, preserving and providing access to these resources.</p>
<p><strong>Day Two: IIPC General Assembly</strong></p>
<p><div id="attachment_7554" class="wp-caption alignleft" style="width: 310px"><a  href="http://netpreserve.org"><img class="size-medium wp-image-7554" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/IIPC_Logo_Extended_FullColor-300x93.png" alt="IIPC logo" width="300" height="93" /></a><p class="wp-caption-text">New IIPC logo.</p></div> The General Assembly is the annual gathering of all IIPC members to discuss IIPC business. The officers gave updates on the past year’s progress. Notable announcements were the selection of the IIPC, Internet Archive and University of North Texas sponsored PhD student Brenda Reyes. She is currently working at the National Library of Spain and will spend 3 years studying web archiving at UNT with a summer internship at the Internet Archive. Members should note that in the weeks following the 2012 GA the 2013 call for proposals will be released. Another officer project this year is the redesign of IIPC’s own web site netpreserve.org. The new IIPC logo and homepage layout were debuted. The new site is expected to be launched well before the end of the calendar year. It will be the definitive resource for web archiving practice, it will help IIPC members work together better, and it will clearly explain the value of the work of the IIPC. Broad participation from members will be needed to launch and maintain the new web site.</p>
<p>Membership also received an update from funded projects including <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/Clarke.pdf">JhoNAS</a>, <a  href="http://netpreserve.org/events/02_Tuesday_IIPC/Hockx-Yu.pdf">Twittervane</a>, and the <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/BnF.pdf">web archiving workshop</a> at the National Library of France.  A project of considerable interest that was demonstrated is the <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/Sanderson.pdf">IIPC Memento Aggregator</a> which holds the possibility of a unified access mechanism for IIPC web archives via a Memento time gate.</p>
<p>The number of IIPC member institutions is up to 42. The four new members this year, <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/Wolven.pdf">Columbia University</a>, <a  href="http://www.library.gwu.edu/">George Washington University</a>, <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/Jaanus.pdf">National Library of Estonia</a>, and <a  href="http://netpreserve.org/events/02_Tuesday_IIPC/Herbert.pdf">Los Alamos National Laboratory</a>, were able to introduce themselves and their interests and activities in web archiving.</p>
<p>Our veteran members were also able to update the group on major developments at their own institutions, including presentations from a first-time GA attendee the <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/Walls.pdf">Government Printing Office</a> and first-time GA presenter the <a  href="http://netpreserve.org/events/dc_ga/02_Tuesday_IIPC/Mar.pdf">National Library of Spain</a>, among others.</p>
<p><strong>Day Three:Working Group Meetings</strong></p>
<p>The Harvesting, Access and Preservation Working Groups met on day three in addition to a Heritrix User Group. Participation in a Working Group is the major benefit and responsibility of each member organization. It is also the structure through which most IIPC funded projects are incubated. Working group members should contact their co-chairs for information about these meetings. The agendas are posted here: <a  href="http://netpreserve.org/events/2012ga.php">http://netpreserve.org/events/2012ga.php</a></p>
<p><strong>Day Four &amp; Five: Workshops</strong></p>
<p><strong> </strong>Several topics in web archiving cross working group lines. Thursday and Friday were and opportunity for members and invited guests to discuss important and emerging issues. Depicting and managing the <a  href="http://netpreserve.org/events/dc_ga/04_Thursday/Web%20Archiving%20Lifecycle/Lifecycle.pdf">Web Archiving Lifecycle</a> were discussed in detail. The automated workflow tool NetarchiveSuite was <a  href="http://netpreserve.org/events/dc_ga/NetarchiveSuite_IIPC_GA_2012.pdf">demonstrated</a>, the Unified Digital Format Registry community <a  href="http://netpreserve.org/events/dc_ga/05_Friday/IIPC-2012-UDFR-community-meeting-v07.pdf">met</a>, metrics and quality in web archives were discussed, and ideas about using crowdsourcing methods to engage the public and improve the selection and access of web archives. In addition to discussions about improving current practices to capture the web as it is today there was a workshop and panel discussions about <a  href="http://netpreserve.org/events/dc_ga/04_Thursday/Harvesting%20the%20Future%20Web/FutureoftheWeb_OverviewFINAL.pdf">Harvesting the Future Web</a>.</p>
<p>As the publishing platforms and technologies continue to shift the tools and processes of web archivists also need to evolve. It is these kinds of challenges and the talented people who keep trying to meet them that will make future IIPC General Assembly meetings just as productive, thought provoking, and enjoyable as this year’s was.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/iipc12-a-week-of-web-archiving/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Audio Visual Working Group Update: Evolving Standards</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/audio-visual-working-group-update-evolving-standards/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/audio-visual-working-group-update-evolving-standards/#comments</comments>
		<pubDate>Wed, 09 May 2012 13:29:51 +0000</pubDate>
		<dc:creator>Susan Manus</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Partners and Collaboration]]></category>
		<category><![CDATA[audiovisual file formats]]></category>
		<category><![CDATA[digital collections]]></category>
		<category><![CDATA[digitization]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[Standards]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7491</guid>
		<description><![CDATA[The following is a guest post by Carla Miller, Administrative Specialist for the Office of Strategic Initiatives. This is Part Two of a post reporting on the joint meeting on March 23, 2012 of both the Still Image and Audio Visual Working Groups of the Federal Agencies Digitization Guidelines Initiative hosted at the National Archives [...]]]></description>
			<content:encoded><![CDATA[<p><em>The following is a guest post by <strong>Carla Miller</strong>, Administrative Specialist for the Office of Strategic Initiatives.</em></p>
<p>This is Part Two of a post reporting on the joint meeting on March 23, 2012 of both the Still Image and Audio Visual Working Groups of the <a  href="http://www.digitizationguidelines.gov/">Federal Agencies Digitization Guidelines Initiative</a> hosted at the National Archives and Records Administration’s College Park campus.  <a  href="http://blogs.loc.gov/digitalpreservation/2012/04/seeking-miraculous-and-lossless-victories-an-update-on-the-fadgi-still-image-working-group/">Part One</a> covered the Still Image meeting, and now we move on to the Audio Visual meeting.</p>
<p>After the <a  href="http://www.digitizationguidelines.gov/still-image/">Still Image Working Group</a> meeting, attendees had lunch and then were given a tour by Kate Murray of the <a  href="http://www.archives.gov/preservation/products/definitions/labs.html">Digitization Services</a> facilities at NARA.  Once the tour concluded, Carl Fleischhauer of the Library of Congress led the <a  href="http://www.digitizationguidelines.gov/audio-visual/">Audio-Visual Working Group</a> meeting.</p>
<div id="attachment_7500" class="wp-caption alignleft" style="width: 280px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/AVNARA_HDFilmScanner21.jpg"><img class="size-medium wp-image-7500 " src="http://blogs.loc.gov/digitalpreservation/files/2012/05/AVNARA_HDFilmScanner21-300x215.jpg" alt="" width="270" height="194" /></a><p class="wp-caption-text">Seen on the digital conversion tour at NARA in College Park, MD: the Sondor Altra Film Scanner, used to capture imagery from motion picture film. Photo by Criss Kovac, courtesy of NARA. </p></div>
<p>Standards are not fixed but continue to evolve over time, and so it is with the standards for embedding metadata in Broadcast WAVE files.  In 2011, the European Broadcast Union standards body updated their specification for Broadcast WAVE files.  In response to this, Jimi Jones led an effort to draft Version 2 of the FADGI guideline <em><a  href="http://www.digitizationguidelines.gov/guidelines/digitize-embedding.html">Embedding Metadata in Broadcast WAVE Files</a></em>, and at this meeting the revision was adopted by the group.  Thus the FADGI document remains in sync with the revised EBU standard.</p>
<p>Kate Murray from NARA presented information about an XML technical metadata schema they have developed for reformatted video objects.  In support of this schema, NARA has also developed an XML metadata export/extraction tool that will organize and assemble data to meet the schema specification.  NARA has identified the best area within the AVI file header to embed limited and controlled metadata for preservation purposes and has developed a tool that supports embedding, validating and exporting of metadata in AVI files.  This is an interesting example of synergy with open source, since NARA borrows from and uses a pre-existing tool <a  href="http://www.mediainfo.sourceforge.net/en">MediaInfo</a> (available through SourceForge) and contributed a new open source tool <a  href="https://github.com/usnationalarchives/AVI-MetaEdit">AVI MetaEdit</a> (available through GitHub).  Some additional information is provided on pages linked from NARA&#8217;s <a  href="http://www.archives.gov/preservation/products/">Products and Services</a> Web site.</p>
<p>Expert consultant Chris Lacinak discussed his audio performance metrics project.  The project report <em><a  href="http://www.digitizationguidelines.gov/guidelines/digitize-audioperf.html">Analog-to-Digital Converter Performance Specification and Testing Study and Recommended Guideline</a></em> has been posted on the FADGI site for comment.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/audio-visual-working-group-update-evolving-standards/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Telling Tales: Joe Lambert from the Center for Digital Storytelling</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/telling-tales-joe-lambert-from-the-center-for-digital-storytelling/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/telling-tales-joe-lambert-from-the-center-for-digital-storytelling/#comments</comments>
		<pubDate>Tue, 08 May 2012 19:58:52 +0000</pubDate>
		<dc:creator>Butch Lazorchak</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Partners and Collaboration]]></category>
		<category><![CDATA[Personal Archiving]]></category>
		<category><![CDATA[center for digital storytelling]]></category>
		<category><![CDATA[digital storytelling]]></category>
		<category><![CDATA[filmmaking]]></category>
		<category><![CDATA[insights interview series]]></category>
		<category><![CDATA[joe lambert]]></category>
		<category><![CDATA[multimedia preservation]]></category>
		<category><![CDATA[stories]]></category>
		<category><![CDATA[video]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7507</guid>
		<description><![CDATA[The following is a guest post by Jane Mandelbaum, IT Project Manager at the Library of Congress’s Office of Strategic Initiatives. The Insights Interview series is an occasional feature sharing interviews and conversations between National Digital Stewardship Alliance Innovation Working Group members and individuals involved in projects related to preservation, access, and stewardship of digital information. [...]]]></description>
			<content:encoded><![CDATA[<p><em>The following is a guest post by</em><em> </em><strong><em>Jane Mandelbaum</em></strong><em>, IT Project Manager at the Library of Congress’s Office of Strategic Initiatives.</em></p>
<p>The Insights Interview series is an occasional feature sharing interviews and conversations between <a  href="http://digitalpreservation.gov/ndsa/working_groups/innovation.html">National Digital Stewardship Alliance Innovation Working Group</a> members and individuals involved in projects related to preservation, access, and stewardship of digital information. In this post, I am excited to have the chance to talk with Joe Lambert, Founder and Executive Director of the Center for Digital Storytelling.</p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong></p>
<div id="attachment_7525" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/joe_lambert.jpg"><img class="size-medium wp-image-7525" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/joe_lambert-300x225.jpg" alt="Joe Lambert by user cogdogblog on Flickr" width="300" height="225" /></a><p class="wp-caption-text">Joe Lambert by user cogdogblog on Flickr</p></div>
<p></strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong>Q1.</strong> Could you give us a quick overview of your organization?</p>
<p>The Center for Digital Storytelling was founded in the early 1990s as a community-based training center in digital media.  From the beginning we were known for encouraging a style of short (2-3 min), personal films.  The method and applications of a workshop model became our focus as we moved to UC Berkeley in 1998.  Since then we have expanded as an international organization with approximately 90 separate projects per year, helping organizations engage populations in producing and distributing stories in numerous contexts.</p>
<p><strong>Q2: </strong>What do you think makes digital storytelling different than other storytelling?</p>
<p>Our original interest was the simple idea of affordability and distribution of video editing as a form of expression, summed up in the idea that what typing did to 20th century literacy, video editing will do to 21st Century literacy.  But implicit in this understanding is that when you compose in film, or words spoken, sound and image, you are engaged in multi-modal communication – that had exponentially more complex impact on you as an editor/creator, as well as your audience. What we share with the storytelling traditions is a strong sense of formal issues on the creation of meaningful and powerful stories, we teach the elements of story as a core part of our trainings.</p>
<p><strong>Q3:</strong> How did the center get started and what do you think has contributed to its growth?</p>
<p>The Center grew out of my theater and community arts organization, Life on the Water, in San Francisco, and a specific collaboration between myself and the late Dana Atchley, a professional video producer, designer and performing artist, called Next Exit.   We grew through several stages, and to some extent markets, starting as an arts-based organization working locally but also engaged in the media technology industries of the early and mid-nineties.  In moving to Berkeley, our emphasis became more explicitly educational and tied to discussions of digital media literacy and instructional technology. In this, our third phase, beginning in 2005 our focus became more and more human services and work with NGO and agencies dealing with post-trauma or difficult life issues and the use of storytelling as a healing modality.</p>
<p><strong>Q4:</strong> What kinds of people or examples have inspired your work?</p>
<p>I am inspired by numerous sources; certainly the work of Studs Terkel in popularizing stories of ordinary people and oral history.  I am deeply inspired by the general trends of community-based arts where artists engage people in the issues of their lives.  More recently, I take inspiration from Storycorps and organizations like the Museum of the Person in Brazil, that have managed to make reflections on lives a valued and more central part of the dominant cultures in their countries.</p>
<p><strong>Q5:</strong> Can you describe the different ways in which people interact with stories?</p>
<p>At the general level, it is of course fundamental human activity.  Answering what happened, listening as witness to other&#8217;s experience and to your own, makes us human, and shapes us in countless ways.  In the specific sense of our work, the stories start as deeply personal artifacts, often to be shown to a limited audience, family members, friend and community, and on the other end of the spectrum become broadcast stories consumed by a general public.  We delineate for our partners and clients how the stories serve these different purposes, personal expression and the preservation of memory, tools for learning and the sharing of information, tools for organizing and mobilization, tools for advocacy and social change, and tools for evaluation and reflection.  In each way, the same story can said to have a different role, simply by the way it is contextualized.</p>
<p><strong>Q6:</strong> I can see that the work includes the recording of stories, and also outreach and education efforts. How do you evaluate the success of what you do? How do you describe your outcomes?</p>
<p>One fortunate part of being a process organization that makes a product is that the products themselves are tangible outcomes.  When we work with a community to discuss issues and address problems, the community ends up with a compendium of stories they can use in all the above ways.  We also develop project plans with specific additional deliverables including curricula, study guides, subtitling, web design, presentation services, etc, that are evaluated for their impact and usefulness.  When possible we like to have thorough evaluation of the movement of the storytellers from beginning to end of the process, what changed within them in the process, and how the story is continued to be used.  Many colleagues in the academic world have taken a much deeper look at the long term impact of a digital storytelling process within educational and community environments.</p>
<p><strong>Q7:</strong> You offer web media design and production services.  How does that work fit in?</p>
<p>It is a small portion of what we do, but inevitably organizations with which we are collaborating are also in the process of more developed media strategy, including re-design of websites, or the making of documentaries about their organization, or a project, and because we have those capacities as well, we will take on the project to assist.</p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong></p>
<div id="attachment_7526" class="wp-caption alignleft" style="width: 247px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/digital_story1.jpg"><img class="size-full wp-image-7526" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/digital_story1.jpg" alt="Digital Storytelling Viewing Station by user MACSD on Flickr" width="237" height="178" /></a><p class="wp-caption-text">Digital Storytelling Viewing Station by user MACSD on Flickr</p></div>
<p></strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong>Q8: </strong>What methods have you found useful in encouraging large-scale projects?  What do you think are keys to scalability?</p>
<p>Besides obviously some resource for the technology (although that is endlessly cheaper), the real issue is systems of training facilitators.  Like many honed methods, the quality of experience as well as the effectiveness of the stories, comes with the ability to adapt to a given set of individuals and the context of the environment (both in terms of how storytellers are engaged and purposing the work, as well as the issues with the technological infrastructure provided).  We take several years of collaboration with a partner to fully qualify facilitators, but once they are present, programs tend to grow, because these &#8220;lead&#8221; facilitators can then pass those skills on and create sustainable capacities for the organization/community.  The other issue is vision of the ways the stories become part of a community, in ongoing rituals of presentation events, or senses that this particular process serves as a rallying process for certain kinds of campaigns.  In University settings we have seen the projects scale because they have a clear niche in the way people make community. People have learned to gather around each other&#8217;s digital stories, as ways to recognize each other, and support each other&#8217;s interests and unique contributions.</p>
<p><strong>Q9:</strong> How do you think about or encourage preservation of stories into the future?  How do you address that in your partnerships and workshops? What can be learned from how people have preserved stories in the past? What kinds of technology issues have you dealt with, and how have you dealt with them?</p>
<p>Our standing agreement is that all projects, the output film files, and the project resources (were the film need to be re-edited from scratch), are maintained by CDS.  So 900-1200 stories a year are archived by our organization through this process.  The archive has taken every media form, from videotape, to CD-Rom, to DVD, every kind of storage media, but mainly external hard drives.  Our centralized archive is on a DROBO 16 TB server, holding approximately 4000 stories. The main lesson is tertiary levels of back up, with no two levels being stored at the same location.  We have used some cloud based storage, but the data flow is a bit too much for us to do at the informal level.  At six different times we have attempted putting in place a searchable system for the archive, but we still are mainly using a date process, as the files are not maintained with inherent metadata; the metadata format exists, as do the databases, but as a small independent arts organization we have not been able to afford and maintain staff focused on our archive.  As a result, when we are asked, as happens once a month, do you have a copy of my movie from 2001, we have a way of finding the film, but it is by no means at a moment&#8217;s notice, and is not available to researchers and others as a rich research source, which would be our preference.</p>
<p><strong>Q10</strong>: Do you work or interact with organizations such as libraries and archives that collect digital content?</p>
<p>Shortly after the agreement with StoryCorp to bring their material to the LOC, we approached the Folklife folks about the same idea with our archive, but we had no simple way of providing ongoing maintenance, so we gave up on that solution.  We should find an institutional partner to assist us, as it really is an amazing archive at this point.</p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong></p>
<div id="attachment_7527" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/digital_story2.jpg"><img class="size-medium wp-image-7527" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/digital_story2-300x225.jpg" alt="DS106 Set Title Image by user life-long-learners on Flickr" width="300" height="225" /></a><p class="wp-caption-text">DS106 Set Title Image by user life-long-learners on Flickr</p></div>
<p></strong></p>
<p><strong> </strong></p>
<p><strong> </strong></p>
<p><strong>Q11</strong>:  What do you think are currently the most pressing problems that need to be addressed for those working with digital content? To what extent do you think we are addressing these problems?</p>
<p>I really cannot speak for the field.  Data management has not vastly improved in terms of standardization, but the cloud-based solutions suggest we are not far from trusting the great brother in the sky to maintain everything for us.  In which case, how you move terabytes of data from a small organization up into the cloud is the problem.</p>
<p>Surprisingly, for our group,  the issues that concern us start with digital creation.  We have no simple way to collect metadata information on the films as they are created, so we create a backlog of mind-numbing data entry work to make the documents valuable.  It seems that we need a way, even as people are titling their pieces, and/or filling out their evaluations, that they fill out the story metadata information, choosing appropriate tags, and content information so we would not have return to the work, years later, and make sense of what these stories can tell us, or how we can use them as examples.</p>
<p><strong>Q12</strong>: What do you think are the kinds of problems that your organization will be facing in the near future? In the longer future?</p>
<p>We would love help, is what it comes down to.</p>
<p><strong>Q13. </strong>What do you find encouraging in our current world, in terms of your work?  What do you find discouraging?</p>
<p>The pendulum from a data-centric, logico-centric domination of what we consider knowledge and wisdom, to the intuitive/creative/story based wisdom, is swinging our way. We need a more balanced understanding of knowing what it means to be human, and ways for listening processes to be seen as at least as valuable as arguing processes. What is discouraging is how little listening happens, lots of noise, lots of things to listen to, but little real listening.  We are drowning in information immediacy, and we need the lifeboat of reflection.  I keep putting patches in canvas to keep it floating, but the tide keeps rising as far as I can see.</p>
<p><strong>Q14. </strong>How do you think storytelling and listening affect other aspects of people’s lives?</p>
<p>I obviously think story is the whole enchilada of our lives.  We carry a script of ourselves, we generally do not alter the script, we are trapped inside so many levels of self-identification and self-rationalization that we can not see how it molds all our choices, our behaviors.  If you go tell and re-tell the seven essential stories of your life, family self/becoming, community self/connecting, essential self/being, loving self/partnering, creative self/serving, thinking self/knowing, and dying self/transcending, you become a more complete person.  It is a challenge, and many more people are taking that challenge, and are made richer for it.</p>
<p><strong>Q15: </strong>What have you learned from other fields or professions?</p>
<p>I read across history, society, cognitive science, technology, health and wellness, mindfulness and spirituality, and stories, many stories.  I take parts of each to consider different parts of the way stories work upon us.  I am currently engulfed in ethics, particularly professional ethics in the human contact professions.  Lots of words to actualize the golden rule.</p>
<p><strong>Q16:</strong> We’re always trying to tell compelling stories that illustrate the importance of digital preservation.  Do you have advice for us?</p>
<p>Funny thing happened on the way to the hardrive is not a joke I have heard lately, in part because a hard drive is precisely an inconsequential thing.  Pulling a hand written letter from a file cabinet that was written by your great uncle about his feelings on Woodrow Wilson&#8217;s role at Versailles is one thing, whipping it out of the National Archive database is another.  The visceral still has great hold on us.  Most of the stories we have about the encyclopedia internetica are about the delightful accidental discoveries, you thought you were looking for X and you found Y, and Y it turns out is exactly what you needed.  I think those kinds of stories rationalize the great apparatus of memory, that we can make a link that was not expected, and that turns out to be a decisive moment in understanding.  Do you have any of those?</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/telling-tales-joe-lambert-from-the-center-for-digital-storytelling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PeDALS Mettle: Final Report of the Persistent Digital Archives and Library System</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/pedals-mettle-final-report-of-the-persistent-digital-archives-and-library-system/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/pedals-mettle-final-report-of-the-persistent-digital-archives-and-library-system/#comments</comments>
		<pubDate>Mon, 07 May 2012 17:17:35 +0000</pubDate>
		<dc:creator>Erin Engle</dc:creator>
				<category><![CDATA[Partners and Collaboration]]></category>
		<category><![CDATA[Publications and Resources]]></category>
		<category><![CDATA[digital preservation]]></category>
		<category><![CDATA[PeDALS]]></category>
		<category><![CDATA[state government information]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7467</guid>
		<description><![CDATA[Four years ago, the National Digital Information Infrastructure and Preservation Program awarded grants to four projects involving multiple states, known as the Preserving State Government Digital Information initiative. At that time, NDIIPP was in the process of expanding its network of partnerships through projects exploring the preservation of and future access to at-risk digital content.   [...]]]></description>
			<content:encoded><![CDATA[<p>Four years ago, the National Digital Information Infrastructure and Preservation Program awarded grants to four projects involving multiple states, known as the <a  href="http://www.digitalpreservation.gov/news/2008/20080127news_article_states.html">Preserving State Government Digital Information initiative.</a></p>
<p>At that time, NDIIPP was in the process of expanding its network of partnerships through projects exploring the preservation of and future access to at-risk digital content.   State government digital information was identified as content particularly at-risk, and there was relatively little experience in cross-state preservation collaboration.  In fact, few individual states had made much headway in managing digital information at the state or local levels.</p>
<p>Flash forward to 2012.  The Preserving State Government Digital Information projects are in the process of wrapping up. We’ve talked about the excellent results the <a  href="http://blogs.loc.gov/digitalpreservation/2012/04/preserving-digital-legislative-information-wrapping-up-the-mtsa-project/">Model Technological and Social Architecture for the Preservation of State Government Digital Information</a> project.   Now I have the opportunity to share some of the results of the Persistent Digital Archives and Library System research project.</p>
<div id="attachment_7476" class="wp-caption alignleft" style="width: 210px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/BPE_AZ_team.jpg"><img class="size-full wp-image-7476" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/BPE_AZ_team.jpg" alt="Members of the Persistent Digital Archives and Library System research project team at BPE2010." width="200" height="159" /></a><p class="wp-caption-text">Members of the Persistent Digital Archives and Library System research project team at BPE2010.</p></div>
<p>Led by the <a  href="http://www.lib.az.us/">Arizona State Library, Archives and Public Records,</a> the goal of PeDALS was to develop a shared curatorial framework for the preservation of digital public records and to investigate technical solutions for managing the records, including ingesting and cost-effective storage solutions for large collections of state government agency publications and records.</p>
<p>Results from the technical investigations are one of the highlights of the project.  The project adopted a distributed preservation system for a cost-effective storage solution.  Partner states benefited from sharing knowledge, increasing staff technical skills and understanding requirements for maintaining and providing access to electronic records.  Creating and leveraging shared learning that resulted in expertise across states – a great outcome.</p>
<p>The <a  href="http://www.pedalspreservation.org/Default.aspx" target="_blank">project website</a> documents the results of the partner efforts mentioned above.  The projects also explored solutions to support the life cycle of the curatorial process.  Project members from Florida, Arizona, Wisconsin and New York developed a core metadata dictionary of the metadata elements used across multiple states; the network architecture developed for the project is laid out; and a description of LOCKSS technology concisely discusses the “digital stacks” approach of the project.</p>
<p>As the<a  href="http://www.digitalpreservation.gov/multimedia/documents/PeDALS%20_Final_Report.pdf"> PeDALS final report</a> (PDF) notes, the partner states faced many challenges due to budget cuts, making it difficult to hire, train and retain staff, to participate in project meetings and to contribute staff or technology resources.  Despite these challenges, partner states and project team members developed a strong affinity for collaborating across state lines.  A community of shared practice emerged, engaging in best practices for preserving state government records, developing and testing software, with the hope to foster a system that could be applied to multiple states.</p>
<p>The PeDALS project officially wrapped up in March 2012.  Going forward, the PeDALS partnership will be a loose confederation of the four states – Alabama, Arizona and Wisconsin, with New York being an active observer.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/pedals-mettle-final-report-of-the-persistent-digital-archives-and-library-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The May 2012 Library of Congress Digital Preservation Newsletter is now available</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/the-may-2012-library-of-congress-digital-preservation-newsletter-is-now-available/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/the-may-2012-library-of-congress-digital-preservation-newsletter-is-now-available/#comments</comments>
		<pubDate>Fri, 04 May 2012 19:02:59 +0000</pubDate>
		<dc:creator>Erin Engle</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Education and Training]]></category>
		<category><![CDATA[Outreach and Events]]></category>
		<category><![CDATA[Partners and Collaboration]]></category>
		<category><![CDATA[Personal Archiving]]></category>
		<category><![CDATA[Publications and Resources]]></category>
		<category><![CDATA[Tools and Infrastructure]]></category>
		<category><![CDATA[Videos and Podcasts]]></category>
		<category><![CDATA[digital preservation newsletter]]></category>
		<category><![CDATA[newsletter]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7458</guid>
		<description><![CDATA[The May 2012 Library of Congress Digital Preservation Newsletter is now available. http://www.digitalpreservation.gov/news/newsletter/201205.pdf In this issue: Exploring Collections using Viewshare The challenges of extracting information from floppy disks U.S. government elections and web archiving at the Spring CNI Meeting Preservation of and access to federally funded scientific data Help launch a digital preservation Q &#38; [...]]]></description>
			<content:encoded><![CDATA[<p>The May 2012 Library of Congress Digital Preservation Newsletter is now available.</p>
<p><a  href="http://www.digitalpreservation.gov/news/newsletter/201205.pdf">http://www.digitalpreservation.gov/news/newsletter/201205.pdf</a></p>
<p>In this issue:</p>
<ul>
<li>Exploring Collections using Viewshare</li>
<li>The challenges of extracting information from floppy disks</li>
<li>U.S. government elections and web archiving at the Spring CNI Meeting</li>
<li>Preservation of and access to federally funded scientific data</li>
<li>Help launch a digital preservation Q &amp; A site</li>
<li>Recent interviews with Bram van de Werf, Lori Phillips, Anne Van Camp, and Ellysa Stern Cahoy</li>
<li>New reports: States of Sustainability: A Review of State Projects Funded by NDIIPP; and Preserving State Government Digital Information, Minnesota Historical Society Final Report</li>
<li>Upcoming Conferences: JCDL2012 (June 10-14) and Screening the Future 2012 (May 21-23)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/the-may-2012-library-of-congress-digital-preservation-newsletter-is-now-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Life-Saving: The National Software Reference Library</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/life-saving-the-national-software-reference-library/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/life-saving-the-national-software-reference-library/#comments</comments>
		<pubDate>Fri, 04 May 2012 16:42:08 +0000</pubDate>
		<dc:creator>Trevor Owens</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Tools and Infrastructure]]></category>
		<category><![CDATA[National Institute of Standards and Technology]]></category>
		<category><![CDATA[National Software Reference Library]]></category>
		<category><![CDATA[NIST]]></category>
		<category><![CDATA[NSRL]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7423</guid>
		<description><![CDATA[Insights is an occasional series of posts in which members of National Digital Stewardship Alliance Innovation Working Group take a bit of time to chat with people doing novel, exciting and innovative work in and around digital preservation and stewardship. In this post, I am thrilled to have a chance to hear from Doug White, [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_7431" class="wp-caption alignleft" style="width: 140px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/doug.jpg"><img class="size-medium wp-image-7431   " src="http://blogs.loc.gov/digitalpreservation/files/2012/05/doug-243x300.jpg" alt="" width="130" /></a><p class="wp-caption-text">Doug White, of NSRL</p></div>
<p>Insights is an occasional series of posts in which members of <a  href="http://digitalpreservation.gov/ndsa/working_groups/innovation.html">National Digital Stewardship Alliance Innovation Working Group</a> take a bit of time to chat with people doing novel, exciting and innovative work in and around digital preservation and stewardship. In this post, I am thrilled to have a chance to hear from Doug White, Project leader for the National Institute of Standards and Technology <a  href="http://www.nsrl.nist.gov/">National Software Reference Library</a>. I heard Doug give an fantastic talk about his work at the CurateGear Workshop (<a  href="http://www.nsrl.nist.gov/Documents/NSRL_Jan_2012.pdf">see slides from the talk here</a>).</p>
<p>Before we dig into the details of the project, you mentioned that the NSRL has already resulted in saving at least one person&#8217;s life. Could you walk us through exactly how that came about? I think it makes for a really compelling story for why software preservation matters.</p>
<div id="attachment_7436" class="wp-caption alignright" style="width: 330px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/lifesaving_sm.jpg"><img class="size-full wp-image-7436" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/lifesaving_sm.jpg" alt="Nice Paint Job, by Vicki's Picks, on Flicr" width="320" height="167" /></a><p class="wp-caption-text">Nice Paint Job, by Vicki&#039;s Picks, on Flickr</p></div>
<p><strong>Doug:</strong> Certainly; it was an unintentional circumstance. To begin, we often were asked if software may be borrowed from the NSRL, and the response was, “No, we are a reference, not a lending library.” But then we received a call from an Food and Drug Administration agent on a Friday afternoon in December 2004.</p>
<p>A medical supply company in Miami had received a delivery of botulin, which was to be processed into Botox and distributed. However, it was misprocessed, and a dangerous concentrate was distributed. The FDA had all of the information needed to identify the recipients, but the information was in a file created with a 2003 version of a popular business software application. The 2004 version available to the FDA could not open the data file. The manufacturer of the software was also unable to supply the relevant version.</p>
<p>It so happened that one of the agents involved in the case was familiar with the NSRL, and had in fact provided software to us earlier in the year. He called, explained the situation, and asked if we had the 2003 version of the software. We did! The agent then arranged for an FDA contact to come to NIST, get the software, and put it on a jet to Miami. The people working the case in Miami were able to install the old version, open the data file, and trace the paths of the botulin.</p>
<p>Several fortunate events occurred to enable this story to end on a positive note. We have a process in place should this occur again, though we consider the NSRL to be a “last resource.”</p>
<p><strong>Trevor: </strong>I have heard you describe the National Software Reference Library as a library of software, a database of metadata, a NIST publication and a research environment. Could you give us a little background on the project and explain how NSRL serves these different functions?</p>
<p><strong>Doug: </strong>The diagram below is an overview showing several facets of the NSRL. The path using red arrows involves our core operations, green arrows designate “derivative” operations, and blue illustrates some collaborative research.</p>
<p><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/NSRL-workflow.jpg"><img class="alignnone size-full wp-image-7425" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/NSRL-workflow.jpg" alt="" width="540" /></a></p>
<p>The physical library is our foundation. At the inception of the project, in 2000, organizations were creating and sharing metadata describing computer files on a very ad hoc basis. If the metadata were questioned, it was highly unlikely that the original media were available to resolve the issue. The NSRL operates in the same fashion as an evidentiary locker, with the original media available in the event of a question.</p>
<p>The physical library has a parallel virtual library. NSRL has created bit-for-bit copies of the original media and images of packaging materials that are kept on a network storage device. I need to point out that the NSRL runs on a network disconnected from the Internet, and in fact, also disconnected from the NIST network infrastructure, using equipment and cables we installed. The media copies can be manipulated automatically, used by multiple processes and repeated physical contact with original objects is minimized.</p>
<p>From the packaging and media, we collect metadata from every application, from every file. We store the metadata in a PostgreSQL database. The database has several schemas, which act as conceptual boundaries around accession processes, the collection of software application descriptions by manual processes, the collection of content metadata by automated processes, storage processes and publication processes. The work processes and the technology are modular components that are easy to test, maintain, train, or reuse. The database metadata (with the exception of staff information) is available on request.</p>
<p>There is a subset of the collected metadata which is of use to investigators and researchers in the community in which NSRL participates, and the subset is published quarterly as <a  href="http://www.nist.gov/srd/nistsd28.cfm">NIST Special Database #28.</a> The specific data includes:</p>
<ul>
<li>Manufacturer Name</li>
<li>Operating System Name</li>
<li>Operating System Version</li>
<li>Product Name</li>
<li>Product Version</li>
<li>Product Language</li>
<li>Application Type</li>
<li>SHA-1 of file (digital fingerprint)</li>
<li>MD5 of file</li>
<li>CRC32 of file</li>
<li>File Name</li>
<li>File Size</li>
</ul>
<p>The research environment allows NSRL to collaborate with researchers who wish to access the contents of the virtual library. Researchers may perform tasks on the NSRL isolated network that involve access to the copies of media, to individual files, or to “snapshots” of software installations. In addition to the media copies, NSRL has compiled a corpus of the 25,000,000 unique files found on the media, and examples of software installation and execution in virtual machines.</p>
<p><strong>Trevor: </strong>Could you give us a brief overview of what exactly is the content of the library? What data and metadata do you collect and how do you work with it?</p>
<p><strong>Doug:</strong> The library contains commercial software, both off-the-shelf shrink-wrapped physical packages and download-only “click-wrapped” digital objects. This includes computer operating systems, business software, games, mobile device apps, multimedia collections and malicious software tools.</p>
<div id="attachment_7438" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/metadata_sm.png"><img class="size-medium wp-image-7438" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/metadata_sm-300x229.png" alt="Metadata, by Shira Golding, on Flickr" width="300" height="229" /></a><p class="wp-caption-text">Metadata, by Shira Golding, on Flickr</p></div>
<p>Most of the software in the NSRL is purchased. We try to acquire everything the top selling lists. Some software we hear about by word of mouth, some by schedule (like tax programs each tax year, security, antivirus) and some by requests from law enforcement and other agencies. We accept donations from manufacturers and have paperwork to state we will not use the software license. We accept donations of used software as long as it is in useable condition but there is no guarantee that it will make it into the NSRL.</p>
<p>The data and metadata is detailed in documents on the <a  href="http://www.nsrl.nist.gov/">NSRL website</a>. To summarize, we collect accession data familiar to your readers; the information about the manufacturer and publisher, the minimal requirements listed, the number and types of media, etc. We also process the contents of the media to obtain metadata about the file system(s), directory structure, file types (based on signature strings) and many file-level metadata as I mentioned in the previous question.</p>
<p>NSRL makes minimal use of this metadata. We perform mock investigations using the metadata to measure the applicability. We investigate the randomness of the cryptographic algorithm results. We are constantly seeking related collections with which we could combine an index or translate a taxonomy, to cross-reference NSRL data with other sets.</p>
<p><strong>Trevor: </strong>In the context of thinking about NSRL as a research environment it seems that the key value there is the corpus of software, the 23,809,431 unique files, that you have identified. Could you tell us about some of the research uses these have served so far? The audience for the blog varies widely in technical knowledge so it would be ideal if you could unpack these concepts a bit too.</p>
<p><strong>Doug: </strong>The highest value, in my opinion, is the provenance and persistence of the collection. Given the virtual library, it is easy to apply new technology, new algorithms to the entire set or specific content automatically, while maintaining the the relationship to previous work and the original media.</p>
<p>NSRL has applied several cryptographic algorithms against the corpus, and statistically analyzed the results. This is an interesting measurement of the algorithm properties within the relatively small scope of binary executable file types. NSRL found that indeed there were no collisions among the 25 million files.</p>
<p>Working with a collaborator, we are able to define precise, static content sections of executable files, obtain a digital fingerprint of those sections, then identify those sections when they are present on a running computer. This can allow an investigator to determine that a program was running, even though the files do not exist on the computer.</p>
<p>Working with a collaborator, we are able to provide practical feedback on the development of an algorithm called a similarity digest. Currently, if you have two digital copies of the Gettysburg Address text, one which begins “Five score and &#8230;”, the two cryptographic hashes of the differing files will be extremely dissimilar, as intended. Two similarity digest results on the two Address files will be similar, and the similarity can be measured. Algorithms of this kind are also known as “fuzzy” hashes, and they tend to be impractical for very large sets. We are assisting in developing a practical implementation.</p>
<p>NSRL has in past limited metadata collection to the content of the application media. We have now acquired the resources and defined the processes to automatically install an operating system on a virtual machine, run the OS, perform noteworthy tasks, install applications, generate content, uninstall applications, etc. This enables the collection of metadata on dynamic system files, registries, log files, memory, various versions of user-generated files. We can use some of this metadata as feedback into our core process, and we have some research opportunities.</p>
<p>Another imminent collaboration is the creation of many word processing documents with created with different applications and multiple versions that contain the same text. A corpus of document tags or codes spanning versions and products has generated some interest.</p>
<p><strong>Trevor: </strong>Could you tell us a little bit about the NSRL environment? What kinds of technologies and software are you currently using currently and what are you exploring for use in the future?</p>
<p><strong>Doug: </strong>We are fortunate to have three contiguous rooms, one that houses the physical library, one that houses the data entry workstations, and one that houses servers and storage. The proximity of the rooms allowed us to pull our own cables, which makes that level of our infrastructure a controlled, known quantity.</p>
<p>The physical library has an alarmed, multi-factor entry control. The shelf system is a powered collapsing system which defaults to a closed, fire-retardant position. The environment is not kept within the recommended practices for archives; this was considered, but not implemented. Heat, fire, humidity and other risks are minimized to the best extent we can.</p>
<p>NSRL has strived to keep infrastructure implementations to hardware and technologies that can be quickly obtained and made functional in the event of a disaster. I would prefer to not name manufacturers at this time, but am willing to discuss those details with individuals.</p>
<div id="attachment_7441" class="wp-caption alignright" style="width: 330px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/adhoc_sm.jpg"><img class="size-full wp-image-7441" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/adhoc_sm.jpg" alt="Ad Hoc, by Steve Rhodes, on Flickr" width="320" height="184" /></a><p class="wp-caption-text">Ad Hoc, by Steve Rhodes, on Flickr</p></div>
<p>In the second room, core work is performed using OpenSuSE Linux workstations for browser-based data entry and media copying. The Linux machines can be created in bulk or ad hoc using a net boot image. This room also contains a system used to perform software installations, so the NSRL can collect installed files, registry information and other artifacts of a running application. This room contains a computer attached to the internet on which NSRL downloads digital-only distributions of software. A photography stand and flatbed scanning stations are in this room, used to create digital photos of packaging, so these photos can be used for data entry and research instead of shelved material.</p>
<p>Movement of original packages and media is limited to the previous two rooms.</p>
<p>The third room is a computer server room with racks of equipment. The media copies are stored on a commercial, expandable network (currently 42TB) that is capable of access by Windows, Apple and Linux computers. We have several quad-core rack mounted servers that perform the automated distributed metadata collection tasks. A PostgreSQL database and an Apache webserver reside on one of the rack servers which is dedicated to these functions. The database is on local storage in that server.</p>
<p>The equipment described in the previous paragraph is duplicated, and that is the research environment. Media images, individual files, virtual machine slices and all databases are backed up across a dedicated fiber connection to storage several buildings distant. Verification of critical files is performed nightly. We also periodically ship copies of the critical files to NIST Boulder, CO, campus.</p>
<p>The software we use is mostly written in Perl, with some PHP for the browser-based data entry. Reuse is key, as is flexibility; the NSRL code is essentially a wrapper or application interface which calls third-party tools to manipulate media, files or systems.</p>
<div id="attachment_7443" class="wp-caption alignleft" style="width: 330px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/e.jpg"><img class="size-full wp-image-7443" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/e.jpg" alt="E Is For Evidence, by Howdy, I'm H. Michael Karshis, on Flickr" width="320" height="160" /></a><p class="wp-caption-text">E Is For Evidence, by Howdy, I&#039;m H. Michael Karshis, on Flickr</p></div>
<p>We have a quality assurance process that involves loading NSRL quarterly candidate releases into several third-party digital forensics tools, in each publishing cycle.</p>
<p>We don&#8217;t anticipate substantial changes to our technology or software in the near future. If anything, we would revisit our internal database design, and address some issues that did not scale up as well as we expected.</p>
<p><strong>Trevor:</strong> If other organizations have special collections would NSRL be interested in adding those collections to the reference library? If yes, what process would you suggest to someone interested discussing such an arrangement?</p>
<p><strong>Doug: </strong>NSRL is very interested in pursuing loan arrangements with other institutions. Transfer of materials to NIST need not be a requirement. Please contact me, or any NSRL staff, via nsrl@nist.gov or 301-975-3262.</p>
<p><strong>Trevor:</strong> Are their more research uses or ways that you think the NSRL could play a role in digital preservation work and research? Further, if any of the folks who follow this blog are interested in exploring doing research involving the software corpus what should they put together and how should they go about getting in touch with your team?</p>
<p><strong>Doug:</strong> We are new participants in the community, so I believe we are still at the point of introducing ourselves. I am hopeful that uses may be identified as our capabilities and activities are made known. This blog is a step in that direction, and I thank you for this opportunity.  Anyone with questions regarding research access should contact me.</p>
<p><strong>Trevor: </strong>As a final question, could you tell us a bit about how the NSRL came about? One of the tricky parts of digital stewardship establishing the value and need building and maintaining collections and I think the story of the need and uses that the NSRL serves offers a powerful frame for thinking about the kinds of coalitions and common needs that digital stewardship initiatives work to support.</p>
<p><strong>Doug: </strong>Prior to NIST involvement in digital forensics, Law enforcement identified the need for automated methods to review the large number of files in investigations involving computers. The FBI &#8220;Known File Filter&#8221; project supplied hash values of known files, the NDIC &#8220;Hashkeeper&#8221; project supplied hash values of installed files and of &#8220;known malicious&#8221; data files. Several commercial and open source tools existed that each used different hash values (CRC32, MD4, MD5, SHA-1)</p>
<p>Hash values were exchanged informally throughout the entire community via email, FTP sites, etc. Investigators had to know where to find hash sets; investigators had to judge the quality of the hash sets. There was no central, trusted repository, and there were open avenues for conflicts of interest.</p>
<p>NIST was contacted because of its history of impartiality in research and standards development. Among the benefits of this involvement were :</p>
<ul>
<li>NIST is an unbiased organization, not in law enforcement, not a vendor</li>
<li>NIST can control quality of data</li>
<li>NIST can provide traceability by retaining original software</li>
<li>NIST can provide data in formats useful by many existing 	tools</li>
<li>NIST has distribution mechanism in the Standard Reference Data service</li>
</ul>
<p>The result of this is a data set that is court-admissible, a process that is transparent, and a collection open to researchers.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/life-saving-the-national-software-reference-library/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Archiving Arrives: Results from the NDSA Web Archiving Survey</title>
		<link>http://blogs.loc.gov/digitalpreservation/2012/05/web-archiving-arrives-results-from-the-ndsa-web-archiving-survey/</link>
		<comments>http://blogs.loc.gov/digitalpreservation/2012/05/web-archiving-arrives-results-from-the-ndsa-web-archiving-survey/#comments</comments>
		<pubDate>Thu, 03 May 2012 13:38:58 +0000</pubDate>
		<dc:creator>Susan Manus</dc:creator>
				<category><![CDATA[Digital Content]]></category>
		<category><![CDATA[Partners and Collaboration]]></category>
		<category><![CDATA[digital preservation]]></category>
		<category><![CDATA[IIPC]]></category>
		<category><![CDATA[ndsa]]></category>
		<category><![CDATA[web archiving]]></category>

		<guid isPermaLink="false">http://blogs.loc.gov/digitalpreservation/?p=7365</guid>
		<description><![CDATA[The following is a guest post by Jefferson Bailey, Fellow at the Library of Congress’s Office of Strategic Initiatives. The NDSA Content Working Group, one of the five working groups of the National Digital Stewardship Alliance focuses on identifying content already preserved, investigating guidelines for the selection of significant content, discovery of at-risk digital content [...]]]></description>
			<content:encoded><![CDATA[<p><em>The following is a guest post by <strong>Jefferson Bailey</strong>, Fellow at the Library of Congress’s Office of Strategic Initiatives.</em><em> </em></p>
<p><em> </em></p>
<p>The <a  href="http://digitalpreservation.gov/ndsa/working_groups/content.html">NDSA Content Working Group</a>, one of the five working groups of the <a  href="http://digitalpreservation.gov/ndsa/">National Digital Stewardship Alliance </a> focuses on identifying content already preserved, investigating guidelines for the selection of significant content, discovery of at-risk digital content or collections, and matching orphan content with NDSA partners who will acquire the content, preserve it, and provide access to it. As part of this effort, the group conducted a survey of organizations in the United States that are actively involved in, or planning to start, programs to archive content from the web. Conducted from October 3 through October 31, 2011, the goal of the survey was to better understand the landscape of web archiving activities in the United States by identifying the organizations involved, the types of web content being preserved, the tools and policies being used, and the types of access being provided.</p>
<p>Preliminary results of the report presented here are being released in conjunction with the <a  href="http://netpreserve.org/about/index.php">International Internet Preservation Consortium</a> 2012 General Assembly taking place this week here at the Library of Congress. The full report will be made available soon.</p>
<p>The survey featured 28 questions and garnered 77 unique responses from a range of institutions, with survey participants primarily representing the cultural heritage (29%, 22 of 77), government (22% 17 of 77), and university communities (46%, 36 of 77). Of the survey respondents, 31% (24 of 77) were members of the NDSA and 8% (6 of 77) were members of the IIPC.</p>
<p><strong>Web Archiving Activity</strong></p>
<p>The current web archiving activities of the survey respondents was as follows:</p>
<ul>
<li>63% (49 of 77) have an active web archiving program.</li>
<li>16% (12 of 77) are actively testing a web archiving program.</li>
<li>17% (13 of 77) are planning on pursuing a web archiving program in the near future.</li>
<li>4% (3 of 77) formerly managed web archiving programs, but no longer do so.</li>
</ul>
<div id="attachment_7397" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img13.jpg"><img class="size-medium wp-image-7397" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img13-300x223.jpg" alt="" width="300" height="223" /></a><p class="wp-caption-text">Chart 1: Status of current web archiving activities.</p></div>
<p>Interestingly, of the 71 respondents that identified their web archiving goals, 80% (57 of 71) were archiving content “from other organizations or individuals for future research,” 69% (49 of 71) were preserving their own institutional web content, and 49% (35 of 71) were doing both.</p>
<p>In reviewing the full survey results, a number of themes emerged.</p>
<p><strong>The recent emergence of web archiving, especially at academic institutions</strong></p>
<p>One surprising result was the preponderance of universities that have initiated web archiving programs in the last 5 years. Of the 68 respondents that identified the specific year their web archiving began, nearly a third, 32% (22 of 68) began their programs within the last two years, the exact same number of institutions (22, 32%) that began archiving web content in the 17 years between 1989 and 2006. The recent surge in web archiving within the last 5 years – 68% (46 of 68) of those surveyed – is primarily due to universities starting web archiving programs.</p>
<div id="attachment_7372" class="wp-caption alignright" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img2.jpg"><img class="size-medium wp-image-7372" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img2-300x242.jpg" alt="" width="300" height="242" /></a><p class="wp-caption-text">Chart 2: Year began archiving web content.</p></div>
<p><em> </em></p>
<p><strong>Inconsistent custodianship</strong></p>
<p>One discovery of the survey was the low percentage of respondents that have transferred their archived data from their external service to their institution. Only 18% (9 of 49) of survey members have transferred their data in-house, including only 2 of the 12 government respondents and only 4 of the 25 university respondents. A total of 82% of those using an external service have not transferred data to their institution. Free text comments for this question pointed to many concerns for transferring externally harvested data to in-house systems including “duplicate costs,” a lack of infrastructure, confidentiality concerns, and cataloging and accessibility challenges.</p>
<div id="attachment_7374" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img3.jpg"><img class="size-medium wp-image-7374" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img3-300x204.jpg" alt="" width="300" height="204" /></a><p class="wp-caption-text">Chart 3: Rates of transferring web content in-house for those collecting content through an external service.</p></div>
<p><em> </em></p>
<p><strong>Lack of policies and unclear guidance on permissions</strong></p>
<p>Internal policy documentation appeared to be an area of continued improvement for many institutions. While some programs had incorporated web-materials into existing policies and procedures, others had not and some seemed unsure of their institution’s current policy status for web content.</p>
<p>The survey also brought to light an acute lack of clarity around seeking permission from content creators, both for harvesting and for providing access to collections. Chart 4 and Chart 6 show policies related to seeking permission from content creators to harvest content and provide access to archived web sites.</p>
<div id="attachment_7377" class="wp-caption alignright" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img41.jpg"><img class="size-medium wp-image-7377" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img41-300x219.jpg" alt="" width="300" height="219" /></a><p class="wp-caption-text">Chart 4: Policies towards seeking permission to crawl websites. </p></div>
<p><strong>Collecting trends and collaborative potential</strong></p>
<p>The types of content being acquired included websites, blogs, and social media:</p>
<ul>
<li>78%      (60 of 77) included or plan on including websites in their archive</li>
<li>57%      (44 of 77) included or plan on including blogs in their archive</li>
<li>38%      (29 of 77) included or plan on including social media in their archive</li>
</ul>
<div id="attachment_7378" class="wp-caption alignleft" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img5.jpg"><img class="size-medium wp-image-7378" src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img5-300x222.jpg" alt="" width="300" height="222" /></a><p class="wp-caption-text">Chart 5: Policies towards seeking permission to provide access to archived web content.</p></div>
<p>A free-text survey question asked for respondents to “briefly describe the scope of your web archive collections.” Broadly stated, these responses fell into one of three categories: institutional self-documentation, collection enhancement, and thematic. Chart 6 shows the survey responses when asked to choose from among a variety of specific subject topics.</p>
<p>The potential for collaboration was a notable aspect of the survey results. Though only 23% of organizations were currently collaborating on web archiving, 96% (64 of 67) answered either “yes” (34, 51%) or “maybe” (30, 45%) when asked if they were interested in participating in future collaborative collecting activities. As these numbers demonstrate, there is a significant interest in the collaborative opportunities around joint web archiving, but little current action in this area.</p>
<div id="attachment_7381" class="wp-caption aligncenter" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img61.jpg"><img class="size-medium wp-image-7381 " src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img61-300x161.jpg" alt="" width="300" height="161" /></a><p class="wp-caption-text">Chart 6: Subjects currently or planned to be represented in respondents’ web archives.</p></div>
<p style="text-align: left">&nbsp;</p>
<div id="attachment_7382" class="wp-caption aligncenter" style="width: 310px"><a  href="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img7.jpg"><img class="size-medium wp-image-7382 " src="http://blogs.loc.gov/digitalpreservation/files/2012/05/web_survey_blog_img7-300x189.jpg" alt="" width="300" height="189" /></a><p class="wp-caption-text">Chart 7: Current participation and interest in future participation in collaborative web archiving.</p></div>
<p>While the survey sometimes exposed the continued challenges of preserving content that is created on the web, as well as the ongoing permission and management challenges of providing access, it also pointed to the growing importance of web archiving as a core function of collection development for many institutions. This, coupled with the openness towards collaboration, suggests that many of the challenges evident in the report will be addressed in due time by the combined efforts of the entire community. Events like the IIPC General Assembly and alliances like the NDSA are a key part of the knowledge-sharing and collaboration essential to organizations as they work to archive and preserve web collections.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.loc.gov/digitalpreservation/2012/05/web-archiving-arrives-results-from-the-ndsa-web-archiving-survey/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

