Top of page

Helen Hockx-Yu: Web Archiving at the British Library

Share this post:

The following is a guest post by Abbey Potter, Program Officer, NDIIPP.  She is also Communications Officer for the IIPC.

It’s been three weeks since the NDIIPP/NDSA partners meeting and to keep the “Make it Work” spirit going we’d like to focus on the work of one of the speakers from that event, Helen Hockx-Yu.  Her team sets a great example of striding through challenges and providing innovative services that are helping to move the practice of web archiving forward.

Helen Hockx-Yu during presentation at partners meeting. Photo by Barry Wheeler.

Helen Hockx-Yu is the Head of Web Archiving at the British Library.  Previously she’s been a project manager for Planets, the European Union funded digital preservation initiative, and also a program manager at the Joint Information Systems Committee, a supporting agency for the use of digital technologies in higher education in the United Kingdom.


We know Helen best through the International Internet Preservation Consortium where she is the co-chair of the Access Working Group. The British Library is a founding member of the IIPC and is well represented throughout the organization: Sean Martin is the 2011 chair of the Steering Committee and Lewis Crawford is the co-chair of the Harvesting Working Group.  This team has accomplished much over the past couple of years despite an uncertain legal environment, budget cuts and departmental reorganization.

For several years the British Library has been preparing for the imminent passing of legislation that would allow the capture of the UK web domain under a legal deposit regime. While awaiting the new policy, the team has been put to good use building capacity and operating under a selective or thematic approach for harvesting web sites. The UK Web Archive contains over 10,000 archived web sites that users can browse, search, and view alongside other digital resources at the British Library. As a result, the page views of the archive increased approximately 50% from April 2010 to March 2011.

Helen has focused on building the web archiving team at the British Library with people who have diverse skill sets and a wide base of technical experience, targeting industry hires and offering training opportunities and staff exchanges. Just as the web archives are integrated resources in the British Library catalog, the web archiving program, as of this year, has been established as a “business as usual” unit and not a “one-off” or special project.

Establishing and developing the technical know-how for operating a web archiving program in-house has made it possible for the team to leverage data processing and analytics to build visualizations that offer new ways to discover the unique content in the UK Web Archive. The British Library just announced the ability to view search results from the UK Web Archive in an N-gram, or chart, that plots the frequency of the appearance of a term over time in the archive. Users can also browse the UK Web Archive through tag clouds and mapping of subject terms. These tools highlight the use of web archives as data sets, an emerging and important mode of access that changes the way libraries can support research and use of digital materials.

Helen and her team embody the “make it work” mentality of doing the best possible job with the resources available. They are leading the field of web archiving into new modes of access  – opening libraries and archives to the possibilities of building services that can serve new approaches to research.


Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.