Top of page

Extending the Life of a Story Through Taxonomy at National Public Radio

Share this post:

Hannah Sommers has done just about every job one can do in a library.  Today she serves as NPR’s first Library Program Manager, helping forge a new path for the profession in her role directing product development for the NPR Library. This is her guest post.

NPR Headquarters in Washington, DC. Photo Credit: Stephen Voss/NPR.
NPR Headquarters in Washington, DC. Photo Credit: Stephen Voss/NPR.

NPR’s mission is to create a more informed public, news cycle by news cycle, one story at a time.  As a news organization, we operate with a general orientation toward making sense of what’s coming next.  This forward-leaning, future-oriented posture can present interesting challenges for the immediate and longer-term preservation of our stories.  NPR is not a cultural heritage institution but every day our team of professional librarians steward cultural heritage.  Increasingly we are acting as workflow designers and knowledge engineers, determining how a story can continue to meet an expanding array of business needs long after (sometimes decades after) it is broadcast or published.  Our session at Digital Preservation 2014 explores the role we play at NPR. We’ll look at how our team views our context as a set of creative constraints, and why we think taxonomy matters beyond facilitating navigation and supporting distribution channels, although both of those functions are critical.  Ours is a story about relevance.  Ours is a story in progress.

From line producers to news executives to bloggers, our colleagues spend their waking hours thinking about what’s next — the next interview, the next newscast, the next deadline, the next time-shifted audio experience.  We strive to deliver information that makes a difference to our audience: information that is relevant and connects to individuals wherever they may be.  Some listeners will be making breakfast, others will be driving a car, browsing a tablet or plugged into a smart phone.  Both the content and the mode of delivery need to be relevant and to answer to digital realities.

Keeping track of all the stories NPR has produced is one of several places librarians enter the content life cycle.  In the 1970s librarians established a workflow to create a proxy for each story in electronically searchable descriptive metadata.  That process has evolved and carries us into the present day.  A key feature of this legacy is the ability to search across the years for all stories about a particular topic, person or geographic region.  Need all the stories where Ronald Reagan’s voice is heard?  Done.  Do you need a list of all the coverage on automobile recalls?  Paris, France?  Extreme sports?  You can find them in one search.

To do this we leverage taxonomies: controlled vocabularies of terms (topics, place names, people names).  The taxonomies that NPR librarians use have evolved since the 1970s and they play a critical role in facilitating searches across our archive of 800,000 radio stories.  Taxonomy provides one aspect of a structured data approach that makes exploration (and possible future remixing) of our stories possible in ways we haven’t thought of yet.  Our colleagues at the New York Times describe in an exciting level of detail what they think might be possible via these types of structured data in their recent Innovation Report.

Photo left to right: Justin Bachorik (NPR developer), Sarah Knight (NPR taxonomist), and Brian D’Astous (NPR developer) discuss the sprint cycle just closed during the retrospective meeting. Credit: Hannah Sommers
Photo left to right: Justin Bachorik (NPR developer), Sarah Knight (NPR taxonomist), and Brian D’Astous (NPR developer) discuss the sprint cycle just closed during the retrospective meeting. Credit: Hannah Sommers

Today at NPR we are looking for ways to extend the benefits of descriptive metadata, taxonomies and “tags” to stories that are not presented in the context of a newsmagazine such as Morning Edition or All Things Considered.  The challenge of timely and accurate tagging is particularly acute in the case of digital storytelling when, for example, new blogs can spring up around specific events and wind down just as quickly.

Due in large part to the challenge of scaling workflow and because of the time-sensitivities inherent in news, we have struggled to bring the rich benefits of controlled vocabulary to digital-only stories.  Now, we are pivoting to a native strength of our team, taxonomy, and doubling down on identifying a scalable solution that delivers unique value for internal business units and for audience engagement in the digital realm.  (Our colleague, Jonathan Epstein, discusses one aspect of our work, the buy vs. build dilemma, here.)

As with everything we do, our work is grounded in a creative process framework known as Scrum. If you’re not familiar, check out this quick primer.  For the past few months we have been evaluating software and workflow approaches to introduce semantic processing technologies into our story tagging flow.  We’ve taken some time to iterate on an approach explored by two developer colleagues during NPR’s tenth round of Serendipity Day.

In tandem, we’ve appointed NPR’s first taxonomy lead, Sarah Knight.  As a result, digital-only storytelling is poised to receive the subject analysis and tagging treatment that have always benefited NPR’s radio stories.  With stronger descriptive metadata, digital stories become more interconnected and better accounted for within our systems in a way not possible before.  When we talk about the benefits of taxonomy with our business partners we talk about increasing relevancy in sponsorship placement, increasing the range of pathways for an audience to discover a story, a finer grained understanding of what interests our audience and the ability to create new experiences around specific kinds of content.  These are the benefits that extend to our immediate bottom line.  In themselves they present a compelling business case.

But these are not the only benefits.  We also understand taxonomy as a preservation tactic.  Our industry is evolving more quickly than the systems used to report the news itself, and is shedding library departments even faster.  Each digital story that “knows what it’s about” from the tags it carries is a story more likely to be remembered because its tags connected it to an interested audience in the first place.  It is a story that is inoculated against digital invisibility.  It has a better chance of being accounted for, rediscovered and reused.   It is a story that has a better chance of being part of a group of stories.  It is a story that has a much better chance of persisting.

Comments (3)

  1. I am always interested in complex relationships. For instance:

    With the example of stories where “Ronald Reagan’s voice is heard,” how would one find that using NPR’s taxonomy? Would there be a tag for “Ronald Reagan,” and a property attached to that tag (a tagged tag?) saying that particular subject’s voice is heard? Or would there just be a tag for “Ronald Reagan’s voice is heard?” Or is there some other way that I never would have thought of?

  2. The investment in taxonomies by NPR Librarians will be a rewarding digital effort with far reaching implications! Can’t wait to see what they come up with next.

  3. Thanks, Lauren!
    Dustin, glad you’re interested. Here’s a little more info. We collect metadata at the story level. We use structured metadata and controlled value lists to facilitate this type of search. For each radio story we indicate the names of persons whose voices are heard and pair that with a code from a controlled list that reflects more information about their role in the story (direct interview, a public statement, a performance, etc.).

    Taxonomy is a flexible word. Some would refer to the above as taxonomy because we’re leveraging structured lists to produce results. Typically when our Library team talks about taxonomy we’re referring to the hierarchical lists that reflect topical and geographic concepts. We are thinking about ways we might usefully expand our tagging vocabularies beyond these.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.