Top of page

Managing a Library of Congress Worth of Data

Share this post:

The following is a guest post by Kate Zwaard and David Brunton, both Supervisory IT Specialists in the Library of Congress Repository Development Center.

“Computer data storage in a modern office” by Carol Highsmith, from the archive in the LC Prints and Photograph division
“Computer data storage in a modern office” by Carol Highsmith (LC Prints and Photograph division)

The Library of Congress’s digital collections are growing at a rate of 1.5 terabytes per day (that means, by the popular measure, we collect a “Library of Congress”  worth of data each week, if anyone’s counting). The Repository Development Center, where we work, builds software and services to help manage and preserve the digital collections of the Library of Congress.

What is a digital repository? There are whole books written on this topic, but we understand a digital repository to be software and hardware that:

  • Keeps digital material safe from accidental or unauthorized change or destruction;
  • Makes it possible to get material in the door, described, managed, preserved and available to the people who will use it.

There has been so much news lately about the challenges the federal government faces in making software — we’d like to share with you some of what has worked for us.

We craft our projects

The “project” is our unit of management in the RDC. The only required project document in the RDC is a project charter, which may be one page long and can be written by anyone in the group. The charter is posted on our group wiki, sent to the mailing list, then scheduled for discussion. After the team discusses the charter, including feasibility, risks and success criteria, the chief of the group approves or rejects the proposal. The decision about which projects to approve is made based on the agency’s annual objectives, available staffing, input from users and a sense of the areas of greatest need and impact.

Crafting a project of the right size is difficult, but important. When a project starts out too big, it only gets bigger, leading naturally into schedule extensions and scope creep. If a project is too big, we will work on how to approach it in approachable chunks.

We use free and open source software extensively

The long list of Open Source tools we use is complemented by a shorter list that we release ourselves  and/or contribute to. When we build something useful within the community of practice (either of librarians or other developers), we try to make the parts that are most useful available for others to use under very permissive terms. Typically, this is either a statement of public domain or a BSD-style license.

We work incrementally

The Repository Development Center organizes its work into projects. Within a project, tickets are grouped into releases. The RDC releases new features from at least one project most weeks. We focus on incrementally improving the Library of Congress with each release. Sometimes we coordinate work between projects and coordinate the subsequent releases. This gives us the feeling of “wow.”

Teams get software running quickly and continuously improve it. This means there is typically not a large up-front design phase for new tools. Instead, we keep our work tied to the agency’s evolving needs so that when we create something new that it meets one of the agency’s current objectives. We try something to see if it works, rather than talking about it and coming up with a prediction.

We don’t lie about deadlines

Software projects arrive with a lot of pressure to talk about scope and deadlines. Often this is true even before we have a good idea of what the work is or when it will be needed. Our approach to working within this constraint is to schedule frequent releases for our projects, and to keep a good handle on internal dependencies and external priorities.

Seeing progress helps stakeholders focus on outcomes, which allows them (and consequently us) some agility with scope and deadline. Getting something up and running quickly helps everyone figure out what functionality is necessary for a tool to be immediately useful and what functionality can be added as enhancements.

We can control either scope or schedule on a project. Scoped projects are completed when the scope is complete. Projects with a real date go “live” when the date arrives. In a situation where both scope and schedule are fixed, it has been our experience that software development groups compensate by either considerable padding on the schedule, or absurdly tight scoping that puts outcomes at risk. We try not to do this.

We are part of a community

Individual projects are conceived of as a partnership between developers and content owners. These projects are iteratively managed on small project teams in close collaboration with colleagues making curatorial decisions.

Making software connects us to our community. We are committed to being part of a society of people caring for cultural heritage information and the community of people who are making software for libraries and archives. We think we’re stronger for sharing with each other what works, and that being part of a robust community is part of what makes the work fun.

Comments (2)

  1. Very kick-ass mission statement. You should make this mandatory reading for everyone applying for your current openings.

  2. Agreed. Every development group should have this kind of dedication to structure, honesty and craft that I see in this blog post. This approach shows a lot of thought and hard work to forge something that works.

Add a Comment

Your email address will not be published. Required fields are marked *