I want to preface this post by reiterating one of our general disclaimers up front, to wit: “This blog does not represent official Library of Congress communications.” Because this post will edge slightly closer to “editorializing” than most of my previous posts.
Working in the Office of Communications as I do, I’m aware of the lion’s share of news coverage about and references to the Library, whether via Lexis-Nexis, Google News, or any number of other monitoring services to which we have access. This is a rough approximation — and perhaps a bit hyperbolic — but I would guess that somewhere between a quarter and a third of the general references I see to the Library are along these lines: “Our servers can store the equivalent of the Library of Congress,” or “our network is fast enough to download the entire Library of Congress in a millisecond.” You get the drift.
So it begs the question, just how “big” is the Library of Congress, in terms of our content, but especially if one tried to equate it to the digital realm?
I won’t go into any of the specific claims that are being made, but they’re easy to find out there in the ether, and suffice it to say that the Library would stand behind very few if any of them. There are certain things we can quantify, but far more that are purely speculative.
For instance, we can as of this moment say that the approximate amount of our collections that are digitized and freely and publicly available on the Internet is about 74 terabytes. We can also say that we have about 15.3 million digital items online.
Some may be tempted to extrapolate that those digital items represent a precise percentage of the nearly 142 million items in the Library’s physical collections, and then estimate some kind of digital corollary. But comparing digital and physical items is apples and oranges, at best. A simple example of that fallacy would be represented by a single photograph online depicting several physical objects.
Another source of digital estimates is likely based on the number of books and printed items in our collections, which is currently about 32 million. One could attempt to establish the average length of those items (pages, words, characters, etc.) and extrapolate the digital equivalent of those 32 million physical items.
Assuming one could do that with any degree of accuracy — and that’s a big assumption — it overlooks the fact that those 32 million books represent only about one-quarter of the entire physical collections. The rest are in the form of manuscripts, prints, photographs, maps, globes, moving images, sound recordings, sheet music, oral histories, etc. So how does that other three-quarters of the Library equate digitally? Can one automatically assume the digital resolution at which all maps or photographs, for instance, would be scanned? Those are major wildcards indeed.
And then there are our motion pictures, videos and sound recordings alone — around 6 million items stored at our new Packard Campus for Audio-Visual Conservation in Culpeper, Va. What is their digital equivalent? Most people who record television programs onto a computer or DVR know that a hard drive with hundreds of megabytes or even a terabyte or more can quickly fill up.
One more thing we can quantify or at least estimate: The folks at the Packard Campus say that when their systems are fully online, they expect to be able to digitize between 3 and 5 petabytes of content per year. (That is to say, 3,000 to 5,000 terabytes, for those who are playing at home. Put another way, a single petabyte stored on CD-ROMs would create a stack of discs more than a mile high.) And even at that rate, it would still take decades to digitize the existing content.
So at this point, we’re talking about potentially mind-boggling amounts of data — we’re into the territory of angels and heads of pins. And while it is a bit of a tangent, it points to the continued importance of libraries as places (and librarians) when too many people assume that “everything is online.”
While it is certainly flattering that the Library of Congress is used as a typical benchmark against which others measure their content or data capacity, we would do well to take these claims with at least a shaker of salt. We are far “bigger” than many of them might think.
Fascinating! Thanks for posting this.
This is rather fascinating – it’s daunting to imagine how much human output has been collected and stored by the LoC.
I picture you in one of the dark, labyrinthine Name of the Rose libraries writing this, btw.
I personally think that the number of books do not make any library big or small. The main concern should be whether the content of the library is a good source of knowledge or not. Anyway, thank you very much for the post.
This is amazing. I knew that the Library of Congress had an incredible amount of information, but I never fully realized everything that it holds. Thank you for posting; this gave me a new perspective and appreciation.
Library of Congress
I was thinking about this the other day. In the event that tragedy struck the LOC , Is there a separate record of the property so that maybe a new compilation could be assembled ,even though it would not be near complete?
Jason, many of our collections are unique in all the world, but collections security is something that we — particularly the Librarian of Congress — take VERY seriously.
In terms of our digital content, there are many analogous security measures and redundancies.
Each month, 500 million people interact on the Yahoo! website, doing searches, emailing, reading news and other content. It adds up to 25 terabytes of data each day. If you digitized all of the books in the Library of Congress, you’d get about 10 terabytes of data. So, on Yahoo! alone, people consume more than two Libraries of Congress a day. HOWEVER, data at the Library of Congress are organized, whereas Internet data is scattered. Search companies like Google are trying to organize all of that information but, until they do, the Library of Congress will be the premiere source of information. I suspect the Library will evolve, too, changing its model to fit in seamlessly with the coming Internet age.
Your attempt at comparison is very flawed. You cannot on one hand (Yahoo!) look at the size of the data times the number of people accessing it and on the other (LoC) look at only the data. Well okay, you can. But it’s meaningless. Plus. given that the article emphasizes that the LoC holds more than books and that they are storing about 5PB a year, your post seems trite.
I have wondered about this for quite some time. Thank you for posting this! 74 petabytes is impressive. However, it is strangely comforting to know that not all of the Library of Congress is available online or even digitized yet. It gives me hope that there are yet things to discover in the world, including new books, maps and other similar things, and that a good many of them reside safely in the Library of Congress.
im trying to find the length, width and height of the Library of Congress for a school poster but i cant seem to find it 🙁
It turns out to be not so easy to answer, in part because the “Library” is not a single building. There are three main ones on Cap Hill (Jefferson, Madison, Adams), as well as the Packard Campus in Culpeper, VA, as well as several off-site storage facilities.
You’re probably thinking about the Jefferson Building, the showcase facility. But even that has a cellar, basement, four floors above grade and then the dome. You’ll find more about it here: //www.loc.gov/loc/lcib/9712/timeline.html
Maps and floorplans of the main buildings here: //www.loc.gov/visit/maps-and-floor-plans/
Short answer: The LOC is the largest library in world history. It currently has more than 170 million items and more than 830 miles of shelving spread across four primary buildings and several smaller facilities.
In case you’re still looking for more precise architectural specs, try the Architect of the Capitol, as they manage the facility: http://www.aoc.gov/library-congress