In July 2011, Nicholas Taylor posted an entry to this blog about the amount of data transferred to the Library of Congress and the likely sources of some of the public perceptions of the size of the Library’s digital collections. And Matt Raymond of the Library posted an excellent overview of the size of the Library’s collections in February 2009.
Since then, I’ve become somewhat obsessed with references to the size of the collections, and the use of “a Library of Congress” as a unit of measure. Just check Wikipedia under “unusual units of measurement.”
The ur-number seems to come from a 1997 report written by Michael Lesk titled “How Much Information Is There In the World?” In that report he provides the proposed calculation for the “size” of a digitized book, and the guesstimate that the Library had 20 millions books. To be fair, this report also makes a guesstimate about the size of collections of photographs, video, and audio, and comes up with the figure of 3 petabytes worth of collections. For 1997, this was a very well-informed estimate.
But the numbers that caught the public’s imagination were the ones for books. And that 10 TB figure is everywhere.
So, how many Libraries of Congress does it take to…? Or how many Libraries of Congress can be contained in…?
- “Every Six Hours, the NSA Gathers as Much Data as Is Stored in the Entire Library of Congress.” LINK
- “Facebook’s photo collection has a staggering 140 billion photos, that’s over 10,000 times larger than the Library of Congress.” LINK
- “The [Honeywell India Technology] centre stores some 32 terabytes (32,768 GB) of data. That’s five times more than the world’s largest library – the US Library of Congress.” LINK
- “The fiber optic cable is capable of transmitting data at a maximum of 40 gigabits per second from deep-sea locations where gaps of instrument coverage currently exist. For comparison, the entire print collection of the Library of Congress could be transmitted over the link in just more than 30 minutes.” LINK
- “There are 25 Petabytes (10^15) created every day and thrown into the internet. This is 70 times larger than the Library of Congress.” LINK
- “…it is estimated that the entire collection of the Library of Congress including photos, sound recordings and movies might take 3,000 TB of storage. Assuming $100 each for 2 TB hard drives, the entire book collection of the Library of Congress could be stored on about $1500 worth of hard drives at today’s prices.” LINK
- “The upper end of the reference configurations is 96 blades [servers] with 1,152 cores, 9.2 TB memory and 57.6 TB of disk storage, enough disk space to store the entire Library of Congress six times.” LINK
- “He keeps 500 terabytes of storage near Factual’s headquarters. That’s about twice the amount needed to hold the entire Library of Congress.” LINK
- “The size of Facebook’s data retention database alone would be larger than all of the content that the Library of Congress has put online to date.” LINK
- “… in a world where the entire Library of Congress will soon be accessible on a mobile device with search procedures that are vastly better than any card catalog, factual mastery will become less and less important. ” LINK
I have more of these, but I am always looking to add to my growing collection. Please let me know about more by commenting!