When I wrote my post on the “Library of Congress” as a unit of measure, I expected to receive some feedback.
And boy, did I.
As expected, I received some new examples:
- “In less than two years the app has already hosted more than 500 million images — more than 30 times greater than the entire photo archive of the Library of Congress.” LINK
- “MAST is currently home to an estimated 200 terabytes of data, which… is nearly the same amount of information contained in the U.S. Library of Congress.” LINK
- “This year, CenturyLink projects that 1.8 zettabytes of data will be created. By 2015, the projection is 7.9 zettabytes. That’s the equivalent of 18 million times the digital assets stored by the Library of Congress today.” LINK
- Twitter “needed just 20 terabytes to back up every tweet that’s ever existed… that’s about twice the estimated size of the print collection of the Library of Congress.” LINK
- “A TB, or terabyte, is about 1.05 million MB. All the data in the American Library of Congress amounts to 15 TB.” LINK
- “One petabyte of data is equivalent to 13.3 years of high-definition video, or all of the content in the U.S. Library of Congress — by its own claim the largest library in the world — multiplied by 50…” LINK
But what I also got were calls, emails, and tweets asking why I didn’t set the record straight about the size of the Library’s digital collections, and share a number. The answer to the question about the size of the collections is: it depends.
Do we count… items or files or the amount of storage used? What constitutes an item?
Do we count… master files? Derivative files? Copies on servers? Copies on tape? Second (third, fourth) copies in other distributed preservation locations?
Do we count … files we “own?” Have in our physical control? License access to that lives elsewhere?
And, when we digitize one more item at 5 p.m. that hadn’t existed in our collections at 4:59 p.m., do we update our counts/extents?
So, here’s what I can say: the Library of Congress has more than 3 petabytes of digital collections. What else I can say with all certainty is that by the time you read this, all the numbers — counts and amount of storage — will have changed.