(UPDATE: Here’s a December 2017 status report on our work with the Twitter archives.)
An element of our mission at the Library of Congress is to collect the story of America and to acquire collections that will have research value. So when the Library had the opportunity to acquire an archive from the popular social media service Twitter, we decided this was a collection that should be here.
In April 2010, the Library and Twitter signed an agreement providing the Library the public tweets from the company’s inception through the date of the agreement, an archive of tweets from 2006 through April 2010. Additionally, the Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis under the same terms.
The Library’s first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date.
This month, all those objectives will be completed. We now have an archive of approximately 170 billion tweets and growing. The volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012.
The Library’s focus now is on addressing the significant technology challenges to making the archive accessible to researchers in a comprehensive, useful way. These efforts are ongoing and a priority for the Library.
Twitter is a new kind of collection for the Library of Congress but an important one to its mission. As society turns to social media as a primary method of communication and creative expression, social media is supplementing, and in some cases supplanting, letters, journals, serial publications and other sources routinely collected by research libraries.
Although the Library has been building and stabilizing the archive and has not yet offered researchers access, we have nevertheless received approximately 400 inquiries from researchers all over the world. Some broad topics of interest expressed by researchers run from patterns in the rise of citizen journalism and elected officials’ communications to tracking vaccination rates and predicting stock market activity.
Attached is a white paper [PDF] that summarizes the Library’s work to date and outlines present-day progress and challenges.
When and how will we be able to access all the tweets?
Thanks Rick. Technology to allow for scholarship access to large data sets is lagging
behind technology for creating and distributing such data. The Library is
working to develop a basic level of access that can be implemented while
archival access technologies catch up. These efforts are ongoing a priority
for the Library, but we cannot provide an estimated timeframe at this point.
When technologically feasible, access to the archive would be offered to
researchers on site.
Though this is not my insight, is it applicable to this project of archiving the Tweets of Twitter (& in #haiku form)?
& nobody is list’nen
4 not feel’N heard
cc 2012 greg robie
And that should have been 2013, not 2012 . . . which is corrected in the Tweet I tweeted of it. 😉
I’m sorry, but this is ridiculous. Of all the worthless things to archive, we are spending money on this?
Only 13% of americans have twitter accounts….compared with Facebook holding 70% of americans. So not only do you have a tiny minority, but the data itself is of extremely poor quality.
Atleast with say facebook, people are able to “write sentences”, and people publish articles, poems, etc. What was the last story you read entirely on twitter??
Hi Gayle and Erin:
I am lead for Google’s Developer Relations efforts around our Data products. I have been following this effort and would like to talk to you about some ideas I have for actually exposing this data to the public. I think we can get the query times down quite a bit further than 24 hours 🙂
– Michael Manoochehri
To preserve and enshrine a Mount Everest of largely mindless babble by Mr. & Ms Everyman strikes me as utterly risible.
To those wondering about why we should keep such “low quality” communications, please consider that the Library of Congress originally rejected an archival copy of Walt Disney’s Bambi because the film was deemed “saccharine and derivative.” In short: archival value and perceived value are not the same thing.
Tweets of the world, without distinction between countries do you archive?
Or tweets of the United States, do you archive?
I would not like to have my words and thoughts sold to others for marketing purposes. I tweet about some frivolous things admittedly but not often. So I wold not be comfortable being used for market research although it’s inevitable. Or it seems.
If its for researching purposes only . Such as sociological studies etc. And maybe for future generations, then fine be my guest. However I have three main questions.
1) What rights do I have to the archived tweets if any?
2) What limitations does the Library of Congress have one distributing the information if any?
3) What are the restrictions on private tweets?. Can the Library archive them? If archived are there limitions as to whom these tweets are available to? Considering the fact that I intended for them to be private.
Hopefully these questions can be answered. Bear in mind that I’m a young, young adult. if any of the questions I made seem ignorant feel free to correct them and offer help. But please refrain from sarcasm. Thanks and good article.
I live in china studying environmental issues, and it would be nice to have the archive because a) twitter is blocked in china b) having access to archives of air pollution data allows one to know if things are changing and not just letting the data go down the memory hole
Students at the university here are interested in using the archive for their 4th year thesis. What is the likelihood that access to the collection will be provided by January, 2014?
Wondering if there’s a status update on this? When will it be available?! Thank you…
+1 to Karen’s inquiry
How does this benefit the public?
How can this be used by public?
What is the status of this project?
What is the status of this project? What technology did you use to archive all the tweets? I am interested to know how we can access the data.
Meanwhile, the Library of Congress seems to feel it has not only a right to publish, but that it is also ethical to publish, in its catalog, private emails that are forwarded to its employees…As it did with a private email that I sent to a private organization, which forwarded the email to the LOC. I asked LOC officials to have the email removed (it should be enough that I don’t want my private emails published in a public forum, but in this case it actually has potential to cause harm). But the Library of Congress refused.
This issue of what is preserved forever by the LOC, a taxpayer-funded institution, over the privacy wishes of the citizens it is supposed to serve, is major.
Since learning this about archiving my tweets, I will now need to use more thought before I tweet. If I dont, someday my great, great, great grandchildren will think I was the all american idiot.
Now I just have to start thinking about something smart to say. This could be a real stretch for me
Why no update in so long? Curious as to how things are progressing.
I am also curious as to the lack of updates on this. So I assume there has not been a lot of progress on this initiative in the past two years? It would be great to find out when this archive will be ready. This is going to be a hugely useful tool.
Yet another researcher wondering.. what is the status of this project? Has it been abandoned?
As a student researcher in his first data science course I wish this was available. Perhaps when I finish my doctorate.
So you guys keep all the porn?
What a waste of money. I think the taxpayers ought to have been asked if they wanted their taxes devoted to cataloging tweats. I absolutely do not support this. My God, there are people starving, without a decent place to live or sleep, without proper medications and we’re spending our time and resources on something like this. It’s called the Library of Congress, not the Library of Meaningless Banter. Are you kidding me!
The library of congress are data, word, horders. Looks like they never found a tweet, word they didn’t need to record, file, copy, save.
The link to the white paper is broken can you update the link or send me a link to its current location
Wondering what the status is on this feat. I’m a doctoral candidate looking to do twitter archival research so having this available would be great.
What is the status of this? Can a person request a dump of all tweets for a given day? That would be awesome…