How Tweet It Is!: Library Acquires Entire Twitter Archive

twitter_logo(UPDATE: Here’s a January 2013 status report on our work with the Twitter archives.)

(UPDATE: We posted an FAQ on April 28.)

Have you ever sent out a “tweet” on the popular Twitter social media service?  Congratulations: Your 140 characters or less will now be housed in the Library of Congress.

That’s right.  Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.

We thought it fitting to give the initial heads-up to the Twitter community itself via our own feed @librarycongress.  (By the way, out of sheer coincidence, the announcement comes on the same day our own number of feed-followers has surpassed 50,000. I love serendipity!)

We will also be putting out a press release later with even more details and quotes.  Expect to see an emphasis on the scholarly and research implications of the acquisition.  I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data.  And I’m certain we’ll learn things that none of us now can even possibly conceive.

Just a few examples of important tweets in the past few years include the first-ever tweet from Twitter co-founder Jack Dorsey (http://twitter.com/jack/status/20), President Obama’s tweet about winning the 2008 election (http://twitter.com/barackobama/status/992176676), and a set of two tweets from a photojournalist who was arrested in Egypt and then freed because of a series of events set into motion by his use of Twitter (http://twitter.com/jamesbuck/status/786571964) and (http://twitter.com/jamesbuck/status/787167620).

Twitter plans to make its own announcement today on its blog from “Chirp,” the Official Twitter Developer Conference, in San Francisco.  (UPDATE: Here’s their post.)

So if you think the Library of Congress is “just books,” think of this: The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites in 2000.  Today we hold more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress.

We also operate the National Digital Information Infrastructure and Preservation Program www.digitalpreservation.gov, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.

In other words, if you’re looking for a place where important historical and other information in digital form should be preserved for the long haul, we’re it!

(Thanks to my co-blogger, Jennifer, for the headline.  She always does a much better job of that than I do!)

99 Comments

  1. Keith Privette
    April 14, 2010 at 12:39 pm

    I think this is down right awesome! I am part of a generation leaving a legacy like others before! Thank you Library of Congress for undertaking this huge task of curating digital history for us!

  2. @Dayngr
    April 14, 2010 at 1:50 pm

    The question now is, who is the copyright holder? Twitter or the person who tweeted the tweet?

  3. Borges Would Be Proud/Astounded
    April 14, 2010 at 1:54 pm

    Archiving the ephemeral, the meaningless and the lulzy. Strange project – doesn’t this seem to be an overly commercial endeavor? If tweets ™ are in, how about craigslist.org postings? Spam bot postings? are you all keeping up with the google buzz ™ feeds or my facebook updates? (you see my point, I hope) because all of that information is just as culturally vacant to be archiving in the LOC.

    This is making a meaningless library indexing ephemeral nothings. Thanks for contributing to our literary and cultural heritage with this.

  4. Duzell
    April 14, 2010 at 1:58 pm

    This is bollocks. What right does the government have to my tweets or other people’s PRIVATE tweets, they made them private for a reason, aren’t their privacy laws and privacy policies on twitter? And this isn’t just American citizens and the like, everyone uses Twitter. It may seem odd to say this since you’re the government but who the hell do you think you are?

  5. Randy
    April 14, 2010 at 2:06 pm

    You have 167 terabytes of web info! Do you have a backup of that?

  6. calvin
    April 14, 2010 at 2:09 pm

    My tax dollars are paying for this? You can’t be serious.

  7. Charlie Slavin
    April 14, 2010 at 2:37 pm

    Speaking of serendipity — the announcement of this came literally minutes before I am off to introduce a Pulitzer Prize-winning journalist. the title of her talk is “Please don’t tweet me: journalism today”!

  8. Jeff Corbett
    April 14, 2010 at 2:40 pm

    A huge waste of time and money. How much will that cost tax payers? The time and money should be spent in much more worthwhile endeavours. Why do we need this?

  9. Joe Citizen
    April 14, 2010 at 2:45 pm

    I can see a lot of political aspirations dashed by people pulling out old Tweets. I’ve always thought of the service as quite banal and narcissistic, but I’ve had a Twitter account to provide feedback to a college and a couple of vendors. I think I’ll close my account now. I don’t need to risk Tweeting something hurtful or stupid that will be around for all recorded time.

  10. Nerfherder
    April 14, 2010 at 3:12 pm

    Just goes to show that dinosaurs “run” the country…

  11. Tim Baran
    April 14, 2010 at 3:16 pm

    Very cool! Will it be searchable and accessible to general public?

    Leery of private orgs having increasing control over our public (and private) discourse so it’s refreshing to see LOC taking the lead in archiving this info!

  12. Ruth
    April 14, 2010 at 3:23 pm

    So, if I graduate from library school in two years, think I can get a job with you to continue sorting this out?

  13. Elliot V
    April 14, 2010 at 3:24 pm

    Thanks for this information all! It’s pretty amazing that the entire archive will be in the LOC. This will do wonders for social networking and research.

  14. Shaun
    April 14, 2010 at 3:47 pm

    So with no warning, every public tweet we’ve ever published is saved for all time? What the hell. That’s awful.

    We should have been warned about this. Now people will be able to look up our tweets for the rest of our lives, and there’s no way we can have them removed.

    Even if our tweets aren’t bad or anything, this is hugely inappropriate.

  15. Jeffrey Greenberg
    April 14, 2010 at 4:47 pm

    How can the archive be accessed ? what are the access policies? What is the API for access?

  16. Bryan C.
    April 14, 2010 at 5:53 pm

    What we really need is a snapshot of the web from 1994-1998. The Internet Archive only has partial data starting in mid-1996 which means all of the first wave of web-sites and early on-line businesses that didn’t survive the 1996 collapse are now gone along with all of the reviews, articles, and perspectives from a formative moment in Internet history.

    Case in point. I published the first video game magazine on the web, Game Zero magazine. We ceased primary publication at the end of 1996. In 1995 we were ranked one of the top 10 resource sites for video game information on the Internet (ranked higher than Nintendo) and at our peak we were handling over a million page views a month.

    Game Zero even pioneered many things people now take for granted from the gaming press (first to publish videos of game play), same day trade show coverage, etc… Unfortunately, all of the online resources (short of a broken link in the Wired archives) that referenced the magazine and it’s firsts are all gone from the Internet leaving no citeable sources about the magazine for reference. If I didn’t have such a strong opinion about maintaining my published works even the Game Zero site would probably have been long gone (goodness knows many of the editors begged me for years to pull it).

    Situations like this have created a dilemma for sites like Wikipedia who by it’s own rules say that only citable resources can be indexed as that is their primary way to establish notability for an entry.

    All in all, there probably isn’t anything to be done about it at this point short of invent a time machine but it’s always good to be aware of this crisis so we can work to prevent it from happening again.

  17. Joe McCarthy
    April 15, 2010 at 5:42 pm

    This sounds like an incredibly valuable resource. There was an entire workshop on Microblogging at the Human Factors in Computing Systems conference (CHI 2010) this past week: http://www.cs.unc.edu/~julia/chi2010.html

    Some large industry research labs have purchased special access to tweets. Making all tweets available to all researchers will enable more researchers to study the evolution of the use of this increasingly popular social media tool.

    Any estimate on when the database will be made available, how often it will be updated, and where and how it will be accessible?

    Thanks!

  18. big deal
    April 15, 2010 at 7:27 pm

    Don’t tweet private info on the web and you have nothing to worry about. Handle your reputation online as you would normally. If you’re an idiot in real life you’ll be an idiot on twitter, and everyone will know it regardless. That being said, it’s not the most pressing thing facing our country at the moment I’m sure the money could be well spent elsewhere.

  19. Anon
    April 15, 2010 at 9:47 pm

    What a COLOSSAL waste of time/money/effort!

  20. *m
    April 15, 2010 at 10:15 pm

    How will tweets be treated if they were initially public, but then the account went private? Are they also accessible? If yes, boo! If no, yay! I say those who want to publicly tweet should be accessible, while those who only want their friends to see their words, should be allowed their privacy.

  21. yeah sure
    April 16, 2010 at 1:05 am

    because the government is limited by law regarding data mining and databases, so they need to pull stunts like this

  22. Joshua Rogers
    April 16, 2010 at 1:05 am

    Most of the negative comments on here seem to be quite odd. It’s as though most people believe that removing a tweet actually removes all record of its existence. Google archives twitter, different websites aggregate tweets, tweets get retweeted or sent to cell phones. Once it’s out, it’s out.

    There is no putting the genie back in the bottle. Showing anger at the Library of Congress for archiving tweets also shows a lack of understanding about how the internet works. Anyone requiring a warning that words can’t be unsaid should probably not be using such a service (or the internet as a whole.) When you distribute information (such as a tweet) you lose the ability to control it any further.

    Additionally, it might be worth mentioning that people probably shouldn’t tweet things that they don’t want others to see. I’d hoped that was obvious though.

  23. George Larry
    April 16, 2010 at 2:06 am

    Now, everyone can view what you posted on Twitter! Isn’t that amazing? Your Twitter history is for ALL to see! Congratulations!

  24. Bill in San Diego
    April 16, 2010 at 2:55 am

    THIS IS THE STUPIDEST MOST WASTEFUL IDEA I’VE HEARD OF AN IN A LONG TIME. SHAME ON OUR SELF-SERVING WASTEFUL GOVERNMENT !!

  25. Se Jung Park
    April 16, 2010 at 4:30 am

    Where can I use Twitter Archive? I can’t find the function on this web. Please let me know.

  26. Lean IT manager
    April 16, 2010 at 7:20 am

    For ages Twitter was the poster child of a cool app looking for a business model to make money. Looks like they found it.

    Selling tweets to the government, who would have figured. Companies have left countries for less.

    Question that remains unanswered is who owns the non US originated tweets. Questions that we will be able to answer (for years to come) are: what did we have for lunch on Monday April 12 2010. A small snack for man but a giant leap for ….?

  27. Korodzik
    April 16, 2010 at 10:38 am

    “Waste of time/taxpayer money?” Don’t judge what is worthless and what is not, for you never know what use might the future have for them. Would you archive, say, a generic announcement about taxes? No? Ahem: the Rosetta Stone was just that.

    What worries me is that many, many tweets (the majority, perhaps) contain not an ounce of original content (like a diary entry), but rather just a copy of whatever news headline is cool at the moment, and a link to it.

  28. Stephanie
    April 16, 2010 at 11:25 am

    As a researcher, librarian, and lifelong learner, I can tell you that though it seems inconsequential now, these archives will be hugely useful in the future for research and historic preservation. You would not believe how many archives I have found so useful and informative that seem to be of little value to the general population. Think about using the tweets of Obama or other historic figures to gain insights on their decisions? What if we had had those for Lincoln or Roosevelt? Sounds crazy, I know, but consider it! Also, all I hear in a small town and as a librarian is how angry people are that our historic archives are not kept up as they should be and how records of historic significance are thrown away or desecrated when we have the chance to save them. Thank you to LOC for having the foresight and intuitive thinking to see the potential.

  29. Joshua
    April 16, 2010 at 11:26 am

    How would someone access this information? What is the process?

  30. Chelsey
    April 16, 2010 at 12:15 pm

    I Honestly don’t know why people are making a big deal about the government looking at your tweets. Dont tweet if you dont want them to have them. But i do agree with some people it is a waste of time and effort. and why are they waiting untill now to want to look at them? Anyways if you dont like it donet tweet. But i know i wont stop!!!! TWITTER FOR LIFE BABE lol

  31. Andrew
    April 16, 2010 at 12:23 pm

    Joshua, I can’t speak for everybody else, but I know that I’m not so much upset about tweets being preserved for all time as I am that taxpayer money is going towards a project so completely unnecessary, mundane and moronic.

    Has anybody at the LoC actually seen the things people tweet? And they still think that this is something worth documenting? Honestly, I’m hoping that this is some kind of horribly late April Fools joke.

  32. PeteJ
    April 16, 2010 at 1:19 pm

    Does “Every public tweet, ever, since Twitter’s inception in March 2006″ include posts which a user subsequently deleted? Posts from accounts that have been deleted? I’m afraid it does rather sound like that, but I’m hoping that may be just a spot of heat-of-the-moment hyperbole. :-)

    I do take the point made by Joshua above that, technically, once something is out there, it’s out there, crawled, copied, cached. So, yes, it is the responsibility of the poster to understand that and beware.

    However, I also think there’s a difference, at least in spirit, between a third party having cached a copy of a post which I subsequently deleted, and Twitter itself having kept a copy in its back-end archive all along and then made it available again – but I hope the latter turns out not to be the case.

    Some clarification on these issues would be welcome.

  33. Jim
    April 16, 2010 at 1:30 pm

    Divorce lawyers, politicians and private investigators will have a heyday with this information.

  34. Bryan
    April 16, 2010 at 1:36 pm

    Anything posted on Twitter is the property of Twitter. There is no “privacy” or “ownership”. You signed that away for your “tweets” the instant you signed up to Twitter.

    Too bad some people are too stupid to read the full text of the member agreements before agreeing to them. Stupidity has its own reward.

  35. Concerned – but not too concerned – citizen
    April 16, 2010 at 2:10 pm

    Ok, so if my account is private, does it stay private or are my tweets archived for public use too?
    I don’t post any super private information and I try to be careful of the sorts of things I post. I realize that anything posted – even if it’s private or eventually deleted – is out there in cyberspace somewhere or on someone’s servers forever.
    The problem is that it’s not likely that those tweets are going to be used for “research” or “study.” Under these circumstances the government and twitter are encouraging the study of people’s tweets.
    Now, if your account is public, I have much less of a problem with this. You’re agreeing to allow anyone to see your information.
    If, however, your account is private I would hope that it would remain so. At the very least, I would hope that those holding private accounts would be given the option to opt out and close your account without any of your tweets being archived.
    There are a whole host of Constitutional questions that this acquisition brings up. Don’t get me wrong… I see the proposed value in such a project. I’m just not sold on the idea yet.

  36. jt
    April 16, 2010 at 5:34 pm

    “Once it’s out, it’s out.”

    It’s simplistic to say that all forms of “out” are the same. Is a comment in isolation in blog that someone needs to look at the same as the comment in a large, easily searchable collection?

    Librarians make things more accessible, so information that’s processed and organized by LOC is not the same as those pieces of information in isolation. Wholesale collection is different than small pieces being public separately. That can be good or bad depending on your perspective – either way its different.

  37. Michael Hartzell
    April 16, 2010 at 6:25 pm

    It reminds a little of Miracle on 42nd Street. Until the official government stepped in, there was a debate.

    This is a milestone to say the least. It proves that business owners are publishers and what Dad said: Everything you say counts so think before you speak.

    I am so excited I have blogged and tweeted it which of course also goes down in the Library of Congress? Have to wonder how many tweets about the LOC will be in the LOC in reference to saving tweets in the LOC.

    Now that make me SMink. SMile and Think at the same time. :)

  38. Sean
    April 19, 2010 at 12:29 am

    Twitter is certainly crap, but would the author kindly research and address the actual cost of this project?

    It seems to me the scale of this project is easily overestimated. The data can fit on a consumer hard drive, and the operation (ETL, storage, backup, etc.) seems like it should be trivially folded into what must be an already reasonably robust IT infrastructure to support the LoC’s research mission.

    Did this project cost the LoC anything other than slivers of existing staff and systems? Or was there new staff, new infrastructure, or some other capital or new non-trivial operational expense?

    The internet is a hotbed for rumor and wild speculation. Please get the numbers.

    Thanks,
    Sean

  39. anon_librarian
    April 19, 2010 at 12:59 am

    Oh great…now the government can track even our most mundane commentary and hold us accountable. So much for privacy and intellectual freedom…

  40. Dale McNamee
    April 19, 2010 at 1:39 am

    What a waste of space that could be better used for really historic books and data. Also, what a waste of scarce tax dollars !

    I’m not happy at all in knowing that as my taxes are going up (and they are guaranteed to with all of the spending), some it will be used to narchive the most mind-numbingly boring messages of the self-absorbed.

    As for the grandkids, they will have well written observations and well thought out ideas that will be more informative than 140 character “tweets” !

  41. Dale McNamee
    April 19, 2010 at 1:58 am

    Why not archive IM’s and SMS messages as well ?

    Here are a few of mine :

    “I’m at the Walmart…”
    “I’ll be home at 5PM…”
    “Traffic is backed up on 695…”

    Some really historically significant stuff here,worthy of archiving…

  42. You already agreed to this
    April 19, 2010 at 2:46 am

    Re-read the terms of service. You know, that bunch of legalese we all agreed to in order to activate our Twitter accounts. The short, short version: You wrote it, YOU own it. BUT – by posting it, you also gave Twitter the right to do whatever they like with it, for free. (It’s in the first paragraph of the Your Rights section in the ToS. Check it.)

  43. Jim
    April 19, 2010 at 4:53 am

    I shouldn’t imagine that it would cost a huge amount of money to do this. Storage is cheap and you just need a code-chimp to whack out an API to capture the public timeline then ask Twitter to give you a good API Limit because you’re “special”.

    1TB of storage to a MotP is around $45. The racks and servers are what costs and the infrastructure would already be there. Storage is a matter of plugging it in.

    Bryan C is spot on. I had various sites from 1995 onwards. Would love to see them again. Yep, I did backup. Onto (now dead) 5¼” floppies. Remember them?

  44. c.v.antony
    April 19, 2010 at 7:14 am

    Comparing with the infinite quantum of resources involved what is the gain.?

  45. Gary Green
    April 19, 2010 at 7:30 am

    I think those people who are concerned that their Tweets will be available for the world to see and that Library of Congress are now achiving these tweets should have read the Terms & conditions on Twitter – it seems it’s all included there.

    You should also be aware that there are also other Tweet archiving services already available that may be storing your tweets.

    I know direct messages sent on Twitter won’t be stored. I assume that private account data won’t be stored either? Can the Library of Congress confirm this?

    I don’t agree with the attitude that this project/plan is a waste of time. I agree that there is rubbish spouted on Twitter (I can take some credit for that ;-) ), but at the same time when you look at the discussions happening between people who use it for professional purposes (eg at academic conferences), it does provide a useful means of research, inspiration and information. Even as a social history resource it will be of value in the future.

    Will Library of Congress be classifying/cataloguing the tweets? I know it wouldn’t be possible on a tweet by tweet basis, but how about on a hashtag basis?

  46. Oxa
    April 19, 2010 at 11:57 am

    Processing this is a major waste of taxpayer dollars. Nothing more needs to be said.

  47. Dana
    April 19, 2010 at 12:47 pm

    (following @Bryan C.) Given the fact that a great proportion of twitter posts contain shortened URLs. URLs that without their host service provide absolutely zero context. So unlike the wayback machine which at least contains a semi-descriptive URL, this archive will contain millions of links to a small number of URL shortener services that no longer exist. Is there any plan to address this?

  48. djuyadi
    April 19, 2010 at 1:01 pm

    I can’t really get the fundamental idea behind this project. This is what I want to say: quote from the post, “I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data.”

    But since the library is the biggest in the world and has huge amount of collections, no wonder it runs this kind of archiving public streams.

    I think this project really has to be appreciated. Just wait and see what it’s worth.

  49. @edbice
    April 19, 2010 at 5:36 pm

    my two cents: good move. libraries are supposed to gather published information. twitter is the birth of informal, digital publishing. it does not seem appropriate to let this historical record be left to other services to preserve- the LOC is a perfect location for this data.

    the announcement should, however, have included lots more details on the terms of access for this data? APIs or only viewable on microfiche with a security clearance?

    there are bound to be many awkward moments when people realize they are ‘publishing’ on twitter– natural selection, etc.–but the LOC should not be criticized for preserving what many would rather was lost to digital decay.

  50. Ann
    April 19, 2010 at 10:44 pm

    Hmm, I’m a bit ambivalent about it. On one hand, it’s a great advancement in future technology and data collecting for statistical references. For the use of good that is, and by no means can the government interfere or use it for evil. I already explained the other hand.
    I can forsee an upcoming thriller/action movie on the controversial nature of collecting ‘tweets’.

    I have to ask though, who or what is paying for all of that storage memory?

  51. thanks
    April 20, 2010 at 8:59 am

    Listen to Jim.

    What’s continually interesting… since the early 90′s… is the “general” opinion toward crawling, if there is such a thing. I guess most people cannot manage a little automation.

    These days, it’s “OK” for Google and some other big players to crawl any site. We even make it easier for them to do so. It wasn’t always like that.

    Did sites even realise Google was doing widescale crawling back when they were at Stanford? Did anyone care (besides Stanford)?

    It’s “OK” for the Internet Archive to crawl any site, as long as their crawler respects robots.txt.

    And now it’s OK for LOC to crawl Twitter. Do they need permission? From Twitter? From subscribers? From taxpayers?

    As Jim says, any chimp at a keyboard can do some crawling, generate some large chunks of data and store it. It takes little time and money.

    And it’s naive to think that there is not massive crawling going on year after year by parties who do not announce what they’re doing, as LOC does here. The parties could be chimps at universities, companies, governments, other organisations, or even at home. 5.25in floppies, stacked CD’s, terabyte towers, server rooms, data centers, whatever.

    At one time, years ago, Google was such a party. They crawled, and crawled, using Stanford’s computers.

    They didn’t ask for permission of the sites they crawled. They just did it.

    Be thankful they did. They do a tremendous job.

    Just don’t try to crawl Google, unless you have permission, or you’re Microsoft. lol.

  52. Brian T
    April 20, 2010 at 11:49 am

    There are still a couple of very important questions not answered, not the least of which is how one goes about data-mining the information?

    More importantly, however, even if the original Tweeter still owns the copyright, doesn’t publication to the LoC by Twitter (as making it available in a searchable electronic archive online and free to the public ultimately does) without express written permission of the original author undermine that author’s ability to profit from their own work by, say, collecting their tweets into a volume for sale? One wouldn’t prevent a poet or a short-story writer from collecting their works into a single volume, but if the aggregation has already occurred on a grand scale, couldn’t Twitter claim a copyright on the collection, and therefore any subsidiary collections be considered a derivative work, despite what their privacy policy says?

    We’ve already seen one similar legal loophole in the ongoing Google Books case where authors are having their works made available de-facto without obtaining express written permission. Is this opening a can of worms? Twitter probably needs to proceed very carefully.

  53. Gail King
    April 20, 2010 at 8:44 pm

    Twitter gave the archive to the Library of Congress, so no cost there.

    Disk space is cheap now.

    Libraries are about more than books: they are about information in whatever form.

    Anything posted on the Web, including unlocked tweets, is published publically. Anyone could save it, so why not the LC?

    Google is going to have it first anyway. And it’s going to be completely searchable.

    Twitter will probably sell the archive to anyone else who asks. They will mine it and count it and anyone who ever said a product name will probably get marketed at.

    It’s nice to know that someone other than for-profit companies has that data.

  54. Michael
    April 20, 2010 at 10:37 pm

    What a lot of comments for and against! Firstly, words, tweets and anything you post on the web is implicitly provided for the public (globally) to see. If you value your privacy and think your rights will be violated, don’t do anything online. When arrows are shot from the bow, they’re gone. You don’t have control anymore. Fullstop. Secondly, read the ToS before you use ANY tool or app from the Internet. You agreed to the ToS before you use and post anything online. Your own fault, so take some responsibility for that.

    At some point in the (far) future, people may be glad that archiving of tweets and other online data was done. It documents humanity and what we are at this point in time. It would be like looking backwards in time – regardless if your tweets are menial or earth-shatteringly useful. Future generations would be able to see what ails us, inspire us and more by looking at snippets of archived tweets. It is immensely useful for humanity’s history. Don’t just think, ‘Me, Me, Me’. Whatever we do now online is a legacy to humanity’s future and captures the essence of who we are.

  55. jsbrodhead
    April 21, 2010 at 5:02 am

    There is no point to this. Seem only like government collecting information for government.
    Do not forget the Federal Statutes enacted after the Nixon Admin gathered info on private citizens… and please remind the current administration of those statutes.
    Regards,
    jsbrodhead

  56. Korodzik
    April 22, 2010 at 5:22 pm

    “Processing this is a major waste of taxpayer dollars. Nothing more needs to be said.”
    Comments like that amuse me. No substance, no arguments, just ‘I’m right and don’t need to prove it.’

    What also amuses me are all these panicked comments: “Oh noes! Now the whole world/the evil government will be able to read my tweets… the tweets which I, personally, have published on the Internet for all to see, but somehow was never concerned with anyone accessing them, until now!”

  57. Korodzik
    April 22, 2010 at 5:30 pm

    Hey, wanna know what all these tweets can be useful as?

    A language corpus.

    Basically, tweets are words. Lots and lots of words. The lexicographers of the future will surely appreciate such a big chunk o’ language.

    You can build statistics from it that show popularity of various words over time. You can see a tweet which coined a particular word. You can see which words are generally popular and which are not, and thus build a general psychological image of twitterers.

  58. EM
    April 23, 2010 at 11:25 pm

    This is despicable! How dare the government compile such a database! I guarantee you that a large percentage of those tweets would never have been sent or altered seriously if the senders thought that the government would be cataloging them for eternity. Anyone that is ok with this is just simply giving our heritage of freedom away to a government that may not be so benign 5, 10 or 20 years from now. You may love and trust Barack Obama and Nancy Pelosi but what if someone as “dangerous” as Dick Chaney or someone really dangerous like David Duke were to be elected president and have all your tweets, would you be so happy? These are dangerous times, don’t be so quick to give away the sacred freedoms that your forefathers left to you under the guise of “being part of history” or for ANY other reason. If we loose our rights to privacy and free speech the government owns us. For example, what if a protest group wants to organize a rally at a specific location via Twitter, Big Brother has access and decides to head them off by shutting down the park they are meeting in. It allows them to control and squelch free speech. What happens when they decide it would be great for history to add all of Google’s databases to the Library of Congress?

  59. OliverK
    April 25, 2010 at 3:42 am

    stay out of my tweets. that’s why I set them private.

  60. Ricker
    April 25, 2010 at 8:53 pm

    So, where are all the folks who got up in arms when the Bush Administration demanded to be able – through the Patriot Act – to wiretap known international terrorists? Their expectation of privacy is to be protected by the 4th Amendment, but mine has no value? Talk about selective indignation and offense! I’ll never Tweet again, now that every character I post becomes the public record with no probable cause for suspicion, no warrant to search and no consent from me! Nobody ever told me that could happen! Is this really America?

  61. AKlarmann
    April 26, 2010 at 6:37 am

    Nice news and a big corpus to extract some insightful information. The following problems remain:
    1. how to come by the evolving “short message syntax” and what does it really mean concerning our language and knowledge
    2. how to teach the people active forgetting, an obviously important function which out brain does along the way, when all information is kept forever
    3. how could we access the dataset systematically … is there an API?

    Good work!

  62. Korodzik
    April 26, 2010 at 1:52 pm

    “For example, what if a protest group wants to organize a rally at a specific location via Twitter, Big Brother has access and decides to head them off by shutting down the park they are meeting in.”

    Good god! The government can access PUBLICLY AVAILABLE INFORMATION ON THE INTERNET now?! There’s no hiding from these monsters!

    “I’ll never Tweet again, now that every character I post becomes the public record”

    Fun fact: they already are public record. Because, y’know, YOU published them.

    Seriously, why are some people so dense?! Sorry, but I just can’t grasp WHY would anyone be angry about having his tweets be public knowledge AFTER they published them to the entire Internet out of their own will!! I mean, WHAT?!

  63. Brooke
    May 2, 2010 at 8:35 pm

    Clearly some of you haven’t been paying attention. Do you really think you own any rights to information you post on the web? Privacy doesn’t exist when you willingly post something to a public site.

    Twitter is an excellent example of the web 2.0 phenomenon and how we are creating and spreading information. Who are any of you to decide what the future information users will find important and useful. It’s thought that 80% of silent films are gone forever because no one took the time to preserve them. I’m sure there are plenty of scholars who would love to have studied these films.

    What would you like the Library of Congress to spend money on? This is what they do. Your precious tax dollars are going to much less important pursuits than the preservation of our cultural heritage.

  64. www.tweetlaterpro.com
    May 24, 2010 at 10:08 am

    Everyone should know by now that the internet is forever. No matter what, anything you put out on the internet will remain forever.

  65. M
    May 26, 2010 at 8:46 pm

    So, are they saving, cataloging, shelving, searching, “researching” our cell conversations too? How about our tv surfing? Oh, and how about those records that are kept on what we buy at the grocery store? Gee, 1000 years from now someone will be amazed at the amount of chewing gum I bought in 2010! Of course, they won’t know what it is. Well, it doesn’t matter because storage of digital stuff is useless and silly because the future generations will have way different kinds of storage and won’t be able to use those terabytes…witness the VCR/CD/DVD/Blue Ray evolution.

  66. Korodzik
    June 2, 2010 at 3:06 am

    @N: I can’t help but notice that we are talking about Twitter and not grocery store records, so your wisecrack about chewing gum is baseless. Also, do you really think that the LOC is unaware of impedning obsolescence of data storage and is unprepared?

  67. patience
    June 6, 2010 at 1:50 pm

    from a historian perspective that relishes in old diary entries of long forgotten john doe… there is a lot of wealth in the tiny scraps of communicative trash people leave behind!

  68. bbvesdfgsdf
    June 22, 2010 at 5:57 am

    i cant even see what people said after 2 days because people tweet so much and they are saving every single one?

  69. Brent Arnesen
    July 19, 2010 at 11:41 am

    Historians in the future, when researching the downfall of America, will now know exactly what the American people were thinking when it happened:

    “Eating yummy flambe’ from Nero’s on 14th, and playing Guitar Hero all day…”

  70. Brandon
    July 29, 2010 at 9:12 am

    This is absolute lunacy. Another waste of taxpaters money.

  71. Luke Peters
    July 29, 2010 at 9:32 am

    Judging from these comments there are a considerable number of folks that have yet to read the terms of service – I quote

    “By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).”

    If you are upset by them doing this perhaps you should have read the agreement first!

  72. Tara Drexler
    July 29, 2010 at 7:31 pm

    I agree with patience. This can be a very valuable resource to future historians and current sociologists. The thoughts of every-day people have been precious and hard to find. Up until now, the main source of information on the thoughts of average people has been inferred from the thoughts of famed and outspoken individuals throughout history. Now, researchers will know what people really think, right down to decisions about where to go for dinner. :)

  73. brian
    August 5, 2010 at 3:46 pm

    How do I opt out of this?

  74. Mike Brown
    September 3, 2010 at 12:42 am

    In comment #47, Dana asked the same question I’m wondering about: so many tweets contain links to a URL shortening & redirection service. How is this being addressed in the archive?

  75. Boyan Yurukov
    September 27, 2010 at 4:56 am

    Technically no one has allowed their tweets to be archived like that. Currently the agreement is that tweets have an expiration date (usually a few weeks). There is no mention in the private policy that you have the right to record and keep track of that data. Not that such things have ever stopped you before.

  76. drbexl
    December 3, 2010 at 5:29 am

    “Currently the agreement is that tweets have an expiration date (usually a few weeks).”

    Tweets have no expiration date – they are still available via Google, etc. and always have been!

    This could make for very interesting data for historians… about our “real” lives, not just about those in power. Interesting

  77. berikson
    January 5, 2011 at 8:57 am

    I can understand being upset about “private’ information, however…as a historian/genealogist – this (albeit mundane) information could be extremely valuable to a researcher. I would absolutely love to be able to “hear” my ancestors “talk”. It would be amazing to see how they think, what concerns them, even down to the cadence of their “speech” (we tend to type how we talk). You could even gain an idea of reactions to historical events – imagine if your 6th great grandchildren were to find out where you were – and what you thought about something like 9-11? (just an example…) I, personally, would find it fascinating to find out what my 3rd great grandfather thought about his oldest 2 sons going to the Civil War. Our family has a few letters from that time – just scraps – but to have a nearly daily account – that would be research GOLD.
    Just my humble opinion.

  78. Jason
    January 7, 2011 at 7:21 am

    I can’t believe that all the tweets since 2006 are going to be stored. That’s alot of tweets!

  79. eb
    January 7, 2011 at 8:15 am

    you guys are being monitored and your just like “thats cool” , u see these are all tests, and when no one stands up for justice, the chains wrap tight,!!! man, o , man are americans dumB!!!!

  80. Mrjonesygirl5
    April 4, 2011 at 3:50 pm

    can we access these?

  81. Suzy Q
    June 19, 2011 at 5:00 pm

    About a year ago, the FBI started to following social sites like this as a form of intelligence gathering. This sounds very time consuming and cumbersome. Now that they are being handed over the contents of a database it will be so much easier for them to catalog dissenters as though they are criminals and terrorists.

    I understand @Berikson’s point about history and genealogy but the letters you have were passed down through the family. They weren’t confiscated by the govt under the guise of anti-terrorism.

    The FBI announced that they were going to follow information posted to social networking sites which I take proved cumbersome. How convenient that twitter has seen it fit to betray it’s loyal users who have made it so successfull by handing over the database of tweets.

  82. Zil Maddet
    June 20, 2011 at 1:16 pm

    This is terrific: a pointless, unnecessary expense by the government to swallow an ocean of data – mostly useless, senseless data with minimal value. Now future generations can bear witness to how utterly stupid and vain we were – 1. for creating this steaming mountain of pointless gibberings, and 2. for preserving it for posterity.

    LOC, you nimrods.

  83. Willie Bigs
    July 5, 2011 at 8:22 am

    Cuts in arts and education, but there us financing for this? *reaches for phone. calls a Republican member of Congress*

  84. @iC
    July 13, 2011 at 7:38 pm

    Any chance of providing an API perhaps w/o user ID (largely removing the privacy issue) Just the 140 message?

    THX

  85. A Question
    August 5, 2011 at 5:55 am

    GREAT! So, how do WE search this archive? Like, my own tweets are not retrievable via twitter but if the LOC has them and THEY ARE MY TWEETS (and I don’t remember giving them away) then I should be able to track them down again, surely?

  86. Noah David Simon (@CriticalAnalyst)
    September 21, 2011 at 1:55 am

    where are they going to house all the twitter accounts that were suspended. that is the question!

  87. sqlservermanagementstudio.net
    October 21, 2011 at 12:41 pm

    Technically there is no way where someone can store all tweets since 2000 or any year- because you not keep your database increasing all the time if you are keeping them live, so tweets should be stored for few years or time period and then deleted or archived in flat stream.
    Rebecca
    Thanks

  88. Supuhstar
    November 17, 2011 at 11:02 am

    so… where can I browse said tweets

  89. JH
    December 7, 2011 at 4:03 pm

    Are you people serious?

    I realize that data storage is geting cheap, but we, the citizens, do not have money to spare to pay for frivolous nonsense like this.

    You, down in DC, need to start to get a clue.

  90. Michelle
    December 9, 2011 at 3:04 am

    Ok so let me get this straight. Someone saying I had carrots for dinner is in need of research? I am just pleading the case not every tweet is a winner! I really don’t see the expensive project will help us understand the generation. I think it may be of use when recording things of historian staus. They can take that information and log it, but what average Joe had for dinner or tweeting I
    am bored Holla at me… Is this of value to studying the generation? I am just saying maybe selective tweets would be more beneficial to history. I am not sure I see the value. Lots of useless comments on tweeter. I know any tweet I have put out is of no historical value. Well now all that is left to say is please fellow tweeters think before you tweet. We will look like idiots in the future. LOL

  91. Grace Park
    December 28, 2011 at 5:24 am

    I dont’t understand those people who say that this is bad.
    I mean, every tweet that we make is going to be history!
    For example, if a war breaks out, the tweets can later be used by our ancestors to work on! Public feeling and all that! We are living to make history and twitter is helping a lot by saving proofs of that history!

  92. Grace Park
    December 28, 2011 at 5:31 am

    And not to forget! Languages which may be lost after this time can also be found again through tweets! Isnt that cool?

  93. ryan
    January 27, 2012 at 12:27 am

    asked a question as to where you go to even read the tweets and i guess my comment got deleted, this smells fishy… now watch this comment disappear too

  94. Carrington
    March 12, 2012 at 5:55 pm

    I honestly think this is the most ridiculous thing that has happened in a long time and it brings mediocrity to an all new low. Let us celebrate the moronic ramblings of every two-cent celebrity and horror of the moment! Not only celebrate it, but keep it forever for our children and grandchildren see just how mundane we truly can be – not to mention how dumb and trivial we may come across. I say we in the royal way – and am not including myself. “Hey kids, forget about all those books on learning, get on-line (the new boob-tube) and check out what Lindsay Lohan tweeted back in the day!”

    Ridiculous.

  95. Saskboy
    March 22, 2012 at 1:13 pm

    How can we search it though?

  96. zach
    July 4, 2012 at 3:25 pm

    You all don’t understand the big picture. If the government can see everything we say about everyday tasks in our lives, they effectively know everything about us. If America ever loses its democratic status and somehow transforms into a dictatorship or other oppresive government, this library will be the perfect way to see who is a threat to the new leadership and the consequences could be bad. Just think about it, anyone in the world can view your tweets, depending on how much/ what you tweet. It could follow you your whole life, like an unintentional tatoo of all your ideas and activitys

  97. Tom
    January 8, 2013 at 3:38 pm

    Are you going to be making this data available, either as subsets, or in total?

    Being able to download a day would allow testing of analysis, a month might be a reasonable size.

    It might be possible to host the data elsewhere, if you are allow to redistribute it.

  98. JA Callahan
    February 17, 2013 at 1:50 am

    The Government is going to be able to text mine / text analyze / find sentiment (feelings towards a subject) from these tweets. In doing so the Government will have a societal consensus opinion on EVERYTHING. This will help the Government do good things and bad things. It will increase the Governments power and ability to strategize / create propaganda greatly.

  99. Troy
    February 16, 2014 at 12:50 pm

    Tweets are archived after 6 months. Meaning if you don’t want your tweets archived you have 5 months and 28 days to delete them. There’s apps that do this for you at the touch of a button.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.