Top of page

Preserving Social Media for Future Historians

Share this post:

Information scientist Katrin Weller’s research investigates how future historians might use social media as primary source materials, and how such materials should be preserved. One of two inaugural Kluge Fellows in Digital Studies, Weller was in residence at the Library of Congress from January – June 2015. She sat down with Jason Steinhauer to discuss her research and the prospect of creating a guide to using social media as historical resources.

Hi, Katrin. Your research investigates whether social media data will be the primary source materials for future historians? Will it–and why or why not?

Social media data and other online communication data will surely be used by future historians to learn about our times. They won’t be the only source material, as current traditional sources will still remain. But social media are already being used as a new type of data source by contemporary scholars in various disciplines: political science, sociology, linguistics, communication science, geography, physics, computer science and many more. It is logical to assume that future historians will also look at these sources.

For what purpose will it be used and what might future scholars learn from this data?

Social media are used as a platform to discuss major events such as elections, political crises, natural disasters or cultural celebrations. For example, historians may want to discover the first people who reported live from what later became known as the Arab Spring. They will try to identify different locations of protest activities, such as during the Occupy Wall Street movement, based on communication in different social media channels. Social media is also used by numerous politicians and other public figures. Historians may want to retrace what Barack Obama said on Twitter during an election campaign and how people reacted to that in social media conversations.

Social media data can also be used to study aspects of everyday life, including popular culture, fashion, nutrition, health and well-being, or travel. Social media data open a window to everyday communication, little notes and observations, which remind us more of the spoken conversations that are typically ephemeral. It is fascinating to see thoughts on everyday life being shared on that large scale.

The only thing that would prevent this scenario is if the data are no longer accessible due to a lack of preservation efforts.

So what should we be thinking about now to ensure social media is preserved?

It’s a very good question and we still need a lot of work to answer it more comprehensively.

First of all, there is the more general topic of digital long-term preservation. We must ensure that storage devices remain intact and that we still have devices that allow us to run specific file formats. More technical challenges need to be solved for social media data, including how to handle their size and how to make them searchable.

Then there are legal and ethical challenges of archiving social media data. Usually data comes from social media platforms, which are operated by large companies–such as Facebook or Yahoo–who each has their own terms of services. Many social media data are not fully openly available and access often depends on agreements with the respective companies, who may or may not have an interest in sharing access to their data or discussing archival strategies.

Third, all preservation strategies have to happen within a framework that ensures that the social media users–the people who have actually created the content within a social media platform–and their interests are protected. Here we need usable approaches to protect privacy, for example.

Finally, we need to start working on how to preserve relevant contextual information. A lot of the context of social media data quickly gets lost, but is important when we want to interpret the data. For example, the look and feel of a social media platform changes over time and it is already very difficult to trace how a specific social media platform looked two years ago, which buttons were placed where and which interactions were possible. But the look and feel highly influences how people use social media data and interact with one another.

Much of your research in Germany focuses on Twitter, and its role in documenting how significant events unfold in real-time and how people respond to them. In your opinion, will Twitter be the barometer that future historians use to gauge what events and ideas were significant in our times, or will future historians decide that based on other sources, and then look to Twitter to gauge how we responded.

There currently are some connections between what is prominently discussed on Twitter and what makes it into the traditional news. Journalists have started to pick up trending topics from Twitter and are commenting on them on TV or news web sites–and of course Twitter users are commenting on events that make it into the news. The ability to connect users around topics rather than focusing on existing “friendship” connections distinguishes Twitter from other social networking sites such as Facebook and makes it special. Thus it will make sense to look at the relation of Twitter and traditional news in the future. In some cases it may be interesting to mine the whole collection of Twitter data for trends and interesting topics. But in most cases the decision about what is a significant event comes first–by looking at the broader picture of worldwide events and their connections–and social media sources like Twitter will subsequently be mined for reactions to those events.

Statista.com reports that, as of the first quarter of 2015, Twitter averaged 236 million monthly active users–impressive, but only about 3 percent of the world’s total population. How do we accurately gauge Twitter’s significance in being representative of contemporary thoughts and mores?

Exactly, that is what we have to constantly keep in mind. And it’s not even the fact that only 3 percent use Twitter which is the most critical argument here. It’s that we are aware that this is not a representative sample but indeed very biased. For example, we know very well that some countries are not represented through Twitter at all. There’s the general phenomenon of the digital divide and populations which are not online at all; of countries where other social networking sites are more frequently used or where Twitter may even be prohibited. When we look at tweets around Hurricane Sandy in 2012, they will tell us quite a lot about what was going on in New York City but nothing about what happened in Haiti around the same time. And even with Twitter users in the U.S., Twitter is not representative of the overall population. It is more frequently used by people of specific age groups and with specific backgrounds. That is why it is so important to generate information about user demographics, so that we can understand these biases.

How do other social media fit into your research: Instagram, Pinterest, Snapchat, etc. Should these be preserved for future historical study–and how might that be done?

It’s very important to also consider the various other social media platforms. They may cover very different topics, different media formats and address different audiences. Photo and video sharing platforms like Instagram, Flickr, YouTube and Vine are very interesting sources–but also pose additional technical challenges to archiving as they are more dependent on content descriptive metadata (what do we see in an image, what is happening in a video) for searching and interpreting. I also consider blogs as very important accounts of individual histories. And then there’s a lot of other online interactional Web data that in the future can shed light on how we lived, such as products on eBay or activities on flat sharing platforms. Snapchat and WhatsApp would be a valuable, too, but are much more difficult to obtain.

What parallels do you draw between social media and past modes of communication: is social media the equivalent of yesterday’s postcard? The carte-de-visite? Are such analogies helpful in any way?

In some cases those comparisons are useful and make sense. Postcards and letters are a good example, as many status updates on social media are indeed a next level of writing postcards or letters to friends and family. And many personal blog posts surely share many of the characteristics of traditional diaries as historical sources. But not all social media content is that personal; in other cases, social media content comes from journalist or public relations offices. It can be designed as advertisement and even propaganda, which calls for some very careful consideration in source criticism. There’s also a lot more to social media data that we do not find in traditional sources: interactions in forms of “likes” and comments, timestamps and geocodes.

What do you hope will come out of your research, both at the Kluge Center and more generally?

Very generally, I hope to contribute to the understanding of what can be studied through social media and how data collection and research methods affect the outcomes in social media studies.

A bit more specifically I want to create a critical guide to using social media data as historical sources. My hope is that my insights into today’s social media research will help to provide the necessary context information for future historians.

For this effort I learned a lot during my time at the Kluge Center. I highly benefited from talking to experts in Web archiving who preserve collections of Websites and introduced me to the specific challenges of this type of resource. And I learned a lot from the interdisciplinary atmosphere at the Kluge Center and from discussions with historians with different backgrounds and their challenges in working with different material at the Library of Congress. The materials they worked with often had parallels to social media contents, whether it be the diaries of war veterans resembling today’s blogs, construction manuals that mirror instructional YouTube videos, or fan fiction that manifests itself on today’s social media.

Katrin Weller is a senior researcher at GESIS Leibniz Institute for the Social Sciences in Cologne and the author of “Knowledge Representation in the Social Semantic Web.” The Kluge Center is currently seeking applicants for the next Kluge Fellows in Digital Studies. Applications are due December 6th.

Comments (2)

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.


Required fields are indicated with an * asterisk.