Breaking: A New “News” Archive!

A new digital collection, The General News on the Internet, is a free archive of online-only news sites collected from the web.  The Library of Congress began preserving these sites in June 2014.

How are these news-based sites captured? The Library uses a hybrid approach of weekly captures of the websites, augmented with twice-daily capture of known RSS feeds (Real Simple Syndication). This produces a more complete news archive.  Given the dynamic nature of the 24-hour news cycle of today, these archives are meant to capture as much of the news distribution as possible given current limitations in technology and resources.

(2001) Detroit Free Press. United States. [Archived Web Site] Retrieved from the Library of Congress, //www.loc.gov/item/lcwaN0011385/.

You will see that some of the sites in this web archive include captures earlier than 2014 and here’s why.  For instance, many news sites are included in the Library’s September 11, 2001 Web Archive, the Public Policy Topics Web Archive, and the United States Elections Web Archive. The web archiving team also picks up content on pages through links from other sites. For instance, if another website we archive has links to additional web pages, we follow those links to get their context and ensure useful content is captured. We also get embedded content that might appear from these resources. The web archive access tools point to any and all versions. Some of the sites (17 to be exact) are openly available online from anywhere and 37 are available on-campus only.

You will see that we are not including major news sites and are only focusing on born-digital sites. Copyright restrictions play a major role and we also wanted to capture sites that could be at-risk of disappearing. For instance, the Christian Science Monitor ceased daily print publication in 2009 and we wanted to add its website to the archives to preserve its content for posterity.

(2010) LGBTQ Nation / News, Opinions, Arts and Culture – The Most Followed LGBTQ News Source. United States. [Archived Web Site] Retrieved from the Library of Congress, //www.loc.gov/item/lcwaN0018339/

(2010) The Texas Tribune. United States. [Archived Web Site] Retrieved from the Library of Congress, //www.loc.gov/item/lcwaN0017064/

 

 

 

 

 

 

 

 

 

 

Why do the archives stop at 2018 right now? Everything in this archive is under a one year embargo.    As items come out of the embargo period, more recent captures will appear. You’ll continue to see more and more content available as records are added.

(2002) Slate Magazine. United States. [Archived Web Site] Retrieved from the Library of Congress, //www.loc.gov/item/lcwaN0010234/

For tips on searching the collection, visit this page.

More information on the web archiving program for researchers and site owners is available here.

Have you used this resource?  Let us know in the comments! Questions about Using the Web Archive? Contact the Web Archiving Team or Ask a Librarian and follow The Signal blog for announcements of additional content being made available.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.