Today’s guest post is from Madeline Goebel, a Digital Collections Specialist at the Library of Congress. As a reader of the Signal, you may already be familiar with By the People, the Library of Congress’s crowdsourcing program that allows volunteers to transcribe, review, and tag digitized pages from the Library’s collections. Further, you may already know …
Introduction The Selected Datasets Collection was publicly launched June 2020 as part of the Library’s ongoing efforts to support emerging data-driven styles of research. Since then, our initial offering of twenty datasets has grown to nearly 200 unique items, and we’ve continued to refine the technical workflows by which content is prepared and delivered to …
Friends, data wranglers, lend me your ears; The Library of Congress’ Selected Datasets Collection is now live! You can now download datasets of the Simple English Wikipedia, the Atlas of Historical County Boundaries, sports economic data, half a million emails from Enron, and urban soil lead abatement from this online collection. This initial set of …
It has been just over a year since we kicked off a deep dive into the Library of Congress Web Archives on the Signal! Now at over 2 petabytes, the web archives are a complex aggregation of interrelated web objects that make up the internet as we know it (images, text, code, audio, video, etc.). …
Interested in learning more about what’s new in the Library of Congress’s digital collections? The Signal shares updates on new additions to our digital collections and we love showing off all the hard work of our colleagues from across the Library. Read on for a sample of what’s been added recently and some of our favorite highlights. …
This blog post was guest-authored by Rachel Trent, Senior Digital Collections Data Librarian. For nearly twenty-five years, the Library of Congress has been archiving campaign websites for Presidential, Congressional, and gubernatorial elections. Back in 2022, we released a dataset of index files for the United States Elections Web Archive, and we are happy to announce …
Catalog records are key to storing and finding digital library materials. As the volume of digital materials continues to grow rapidly, the Library of Congress is exploring whether AI can help catalogers by automating the generation of metadata. AI could provide an opportunity to speed up description workflows. Yet there are numerous machine learning (ML) …