Digital Scholarship Resource Guide: So now you have digital data… (part 3 of 7)

This is part three of our Digital Scholarship Research Guide created by Samantha Herron. See parts one about digital scholarship projects and two about how to create digital documents.

So now you have digital data…

Great! But what to do?

Regardless of what your data are (sometimes it’s just pictures and documents and notes, sometimes it’s numbers and metadata), storage, organization, and management can get complicated.

Here is an excellent resource list from the CUNY Digital Humanities Resource Guide that covers cloud storage, password management, note storage, calendar/contacts, task/to-do lists, citation/reference management, document annotation, backup, conferencing & recording, screencasts, posts, etc.

From the above, I will highlight:

  • Cloud-based secure file storage and sharing services like Google Drive and Dropbox. Both services offer some storage space free, but increased storage costs a monthly fee. With Dropbox, users can save a file to a folder on their computer, and access it on their phone or online. Dropbox folders can be collaborative, shared and synced. Google Drive is a web-based service, available to anyone with a Google account; any file can be uploaded, stored, and shared with others through Drive. Drive will also store Google Documents and Sheets that can be written in browser, and collaborated on in real time.
  • Zotero, a citation management service. Zotero allows users to create and organize citations using collections and tags. Zotero can sense bibliographic information in the web browser, and add it to a library with the click of a button. It can generate citations, footnotes, endnotes, and in-text citations in any style, and can integrate with Microsoft Word.

If you have a dataset:

Here are some online courses from School for Data about how to extract, clean, and explore data.

OpenRefine is one popular software for working with and organizing data. It’s like a very fancy Excel sheet.

It looks like this:

Screenshot of the Open Refine tool.

Screenshot of the Open Refine tool.

Here is an introduction to OpenRefine from Owen Stephens on behalf of the British Library, 2014. Programming Historian also has a tutorial for cleaning data with OpenRefine.

Some computer-y basics

A sophisticated text editing software is good to have. Unlike a word processor like Microsoft Word, text editors are used to edit plaintext–text without other formatting like font, size, page breaks, etc. Text editors are important for writing code and manipulating text. Your computer probably has one preloaded (e.g. Notepad on Windows computers), but there are more robust ones that can be downloaded for free, like Notepad++ for Windows, Text Wrangler for Mac OSX, or Atom for either.

The command line is a way of interacting with a computer program with text instructions (commands), instead of point-and-click GUIs, (graphical user interfaces). For example, instead of clicking on your Documents folder and scrolling through to find a file, you can type text commands into a command prompt to do the same thing. Knowing the basics of the command line helps to understand how a computer thinks, and can be a good introduction to code-ish things for those who have little experience. This Command Line Crash Course from Learn Python the Hard Way gives a quick tutorial on how to use the command line to move through your computer’s file structure.

Code Academy has free, interactive lessons in many different coding languages.

Python seems to be the code language of choice for digital scholars (and a lot of other people). It’s intuitive to learn and can be used to build a variety of programs.

Screenshot of a command line interface.

Screenshot of a command line interface.

Next week we will dive into Text Analysis. See you then!

The Time and Place for PDF: An Interview with Duff Johnson of the PDF Association

The following is a guest post by Kate Murray, Digital Projects Coordinator at the Library of Congress. The Library of Congress is both a producer and collector of PDFs and has recently joined the PDF Association as a Partner Organization. At the upcoming PDF Day organized by the PDF Association, the Library of Congress will […]

New FADGI MXF AS-07 Specification and Sample Files Published

The following is a guest post by Kate Murray, organizer the FADGI Audio-Visual Working Group and Digital Projects Coordinator at the Library of Congress. The Federal Agencies Digital Guidelines Initiative (FADGI) is pleased to announce the publication of a new version of the MXF AS-07 Application Specification (with CC BY-SA 4.0 license) and its accompanying […]

New FADGI Guidelines for Embedded Metadata in DPX Files

The Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce that its new draft publication, Embedding Metadata in Scanned Motion Picture Film Files: Guideline for Federal Agency Use of DPX Files, is available for public comment. The Digital Picture Exchange format typically stores image-only data from scanned motion picture film or born-digital […]

FADGI MXF Video Specification Moves Up an Industry-organization Approval Ladder

The following is a guest post by Carl Fleischhauer, who organized the FADGI Audio-Visual Working Group in 2007. Fleischhauer recently retired from the Library of Congress. The Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group is pleased to announce a milestone in the development of the AS-07 MXF video-preservation format specification. AS-07 has taken shape […]

Demystifying Digital Preservation for the Audiovisual Archiving Community

The following is a guest post by Kathryn Gronsbell, Digital Asset Manager, Carnegie Hall; Shira Peltzman, Digital Archivist, UCLA Library; Ashley Blewer, Applications Developer, NYPL; and Rebecca Fraimow, Archivist and AAPB NDSR Program Coordinator, WGBH. The intersection of digital preservation and audiovisual archiving has reached a tipping point. As the media production and use landscape […]

Avoid Jitter! Measuring the Performance of Audio Analog-to-Digital Converters

The following is a guest post by Carl Fleischhauer, a Project Manager in the National Digital Initiatives unit at the Library of Congress. It’s not for everyone, but I enjoy trying to figure out specialized technical terminology, even at a superficial level. For the last month or two, I have been helping assemble a revision […]

Keeping Our Tools Sharp: Approaching the Annual Review of the Library of Congress Recommended Formats Statement

The following post is by Ted Westervelt, head of acquisitions and cataloging for U.S. Serials in the Arts, Humanities & Sciences section at the Library of Congress. Since first launching its Recommended Formats Statement (then called Recommended Format Specifications in 2014), the Library of Congress has committed to treating it as an important part of […]

Survey: How Do You Approach Web Archiving?

Do you have fifteen minutes to tell the National Digital Stewardship Alliance about your organization’s web archiving activities? If the answer is yes, please contribute to the NDSA Web Archiving Survey. By filling out this short survey, your institution will be part of a multi-year project to track the evolution of web archiving programs in […]

ODF: The Open Document Format

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager at the Library of Congress. During December 2015, the Library’s Format Sustainability website added descriptions of eleven members of the Open Document Format family, aka OpenDocument and ODF. These eleven join a number of other format descriptions mounted in 2015, many […]