Spotlighting Research Data: Building Relationships with Outreach for the NYU Data Catalog

This is a guest post by Nicole Contaxis, Data Catalog Coordinator at NYU Health Sciences Library. You can email her at [email protected].

Screenshot of the NYU Data Catalog Homepage

Screenshot of the NYU Data Catalog Homepage.

An increasing number of publishers and grant-funding organizations are requiring researchers to share their data, so libraries and other institutions are creating tools and strategies to support researchers in this effort. To meet these challenges and communicate the benefits of data sharing, the NYU Health Sciences Library created the NYU Data Catalog, a low-barrier way for researchers to share information about their data.

The NYU Data Catalog
The NYU Data Catalog is a searchable and browsable online collection of datasets. Rather than function as a data repository, the catalog is a digital way-finder for researchers looking for data relevant to their work. Each dataset is described in detail with rich metadata. We include information about who can access each dataset and how, using the metadata elements Access Restrictions and Access Instructions. Other important descriptors include Subject Domains and Keywords which are meant to give users a better idea of the content of the dataset. Some metadata elements are not intended to be used for all types of datasets but are particularly helpful in certain circumstances. Geographic Coverage and Timeframe of Data Collection help researchers identify data about population characteristics and public health by explaining where and for how long the data was collected. When these descriptors are not important or not pertinent to the dataset, we simply leave them blank.

Some of the datasets in the catalog are created by researchers at NYU and others are created by outside agencies, like the U.S. Census Bureau. The catalog includes information on licensed datasets, datasets that require IRB application, as well as datasets that are publicly available. To connect researchers to colleagues with knowledge about these datasets and to encourage collaboration, the catalog lists the NYU researcher who has authored the data or has expertise in it (e.g. published on the dataset or used it previously in her research).

In this way, the NYU Data Catalog was designed to:

  • increase the visibility of research data generated at NYU
  • facilitate collaboration across departments and institutes
  • help researchers locate and understand datasets generated by outside institutions
  • support the process of re-using research data.

These goals, while lofty, are attainable with adequate researcher participation and for that reason the NYU Health Sciences Library is currently engaged in a comprehensive outreach effort.

Photo of A community meeting of participants in the NYU study, Diabetes Research, Education, and Action for Minorities. A description of the data from this study is now available on the NYU Data Catalog. Credit: Laura Wyatt.

A community meeting of participants in the NYU study, Diabetes Research, Education, and Action for Minorities. A description of the data from this study is now available on the NYU Data Catalog. Credit: Laura Wyatt.

The success of the catalog relies on researcher buy-in. In order for the catalog to be a helpful resource, researchers need to contribute records for their datasets and they need to use the catalog to locate datasets and possible collaborators. Achieving adequate user participation for library projects is not a novel obstacle, and this issue has received attention on The Signal previously with posts about the Smithsonian’s Transcription Center.

For projects that require user participation like the NYU Data Catalog, it is imperative to perform outreach in a way that ensures the researchers feel comfortable contributing to the resource, using the resource, sharing the resource with other researchers and updating their contributions as their expertise and publication history grows over time. It is not enough for researchers to contribute records for their research data; we want the catalog to grow and change along with the research community.

Outreach for Building & Maintaining Relationships
Outreach for this project is best understood as a bedrock for building and maintaining relationships with researchers. To design our outreach strategy, we have pulled from the experience and expertise of other librarians, including T-Kay Sangward’s work on ethical partnerships for digital libraries and Micha Broadnax’s work on archival outreach with students. Building a successful relationship with a researcher means that she will be engaged with the catalog, that she will be more likely to point her students to it, that she will be more likely to use it herself and that she is more likely to contribute new datasets as her research expands.

To help maintain relationships with researchers, we are working to create services that will continue to engage researchers after they initially describe their data in the catalog. We are working towards creating usable and helpful analytics so that we can send reports to researchers on how frequently users look at their dataset records.

The Story of One Record
Each record in the NYU Data Catalog is the result of a discussion between the cataloger and the researcher. While a cataloger can gather a substantial amount of information about a researcher’s data from her publications and grants, researcher approval and input is necessary to ensure that each record is accurate, helpful, and complete. Research data is not a monolith. It needs to be cataloged in a way that respects differences across academic disciplines, privacy and ethics concerns, and data sharing requirements from publishers and grant funders. Because of these facts, it is necessary to listen attentively to each individual researcher while cataloging their data. Deferring to their subject expertise is particularly important.

Laura Wyatt, for example, is the Research Data Manager for the Section for Health Equity at the Department of Population Health in the NYU School of Medicine. We located her while becoming better acquainted with the staff, faculty and research projects within the Department of Population Health. After introducing Ms. Wyatt to the NYU Data Catalog via email, we set up an in-person meeting to discuss the various datasets in her care and whether or not they would be a good fit for the catalog. During the meeting, Ms. Wyatt mentioned that the team had heard about the catalog before and had wanted to contribute to it but with publication and grant application deadlines, they were never able to complete the process. Although Ms. Wyatt needed to confirm with each of the Principal Investigators what could be shared, she was able to contribute five unique datasets. Those datasets include:

Screenshot of the DREAM dataset on the NYU Data Catalog.

Screenshot of the DREAM dataset on the NYU Data Catalog.

Throughout the outreach process, it has become increasingly apparent that focused and personalized attention, demonstrated through individualized emails and one-on-one meetings, helps increase researcher participation. Because of the number of obligations researchers have, it is important to demonstrate that the cataloger has the time and energy to address their specific needs and the needs of their data. Individual outreach, including exploring each researcher’s work before emailing her, can make all of the difference. Even researchers who are interested in sharing their data may not contribute to the catalog unless they are individually addressed. Forming an individual relationship may be time-consuming but it can make a big difference in the quantity and quality of researcher contributions.

With permission from the Principal Investigators, Wyatt sent detailed descriptions about each dataset, including a description, time frame, geographic coverage, subject domains, keywords, grant support, and publications that describe how the data was collected or analyzed. While we added information about how to access the datasets and who to contact about them, there was little additional work for the cataloger to do.

Making the Invisible Visible
It is important to note that some of the datasets in the catalog, like the datasets that Wyatt helped contribute, are only made visible with the NYU Data Catalog. Although the publications related to these datasets are available elsewhere, the NYU Data Catalog is the only resource that provides information about the data explicitly and provides access information for how to access it. Because we allow researchers to retain control over their data, there are fewer obstacles for contributing to the catalog than there are for depositing data in a repository. While it would be ideal for researchers to store their data in a repository and we do encourage them to do so, it is not always practical, possible or desirable. By being flexible, we are able to highlight unique datasets that cannot be found anywhere else.

Performing outreach and building relationships with researchers requires time and energy but it allows us to highlight previously unknown datasets, encourage collaboration and create a resource for the research community. Building tools and devising strategies to help researchers share and re-use data is only helpful with researcher buy-in. We at the NYU Health Sciences Library aim to generate that buy-in by developing long-lasting relationships.

Code Availability
In addition to helping researchers share and locate data, the NYU Data Catalog’s code is available on GitHub and documentation is available on the Open Science Framework. Moving forward, the NYU Health Sciences Library hopes to work with other institutions so that they too can create catalogs for datasets relevant to their researchers. If others implement the Data Catalog’s code, it would facilitate the creation of a cross-institutional data catalog that would enable greater data discovery through federated searching.

Using Three-Dimensional Modeling to Preserve Cultural Heritage

This is a guest post by Elizabeth England, a resident in the National Digital Stewardship Residency program. In recent years, a few news stories focused on the use of digital tools in preserving cultural heritage three-dimensional objects, stories such as the printed reconstruction of the Arch of Triumph in Palmyra, Syria and the construction of a […]

Library of Congress Advisory Team Kicks off New Digitization Effort at Eckerd College

This is a guest post by Eckerd College faculty David Gliem, associate professor of Art History, and Nancy Schuler, librarian and assistant professor of Electronic Resources, Collection Development and Instructional Services. On June 3rd, a meeting at Eckerd College in St. Petersburg, Florida, brought key experts and College departments together to begin plans for the […]

Digital Curation and the Public: Strategies for Education and Advocacy

This is a guest post by Jaime Mears. On March 4th, 2016, the Washington DC Public Library hosted Digital Curation and the Public: Strategies for Education and Advocacy at the Martin Luther King, Jr. Memorial Library. It was what the National Digital Stewardship Residents program calls an “enrichment session” and the audience was composed of NDSR colleagues and mentors. […]

Blurred Lines, Shapes, and Polygons, Part 1: An NDSR-NY Project Update

The following is a guest post by Genevieve Havemeyer-King, National Digital Stewardship Resident at the Wildlife Conservation Society Library & Archives. She participates in the NDSR-NY cohort. This post is Part 1 of 2 posts on Genevieve’s exploration of stewardship issues for preserving geospatial data. A few weeks ago, I wrote an article for the […]

Digital Preservation Planning: An NDSR Boston Project Update

The following is a guest post by Jeffrey Erickson, National Digital Stewardship Resident at the University Archives and Special Collections at UMass Boston. He participates in the NDSR-Boston cohort. I am a recent graduate of Simmons College’s School of Library and Information Science as well as a current participant in this year’s Boston cohort of […]