This is a guest post by Nicole Contaxis, Data Catalog Coordinator at NYU Health Sciences Library. You can email her at [email protected].
An increasing number of publishers and grant-funding organizations are requiring researchers to share their data, so libraries and other institutions are creating tools and strategies to support researchers in this effort. To meet these challenges and communicate the benefits of data sharing, the NYU Health Sciences Library created the NYU Data Catalog, a low-barrier way for researchers to share information about their data.
The NYU Data Catalog
The NYU Data Catalog is a searchable and browsable online collection of datasets. Rather than function as a data repository, the catalog is a digital way-finder for researchers looking for data relevant to their work. Each dataset is described in detail with rich metadata. We include information about who can access each dataset and how, using the metadata elements Access Restrictions and Access Instructions. Other important descriptors include Subject Domains and Keywords which are meant to give users a better idea of the content of the dataset. Some metadata elements are not intended to be used for all types of datasets but are particularly helpful in certain circumstances. Geographic Coverage and Timeframe of Data Collection help researchers identify data about population characteristics and public health by explaining where and for how long the data was collected. When these descriptors are not important or not pertinent to the dataset, we simply leave them blank.
Some of the datasets in the catalog are created by researchers at NYU and others are created by outside agencies, like the U.S. Census Bureau. The catalog includes information on licensed datasets, datasets that require IRB application, as well as datasets that are publicly available. To connect researchers to colleagues with knowledge about these datasets and to encourage collaboration, the catalog lists the NYU researcher who has authored the data or has expertise in it (e.g. published on the dataset or used it previously in her research).
In this way, the NYU Data Catalog was designed to:
- increase the visibility of research data generated at NYU
- facilitate collaboration across departments and institutes
- help researchers locate and understand datasets generated by outside institutions
- support the process of re-using research data.
These goals, while lofty, are attainable with adequate researcher participation and for that reason the NYU Health Sciences Library is currently engaged in a comprehensive outreach effort.
The success of the catalog relies on researcher buy-in. In order for the catalog to be a helpful resource, researchers need to contribute records for their datasets and they need to use the catalog to locate datasets and possible collaborators. Achieving adequate user participation for library projects is not a novel obstacle, and this issue has received attention on The Signal previously with posts about the Smithsonian’s Transcription Center.
For projects that require user participation like the NYU Data Catalog, it is imperative to perform outreach in a way that ensures the researchers feel comfortable contributing to the resource, using the resource, sharing the resource with other researchers and updating their contributions as their expertise and publication history grows over time. It is not enough for researchers to contribute records for their research data; we want the catalog to grow and change along with the research community.
Outreach for Building & Maintaining Relationships
Outreach for this project is best understood as a bedrock for building and maintaining relationships with researchers. To design our outreach strategy, we have pulled from the experience and expertise of other librarians, including T-Kay Sangward’s work on ethical partnerships for digital libraries and Micha Broadnax’s work on archival outreach with students. Building a successful relationship with a researcher means that she will be engaged with the catalog, that she will be more likely to point her students to it, that she will be more likely to use it herself and that she is more likely to contribute new datasets as her research expands.
To help maintain relationships with researchers, we are working to create services that will continue to engage researchers after they initially describe their data in the catalog. We are working towards creating usable and helpful analytics so that we can send reports to researchers on how frequently users look at their dataset records.
The Story of One Record
Each record in the NYU Data Catalog is the result of a discussion between the cataloger and the researcher. While a cataloger can gather a substantial amount of information about a researcher’s data from her publications and grants, researcher approval and input is necessary to ensure that each record is accurate, helpful, and complete. Research data is not a monolith. It needs to be cataloged in a way that respects differences across academic disciplines, privacy and ethics concerns, and data sharing requirements from publishers and grant funders. Because of these facts, it is necessary to listen attentively to each individual researcher while cataloging their data. Deferring to their subject expertise is particularly important.
Laura Wyatt, for example, is the Research Data Manager for the Section for Health Equity at the Department of Population Health in the NYU School of Medicine. We located her while becoming better acquainted with the staff, faculty and research projects within the Department of Population Health. After introducing Ms. Wyatt to the NYU Data Catalog via email, we set up an in-person meeting to discuss the various datasets in her care and whether or not they would be a good fit for the catalog. During the meeting, Ms. Wyatt mentioned that the team had heard about the catalog before and had wanted to contribute to it but with publication and grant application deadlines, they were never able to complete the process. Although Ms. Wyatt needed to confirm with each of the Principal Investigators what could be shared, she was able to contribute five unique datasets. Those datasets include:
- Asian American Partnership in Research and Empowerment
- Diabetes Research, Education, and Action for Minorities
- Community Health Resources and Needs Assessment
- Racial and Ethnic Approaches to Community Health across the U.S. Risk Factor Survey
- Reaching Immigrants through Community Empowerment.
Throughout the outreach process, it has become increasingly apparent that focused and personalized attention, demonstrated through individualized emails and one-on-one meetings, helps increase researcher participation. Because of the number of obligations researchers have, it is important to demonstrate that the cataloger has the time and energy to address their specific needs and the needs of their data. Individual outreach, including exploring each researcher’s work before emailing her, can make all of the difference. Even researchers who are interested in sharing their data may not contribute to the catalog unless they are individually addressed. Forming an individual relationship may be time-consuming but it can make a big difference in the quantity and quality of researcher contributions.
With permission from the Principal Investigators, Wyatt sent detailed descriptions about each dataset, including a description, time frame, geographic coverage, subject domains, keywords, grant support, and publications that describe how the data was collected or analyzed. While we added information about how to access the datasets and who to contact about them, there was little additional work for the cataloger to do.
Making the Invisible Visible
It is important to note that some of the datasets in the catalog, like the datasets that Wyatt helped contribute, are only made visible with the NYU Data Catalog. Although the publications related to these datasets are available elsewhere, the NYU Data Catalog is the only resource that provides information about the data explicitly and provides access information for how to access it. Because we allow researchers to retain control over their data, there are fewer obstacles for contributing to the catalog than there are for depositing data in a repository. While it would be ideal for researchers to store their data in a repository and we do encourage them to do so, it is not always practical, possible or desirable. By being flexible, we are able to highlight unique datasets that cannot be found anywhere else.
Performing outreach and building relationships with researchers requires time and energy but it allows us to highlight previously unknown datasets, encourage collaboration and create a resource for the research community. Building tools and devising strategies to help researchers share and re-use data is only helpful with researcher buy-in. We at the NYU Health Sciences Library aim to generate that buy-in by developing long-lasting relationships.
Code Availability
In addition to helping researchers share and locate data, the NYU Data Catalog’s code is available on GitHub and documentation is available on the Open Science Framework. Moving forward, the NYU Health Sciences Library hopes to work with other institutions so that they too can create catalogs for datasets relevant to their researchers. If others implement the Data Catalog’s code, it would facilitate the creation of a cross-institutional data catalog that would enable greater data discovery through federated searching.