Purpose
To identify opportunities for librarians and information professionals to accelerate discovery and advance health through biomedical data science, with a specific focus on areas where the National Library of Medicine’s (NLM) Office of Engagement and Training (OET), and the Network of the National Library of Medicine (NNLM) can help build capacity in the library community.
Background
As part of A Platform for Biomedical Discovery and Data-Powered Health: Strategic Plan 2017-2027, NLM identified several overarching goals to help unleash the potential of data and information to accelerate and transform discovery and improve health and healthcare.
The first of these stated goals was to “accelerate discovery and advance health by providing the tools for data-driven research,” which contained several objectives, including Objective 1.2, “Advance research and development in biomedical informatics and data science.” Additionally, Strategic Plan Goal 3, “Build a workforce for data-driven research and health,” included Objective 3.1, “Expand and enhance research training for biomedical informatics and data science,” and Objective 3.2, “Assure data science and open science proficiency.”
To pursue these goals and objectives, in April 2019, the NLM Office of Strategic Initiatives (OSI) hosted a workshop titled "Developing the Librarian Data Science and Open Science Workforce," with the intent of identifying “the set of skills that librarians will need to advance work in data science and open science.” The report emerging from the workshop outlined seven broad areas of “core skills for librarians for data science and open science.”
Relatedly, in April 2020, the Medical Library Association unveiled a new “Data Services Competency” framework, “to guide librarians who are pursuing training to enhance their data and open science skills.” The Data Services Competency provides a list of relevant skills that are “needed for providing data services in a library setting.” This focus on the services being provided by the librarian, as opposed to a particular job title or description, supports the idea that data literacy and competency may be of use throughout medical librarianship, and not just for those in data librarian roles.
This document uses both the OSI workshop report and the Medical Library Association Data Services Competency as jumping off points, and attempts to identify ways in which OET and the NNLM can further Strategic Plan Goal 1 (with a focus on Objective 1.2) and Goal 3 (with a focus on Objectives 3.1 and 3.2) by building capacity throughout the health sciences librarian community, with a focus on specific areas and skills most relevant to data-driven research and health. Additionally, this document attempts to draw explicit connections between these areas of capacity-building and related NLM offerings.
Data Science Areas of Activity
Librarians and information professionals can support research, conduct data science, and promote health in a variety of ways.
We have identified the following three broad areas of activity in support of data-driven research and health, where the skills and expertise of the library community could be most helpful.
A. Accelerating Discovery through Data Science
This area is most closely connected to NLM Strategic Plan Goal 1.2 (“Advance Research and Development in Biomedical Informatics and Data Science”), and specifically describes efforts to directly pursue new discovery in the fields of health and biomedicine through data-driven research and development.
Historically, librarians and information professionals have primarily played a useful support role in the research enterprise (see Area C, below). However, librarians’ facility for organizing and interpreting information and data, proficiency with working with a wide variety of types of data (e.g. bibliographic data, usage data, etc.), and tradition of “building bridges” between concepts, fields, researchers, and organizations (both within their own institutions and beyond) may make them uniquely suited to participate more directly in the data-driven research process, especially for interdisciplinary research that involves forging connections between research efforts in different fields.
Additionally, many librarians are increasingly leveraging new data-driven techniques to accelerate discovery in their own research. From using text mining and machine learning to facilitate the process of creating systematic reviews to conducting bibliometric analyses to gain new understanding of the current state of research in particular scientific fields, applying data science methods to the study of science and research can generate new insights and help shape future research.
B. Advancing Health and Healthcare Through the Application of Data-Centric Clinical Informatics
This area includes the operation, maintenance, and support of information systems in clinical settings, and connects to NLM Strategic Plan Goal 1 (“accelerate discovery and advance health by providing the tools for data-driven research”) more broadly.
Hospital and clinical librarians are positioned to contribute to these efforts, especially in cases where clinical data is being used as an input to data-driven research, which is directly connected to NLM Strategic Plan Goal 1.1 (“Connect the resources of a digital research enterprise”).
C. Supporting the Research Enterprise with Effective Research Data Management
This area covers activities related to efficiently, effectively and securely organizing, storing, preserving and sharing data collected and used in biomedical research. This includes the organization and facilitation of access to information, and is deeply connected to the ideals of Open Science, which is why it is historically the most connected to the traditional librarianship skill set.
This area also can serve as a bridge to or between the other two areas listed above: data-driven research (area A) requires the development and execution of comprehensive and mature data management plans; and effective research data management is key in the responsible use of clinical information as a substrate for research, both in ensuring the privacy and security of data, and in working with those who manage clinical information systems (area B) to extract potential research data.
Where OET and NNLM Can Build Capacity
The following list enumerates skills and competencies which may be useful or necessary in pursuing the above areas of activity. Many of these skills are already present in the librarian/information professional community, and are ready to be developed further. Other competencies are similar or related to skills traditionally found in the library community, and offer an opportunity for librarians to grow into new or expanded roles.
This list is roughly organized into categories following the research process, from gathering open source data, to conceptualizing and conducting data-driven research, to managing and sharing research data. Many of the skills listed overlap categories, and may be relevant in many phases of the research process.
NLM offerings that relate to the listed competencies are noted in brackets.
The Area(s) of Activity (outlined on the previous tab) to which each skill is relevant is noted with a check mark.
Category |
Area A: |
Area B: |
Area C: Supporting the Research Enterprise with Effective Research Data Management |
1. Accessing Open Data for use in Computational Research |
|
|
|
A. Finding and using data repositories |
√ |
|
√ |
B. Using APIs to retrieve data [E-Utilities] |
√ |
√ |
√ |
C. Citing and referencing shared data [Citing Medicine] |
√ |
|
√ |
2. Conceptualizing, Planning, and Conducting Data-Driven Research |
|||
A. Designing research according to scientifically sound and domain-relevant practices
|
√ | ||
B. Cleaning, standardizing and transforming data (data wrangling) | √ | √ | √ |
C. Analyzing data using various methods
|
√ | ||
3. Managing Research Data (see above, but also:) | |||
A. Explaining the research lifecycle | √ | √ | |
B. Developing data management plans | √ | ||
C. Tracking data workflows
|
√ | ||
4. Sharing Research Data | |||
A. Complying with public access mandates [NIH Manuscript Submission System, ClinicalTrials.gov] | √ | √ | |
B. Structuring data for and submitting data to repositories [GenBank, SRA] | √ | √ | |
5. Building and Maintaining Systems to Support Data-Driven Research and Health | |||
A. Designing and managing databases
|
√ | √ | √ |
B. Conducting systems analysis | √ | ||
C. Building data models | √ | √ | |
D. Applying standard information architecture (HL7) | √ | ||
E. Creating, applying and enforcing metadata standards
|
√ | √ | |
F. Creating and improving search, retrieval and sorting algorithms [CHiQA, Best Match] | √ | ||
6. Working with Data Responsibly | |||
A. Applying and championing Open Science principles and practices
|
√ | √ | |
B. Designing and documenting reproducible research protocols | √ | ||
C. Creating and implementing data policies and governance | √ | √ | √ |
D. Managing data privacy and security [NLM Scrubber] | √ | √ | √ |
E. Upholding and educating ethical standards and practices for data
|
√ | √ | √ |
Notes
This list is not exhaustive, and omits many skills or competencies that librarians and information professionals may require to do their jobs effectively. The list above focuses on the skills presented in the first three broad areas identified in the OSI workshop report (Data Skills, Computational Skills, and Research and Subject Matter Knowledge), which primarily include skills that are uniquely relevant to supporting data-driven research and health, with a particular emphasis on technical areas that require a foundation of computational literacy. These are areas where UEP sees the most immediate opportunities for building capacity in the library community.
Additionally, the list is not intended to be viewed as a mandatory set of prerequisites. Rather, the list should be viewed as aspirational, as a menu of possible areas of professional development and capacity-building. A well-rounded health sciences librarian might have particular expertise in one or more of these competencies, plus a basic understanding and familiarity with several others; not every librarian should expect to need expert-level abilities in every one of these skills.
Changelog
8/19/2020, v1.01: In the "Where UPE & NNLM Can Build Capacity" table, "Accelerating Open Data" corrected to "Accessing Open Data."
8/4/2020, v1.0: First publication.
Last Reviewed: May 18, 2022