Research highlighted from Dr. Brennan's Swearing-in Ceremony

NLM Funded Research Projects Highlighted by Dr. Patricia Flatley Brennan at her Swearing-in Ceremony.

Dr. Patti Brennan was sworn-in as NLM Director on September 12, 2016. During her presentation, Dr. Brennan highlighted several NLM-funded research projects. Two research projects were from NLM Biomedical Informatics trainees, Emily Mallory (Stanford University) and Justin Mower (Baylor College of Medicine/Rice University), and two were from our R01 grantees, Dr. Matthew Scotch (Arizona State University) and Dr. Elizabeth Chen (Brown University).

Link to NIH Videocast of Dr. Brennan’s Swearing-in Ceremony: https://videocast.nih.gov/summary.asp?Live=19671&bhcp=1

Abstract of Emily Mallory’s Presentation at 2016 NLM Informatics Training Conference:

Constructing a Biomedical Relationship Database from Literature using DeepDive

Authors: Emily K Mallory, Ce Zhang, Christopher Re, Russ B Altman, Stanford University

Abstract: A complete repository of biomedical relationships is key for understanding cellular processes, human disease and drug response. After decades of experimental research, the majority of the discovered biomedical relationships exist solely in textual form in the literature. While curated databases have experts manually annotate relevant relationships or interactions from text, these databases struggle to keep up with the exponential growth of the literature. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we developed multiple entity and relationship application tasks to extract biomedical relationships from full text articles. Each relationship extractor identified candidate relations using co-occurring entities within an input sentence. Using a set of generic feature patterns, DeepDive computed a probability that an individual candidate relation was a true relationship based on the sentence. For extracting gene-gene relationships, our system achieved 76% precision and 49% recall in extracting direct and indirect interactions. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. In addition, we developed extractors for gene-disease and gene-drug relationships. This work represents the first application of DeepDive to the biomedical domain.

This research was supported by NLM training grant:

Altman, Russ B
Biomedical Informatics Training at Stanford
4 T15 LM007033-33
Stanford University

Abstract of Justin Mower’s Presentation at 2016 NLM Informatics Training Conference:

Classification of Literature Derived Drug Side Effect Relationships

Authors: Justin Mower^1,2, Devika Subramanian³, Trevor Cohen^1,2

¹Baylor College of Medicine, Houston, TX, ²University of Texas Health Science Center at Houston, Houston, TX, ³Rice University, Houston, TX

Abstract: Adverse drug events (ADEs) are one of the leading causes of preventable patient morbidity and mortality. An important aspect of post-marketing drug surveillance involves identifying potential side-effects utilizing ADE reporting systems and/or Electronic Health Records. Due to the inherent noise of these data, identified drug/ADE associations must be manually reviewed by domain experts – a human-intensive process that scales poorly with large numbers of possibly dangerous associations and rapid growth of biomedical literature.

Consequently, recent work has employed scalable Literature Based Discovery methods, which exploit implicit relationships between biomedical entities within the literature to assist in identifying plausible drug/ADE connections. We extend this work by evaluating machine learning classifiers applied to high-dimensional vector representations of relationships extracted from the literature by the SemRep Natural Language Processing system, as a means to identify true drug/ADE connections. Evaluating against a manually curated reference standard, we show that applying a classifier to such representations improves performance over previous approaches. These trained systems are able to reproduce outcomes of the extensive manual literature review process used to create the reference standard, paving the way for assisted, automated review as an integral component of the pharmacovigilance process.

This research was supported by NLM training grant:

Kavraki, Lydia E
NLM Training Program in Biomedical Informatics for Predoc & Postdoctoral Fellows
4 T15 LM007093-25
Rice University

Abstract of Dr. Matthew Scotch’s R01 Research Project:

Merging Viral Genetics with Climate and Population Data for Zoonotic Surveillance

Recent events such as pandemic influenza A (H1N1) pdm09 have demonstrated how mutations in a viral genome can greatly impact disease spread and population health risk. Thus, there is now a greater need to merge viral genetics within state health agency surveillance practice. This is particularly relevant for zoonotic viruse that are transmittable between animals and humans such as influenza, rabies, and West Nile Virus. As an added complexity, there are many potential drivers of virus transmission that need to be considered including climate, population and travel, and ultimately, genetic polymorphisms in the virus itself. Zoonotic disease surveillance at the state level is most often performed using data that originates from passive case reporting by laboratories or clinicians rather than secondary data from resources such as GenBank. While these data are sufficient for federal reporting purposes and basic trend analysis, they only measure the number of suspected or confirmed cases and not the genetic characteristics of the virus. When states and federal agencies do use genotyping, it is often limited to certain pathogens (mostly bacteria) and only for samples that are reported through passive surveillance or during outbreak investigations. The omission of secondary viral genetic data limits the types of analysis by state health agencies. For example, current reportable disease data do not enable epidemiologists to determine the origin of a particular viral strain, trace how it has spread, or identify climate, population, and genetic factors enabling it to propagate. In this study, we will develop and evaluate an integrated bioinformatics framework to supplement current zoonotic disease surveillance approaches at state health agencies. We hypothesize that a framework that properly merges viral genetic data with climate, population, and travel data can accurately predict the timing of initial peaks of seasonal epidemics caused by zoonotic viruses. Health agencies can then use these trends to prioritize control measures and reduce morbidity and mortality. In addition, we will address the barriers to health agency utilization of bioinformatics resources and secondary data by developing an online portal for accessing and querying of complex viral genetic models. We will measure the perceived usefulness of information from our framework as part of our long-term goal of utilization and adoption by health agencies. In Aim 1, we will develop an automated bioinformatics system that models virus diffusion while testing the significance of climate, population, and genetic predictors. As part of this effort, we will provide a publically available Web portal for health agencies and other users to access our results, and run their own models. In Aim 2, we will use our platform to identify significant climate, population, and genetic predictors of diffusion across different zoonotic viruses including influenza and WNV. In Aim 3, we will evaluate the accuracy of a bioinformatics system that uses statistically significant climate, population, and genetic predictors to identify seasonal trends of zoonotic virus epidemics and communicate these findings to different health agencies.

Scotch, Matthew
Merging Viral Genetics with Climate and Population Data for Zoonotic Surveillance
5 R01 LM012080-02
Arizona State University-Tempe Campus

Abstract of Dr. Elizabeth Chen’s R01 project:

Leveraging the EHR to Collect and Analyze Social, Behavioral & Familial Factors

The importance of understanding interactions among social, behavioral, environmental, and genetic factors and their relationship to health has led to greater interest in studying these determinants of disease in the biomedical research community. While some knowledge exists regarding contributions of specific determinants such as socioeconomic status, educational background, tobacco and alcohol use, and genetic susceptibility to particular diseases or conditions, enhanced methods are needed to analyze and ascertain interrelationships among multiple determinants and to discover potentially unexpected relationships that may ultimately contribute to improving patient care and population health. The increased adoption of electronic health record (EHR) systems has the potential for enhanced collection and access to a wide range of information about an individual's lifetime health status and health care to support a range of "secondary uses" such as biomedical, behavioral and social science, and public health research. Traditionally, clinicians document an individual's health history in clinical notes, including social and behavioral factors within the "social histor" section and familial factors in the "family history" section. While some EHR systems have specific modules for collecting social and family history in structured or semi-structured formats, a large amount of this information is recorded primarily in narrative format, thus necessitating the need for automated methods to facilitate the extraction and integration of social, behavioral, and familial factors for subsequent uses. Once extracted, knowledge acquisition and discovery methods can be applied to both confirm known relationships relative to specific diseases or conditions as well as to potentially discover new relationships. We hypothesize that advanced computational methods can transform social, behavioral, and familial factors from the EHR into a rich longitudinal resource for generating knowledge regarding various determinants of health including their temporal progression, severity, and relationship to health conditions. Towards this goal, the specific aims are to: (1) develop comprehensive information models and natural language processing (NLP) techniques to represent, extract, and integrate social, behavioral, and familial factors from social and family history information in the EHR, (2) adapt and extend data mining techniques to identify non-temporal and temporal relationships among these factors and diseases, and (3) evaluate and validate known and candidate new relationships for specific conditions (pediatric asthma and epilepsy). This multi-site proposal will involve a transdisciplinary team of investigators from the University of Vermont and University of Minnesota, use of EHR data from both institutions, and collaborative development and evaluation of the NLP and data mining techniques. Ultimately, this work has the potential to provide a generalizable approach for supporting and enhancing existing knowledge regarding the interactions among social, behavioral, and familial factors and diseases.

Chen, Elizabeth S
Leveraging the EHR to Collect and Analyze Social, Behavioral & Familial Factors
7R01LM011364-04
Brown University

Last Reviewed: November 4, 2016

Grants and Funding: Extramural Programs (EP)