Grants and Funding: Extramural Programs (EP)
2016 NLM Informatics Lecture Series – Speaker Profiles
Mark Craven is a professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin, and an affiliate faculty member in the Department of Computer Sciences. He is the Director of the Center for Predictive Computational Phenotyping, one of the NIH's Centers of Excellence for Big Data Computing. He is also the Director of the NIH/NLM-funded Computation and Informatics in Biology and Medicine (CIBM) Training Program, and a member of the Institute for Clinical and Translational Research, the Carbone Cancer Center, and the Genome Center of Wisconsin. The focus of his research program is on developing and applying machine-learning methods to the problems of inferring models of, and reasoning about, networks of interactions among genes, proteins, clinical and environmental factors, and phenotypes of interest. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Inferring Host-Pathogen Interactions from Diverse Data Sources
Insight into the mechanisms and context of host-pathogen interactions can be gained by applying computational methods to a broad range of experimental, observational, and secondary data sources. Dr. Craven will discuss his work in three studies that involve developing and applying predictive methods in order to characterize host-pathogen interactions. In the first study, he is focused on inferring host subnetworks that are involved in viral replication from genome-wide loss-of-function experiments. Although these experiments can identify the host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell, they do not elucidate how these genes are organized into the biological pathways that mediate host-virus interactions. His team is developing novel computational methods that use a wide array of secondary data sources, including the scientific literature, to transform the measurements from these assays into hypotheses that predict the pathways in the cell that relate implicated genes to viral replication. In the second study, he is applying machine-learning methods to understand how variation in the genome of HSV-1 influences multiple ocular disease phenotypes in a host. In the third study, he is investigating the extent to which risk for various infectious disease phenotypes can be predicted from electronic health records.
Inferring Host-Pathogen Interactions from Diverse Data Sources
University of Wisconsin Madison
2015 NLM Informatics Lecture Series – Speaker Profiles
Joshua Denny, MD is an Associate Professor in the Departments of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. A primary interest of his lab has been development of the phenome-wide association study (PheWAS) method applied to electronic health records (EHRs) to rapidly uncover genetic pleiotropy and highlight potential drivers of genetic associations with endophenotypes. He helps lead efforts for local and network pharmacogenetics implementation activities. He is part of the NIH-supported Electronic Medical Records and Genomics (eMERGE) network, Pharmacogenomics Research Network (PGRN), and Implementing Genomics in Practice (IGNITE) networks. He is past recipient of the American Medical Informatics Association New Investigator Award, Homer Warner Award, and Vanderbilt Chancellor’s Award for Research. Dr. Denny remains active in clinical care and in teaching students. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Use of Clinical Big Data to Inform Precision Medicine
Precision medicine offers the promise of improved diagnosis and more effective, patient-specific therapies. Typically, clinical research studies have been pursued by enrolling a cohort of willing participants in a town or region, and obtaining information and tissue samples from them. At Vanderbilt, Dr. Denny and his team have linked phenotypic information from de-identified EHRs to a DNA repository of nearly 200,000 samples, creating a ‘virtual’ cohort. This approach allows study of genomic basis of disease and drug response using real-world clinical data. Finding the right information in the EHR can be challenging, but the combination of billing data, laboratory data, medication exposures, and natural language processing has enabled efficient study of genomic and pharmacogenomic phenotypes. The Vanderbilt research team has put many of these discovered pharmacogenomic characteristics into practice through clinical decision support. The EHR also enables the inverse experiment – starting with a genotype and discovering all the phenotypes with which it is associated – PheWAS. PheWAS requires a densely-phenotyped population such as found in the EHR. Dr. Denny’s research team has used PheWAS to replicate more than 300 genotype-phenotype associations, characterize pleiotropy, and discover new associations. They have also used PheWAS to identify characteristics within disease subtypes.
From GWAS to PheWAS: Scanning the EMR Phenome for Gene-Disease Associations
Atul Butte, MD, PhD is the founding Director of the newly-established Institute of Computational Health Sciences at the University of California, San Francisco, and a Professor of Pediatrics. Prior to his new position, he was the chief of the Division of Systems Medicine and Associate Professor of Pediatrics at Stanford University and Lucile Packard Children’s Hospital, where he has been a faculty member for the past decade. Trained in both Computer Science and Medicine at Brown University, Dr. Butte previously worked as a software engineer at Apple Inc. and Microsoft Corp., and received his PhD in Health Sciences and Technology from Harvard Medical School and MIT. He has authored nearly 200 publications, with research repeatedly featured in the New York Times, and the Wall Street Journal and Wired Magazine. In 2013, Dr. Butte was recognized by the White House as an Open Science Champion of Change for promoting science through publicly available data. Dr. Butte is also a founder of several Bay Area biotech startup companies. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Data-Driven Precision Medicine
There is an urgent need to translate genome-era discoveries into clinical utility, but the difficulties in making bench-to-bedside translations have been well described. The nascent field of translational bioinformatics may help. Dr. Butte's lab builds and applies computational tools to convert hundreds of trillions of points of molecular, clinical, and epidemiological data collected by researchers and clinicians worldwide over the past decade, now commonly known as “big data”, into new diagnostics, therapeutics, and insights into rare and common diseases. Dr. Butte, a bioinformatician and pediatric endocrinologist, will highlight how publicly-available molecular measurements to find new uses for drugs including drug repositioning for inflammatory bowel disease, discovering new treatable inflammatory mechanisms of disease in type 2 diabetes, and how the next generation of biotech companies might even start in your garage.
Integrating Microarray and Proteomic Data by Ontology-based Annotation
Dr. John Pestian, Ph.D., MBA is a professor of Pediatrics and Biomedical Informatics at Children’s Hospital Medical Center, University of Cincinnati. He joined the faculty in 2000 as the founding director of the Division of Biomedical Informatics. He has been active in translating neuropsychiatric innovations from the bench to the bedside. One innovation, Optimization and Individualization of Medication Selection and Dosing has been used to identify optimal neuropsychiatric drugs in over 150,000 people. Dr. Pestian’s lab currently focuses on collection and analysis of prospective multimodal data like words, vocal sounds and facial expressions for predicting repeated suicide attempts, depression states, and anxiety in adolescents.
Phenotypical Cohort Retrieval Using the Multi-Instutional Pediatric Epilepsy Decision Support (MiPeds) System
The Multi-Institutional Pediatric Epilepsy Decision Support (MiPeds) system provides point-of-care surveillance of phenotypically similar pediatric epilepsy patients using the electronic health records (EHR) from Cincinnati Children’s Hospital Medical Center, Children’s Hospital of Philadelphia, and Children’s Hospital of Colorado. Using this near real-time cohort retrieval system, the three organizations can review similarities and differences in clinical measures like: medication side-effects, types of seizures, seizure frequency, quality of life, neurological abnormalities, and so forth. This talk will describe the successes and challenges of developing MiPeds. Examples will focus on: aligning the research and clinical needs of each organization with data standards, factors that influence centralization and decentralization, automated method of de-identification, the usefulness of I2B2, developing collaborative measures of data quality and quality of care, searching and visualization. Efforts to generalize this novel approach to other neuropsychiatric disease will be described as well. NLM-Funded Research
Capturing Patient-Provider Encounter through Text Speech and Dialogue Processing
Cincinnati Children's Hospital Medical Center
2014 NLM Informatics Lecture Series – Speaker Profiles
Dr. Peter Szolovits is Professor of Computer Science and Engineering in the MIT Department of Electrical Engineering and Computer Science, Professor of Health Sciences and Technology in the Harvard/MIT Division of Health Sciences and Technology, and head of the Clinical Decision-Making Group within the MIT Computer Science and Artificial Intelligence Laboratory. His research centers on the application of AI methods to problems of medical decision making, natural language processing to extract meaningful data from clinical narratives to support translational medicine, and the design of information systems for health care institutions and patients. Dr. Szolovits received his bachelor's degree in physics and his PhD in information science, both from Caltech. Dr. Szolovits was elected to the Institute of Medicine of the National Academies and is a Fellow of the American Association for Artificial Intelligence, the American College of Medical Informatics and the American Institute for Medical and Biological Engineering. He is a member of the National Library of Medicine’s Biomedical Library and Informatics Review Committee. He is the 2013 recipient of the Morris F. Collen Award of Excellence from the American College of Medical Informatics.
How to Learn in “The Learning Healthcare System”
The Institute of Medicine has argued for more than 20 years that we should view every patient interaction as an (uncontrolled) experiment, and learn from its outcome. Dr. Szolovits has been a participant in numerous collaborative projects, trying to apply this method to data about a broad range of patients suffering from conditions such as arthritis, cardiovascular disease, diabetes, inflammatory bowel disease, autism, depression. In this lecture, he will review some of the methodological challenges he has encountered and the hard-won lessons he has learned. These include the careful formulation of study goals, the importance of open data, what kinds of models to build, how to extract meaning from narrative text, and how to incorporate non-traditional sources of data into a research protocol. Dr. Szolovits will also describe a largely unsuccessful effort to ease the data collection burden in health care by having computerized speech understanding systems listen to and analyze conversations between doctors and patients.
Capturing Patient-Provider Encounter through Text Speech and Dialogue Processing
Massachusetts Institute of Technology
Dr. Guergana Savova is an Associate Professor at Harvard Medical School and Boston Children’s Hospital. Her research interests are in NLP especially as applied to the text generated by physicians (the clinical narrative). She has been creating gold-standard annotated resources based on computable definitions and developing methods for computable solutions. The focus of Dr. Savova's research is higher level semantic and discourse processing of the clinical narrative. The result of Dr. Savova's research with her collaborators has led to the creation of the clinical Text Analysis and Knowledge Extraction System (cTAKES; ctakes.apache.org). Dr. Savova is on the editorial board of the Journal of the Medical Informatics Association (JAMIA), and is a reviewer for several journals including Journal of the Biomedical Informatics (JBI), Journal of Language Resources and Evaluation (LREC), and many conferences/workshops. She is also a member of the National Library of Medicine's Biomedical Library and Informatics Review Committee. Dr. Savova holds a PhD in Linguistics with a minor in Cognitive Science and a Master of Science in Computer Science from University of Minnesota.
Temporal Relation Discovery from the Clinical Narrative
There is an abundance of health-related free text that can be used for a variety of immediate biomedical applications – phenotyping for Genome Wide Studies, clinical point of care, patient powered applications, biomedical research. The presentation will cover current research problems in Natural Language Processing such as temporal relation discovery – a research program funded by grants from the National Library of Medicine (thyme.healthnlp.org). The talk will also outline resources with computable gold-standard annotations created under several NIH-funded projects. It will describe several state-of-the-art system evaluations organized around them (2013 and 2014 CLEF/ShARE Shared task; 2014 and 2015 SemEval Task 7 Analysis of Clinical Text; 2015 SemEval ClinicalTempEval). Applications of NLP to biomedical problems will be discussed within the framework of national networks such as electronic Medical Records and Genomics (eMERGE), Pharmacogenomics Research Network (PGRN), Informatics for Integrating the Biology and the Bedside (i2b2), Patient Centered Outcomes Research Institute (PCORI).
Temporal Relation Discovery for Clinical Text
5 R01 LM010090-04
Children’s Hospital Boston
Dr. Chunhua Weng is the Florence Irving Assistant Professor of Biomedical Informatics at Columbia University, where she has been a faculty member since 2007. Before arriving at Columbia, she obtained an undergraduate degree in computer science from Nankai University, P. R. China, a master’s degree in Information and Computer Science from University of California at Irvine, and a Ph.D. in Biomedical and Health Informatics from University of Washington at Seattle. Dr. Weng’s current primary research interests are (1) designing and applying text knowledge engineering methods to improve the computability of clinical research designs; and (2) designing data-driven methods to increase the transparency and generalizability of clinical research. Dr. Weng serves on the National Library of Medicine Biomedical Library and Informatics Review Committee.
Bridging the Semantic Gap between Research Eligibility Criteria and Clinical Data: Methods and Issues
With the burgeoning adoption of electronic health records (EHRs), vast amounts of clinical data are increasingly available for computational reuse. It is imperative that the scientific community leverage Big Data to accelerate clinical and translational science at low cost and large scale. A critical step toward this goal is matching clinical research eligibility criteria to clinical data for cohort identification. However, this task is complicated by the semantic gap between free-text eligibility criteria and raw clinical data: each criterion has many ways to describe it and a myriad of clinical data points that represent it. In fact, the semantic gap is a significant multifactorial problem because of the central role that clinical research eligibility criteria play in clinical and translational research. In a typical study, they undergo a complex evolution: perceived, defined, interpreted, implemented, and adapted by various stakeholders for a series of clinical research tasks. During the design phase, investigators choose eligibility criteria to define a study’s target population. During screening and recruitment, the criteria are used and interpreted by clinical research coordinators, query analysts, and even research volunteers themselves, each possessing different decision support needs for using the criteria. Later, they are summarized in meta-analyses for developing clinical practice guidelines and, eventually, interpreted by physicians to screen patients for evidence-based care. At each step, their intended meanings can be misinterpreted, as in the game of “telephone”. In this lecture, Dr. Weng will describe the ongoing efforts to bridge this semantic gap from multiple angles and the value of using computable clinical research eligibility criteria to understand clinical trial design patterns and their impact on the semantic gap.
Bridging the Semantic Gap between Research Eligibility Criteria and Clinical Data
2013 NLM Informatics Lecture Series – Speaker Profiles
Dr. Cardozo is Associate Professor of Biochemistry and Molecular Pharmacology at NYU School of Medicine (NYUSOM). An active clinician, educator and computational structural biologist specializing in drug/vaccine design and protein engineering, Dr. Cardozo has been funded both by the Bill and Melinda Gates Foundation and the NIH. He has developed the first known inhibitors of several challenging drug targets. Dr. Cardozo was awarded a "Grand Opportunities" ARRA award to develop a novel chemical biology network that can match biomarkers of complex diseases to drugs. Because of his diverse background in medicine, biology, surgery, chemistry and computer science, Dr. Cardozo was recognized with a 2008 NIH Director's New Innovator Award and was recently awarded the NIDA Avant-Garde Award for HIV/AIDS Research. He serves on the National Library of Medicine Biomedical Library and Informatics Review Committee. At NYUSOM, he serves as Graduate Advisor for the Computational Biology Program. He also currently serves on the Young and Early Career Investigator Committee for the Global HIV Enterprise. Dr. Cardozo received his MD-PhD from NYU School of Medicine.
Matching Complex Biomarkers to Drugs Using HistoReceptomic Signatures
Personalized medicine theorizes that individuals suffering from complex diseases exhibit unique genomic activity profiles to which drug treatments can be matched. Unfortunately, most drugs were discovered phenotypically and have unknown and complex mechanisms of action, making their matching to personalized profiles difficult. We derived a novel molecular signature for drug action by integrating a large set of drug:receptor affinities across the human proteome with receptor gene-expression data in human tissues. The resulting HistoReceptOmic signatures can potentially be used to match diagnostic complex biomarkers of disease to drugs. To demonstrate the utility of the approach we applied it to a psychiatric disease, schizophrenia, for which drug action is not well understood. Specifically, we used this approach to characterize the atypical pharmacologic action (“atypia”) of the antipsychotic drug clozapine, i.e. its beneficial effects that the typical antipsychotic drug chlorpromazine does not exhibit. Our results suggest that the common antipsychotic effects of clozapine and chlorpromazine derive most strongly from the drug’s action on 5-HT2a and 5-HT2c receptors in the prefrontal cortex and caudate nucleus respectively, histamine H1 receptors in the superior cervical ganglion, and muscarinic acetylcholine M3 receptors in the prefrontal cortex. In contrast, targets exclusive to clozapine are dopamine D4 receptors in pineal gland, and muscarinic acetylcholine M1 receptors in prefrontal cortex. These results provide novel perspectives on the mechanism of action of antipsychotics as well as the atypical action of clozapine in schizophrenia. Most importantly, the HistoReceptomics approach might be used generally to match complex biomarkers of disease to drugs or drug-combinations.
A Chemical Biological Network for Personalized Medicine
New York University School of Medicine
Dr. Gonzalez is an assistant professor at the Department of Biomedical Informatics at Arizona State University, and data core director of one of the National Institute on Aging supported Alzheimer’s Disease Centers. She is a member of the NLM’s chartered scientific review committee. She leads the discovery through integration and extraction of genomic knowledge lab, in the area of knowledge discovery, focusing her research on translational applications of information extraction using natural language processing techniques. Her research has contributed to the advancement of knowledge discovery methods across the biomedical spectrum.
Can social media provide reliable signals of adverse drug reactions?
Pre-market testing of drugs produces reasonably high quality information about the efficacy of the drug as a treatment for the condition for which it was approved, but gives a very incomplete picture of the drug’s safety. It is only after a drug is marketed and used on a more widespread basis over longer periods of time that it is possible to identify other effects, such as rare but serious adverse effects, or those that are more common in the special subgroups excluded from the trial, among others. Post-marketing surveillance currently relies on voluntary reporting to the FDA by health care professionals (and recently, patients themselves). Self-reported patient information captures a valuable perspective not captured by other means, and has been found to be of similar quality to that provided by health professionals. However, the value of numerous, informal self-reports such as those found in social network postings has not been evaluated. Through recently awarded NIH/NLM funding, Dr. Gonzalez is deploying the infrastructure needed to explore the value of such postings as a source of “signals” of potential adverse drug reactions soon after the drugs hit the market. Despite the significant challenge of processing colloquial text, her studies showed promising results. Additional evaluation on un-annotated comments revealed encouraging correlations between adverse drug reactions found by her system and the documented reactions for those drugs. An overview of the methods and ongoing findings of this project will be discussed in this presentation, particularly as Dr. Gonzalez seek to answer the question: can social media provide reliable signals of adverse drug reactions?
Mining Social Network Postings for Mentions of Potential Adverse Drug Reactions
Arizona State University
2012 NLM Informatics Lecture Series – Speaker Profiles
Dr. Gregory Cooper is a Professor of Biomedical Informatics and of Intelligent Systems at the University of Pittsburgh, where he has been a faculty member since 1990. Prior to arriving at the University of Pittsburgh, he obtained an undergraduate degree in computer science from MIT, a Ph.D. in Medical Information Sciences from Stanford University, and an M.D. from Stanford. His research theme is the application of probability theory, decision theory, Bayesian statistics, and artificial intelligence to biomedical informatics problems. His current research is focused on problems that include clinical alerting based on machine learning, causal modeling and discovery from clinical and biological data, computer-aided medical diagnosis and prediction, and the detection and characterization of disease outbreaks using clinical data. He is best known for his research on Bayesian networks, especially work on learning Bayesian networks from data. Dr. Cooper was elected as a Fellow into the American College of Medical Informatics in 1991. In 2006 he was elected as a Fellow into the Association for the Advancement of Artificial Intelligence.
Machine Learning of Patient-Specific Predictive Models from Clinical Data
A patient-specific predictive model is a model that is constructed in a way that tailors it to the particular history, symptoms, signs, laboratory results, and other features of the patient case at hand. Such a model can be applied to perform risk assessment, diagnosis, prognosis, and the prediction of response to therapy. In contrast, traditional population-wide models are constructed to perform predictions well on average for all future patient cases. By taking advantage of the known features of a given patient case, the patient-specific method may learn a model that predicts better than a population-wide method. In particular, a patient-specific approach focuses the search for predictive models to those that are closely related to the current patient case, and it specializes model evaluation (scoring) to be sensitive to the features of the current case.
This talk will describe the implementation and evaluation of a particular approach to patient-specific predictive modeling. The evaluation considers two domains. One involves predicting whether a patient with community acquired pneumonia will develop severe sepsis. The other involves predicting whether a patient with heart failure will develop serious medical complications. The results of these studies provide support that patient-specific modeling can improve the prediction of clinical outcomes.
This talk will also discuss how patient-specific methods might be applied in personalized medicine, where the predictive model for a patient is individualized, based on the use of both traditional clinical data as well as high-throughput molecular measurements, such as whole genome data.
Predicting Patient Outcomes from Clinical and Genome-Wide Data
1 R01 LM010020-01
University of Pittsburgh at Pittsburgh
Dr. Hurdle earned his MD from the University of Colorado and his MS in Computer Science from Columbia University in 1981. After working in healthcare informatics, including a stint as CIO for The Graduate Hospital in Philadelphia, he returned to research, completing his PhD in Computer Science from the University of Utah in 1994. He has completed two informatics fellowships, a postdoctoral fellowship in the Utah/VA postdoctoral program (1996-97) and, in 2007 he served as a Senior Fellow at the National Library of Medicine. Dr. Hurdle has a broad interest in the areas of clinical research and public health informatics. His current research interests include: building tools to unlock the content of clinical narratives using natural language processing; finding high-performance computing solutions to clinical research informatics challenges; and exploring novel ways to use informatics to address regulatory and bioethical concerns. His research also includes an historical interest in health-services research and a developing interest in nutritional data-mining to improve individual and population diet-related outcomes. Dr. Hurdle is an appointed member of NLM Biomedical Library and Informatics Review Committee. He has also served as chair of the American Medical Informatics Association's Ethics Committee when it created AMIA's first code of professional conduct.
Nutritional Informatics: Integrating real-time dietary patterns into the Electronic Health Record
Improving the dietary health of the nation has been a long-standing goal of healthcare researchers and practitioners, as well as of the federal government. Efforts such as the National Health and Nutrition Examination Survey (NHANES) are important epidemiological tools in the battle against weight-related healthcare morbidity and mortality. We propose here to bring informatics technology to bear as a personalized medicine intervention in the effort against weight-related healthcare morbidity and mortality. We have preliminary data that indicates we can, using data mining, extract a variety of dietary patterns from family food item sales data. In collaboration with researchers at the USDA, we are exploring ways to map these dietary patterns to standard dietary metrics, such as the Healthy Eating Index (HEI). The goal of the work he will discuss is a new research direction: to find ways to integrate these real-time dietary data into the EHR in a clinically meaningful way. Such metrics, because they are collected automatically at the point of purchase from grocery sales transactions, are virtually free of reporting bias and impose no respondent burden on patients. We see this very much as personalized medicine. By linking dietary pattern metrics to the EHR, dietary trends could become as amenable to monitoring and counseling in the clinical setting as other common biomarker measures such as lipid panels.
Hurdle, John F.
POET-2: High-performance Computing for Advanced Clinical Narrative Preprocessing
1 R01 LM010981-01A1
University of Utah
Dr. Wagner is an Associate Professor of Biomedical Informatics and Intelligent Systems at the University of Pittsburgh. He directs the Real-time Outbreak and Disease Surveillance (RODS) laboratory.
Dr. Wagner’s research focuses on real-time methods for detecting and characterizing disease outbreaks, including the development and testing of operational biosurveillance systems. In his role as director of the RODS Laboratory, Dr. Wagner led the development and implementation of two widely used biosurveillance systems: the RODS system and the National Retail Data Monitor (NRDM). Currently, Dr. Wagner is developing a third system called BioEcon, a decision analytic tool for use by analysts working in health departments.
After completing his education (BS in biology, SUNY at Stony Brook; MD, NYU School of Medicine), Dr. Wagner practiced internal medicine from 1979 to 1988 at Baltimore City Hospital, Bellevue Hospital, and with the Hawaii Permanente Medical Group. He then moved to Pittsburgh where he received additional formal training in artificial intelligence (PhD, Intelligent Systems, University of Pittsburgh) and joined the Pitt faculty in 1991. He also practiced geriatric medicine until 2002.
Decision-theoretic Model of Disease Surveillance and Control and a Prototype Implementation for the Disease Influenza
This talk will first describe a decision-theoretic model of disease surveillance and control, followed by a description of a prototype system for influenza monitoring based on the model. The decision-theoretic model connects disparate work in epidemiological modeling and disease control under a uniform mathematical formulation. The last part of the talk will focus on an ontology for population disease models and an infrastructure called the Apollo Web Service that allows end-user applications and epidemic models to interoperate. The expectation is that the theoretical model, the prototype, and the interoperability infrastructure will stimulate new avenues of research in disease surveillance/control and epidemic modeling.
Wagner, Michael M.
Decision Making in Biosurveillance
5 R01 LM009132-04
University of Pittsburgh at Pittsburgh
The slides for this presentation are available upon request by contacting Ms. Ebony Hughes at Ebony.Hughes@nih.gov.