NLM Ada Lovelace Computational Health Lecture Series – Speaker Profiles
John H. Holmes, PhD, is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Penn Institute for Biomedical Informatics and is Past-Chair of the Graduate Group in Epidemiology and Biostatistics. Dr. Holmes has been recognized nationally and internationally for his work on developing and applying new artificial intelligence approaches to mining epidemiologic surveillance data. Dr. Holmes’ research interests are focused on the intersection of medical informatics and clinical research, specifically evolutionary computation and machine learning approaches to knowledge discovery in clinical databases, deep electronic phenotyping, interoperable information systems infrastructures for epidemiologic surveillance, and their application to a broad array of clinical domains, including cardiology and pulmonary medicine. He has served as the co-lead of the Governance Core for the SPAN project, a scalable distributed research network, and participates in the FDA Sentinel Initiative. Dr. Holmes is an elected Fellow of the American College of Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics (IAHSI).
Traditional methods of epidemic modeling continue to be used fruitfully for characterizing outbreaks and predicting the spread of disease in populations. However, these methods, typically rely on what are known as “compartment models”, requiring assumptions that are not necessarily sensitive to the ever-changing environmental, behavioral, temporospatial, and social phenomena that influence disease spread. Compartment models can be enriched by the judicious use of robust methods drawn from the field of artificial intelligence that allow us to model more accurately and more quickly the population and disease dynamics that are central to developing policies for prevention, detection, and treatment. We will explore these approaches, including some that are currently in use as well as a proposal for novel, next-generation machine learning tools for epidemiologic investigation.
This lecture will be live-streamed globally, and subsequently archived, by NIH video casting: https://videocast.nih.gov/watch=37909
2019 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Quynh Nguyen is Assistant Professor of Epidemiology and Biostatistics at the University of Maryland School of Public Health. She is a social epidemiologist focusing on contextual and economic factors as they relate to health. She has extensive experience using numerous national and international population-based health surveys to examine social and economic predictors of health, and to quantify national and international patterns in health disparities. Her current research program focuses on creating and validating neighborhood indicators constructed from nontraditional Big Data sources such as social media data and GSV images.
Using Google Street View Images to Examine Links Between the Built Environment and Health
Advances in neighborhood research have been constrained by the lack of neighborhood data for many geographical areas. Dr. Nguyen will discuss the use of Google Street View (GSV) images as a source of national data on built environment features and the use of computer vision to label images for indicators of walkability, urban development, and physical disorder. She will discuss how they collect images and how they identify relevant built environment features from street images. Dr. Nguyen will present preliminary analyses examining associations between built environment features and health outcomes at the census tract and county levels. GSV images represent an underutilized resource for building national data on neighborhoods and examining the influence of built environments on community health outcomes across the United States.
Neighborhood Looking Glass:360 Degree Automated Characterization of the Built Environment for Neighborhood Effects Research
University of Maryland, College Park
Matthew Scotch is Associate Professor of Biomedical Informatics at Arizona State University (ASU). He is also Assistant Director of ASU’s Biodesign Center for Environmental Health Engineering. His work lies at the intersection of bioinformatics and public health informatics and focuses on the theory and application of genomics-informed public health surveillance of RNA viruses. Dr. Scotch has a particular interest in human and avian influenza. He has published extensively on this work including in journals such as: Molecular Biology and Evolution, PLoS Computational Biology, Viruses, Virus Evolution, and Bioinformatics. Dr. Scotch is a Council member for the International Society for Influenza and other Respiratory Virus Diseases, an Editor for Infection, Genetics, and Evolution, and Scientific Reports. He is a frequent reviewer on NLM study sections, a member of AMIA since the early 2000’s, and a former Chair of the AMIA Public Health Informatics Working Group.
Informatics for Genomics-informed Surveillance of RNA viruses
Genomics-informed surveillance is now recognized as an important extension to the monitoring of rapidly evolving pathogens. Next generation sequencing has the ability to produce large amounts of data for tracking viruses of public health importance. Biomedical informatics approaches are able to facilitate the translation of these data into information for public health surveillance. Thus, epidemiologists can identify new outbreaks or monitor the course of a known epidemic by leveraging pathogen sequences (and corresponding metadata) generated from the clinical specimens of sick patients. In this presentation, Dr. Scotch will discuss NLM-funded projects related to the development and evaluation of a surveillance system that uses virus sequences to study the evolution, spread, and population size of viruses across geographic areas. This includes the development of a pipeline for virus phylogeography and spread and its utilization as part of a newly funded project on metagenomics of wastewater for outbreak detection and epidemic monitoring including seasonal influenza. This work aims to highlight the value of using biomedical informatics to translate viral genetic data into valuable information for surveillance of both known and novel viruses.
Merging Viral Genetics with Climate and Population Data for Zoonotic Surveillance
Bioinformatics Framework for Wastewater-based Surveillance of Infectious Diseases
Arizona State University-Tempe Campus
Noémie Elhadad is Associate Professor and co-interim Chair at the Department of Biomedical Informatics at Columbia University, affiliated with the Columbia Computer Science Department and Data Science Institute. She received her PhD in Computer Science from Columbia University. Her research is at the intersection of machine learning, technology, and medicine. She investigates ways in which observational clinical data (e.g., electronic health records) and patient-generated data (e.g., online health community discussions, mobile health data) can enhance access to relevant information for patients, clinicians, and health researchers alike and can impact care and health of patients. Dr. Elhadad is a current member of NLM's Biomedical Informatics, Library and Data Sciences Review Committee.
Advancing Women's Health through Data Science and Personal Health Informatics
Endometriosis is a chronic, inflammatory, and estrogen-dependent condition with a high burden on quality of life, estimated to affect 6-10% of women of reproductive age worldwide. Despite its high prevalence, it is an enigmatic condition: there is currently no cure and no known biomarker or non-invasive diagnostic test for this multifactorial disease. In this talk, Dr. Elhadad will report on ongoing research on two inter-related questions: how to characterize and discover the different ways in which endometriosis presents in individuals, essentially phenotyping the disease, and how to support individuals with self-discovery and management about the disease considering its heterogeneous presentations. She will show the current characterization of endometriosis from clinical data sources and discuss its current limitations, specifically the disconnect with the day-to-day patient experience of endometriosis. She will present the design and development of a personal health informatics solution (a research app called Phendo) and the analysis of the data contributed by Phendo participants towards phenotyping endometriosis. Finally, She will discuss how these data can be leveraged further to support individuals in learning about and self-managing their condition, as well as facilitating shared decision making with their providers.
PhendoPHL:A Data-Science Enabled Personal Health Library to Manage Endometriosis
Columbia University Health Sciences
Samantha Kleinberg is an Associate Professor of Computer Science at Stevens Institute of Technology. She received her PhD in Computer Science from New York University and was a Computing Innovation Fellow at Columbia University in the Department of Biomedical informatics. She is the recipient of NSF CAREER and JSMF Complex Systems Scholar Awards and is a 2016 Kavli Fellow of the National Academy of Sciences. She is the author of “Causality, Probability, and Time” (Cambridge University Press, 2012) and “Why: A Guide to Finding and Using Causes” (O’Reilly Media, 2015). Dr. Kleinberg is a current member of NLM's Biomedical Informatics, Library and Data Sciences Review Committee.
From Data to Decisions: Large-Scale Causal Inference in Biomedicine
The collection of massive observational datasets has led to unprecedented opportunities for causal inference, such as using electronic health records to identify risk factors for disease. However, our ability to understand these complex data sets has not grown at the same pace as our ability to collect them. While causal inference has traditionally focused on pair-wise relationships between variables, biological systems are highly complex and knowing when events may happen is often as important as knowing whether they will. Motivated by the analysis of intensive care unit data, this talk discusses new methods to automatically extract causal relationships from data and how these have been applied to gain new insight into stroke recovery. Finally, the speaker will discuss recent findings in cognitive science and how they can help us make better use of causal information for decision-making.
BIGDATA: Causal Inference in Large-Scale Time Series
Stevens Institute of Technology
2018 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Elizabeth Chen is the Founding Associate Director of the Brown Center for Biomedical Informatics (BCBI), Associate Professor of Medical Science, and Associate Professor of Health Services, Policy & Practice at Brown University. She received a BS in Computer Science from Tufts University and PhD in Biomedical Informatics from Columbia University. Within BCBI, Dr. Chen leads the Clinical Informatics Innovation and Implementation (CI3) Laboratory that is focused on leveraging EHR technology and data to improve healthcare delivery and biomedical discovery. Her research interests include clinical documentation, clinical decision support, health information needs, standards and interoperability, natural language processing, and data mining. Dr. Chen is an elected fellow of the American College of Medical Informatics and is currently a member of NLM’s Biomedical Informatics, Library and Data Sciences Review Committee.
Knowledge Discovery in Clinical and Biomedical Data: Case Studies in Pediatrics and Mental Health
With the widespread adoption of electronic health records and increasing discoveries reported in biomedical literature, computational approaches are needed for further knowledge discovery and hypothesis generation. Challenges include the capture of key information within text and standardization issues, requiring use of natural language processing and data integration techniques. Clinical data mining and biomedical literature mining have been used in a range of contexts to discover disease knowledge such as comorbidities and patterns related to social, behavioral, and familial (SBF) factors. In this lecture, a series of case studies will be presented on representing, extracting, integrating, mining, and visualizing SBF factors and comorbidities for pediatric and mental health conditions. Collectively, these studies demonstrate use of systematic processes and development of open-source tools for transforming clinical and biomedical data into knowledge.
Leveraging the EHR to Collect and Analyze Social, Behavioral & Familial Factors
University of Vermont
John Gennari has been a professor in biomedical and health informatics at the University of Washington for over 15 years. His doctorate and background is in computer science, but he began working in biomedical informatics in the early ‘90s, beginning with work on the Protégé knowledge representation and ontology development system. He has had a wide range of experiences on large, multi-institutional, multi-disciplinary projects, and this led to his research focus of knowledge reuse and knowledge sharing. His expertise is in ontology development, standards and semantic web tools. Dr. Gennari is the Graduate Program Director at UW, overseeing the PhD and research M.S. programs. He is currently on the NLM Biomedical Informatics, Library and Data Sciences Review Committee.
Semantic Annotations, Reuse, and Reproducibility
Biomodeling (or biosimulation modeling) has the potential to revolutionize patient-specific health care and precision medicine. To increase our knowledge and management of complex pathologies, biomodeling provides the ability to produce detailed, mechanistic simulations of the dynamic biological processes and their participants. The development of these biomodels can be viewed as analogous to software development. To be effective and to scale to larger systems, the models must include clear documentation (semantic annotations), be developed in a reproducible manner, and be designed to allow for plug-and-play reuse so that researchers can build from the efforts of others. In the presentation, Dr. Gennari will report on his group’s efforts to standardize practices for semantic annotation, and to demonstrate the value of those annotations both for semantic searching over model repositories and for model merging and model reuse tasks. Over the last several years, they have succeeded in building community-wide agreement on both the importance of semantic annotation and the format of these annotations. In addition, using their annotation and model reuse tool, they have developed several demonstration examples of model merging that leverage the use of semantic annotation. Finally, as an important consequence of their work, he will also report on the initiation of a new Center for Reproducible Biomedical Modeling.
Physiological Knowledge Integration and Recombinant Modeling via Accelerated Sema
University of Washington
2017 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
S. Joshua Swamidass is an Assistant Professor of Laboratory and Genomic Medicine at Washington University School of Medicine (http://swami.wustl.edu). His group studies information with new computational methods, at the intersection of biology, medicine and chemistry. He is funded by the National Library Medicine (NLM) to model bioactivation pathways, and how bioactivation pathways change in children. Dr. Swamidass is currently on the NLM Biomedical Informatics, Library and Data Sciences Review Committee.
Translating from Chemistry to Clinic with Deep Learning
Many medicines become toxic only after bioactivation by metabolizing enzymes. Often, metabolic enzymes transformed them into chemically reactive species, which subsequently conjugate to proteins and cause adverse events. For example, carbamazepine is epoxidized by P450 enzymes in the liver, but then conjugates to proteins, causing Stevens Johnson Syndrome in some patients. The most difficult to predict drug reactions, idiosyncratic adverse drug reactions, often depend on bioactivation. Our group has been using deep learning to model the metabolism of diverse chemicals, and the subsequent reactivity of their metabolites. Deep learning systematically summarizes the information from thousands of publications into quantitative models of bioactivation, predicting exactly how medicines are modified by metabolic enzymes. These models are giving deeper understanding of why some drugs become toxic, and others do not. At the same time, deep learning can be used to understand drug toxicity as it arises in clinical data, and why some patients are affected, but not others. A conversation between the basic and clinical sciences is now possible, where patient outcomes can be understood in light of bioactivation mechanisms, and these mechanisms can explain why some patients are susceptible to drug toxicity, and others are not.
S. Joshua Swamidass
Data and Tools for Modeling Metabolism and Reactivity
University of Arkansas for Medical Sciences
Computationally Modeling the Impact of Ontogeny on Drug Metabolic Fate
University of Wisconsin Madison
2017 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Dr. James Cimino is a board certified internist and clinical informatician, currently a Professor of Medicine and inaugural Director of the Informatics Institute at the University of Alabama-Birmingham School of Medicine. He has been carrying out clinical informatics research, building clinical information systems, teaching medical informatics and medicine, and caring for patients for over 30 years, with principal research areas in desiderata for controlled terminologies, mobile and Web-based clinical information systems for clinicians and patients, context-aware form of clinical decision support called “infobuttons”, and clinical research data repositories. Past appointments include a Professor of Biomedical Informatics and Medicine at Columbia University and Chief of the Laboratory for Informatics Development at the NIH Clinical Center and the National Library of Medicine. He is co-editor of a leading textbook on Biomedical Informatics and is an Associate Editor of the Journal of Biomedical Informatics. His honors include fellowships of the American College of Physicians and the American College of Medical Informatics, the Donald A.B. Lindberg Award for Innovation in Informatics and the President’s Award from the American Medical Informatics Association, and induction into the National Academy of Medicine. Dr. Cimino is currently on the National Library of Medicine Biomedical Informatics, Library and Data Sciences Review Committee.
Transforming Electronic Health Records from Annoyances to Assistants: A Research Agenda for the Next Decade
Clinical informatics research, and before that, medical informatics research, has made great strides in developing tools to help clinicians improve clinical decision-making and patient care. Yet, electronic health records (EHR) systems today show little aptitude for even simple tasks, like retrieving relevant patient information, while suppressing that which is irrelevant. When bringing artificial intelligence to bear, the best EHRs seem to do is to overwhelm us with alerts that a clinician must override to take action. When the “learning health system” attempts to use data from these systems, it must rely on indirect methods, such as machine learning and natural language processing, to figure out what was actually going on with the patient. The advances that have been made to bring decision support into EHRs rely on formally represented – that is structured and coded – data, such as problem lists, laboratory results and medication lists. What’s missing is a formal representation of the clinical cognition of the patient’s situation: what we think is going on, what our goals are, what we are trying to do about it, and why we have chosen to do it that way. Adding such information to the EHR would enable informaticians to enhance their tools in ways that will improve situational awareness, reduce information overload, make decision support systems provide more relevant knowledge to clinicians, and enable clinical researchers to draw more solid inferences from observational data. Informatics research is needed to understand what needs to be captured, determine how it should be represented, design user interfaces to minimize the effort required, and develop tools that ultimately reduce the work of clinical documentation by reducing redundant data entry, anticipating and executing work plans, and improve the quality and efficiency of patient care. Dr. Cimino will provide illustrations of the formal representation and use of clinical cognition and present a roadmap for research, development and education toward that goal.
David Page is a Vilas Distinguished Achievement Professor at the University of Wisconsin-Madison. His primary appointment is in the Dept. of Biostatistics and Medical Informatics in the School of Medicine and Public Health, with an appointment in the Dept. of Computer Sciences where he teaches machine learning. His PhD in CS is from the University of Illinois at Urbana- Champaign, and he became involved in biomedical applications of machine learning as a post-doc in what was then the Computing Laboratory at Oxford University. He directs the Cancer Informatics Shared Resource of the Carbone Cancer Center and is a member of the Genome Center of Wisconsin. He previously served on the NIH's BioData Management and Analysis Study Section and the scientific advisory boards for the Wisconsin Genomics Initiative and the Observational Medical Outcomes Partnership, as well as the editorial boards for Machine Learning and Data Mining and Knowledge Discovery. He currently is on the National Library of Medicine Study Section (BLIRC) and directs the EHR project within UW-Madison's BD2K Center for Predictive Computational Phenotyping.
Interpretation of Human Genomes And Identification of Impactful Variants Using Biomedical Informatics
The widespread use of electronic health records and the many recent successes of machine learning raise at least two questions. How well can future health events of patients be predicted from EHR data, at various lengths of time in advance? And how can such predictions improve human health? This talk answers the first question via an approach called high-throughput machine learning, and it speculates about answers to the second question. In particular, this talk argues that many healthcare applications require not just accurate prediction, but accurate prediction by causally-faithful models. Causal discovery from observational data is already a major research direction in machine learning and statistics, and this talk discusses new approaches across the spectrum from when "we know all the relevant variables" to when "we know only one relevant variable" for the task at hand. If time permits, the talk will also touch on the issue of protecting patient privacy while empowering the construction of accurate predictive models.
Secure Sharing of Clinical History & Genetic Data: Empowering Predictive Personalized Medicine
University of Wisconsin Madison
2016 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Professor Mooney has spent his career as a researcher and group leader in biomedical informatics. He now leads research IT for UW Medicine and is leading efforts to support and build clinical informatics platforms as its first Chief Research Information Officer (CRIO). He is a professor in the Department of Biomedical Informatics and Medical Education at the University of Washington. Previous to his CRIO role, he was an Associate Professor and Director of Bioinformatics at the Buck Institute for Research on Aging. As an Assistant Professor, he was appointed in Medical and Molecular Genetics at Indiana University School of Medicine and was founder and director of the Indiana University School of Medicine Bioinformatics Core. In 1997, he received his B.S. with Distinction in Biochemistry and Molecular Biology from the University of Wisconsin at Madison. He received a Ph.D. in 2001 at the University of California in San Francisco, and then an American Cancer Society John Peter Hoffman Fellowship at Stanford University.
Interpretation of Human Genomes And Identification of Impactful Variants Using Biomedical Informatics
Whole exome and whole genome sequencing is continuing to challenge researchers with a wealth of genetic variants of unknown disease effects. We are investigating genomic and proteomic attributes that describe genetic variants in human genome sequences and then we are using those attributes to predict pathogenic variants that affect protein structure and function, mRNA processing and translation, and transcriptional regulation. To that end, we have built the MutPred suite of tools for discovering and characterizing pathogenic and pharmacogenetic variants from whole genome sequencing. We are applying these tools in collaboration with genetic studies to better understand the causes of human disease, and I will illustrate using examples of both complex and monogenic diseases. Further, we are leveraging the crowd by organizing and participating in community challenges (critical assessments) to build a better understanding of the types of approaches that perform well in genome interpretation and in what context. I will discuss our involvement in two critical assessment communities, the Critical Assessment of Genome Interpretation and the Critical Assessment of Functional Annotation.
Informatic Profiling of Clinically Relevant Mutation
University of Washington
Kellie Archer is a Professor in the Department of Biostatistics and Director of the Massey Cancer Center Biostatistics Shared Resource at Virginia Commonwealth University. She completed her PhD at The Ohio State University and previously worked there in support of research associated with the Cancer and Leukemia Group B (CALGB) Leukemia Correlative Sciences Committee. She now works primarily in developing innovative statistical methods and software for the analysis of high-dimensional datasets such as those arising from high-throughput genomic platforms. Dr. Archer is the author or co-author of 107 published papers, two book chapters, over 30 university seminars/professional conference presentations. She holds an editorial appointment at Progress in Transplantation and is a Statistical Consultant for Radiology and the Nature Publishing Group. She serves as a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Predicting an Ordinal Response Using Features from High-Throughput Genomic Assays
Ordinal scales are commonly used to measure health status and disease related outcomes. An ordinal outcome takes on one of several categorical levels where there is a clear ordering of the categorical levels but no intrinsic numerical relationship between them. As an example, economic status is often recorded as an ordinal outcome taking on three categorical levels of low, medium, and high income. Notable examples in medicine include stage of cancer, grading the severity of an adverse event, and response of target lesions to chemotherapy. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical likelihood-based ordinal modeling methods have contributed to the analysis of data in which the response categories are ordered and the number of predictor variables is smaller than the sample size. With the emergence of genomic technologies being increasingly applied to identify molecular markers associated with complex disease phenotypes and outcomes, many research studies now include high dimensional feature data where the number of predictor variables greatly exceeds the sample size, so that traditional methods cannot be applied. To fill this void we have developed penalized ordinal response models for classifying and predicting an ordinal response. Additionally, we adapted our method to the longitudinal setting to enable modeling disease progression along with time. We demonstrate our methods using data from two different studies that used high-throughput genomic platforms, the Illumina GoldenGate Methylation BeadArray and Affymetrix gene expression profiles.
Informatic tools for Predicting an Ordinal Response for High-Dimensional Data
Virginia Commonwealth University
Mark Craven is a professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin, and an affiliate faculty member in the Department of Computer Sciences. He is the Director of the Center for Predictive Computational Phenotyping, one of the NIH's Centers of Excellence for Big Data Computing. He is also the Director of the NIH/NLM-funded Computation and Informatics in Biology and Medicine (CIBM) Training Program, and a member of the Institute for Clinical and Translational Research, the Carbone Cancer Center, and the Genome Center of Wisconsin. The focus of his research program is on developing and applying machine-learning methods to the problems of inferring models of, and reasoning about, networks of interactions among genes, proteins, clinical and environmental factors, and phenotypes of interest. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Inferring Host-Pathogen Interactions from Diverse Data Sources
Insight into the mechanisms and context of host-pathogen interactions can be gained by applying computational methods to a broad range of experimental, observational, and secondary data sources. Dr. Craven will discuss his work in three studies that involve developing and applying predictive methods in order to characterize host-pathogen interactions. In the first study, he is focused on inferring host subnetworks that are involved in viral replication from genome-wide loss-of-function experiments. Although these experiments can identify the host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell, they do not elucidate how these genes are organized into the biological pathways that mediate host-virus interactions. His team is developing novel computational methods that use a wide array of secondary data sources, including the scientific literature, to transform the measurements from these assays into hypotheses that predict the pathways in the cell that relate implicated genes to viral replication. In the second study, he is applying machine-learning methods to understand how variation in the genome of HSV-1 influences multiple ocular disease phenotypes in a host. In the third study, he is investigating the extent to which risk for various infectious disease phenotypes can be predicted from electronic health records.
Inferring Host-Pathogen Interactions from Diverse Data Sources
University of Wisconsin Madison
2015 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Joshua Denny, MD is an Associate Professor in the Departments of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. A primary interest of his lab has been development of the phenome-wide association study (PheWAS) method applied to electronic health records (EHRs) to rapidly uncover genetic pleiotropy and highlight potential drivers of genetic associations with endophenotypes. He helps lead efforts for local and network pharmacogenetics implementation activities. He is part of the NIH-supported Electronic Medical Records and Genomics (eMERGE) network, Pharmacogenomics Research Network (PGRN), and Implementing Genomics in Practice (IGNITE) networks. He is past recipient of the American Medical Informatics Association New Investigator Award, Homer Warner Award, and Vanderbilt Chancellor’s Award for Research. Dr. Denny remains active in clinical care and in teaching students. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Use of Clinical Big Data to Inform Precision Medicine
Precision medicine offers the promise of improved diagnosis and more effective, patient-specific therapies. Typically, clinical research studies have been pursued by enrolling a cohort of willing participants in a town or region, and obtaining information and tissue samples from them. At Vanderbilt, Dr. Denny and his team have linked phenotypic information from de-identified EHRs to a DNA repository of nearly 200,000 samples, creating a ‘virtual’ cohort. This approach allows study of genomic basis of disease and drug response using real-world clinical data. Finding the right information in the EHR can be challenging, but the combination of billing data, laboratory data, medication exposures, and natural language processing has enabled efficient study of genomic and pharmacogenomic phenotypes. The Vanderbilt research team has put many of these discovered pharmacogenomic characteristics into practice through clinical decision support. The EHR also enables the inverse experiment – starting with a genotype and discovering all the phenotypes with which it is associated – PheWAS. PheWAS requires a densely-phenotyped population such as found in the EHR. Dr. Denny’s research team has used PheWAS to replicate more than 300 genotype-phenotype associations, characterize pleiotropy, and discover new associations. They have also used PheWAS to identify characteristics within disease subtypes.
From GWAS to PheWAS: Scanning the EMR Phenome for Gene-Disease Associations
Atul Butte, MD, PhD is the founding Director of the newly-established Institute of Computational Health Sciences at the University of California, San Francisco, and a Professor of Pediatrics. Prior to his new position, he was the chief of the Division of Systems Medicine and Associate Professor of Pediatrics at Stanford University and Lucile Packard Children’s Hospital, where he has been a faculty member for the past decade. Trained in both Computer Science and Medicine at Brown University, Dr. Butte previously worked as a software engineer at Apple Inc. and Microsoft Corp., and received his PhD in Health Sciences and Technology from Harvard Medical School and MIT. He has authored nearly 200 publications, with research repeatedly featured in the New York Times, and the Wall Street Journal and Wired Magazine. In 2013, Dr. Butte was recognized by the White House as an Open Science Champion of Change for promoting science through publicly available data. Dr. Butte is also a founder of several Bay Area biotech startup companies. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Data-Driven Precision Medicine
There is an urgent need to translate genome-era discoveries into clinical utility, but the difficulties in making bench-to-bedside translations have been well described. The nascent field of translational bioinformatics may help. Dr. Butte's lab builds and applies computational tools to convert hundreds of trillions of points of molecular, clinical, and epidemiological data collected by researchers and clinicians worldwide over the past decade, now commonly known as “big data”, into new diagnostics, therapeutics, and insights into rare and common diseases. Dr. Butte, a bioinformatician and pediatric endocrinologist, will highlight how publicly-available molecular measurements to find new uses for drugs including drug repositioning for inflammatory bowel disease, discovering new treatable inflammatory mechanisms of disease in type 2 diabetes, and how the next generation of biotech companies might even start in your garage.
Integrating Microarray and Proteomic Data by Ontology-based Annotation
Dr. John Pestian, Ph.D., MBA is a professor of Pediatrics and Biomedical Informatics at Children’s Hospital Medical Center, University of Cincinnati. He joined the faculty in 2000 as the founding director of the Division of Biomedical Informatics. He has been active in translating neuropsychiatric innovations from the bench to the bedside. One innovation, Optimization and Individualization of Medication Selection and Dosing has been used to identify optimal neuropsychiatric drugs in over 150,000 people. Dr. Pestian’s lab currently focuses on collection and analysis of prospective multimodal data like words, vocal sounds and facial expressions for predicting repeated suicide attempts, depression states, and anxiety in adolescents.
Phenotypical Cohort Retrieval Using the Multi-Instutional Pediatric Epilepsy Decision Support (MiPeds) System
The Multi-Institutional Pediatric Epilepsy Decision Support (MiPeds) system provides point-of-care surveillance of phenotypically similar pediatric epilepsy patients using the electronic health records (EHR) from Cincinnati Children’s Hospital Medical Center, Children’s Hospital of Philadelphia, and Children’s Hospital of Colorado. Using this near real-time cohort retrieval system, the three organizations can review similarities and differences in clinical measures like: medication side-effects, types of seizures, seizure frequency, quality of life, neurological abnormalities, and so forth. This talk will describe the successes and challenges of developing MiPeds. Examples will focus on: aligning the research and clinical needs of each organization with data standards, factors that influence centralization and decentralization, automated method of de-identification, the usefulness of I2B2, developing collaborative measures of data quality and quality of care, searching and visualization. Efforts to generalize this novel approach to other neuropsychiatric disease will be described as well. NLM-Funded Research
Capturing Patient-Provider Encounter through Text Speech and Dialogue Processing
Cincinnati Children's Hospital Medical Center
2014 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Dr. Peter Szolovits is Professor of Computer Science and Engineering in the MIT Department of Electrical Engineering and Computer Science, Professor of Health Sciences and Technology in the Harvard/MIT Division of Health Sciences and Technology, and head of the Clinical Decision-Making Group within the MIT Computer Science and Artificial Intelligence Laboratory. His research centers on the application of AI methods to problems of medical decision making, natural language processing to extract meaningful data from clinical narratives to support translational medicine, and the design of information systems for health care institutions and patients. Dr. Szolovits received his bachelor's degree in physics and his PhD in information science, both from Caltech. Dr. Szolovits was elected to the Institute of Medicine of the National Academies and is a Fellow of the American Association for Artificial Intelligence, the American College of Medical Informatics and the American Institute for Medical and Biological Engineering. He is a member of the National Library of Medicine’s Biomedical Library and Informatics Review Committee. He is the 2013 recipient of the Morris F. Collen Award of Excellence from the American College of Medical Informatics.
How to Learn in “The Learning Healthcare System”
The Institute of Medicine has argued for more than 20 years that we should view every patient interaction as an (uncontrolled) experiment, and learn from its outcome. Dr. Szolovits has been a participant in numerous collaborative projects, trying to apply this method to data about a broad range of patients suffering from conditions such as arthritis, cardiovascular disease, diabetes, inflammatory bowel disease, autism, depression. In this lecture, he will review some of the methodological challenges he has encountered and the hard-won lessons he has learned. These include the careful formulation of study goals, the importance of open data, what kinds of models to build, how to extract meaning from narrative text, and how to incorporate non-traditional sources of data into a research protocol. Dr. Szolovits will also describe a largely unsuccessful effort to ease the data collection burden in health care by having computerized speech understanding systems listen to and analyze conversations between doctors and patients.
Capturing Patient-Provider Encounter through Text Speech and Dialogue Processing
Massachusetts Institute of Technology
Dr. Guergana Savova is an Associate Professor at Harvard Medical School and Boston Children’s Hospital. Her research interests are in NLP especially as applied to the text generated by physicians (the clinical narrative). She has been creating gold-standard annotated resources based on computable definitions and developing methods for computable solutions. The focus of Dr. Savova's research is higher level semantic and discourse processing of the clinical narrative. The result of Dr. Savova's research with her collaborators has led to the creation of the clinical Text Analysis and Knowledge Extraction System (cTAKES; ctakes.apache.org). Dr. Savova is on the editorial board of the Journal of the Medical Informatics Association (JAMIA), and is a reviewer for several journals including Journal of the Biomedical Informatics (JBI), Journal of Language Resources and Evaluation (LREC), and many conferences/workshops. She is also a member of the National Library of Medicine's Biomedical Library and Informatics Review Committee. Dr. Savova holds a PhD in Linguistics with a minor in Cognitive Science and a Master of Science in Computer Science from University of Minnesota.
Temporal Relation Discovery from the Clinical Narrative
There is an abundance of health-related free text that can be used for a variety of immediate biomedical applications – phenotyping for Genome Wide Studies, clinical point of care, patient powered applications, biomedical research. The presentation will cover current research problems in Natural Language Processing such as temporal relation discovery – a research program funded by grants from the National Library of Medicine (thyme.healthnlp.org). The talk will also outline resources with computable gold-standard annotations created under several NIH-funded projects. It will describe several state-of-the-art system evaluations organized around them (2013 and 2014 CLEF/ShARE Shared task; 2014 and 2015 SemEval Task 7 Analysis of Clinical Text; 2015 SemEval ClinicalTempEval). Applications of NLP to biomedical problems will be discussed within the framework of national networks such as electronic Medical Records and Genomics (eMERGE), Pharmacogenomics Research Network (PGRN), Informatics for Integrating the Biology and the Bedside (i2b2), Patient Centered Outcomes Research Institute (PCORI).
Temporal Relation Discovery for Clinical Text
5 R01 LM010090-04
Children’s Hospital Boston
Dr. Chunhua Weng is the Florence Irving Assistant Professor of Biomedical Informatics at Columbia University, where she has been a faculty member since 2007. Before arriving at Columbia, she obtained an undergraduate degree in computer science from Nankai University, P. R. China, a master’s degree in Information and Computer Science from University of California at Irvine, and a Ph.D. in Biomedical and Health Informatics from University of Washington at Seattle. Dr. Weng’s current primary research interests are (1) designing and applying text knowledge engineering methods to improve the computability of clinical research designs; and (2) designing data-driven methods to increase the transparency and generalizability of clinical research. Dr. Weng serves on the National Library of Medicine Biomedical Library and Informatics Review Committee.
Bridging the Semantic Gap between Research Eligibility Criteria and Clinical Data: Methods and Issues
With the burgeoning adoption of electronic health records (EHRs), vast amounts of clinical data are increasingly available for computational reuse. It is imperative that the scientific community leverage Big Data to accelerate clinical and translational science at low cost and large scale. A critical step toward this goal is matching clinical research eligibility criteria to clinical data for cohort identification. However, this task is complicated by the semantic gap between free-text eligibility criteria and raw clinical data: each criterion has many ways to describe it and a myriad of clinical data points that represent it. In fact, the semantic gap is a significant multifactorial problem because of the central role that clinical research eligibility criteria play in clinical and translational research. In a typical study, they undergo a complex evolution: perceived, defined, interpreted, implemented, and adapted by various stakeholders for a series of clinical research tasks. During the design phase, investigators choose eligibility criteria to define a study’s target population. During screening and recruitment, the criteria are used and interpreted by clinical research coordinators, query analysts, and even research volunteers themselves, each possessing different decision support needs for using the criteria. Later, they are summarized in meta-analyses for developing clinical practice guidelines and, eventually, interpreted by physicians to screen patients for evidence-based care. At each step, their intended meanings can be misinterpreted, as in the game of “telephone”. In this lecture, Dr. Weng will describe the ongoing efforts to bridge this semantic gap from multiple angles and the value of using computable clinical research eligibility criteria to understand clinical trial design patterns and their impact on the semantic gap.
Bridging the Semantic Gap between Research Eligibility Criteria and Clinical Data
2013 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Dr. Cardozo is Associate Professor of Biochemistry and Molecular Pharmacology at NYU School of Medicine (NYUSOM). An active clinician, educator and computational structural biologist specializing in drug/vaccine design and protein engineering, Dr. Cardozo has been funded both by the Bill and Melinda Gates Foundation and the NIH. He has developed the first known inhibitors of several challenging drug targets. Dr. Cardozo was awarded a "Grand Opportunities" ARRA award to develop a novel chemical biology network that can match biomarkers of complex diseases to drugs. Because of his diverse background in medicine, biology, surgery, chemistry and computer science, Dr. Cardozo was recognized with a 2008 NIH Director's New Innovator Award and was recently awarded the NIDA Avant-Garde Award for HIV/AIDS Research. He serves on the National Library of Medicine Biomedical Library and Informatics Review Committee. At NYUSOM, he serves as Graduate Advisor for the Computational Biology Program. He also currently serves on the Young and Early Career Investigator Committee for the Global HIV Enterprise. Dr. Cardozo received his MD-PhD from NYU School of Medicine.
Matching Complex Biomarkers to Drugs Using HistoReceptomic Signatures
Personalized medicine theorizes that individuals suffering from complex diseases exhibit unique genomic activity profiles to which drug treatments can be matched. Unfortunately, most drugs were discovered phenotypically and have unknown and complex mechanisms of action, making their matching to personalized profiles difficult. We derived a novel molecular signature for drug action by integrating a large set of drug:receptor affinities across the human proteome with receptor gene-expression data in human tissues. The resulting HistoReceptOmic signatures can potentially be used to match diagnostic complex biomarkers of disease to drugs. To demonstrate the utility of the approach we applied it to a psychiatric disease, schizophrenia, for which drug action is not well understood. Specifically, we used this approach to characterize the atypical pharmacologic action (“atypia”) of the antipsychotic drug clozapine, i.e. its beneficial effects that the typical antipsychotic drug chlorpromazine does not exhibit. Our results suggest that the common antipsychotic effects of clozapine and chlorpromazine derive most strongly from the drug’s action on 5-HT2a and 5-HT2c receptors in the prefrontal cortex and caudate nucleus respectively, histamine H1 receptors in the superior cervical ganglion, and muscarinic acetylcholine M3 receptors in the prefrontal cortex. In contrast, targets exclusive to clozapine are dopamine D4 receptors in pineal gland, and muscarinic acetylcholine M1 receptors in prefrontal cortex. These results provide novel perspectives on the mechanism of action of antipsychotics as well as the atypical action of clozapine in schizophrenia. Most importantly, the HistoReceptomics approach might be used generally to match complex biomarkers of disease to drugs or drug-combinations.
A Chemical Biological Network for Personalized Medicine
New York University School of Medicine
Dr. Gonzalez is an assistant professor at the Department of Biomedical Informatics at Arizona State University, and data core director of one of the National Institute on Aging supported Alzheimer’s Disease Centers. She is a member of the NLM’s chartered scientific review committee. She leads the discovery through integration and extraction of genomic knowledge lab, in the area of knowledge discovery, focusing her research on translational applications of information extraction using natural language processing techniques. Her research has contributed to the advancement of knowledge discovery methods across the biomedical spectrum.
Can social media provide reliable signals of adverse drug reactions?
Pre-market testing of drugs produces reasonably high quality information about the efficacy of the drug as a treatment for the condition for which it was approved, but gives a very incomplete picture of the drug’s safety. It is only after a drug is marketed and used on a more widespread basis over longer periods of time that it is possible to identify other effects, such as rare but serious adverse effects, or those that are more common in the special subgroups excluded from the trial, among others. Post-marketing surveillance currently relies on voluntary reporting to the FDA by health care professionals (and recently, patients themselves). Self-reported patient information captures a valuable perspective not captured by other means, and has been found to be of similar quality to that provided by health professionals. However, the value of numerous, informal self-reports such as those found in social network postings has not been evaluated. Through recently awarded NIH/NLM funding, Dr. Gonzalez is deploying the infrastructure needed to explore the value of such postings as a source of “signals” of potential adverse drug reactions soon after the drugs hit the market. Despite the significant challenge of processing colloquial text, her studies showed promising results. Additional evaluation on un-annotated comments revealed encouraging correlations between adverse drug reactions found by her system and the documented reactions for those drugs. An overview of the methods and ongoing findings of this project will be discussed in this presentation, particularly as Dr. Gonzalez seek to answer the question: can social media provide reliable signals of adverse drug reactions?
Mining Social Network Postings for Mentions of Potential Adverse Drug Reactions
Arizona State University
2012 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Dr. Gregory Cooper is a Professor of Biomedical Informatics and of Intelligent Systems at the University of Pittsburgh, where he has been a faculty member since 1990. Prior to arriving at the University of Pittsburgh, he obtained an undergraduate degree in computer science from MIT, a Ph.D. in Medical Information Sciences from Stanford University, and an M.D. from Stanford. His research theme is the application of probability theory, decision theory, Bayesian statistics, and artificial intelligence to biomedical informatics problems. His current research is focused on problems that include clinical alerting based on machine learning, causal modeling and discovery from clinical and biological data, computer-aided medical diagnosis and prediction, and the detection and characterization of disease outbreaks using clinical data. He is best known for his research on Bayesian networks, especially work on learning Bayesian networks from data. Dr. Cooper was elected as a Fellow into the American College of Medical Informatics in 1991. In 2006 he was elected as a Fellow into the Association for the Advancement of Artificial Intelligence.
Machine Learning of Patient-Specific Predictive Models from Clinical Data
A patient-specific predictive model is a model that is constructed in a way that tailors it to the particular history, symptoms, signs, laboratory results, and other features of the patient case at hand. Such a model can be applied to perform risk assessment, diagnosis, prognosis, and the prediction of response to therapy. In contrast, traditional population-wide models are constructed to perform predictions well on average for all future patient cases. By taking advantage of the known features of a given patient case, the patient-specific method may learn a model that predicts better than a population-wide method. In particular, a patient-specific approach focuses the search for predictive models to those that are closely related to the current patient case, and it specializes model evaluation (scoring) to be sensitive to the features of the current case.
This talk will describe the implementation and evaluation of a particular approach to patient-specific predictive modeling. The evaluation considers two domains. One involves predicting whether a patient with community acquired pneumonia will develop severe sepsis. The other involves predicting whether a patient with heart failure will develop serious medical complications. The results of these studies provide support that patient-specific modeling can improve the prediction of clinical outcomes.
This talk will also discuss how patient-specific methods might be applied in personalized medicine, where the predictive model for a patient is individualized, based on the use of both traditional clinical data as well as high-throughput molecular measurements, such as whole genome data.
Predicting Patient Outcomes from Clinical and Genome-Wide Data
1 R01 LM010020-01
University of Pittsburgh at Pittsburgh
Dr. Hurdle earned his MD from the University of Colorado and his MS in Computer Science from Columbia University in 1981. After working in healthcare informatics, including a stint as CIO for The Graduate Hospital in Philadelphia, he returned to research, completing his PhD in Computer Science from the University of Utah in 1994. He has completed two informatics fellowships, a postdoctoral fellowship in the Utah/VA postdoctoral program (1996-97) and, in 2007 he served as a Senior Fellow at the National Library of Medicine. Dr. Hurdle has a broad interest in the areas of clinical research and public health informatics. His current research interests include: building tools to unlock the content of clinical narratives using natural language processing; finding high-performance computing solutions to clinical research informatics challenges; and exploring novel ways to use informatics to address regulatory and bioethical concerns. His research also includes an historical interest in health-services research and a developing interest in nutritional data-mining to improve individual and population diet-related outcomes. Dr. Hurdle is an appointed member of NLM Biomedical Library and Informatics Review Committee. He has also served as chair of the American Medical Informatics Association's Ethics Committee when it created AMIA's first code of professional conduct.
Nutritional Informatics: Integrating real-time dietary patterns into the Electronic Health Record
Improving the dietary health of the nation has been a long-standing goal of healthcare researchers and practitioners, as well as of the federal government. Efforts such as the National Health and Nutrition Examination Survey (NHANES) are important epidemiological tools in the battle against weight-related healthcare morbidity and mortality. We propose here to bring informatics technology to bear as a personalized medicine intervention in the effort against weight-related healthcare morbidity and mortality. We have preliminary data that indicates we can, using data mining, extract a variety of dietary patterns from family food item sales data. In collaboration with researchers at the USDA, we are exploring ways to map these dietary patterns to standard dietary metrics, such as the Healthy Eating Index (HEI). The goal of the work he will discuss is a new research direction: to find ways to integrate these real-time dietary data into the EHR in a clinically meaningful way. Such metrics, because they are collected automatically at the point of purchase from grocery sales transactions, are virtually free of reporting bias and impose no respondent burden on patients. We see this very much as personalized medicine. By linking dietary pattern metrics to the EHR, dietary trends could become as amenable to monitoring and counseling in the clinical setting as other common biomarker measures such as lipid panels.
Hurdle, John F.
POET-2: High-performance Computing for Advanced Clinical Narrative Preprocessing
1 R01 LM010981-01A1
University of Utah
Dr. Wagner is an Associate Professor of Biomedical Informatics and Intelligent Systems at the University of Pittsburgh. He directs the Real-time Outbreak and Disease Surveillance (RODS) laboratory.
Dr. Wagner’s research focuses on real-time methods for detecting and characterizing disease outbreaks, including the development and testing of operational biosurveillance systems. In his role as director of the RODS Laboratory, Dr. Wagner led the development and implementation of two widely used biosurveillance systems: the RODS system and the National Retail Data Monitor (NRDM). Currently, Dr. Wagner is developing a third system called BioEcon, a decision analytic tool for use by analysts working in health departments.
After completing his education (BS in biology, SUNY at Stony Brook; MD, NYU School of Medicine), Dr. Wagner practiced internal medicine from 1979 to 1988 at Baltimore City Hospital, Bellevue Hospital, and with the Hawaii Permanente Medical Group. He then moved to Pittsburgh where he received additional formal training in artificial intelligence (PhD, Intelligent Systems, University of Pittsburgh) and joined the Pitt faculty in 1991. He also practiced geriatric medicine until 2002.
Decision-theoretic Model of Disease Surveillance and Control and a Prototype Implementation for the Disease Influenza
This talk will first describe a decision-theoretic model of disease surveillance and control, followed by a description of a prototype system for influenza monitoring based on the model. The decision-theoretic model connects disparate work in epidemiological modeling and disease control under a uniform mathematical formulation. The last part of the talk will focus on an ontology for population disease models and an infrastructure called the Apollo Web Service that allows end-user applications and epidemic models to interoperate. The expectation is that the theoretical model, the prototype, and the interoperability infrastructure will stimulate new avenues of research in disease surveillance/control and epidemic modeling.
Wagner, Michael M.
Decision Making in Biosurveillance
5 R01 LM009132-04
University of Pittsburgh at Pittsburgh
The slides for this presentation are available upon request by contacting Ms. Ebony Hughes at Ebony.Hughes@nih.gov.
Last Reviewed: June 11, 2020