2021 NLM Ada Lovelace Computational Health Lecture Series – Speaker Profiles
Omolola Ogunyemi, PhD, FACMI
Omolola Ogunyemi, PhD, FACMI is the Director of Charles R. Drew University of Medicine and Science's Center for Biomedical Informatics and a co-chair of the UCLA CTSI's biomedical informatics program. She is also an Adjunct Professor of Radiological Sciences in the David Geffen School of Medicine at UCLA with the Medical and Imaging Informatics group. She was recently a Principal Investigator on a National Library of Medicine (NLM)-funded R01 grant to develop a variety of machine learning approaches for identifying patients with latent/undiagnosed diabetic retinopathy from electronic health records or digital retinal images. Dr. Ogunyemi’s research at the CBI focuses on novel biomedical informatics solutions for problems that affect medically underserved communities. Her research interests include computerized medical decision support, reasoning under uncertainty, 3D graphics and visualization, and machine learning.
Prior to her role at Charles R. Drew University of Medicine and Science, Dr. Ogunyemi was a biomedical informatics faculty member in the Department of Radiology at Brigham and Women's Hospital and Harvard Medical School. She was also a member of the affiliated faculty in the Harvard-MIT Division of Health Sciences and Technology. Dr. Ogunyemi holds an undergraduate degree in Computer Science from Barnard College, Columbia University and an M.S.E, and Ph.D. in Computer and Information Science from the University of Pennsylvania.
Lecture Abstract
Diabetic retinopathy is the leading cause of blindness in working age adults in the United States. It is challenging to address in both rural and urban underserved settings, which suffer from shortages of eye specialists. This talk will describe the approach taken to address this condition in a medically underserved area (South Los Angeles) by researchers in the Center for Biomedical Informatics at Charles R. Drew University of Medicine and Science, using telehealth and machine learning on data from patient electronic health records.
2020 NLM Ada Lovelace Computational Health Lecture Series – Speaker Profiles
John Holmes, PhD
John H. Holmes, PhD, is Professor of Medical Informatics in Epidemiology at the University of Pennsylvania Perelman School of Medicine. He is the Associate Director of the Penn Institute for Biomedical Informatics and is Past-Chair of the Graduate Group in Epidemiology and Biostatistics. Dr. Holmes has been recognized nationally and internationally for his work on developing and applying new artificial intelligence approaches to mining epidemiologic surveillance data. Dr. Holmes’ research interests are focused on the intersection of medical informatics and clinical research, specifically evolutionary computation and machine learning approaches to knowledge discovery in clinical databases, deep electronic phenotyping, interoperable information systems infrastructures for epidemiologic surveillance, and their application to a broad array of clinical domains, including cardiology and pulmonary medicine. He has served as the co-lead of the Governance Core for the SPAN project, a scalable distributed research network, and participates in the FDA Sentinel Initiative. Dr. Holmes is an elected Fellow of the American College of Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics Medical Informatics (ACMI), the American College of Epidemiology (ACE), and the International Academy of Health Sciences Informatics (IAHSI).
Lecture Abstract
Traditional methods of epidemic modeling continue to be used fruitfully for characterizing outbreaks and predicting the spread of disease in populations. However, these methods, typically rely on what are known as “compartment models”, requiring assumptions that are not necessarily sensitive to the ever-changing environmental, behavioral, temporospatial, and social phenomena that influence disease spread. Compartment models can be enriched by the judicious use of robust methods drawn from the field of artificial intelligence that allow us to model more accurately and more quickly the population and disease dynamics that are central to developing policies for prevention, detection, and treatment. We will explore these approaches, including some that are currently in use as well as a proposal for novel, next-generation machine learning tools for epidemiologic investigation.
This lecture will be live-streamed globally, and subsequently archived, by NIH video casting: https://videocast.nih.gov/watch=37909
Quynh Nguyen, PhD, MSPH
Quynh Nguyen is Assistant Professor of Epidemiology and Biostatistics at the University of Maryland School of Public Health. She is a social epidemiologist focusing on contextual and economic factors as they relate to health. She has extensive experience using numerous national and international population-based health surveys to examine social and economic predictors of health, and to quantify national and international patterns in health disparities. Her current research program focuses on creating and validating neighborhood indicators constructed from nontraditional Big Data sources such as social media data and GSV images.
Lecture Abstract
Using Google Street View Images to Examine Links Between the Built Environment and Health
Advances in neighborhood research have been constrained by the lack of neighborhood data for many geographical areas. Dr. Nguyen will discuss the use of Google Street View (GSV) images as a source of national data on built environment features and the use of computer vision to label images for indicators of walkability, urban development, and physical disorder. She will discuss how they collect images and how they identify relevant built environment features from street images. Dr. Nguyen will present preliminary analyses examining associations between built environment features and health outcomes at the census tract and county levels. GSV images represent an underutilized resource for building national data on neighborhoods and examining the influence of built environments on community health outcomes across the United States.
Quynh Nguyen
Neighborhood Looking Glass:360 Degree Automated Characterization of the Built Environment for Neighborhood Effects Research
R01 LM012849
University of Maryland, College Park
2019 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Matthew Scotch, PhD, MPH
Matthew Scotch is Associate Professor of Biomedical Informatics at Arizona State University (ASU). He is also Assistant Director of ASU’s Biodesign Center for Environmental Health Engineering. His work lies at the intersection of bioinformatics and public health informatics and focuses on the theory and application of genomics-informed public health surveillance of RNA viruses. Dr. Scotch has a particular interest in human and avian influenza. He has published extensively on this work including in journals such as: Molecular Biology and Evolution, PLoS Computational Biology, Viruses, Virus Evolution, and Bioinformatics. Dr. Scotch is a Council member for the International Society for Influenza and other Respiratory Virus Diseases, an Editor for Infection, Genetics, and Evolution, and Scientific Reports. He is a frequent reviewer on NLM study sections, a member of AMIA since the early 2000’s, and a former Chair of the AMIA Public Health Informatics Working Group.
Lecture Abstract
Informatics for Genomics-informed Surveillance of RNA viruses
Genomics-informed surveillance is now recognized as an important extension to the monitoring of rapidly evolving pathogens. Next generation sequencing has the ability to produce large amounts of data for tracking viruses of public health importance. Biomedical informatics approaches are able to facilitate the translation of these data into information for public health surveillance. Thus, epidemiologists can identify new outbreaks or monitor the course of a known epidemic by leveraging pathogen sequences (and corresponding metadata) generated from the clinical specimens of sick patients. In this presentation, Dr. Scotch will discuss NLM-funded projects related to the development and evaluation of a surveillance system that uses virus sequences to study the evolution, spread, and population size of viruses across geographic areas. This includes the development of a pipeline for virus phylogeography and spread and its utilization as part of a newly funded project on metagenomics of wastewater for outbreak detection and epidemic monitoring including seasonal influenza. This work aims to highlight the value of using biomedical informatics to translate viral genetic data into valuable information for surveillance of both known and novel viruses.
Matthew Scotch
Merging Viral Genetics with Climate and Population Data for Zoonotic Surveillance
R01 LM012080
Bioinformatics Framework for Wastewater-based Surveillance of Infectious Diseases
R01 LM013129
Arizona State University-Tempe Campus
Noémie Elhadad, PhD
Noémie Elhadad is Associate Professor and co-interim Chair at the Department of Biomedical Informatics at Columbia University, affiliated with the Columbia Computer Science Department and Data Science Institute. She received her PhD in Computer Science from Columbia University. Her research is at the intersection of machine learning, technology, and medicine. She investigates ways in which observational clinical data (e.g., electronic health records) and patient-generated data (e.g., online health community discussions, mobile health data) can enhance access to relevant information for patients, clinicians, and health researchers alike and can impact care and health of patients. Dr. Elhadad is a current member of NLM's Biomedical Informatics, Library and Data Sciences Review Committee.
Lecture Abstract
Advancing Women's Health through Data Science and Personal Health Informatics
Endometriosis is a chronic, inflammatory, and estrogen-dependent condition with a high burden on quality of life, estimated to affect 6-10% of women of reproductive age worldwide. Despite its high prevalence, it is an enigmatic condition: there is currently no cure and no known biomarker or non-invasive diagnostic test for this multifactorial disease. In this talk, Dr. Elhadad will report on ongoing research on two inter-related questions: how to characterize and discover the different ways in which endometriosis presents in individuals, essentially phenotyping the disease, and how to support individuals with self-discovery and management about the disease considering its heterogeneous presentations. She will show the current characterization of endometriosis from clinical data sources and discuss its current limitations, specifically the disconnect with the day-to-day patient experience of endometriosis. She will present the design and development of a personal health informatics solution (a research app called Phendo) and the analysis of the data contributed by Phendo participants towards phenotyping endometriosis. Finally, She will discuss how these data can be leveraged further to support individuals in learning about and self-managing their condition, as well as facilitating shared decision making with their providers.
Noémie Elhadad
PhendoPHL:A Data-Science Enabled Personal Health Library to Manage Endometriosis
R01 LM013043
Columbia University Health Sciences
Samantha Kleinberg, PhD
Samantha Kleinberg is an Associate Professor of Computer Science at Stevens Institute of Technology. She received her PhD in Computer Science from New York University and was a Computing Innovation Fellow at Columbia University in the Department of Biomedical informatics. She is the recipient of NSF CAREER and JSMF Complex Systems Scholar Awards and is a 2016 Kavli Fellow of the National Academy of Sciences. She is the author of “Causality, Probability, and Time” (Cambridge University Press, 2012) and “Why: A Guide to Finding and Using Causes” (O’Reilly Media, 2015). Dr. Kleinberg is a current member of NLM's Biomedical Informatics, Library and Data Sciences Review Committee.
Lecture Abstract
From Data to Decisions: Large-Scale Causal Inference in Biomedicine
The collection of massive observational datasets has led to unprecedented opportunities for causal inference, such as using electronic health records to identify risk factors for disease. However, our ability to understand these complex data sets has not grown at the same pace as our ability to collect them. While causal inference has traditionally focused on pair-wise relationships between variables, biological systems are highly complex and knowing when events may happen is often as important as knowing whether they will. Motivated by the analysis of intensive care unit data, this talk discusses new methods to automatically extract causal relationships from data and how these have been applied to gain new insight into stroke recovery. Finally, the speaker will discuss recent findings in cognitive science and how they can help us make better use of causal information for decision-making.
Samantha Kleinberg
BIGDATA: Causal Inference in Large-Scale Time Series
R01 LM011826
Stevens Institute of Technology
2018 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Elizabeth Chen, PhD
Elizabeth Chen is the Founding Associate Director of the Brown Center for Biomedical Informatics (BCBI), Associate Professor of Medical Science, and Associate Professor of Health Services, Policy & Practice at Brown University. She received a BS in Computer Science from Tufts University and PhD in Biomedical Informatics from Columbia University. Within BCBI, Dr. Chen leads the Clinical Informatics Innovation and Implementation (CI3) Laboratory that is focused on leveraging EHR technology and data to improve healthcare delivery and biomedical discovery. Her research interests include clinical documentation, clinical decision support, health information needs, standards and interoperability, natural language processing, and data mining. Dr. Chen is an elected fellow of the American College of Medical Informatics and is currently a member of NLM’s Biomedical Informatics, Library and Data Sciences Review Committee.
Lecture Abstract
Knowledge Discovery in Clinical and Biomedical Data: Case Studies in Pediatrics and Mental Health
With the widespread adoption of electronic health records and increasing discoveries reported in biomedical literature, computational approaches are needed for further knowledge discovery and hypothesis generation. Challenges include the capture of key information within text and standardization issues, requiring use of natural language processing and data integration techniques. Clinical data mining and biomedical literature mining have been used in a range of contexts to discover disease knowledge such as comorbidities and patterns related to social, behavioral, and familial (SBF) factors. In this lecture, a series of case studies will be presented on representing, extracting, integrating, mining, and visualizing SBF factors and comorbidities for pediatric and mental health conditions. Collectively, these studies demonstrate use of systematic processes and development of open-source tools for transforming clinical and biomedical data into knowledge.
Elizabeth Chen
Leveraging the EHR to Collect and Analyze Social, Behavioral & Familial Factors
R01 LM011364
University of Vermont
John Gennari, PhD
John Gennari has been a professor in biomedical and health informatics at the University of Washington for over 15 years. His doctorate and background is in computer science, but he began working in biomedical informatics in the early ‘90s, beginning with work on the Protégé knowledge representation and ontology development system. He has had a wide range of experiences on large, multi-institutional, multi-disciplinary projects, and this led to his research focus of knowledge reuse and knowledge sharing. His expertise is in ontology development, standards and semantic web tools. Dr. Gennari is the Graduate Program Director at UW, overseeing the PhD and research M.S. programs. He is currently on the NLM Biomedical Informatics, Library and Data Sciences Review Committee.
Lecture Abstract
Semantic Annotations, Reuse, and Reproducibility
Biomodeling (or biosimulation modeling) has the potential to revolutionize patient-specific health care and precision medicine. To increase our knowledge and management of complex pathologies, biomodeling provides the ability to produce detailed, mechanistic simulations of the dynamic biological processes and their participants. The development of these biomodels can be viewed as analogous to software development. To be effective and to scale to larger systems, the models must include clear documentation (semantic annotations), be developed in a reproducible manner, and be designed to allow for plug-and-play reuse so that researchers can build from the efforts of others. In the presentation, Dr. Gennari will report on his group’s efforts to standardize practices for semantic annotation, and to demonstrate the value of those annotations both for semantic searching over model repositories and for model merging and model reuse tasks. Over the last several years, they have succeeded in building community-wide agreement on both the importance of semantic annotation and the format of these annotations. In addition, using their annotation and model reuse tool, they have developed several demonstration examples of model merging that leverage the use of semantic annotation. Finally, as an important consequence of their work, he will also report on the initiation of a new Center for Reproducible Biomedical Modeling.
John Gennari
Physiological Knowledge Integration and Recombinant Modeling via Accelerated Sema
R01 LM011969
University of Washington
S. Joshua Swamidass, MD PhD
S. Joshua Swamidass is an Assistant Professor of Laboratory and Genomic Medicine at Washington University School of Medicine (http://swami.wustl.edu). His group studies information with new computational methods, at the intersection of biology, medicine and chemistry. He is funded by the National Library Medicine (NLM) to model bioactivation pathways, and how bioactivation pathways change in children. Dr. Swamidass is currently on the NLM Biomedical Informatics, Library and Data Sciences Review Committee.
Lecture Abstract
Translating from Chemistry to Clinic with Deep Learning
Many medicines become toxic only after bioactivation by metabolizing enzymes. Often, metabolic enzymes transformed them into chemically reactive species, which subsequently conjugate to proteins and cause adverse events. For example, carbamazepine is epoxidized by P450 enzymes in the liver, but then conjugates to proteins, causing Stevens Johnson Syndrome in some patients. The most difficult to predict drug reactions, idiosyncratic adverse drug reactions, often depend on bioactivation. Our group has been using deep learning to model the metabolism of diverse chemicals, and the subsequent reactivity of their metabolites. Deep learning systematically summarizes the information from thousands of publications into quantitative models of bioactivation, predicting exactly how medicines are modified by metabolic enzymes. These models are giving deeper understanding of why some drugs become toxic, and others do not. At the same time, deep learning can be used to understand drug toxicity as it arises in clinical data, and why some patients are affected, but not others. A conversation between the basic and clinical sciences is now possible, where patient outcomes can be understood in light of bioactivation mechanisms, and these mechanisms can explain why some patients are susceptible to drug toxicity, and others are not.
S. Joshua Swamidass
Data and Tools for Modeling Metabolism and Reactivity
R01LM012222
University of Arkansas for Medical Sciences
Computationally Modeling the Impact of Ontogeny on Drug Metabolic Fate
R01LM012482
University of Wisconsin Madison
2017 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
James Cimino, MD
Dr. James Cimino is a board certified internist and clinical informatician, currently a Professor of Medicine and inaugural Director of the Informatics Institute at the University of Alabama-Birmingham School of Medicine. He has been carrying out clinical informatics research, building clinical information systems, teaching medical informatics and medicine, and caring for patients for over 30 years, with principal research areas in desiderata for controlled terminologies, mobile and Web-based clinical information systems for clinicians and patients, context-aware form of clinical decision support called “infobuttons”, and clinical research data repositories. Past appointments include a Professor of Biomedical Informatics and Medicine at Columbia University and Chief of the Laboratory for Informatics Development at the NIH Clinical Center and the National Library of Medicine. He is co-editor of a leading textbook on Biomedical Informatics and is an Associate Editor of the Journal of Biomedical Informatics. His honors include fellowships of the American College of Physicians and the American College of Medical Informatics, the Donald A.B. Lindberg Award for Innovation in Informatics and the President’s Award from the American Medical Informatics Association, and induction into the National Academy of Medicine. Dr. Cimino is currently on the National Library of Medicine Biomedical Informatics, Library and Data Sciences Review Committee.
Lecture Abstract
Transforming Electronic Health Records from Annoyances to Assistants: A Research Agenda for the Next Decade
Clinical informatics research, and before that, medical informatics research, has made great strides in developing tools to help clinicians improve clinical decision-making and patient care. Yet, electronic health records (EHR) systems today show little aptitude for even simple tasks, like retrieving relevant patient information, while suppressing that which is irrelevant. When bringing artificial intelligence to bear, the best EHRs seem to do is to overwhelm us with alerts that a clinician must override to take action. When the “learning health system” attempts to use data from these systems, it must rely on indirect methods, such as machine learning and natural language processing, to figure out what was actually going on with the patient. The advances that have been made to bring decision support into EHRs rely on formally represented – that is structured and coded – data, such as problem lists, laboratory results and medication lists. What’s missing is a formal representation of the clinical cognition of the patient’s situation: what we think is going on, what our goals are, what we are trying to do about it, and why we have chosen to do it that way. Adding such information to the EHR would enable informaticians to enhance their tools in ways that will improve situational awareness, reduce information overload, make decision support systems provide more relevant knowledge to clinicians, and enable clinical researchers to draw more solid inferences from observational data. Informatics research is needed to understand what needs to be captured, determine how it should be represented, design user interfaces to minimize the effort required, and develop tools that ultimately reduce the work of clinical documentation by reducing redundant data entry, anticipating and executing work plans, and improve the quality and efficiency of patient care. Dr. Cimino will provide illustrations of the formal representation and use of clinical cognition and present a roadmap for research, development and education toward that goal.
David Page, PhD
David Page is a Vilas Distinguished Achievement Professor at the University of Wisconsin-Madison. His primary appointment is in the Dept. of Biostatistics and Medical Informatics in the School of Medicine and Public Health, with an appointment in the Dept. of Computer Sciences where he teaches machine learning. His PhD in CS is from the University of Illinois at Urbana- Champaign, and he became involved in biomedical applications of machine learning as a post-doc in what was then the Computing Laboratory at Oxford University. He directs the Cancer Informatics Shared Resource of the Carbone Cancer Center and is a member of the Genome Center of Wisconsin. He previously served on the NIH's BioData Management and Analysis Study Section and the scientific advisory boards for the Wisconsin Genomics Initiative and the Observational Medical Outcomes Partnership, as well as the editorial boards for Machine Learning and Data Mining and Knowledge Discovery. He currently is on the National Library of Medicine Study Section (BLIRC) and directs the EHR project within UW-Madison's BD2K Center for Predictive Computational Phenotyping.
Lecture Abstract
Interpretation of Human Genomes And Identification of Impactful Variants Using Biomedical Informatics
The widespread use of electronic health records and the many recent successes of machine learning raise at least two questions. How well can future health events of patients be predicted from EHR data, at various lengths of time in advance? And how can such predictions improve human health? This talk answers the first question via an approach called high-throughput machine learning, and it speculates about answers to the second question. In particular, this talk argues that many healthcare applications require not just accurate prediction, but accurate prediction by causally-faithful models. Causal discovery from observational data is already a major research direction in machine learning and statistics, and this talk discusses new approaches across the spectrum from when "we know all the relevant variables" to when "we know only one relevant variable" for the task at hand. If time permits, the talk will also touch on the issue of protecting patient privacy while empowering the construction of accurate predictive models.
Page, David
Secure Sharing of Clinical History & Genetic Data: Empowering Predictive Personalized Medicine
R01LM011028
University of Wisconsin Madison
2016 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Sean Mooney, PhD
Professor Mooney has spent his career as a researcher and group leader in biomedical informatics. He now leads research IT for UW Medicine and is leading efforts to support and build clinical informatics platforms as its first Chief Research Information Officer (CRIO). He is a professor in the Department of Biomedical Informatics and Medical Education at the University of Washington. Previous to his CRIO role, he was an Associate Professor and Director of Bioinformatics at the Buck Institute for Research on Aging. As an Assistant Professor, he was appointed in Medical and Molecular Genetics at Indiana University School of Medicine and was founder and director of the Indiana University School of Medicine Bioinformatics Core. In 1997, he received his B.S. with Distinction in Biochemistry and Molecular Biology from the University of Wisconsin at Madison. He received a Ph.D. in 2001 at the University of California in San Francisco, and then an American Cancer Society John Peter Hoffman Fellowship at Stanford University.
Lecture Abstract
Interpretation of Human Genomes And Identification of Impactful Variants Using Biomedical Informatics
Whole exome and whole genome sequencing is continuing to challenge researchers with a wealth of genetic variants of unknown disease effects. We are investigating genomic and proteomic attributes that describe genetic variants in human genome sequences and then we are using those attributes to predict pathogenic variants that affect protein structure and function, mRNA processing and translation, and transcriptional regulation. To that end, we have built the MutPred suite of tools for discovering and characterizing pathogenic and pharmacogenetic variants from whole genome sequencing. We are applying these tools in collaboration with genetic studies to better understand the causes of human disease, and I will illustrate using examples of both complex and monogenic diseases. Further, we are leveraging the crowd by organizing and participating in community challenges (critical assessments) to build a better understanding of the types of approaches that perform well in genome interpretation and in what context. I will discuss our involvement in two critical assessment communities, the Critical Assessment of Genome Interpretation and the Critical Assessment of Functional Annotation.
Mooney, Sean
Informatic Profiling of Clinically Relevant Mutation
R01LM009722
University of Washington
Kellie Archer, Ph.D
Kellie Archer is a Professor in the Department of Biostatistics and Director of the Massey Cancer Center Biostatistics Shared Resource at Virginia Commonwealth University. She completed her PhD at The Ohio State University and previously worked there in support of research associated with the Cancer and Leukemia Group B (CALGB) Leukemia Correlative Sciences Committee. She now works primarily in developing innovative statistical methods and software for the analysis of high-dimensional datasets such as those arising from high-throughput genomic platforms. Dr. Archer is the author or co-author of 107 published papers, two book chapters, over 30 university seminars/professional conference presentations. She holds an editorial appointment at Progress in Transplantation and is a Statistical Consultant for Radiology and the Nature Publishing Group. She serves as a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Lecture Abstract
Predicting an Ordinal Response Using Features from High-Throughput Genomic Assays
Ordinal scales are commonly used to measure health status and disease related outcomes. An ordinal outcome takes on one of several categorical levels where there is a clear ordering of the categorical levels but no intrinsic numerical relationship between them. As an example, economic status is often recorded as an ordinal outcome taking on three categorical levels of low, medium, and high income. Notable examples in medicine include stage of cancer, grading the severity of an adverse event, and response of target lesions to chemotherapy. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical likelihood-based ordinal modeling methods have contributed to the analysis of data in which the response categories are ordered and the number of predictor variables is smaller than the sample size. With the emergence of genomic technologies being increasingly applied to identify molecular markers associated with complex disease phenotypes and outcomes, many research studies now include high dimensional feature data where the number of predictor variables greatly exceeds the sample size, so that traditional methods cannot be applied. To fill this void we have developed penalized ordinal response models for classifying and predicting an ordinal response. Additionally, we adapted our method to the longitudinal setting to enable modeling disease progression along with time. We demonstrate our methods using data from two different studies that used high-throughput genomic platforms, the Illumina GoldenGate Methylation BeadArray and Affymetrix gene expression profiles.
Archer, Kellie
Informatic tools for Predicting an Ordinal Response for High-Dimensional Data
R01 LM011169
Virginia Commonwealth University
Mark Craven, PhD
Mark Craven is a professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin, and an affiliate faculty member in the Department of Computer Sciences. He is the Director of the Center for Predictive Computational Phenotyping, one of the NIH's Centers of Excellence for Big Data Computing. He is also the Director of the NIH/NLM-funded Computation and Informatics in Biology and Medicine (CIBM) Training Program, and a member of the Institute for Clinical and Translational Research, the Carbone Cancer Center, and the Genome Center of Wisconsin. The focus of his research program is on developing and applying machine-learning methods to the problems of inferring models of, and reasoning about, networks of interactions among genes, proteins, clinical and environmental factors, and phenotypes of interest. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Lecture Abstract
Inferring Host-Pathogen Interactions from Diverse Data Sources
Insight into the mechanisms and context of host-pathogen interactions can be gained by applying computational methods to a broad range of experimental, observational, and secondary data sources. Dr. Craven will discuss his work in three studies that involve developing and applying predictive methods in order to characterize host-pathogen interactions. In the first study, he is focused on inferring host subnetworks that are involved in viral replication from genome-wide loss-of-function experiments. Although these experiments can identify the host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell, they do not elucidate how these genes are organized into the biological pathways that mediate host-virus interactions. His team is developing novel computational methods that use a wide array of secondary data sources, including the scientific literature, to transform the measurements from these assays into hypotheses that predict the pathways in the cell that relate implicated genes to viral replication. In the second study, he is applying machine-learning methods to understand how variation in the genome of HSV-1 influences multiple ocular disease phenotypes in a host. In the third study, he is investigating the extent to which risk for various infectious disease phenotypes can be predicted from electronic health records.
Craven, Mark
Inferring Host-Pathogen Interactions from Diverse Data Sources
R01LM007050-06
University of Wisconsin Madison
2015 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Joshua Denny, MD
Joshua Denny, MD is an Associate Professor in the Departments of Biomedical Informatics and Medicine at Vanderbilt University Medical Center. A primary interest of his lab has been development of the phenome-wide association study (PheWAS) method applied to electronic health records (EHRs) to rapidly uncover genetic pleiotropy and highlight potential drivers of genetic associations with endophenotypes. He helps lead efforts for local and network pharmacogenetics implementation activities. He is part of the NIH-supported Electronic Medical Records and Genomics (eMERGE) network, Pharmacogenomics Research Network (PGRN), and Implementing Genomics in Practice (IGNITE) networks. He is past recipient of the American Medical Informatics Association New Investigator Award, Homer Warner Award, and Vanderbilt Chancellor’s Award for Research. Dr. Denny remains active in clinical care and in teaching students. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Lecture Abstract
Use of Clinical Big Data to Inform Precision Medicine
Precision medicine offers the promise of improved diagnosis and more effective, patient-specific therapies. Typically, clinical research studies have been pursued by enrolling a cohort of willing participants in a town or region, and obtaining information and tissue samples from them. At Vanderbilt, Dr. Denny and his team have linked phenotypic information from de-identified EHRs to a DNA repository of nearly 200,000 samples, creating a ‘virtual’ cohort. This approach allows study of genomic basis of disease and drug response using real-world clinical data. Finding the right information in the EHR can be challenging, but the combination of billing data, laboratory data, medication exposures, and natural language processing has enabled efficient study of genomic and pharmacogenomic phenotypes. The Vanderbilt research team has put many of these discovered pharmacogenomic characteristics into practice through clinical decision support. The EHR also enables the inverse experiment – starting with a genotype and discovering all the phenotypes with which it is associated – PheWAS. PheWAS requires a densely-phenotyped population such as found in the EHR. Dr. Denny’s research team has used PheWAS to replicate more than 300 genotype-phenotype associations, characterize pleiotropy, and discover new associations. They have also used PheWAS to identify characteristics within disease subtypes.
Denny, Joshua
From GWAS to PheWAS: Scanning the EMR Phenome for Gene-Disease Associations
R01LM010685-01A1
Vanderbilt University
Atul Butte, MD, PhD
Atul Butte, MD, PhD is the founding Director of the newly-established Institute of Computational Health Sciences at the University of California, San Francisco, and a Professor of Pediatrics. Prior to his new position, he was the chief of the Division of Systems Medicine and Associate Professor of Pediatrics at Stanford University and Lucile Packard Children’s Hospital, where he has been a faculty member for the past decade. Trained in both Computer Science and Medicine at Brown University, Dr. Butte previously worked as a software engineer at Apple Inc. and Microsoft Corp., and received his PhD in Health Sciences and Technology from Harvard Medical School and MIT. He has authored nearly 200 publications, with research repeatedly featured in the New York Times, and the Wall Street Journal and Wired Magazine. In 2013, Dr. Butte was recognized by the White House as an Open Science Champion of Change for promoting science through publicly available data. Dr. Butte is also a founder of several Bay Area biotech startup companies. He is also a member of the National Library of Medicine Biomedical Library and Informatics Review Committee.
Lecture Abstract
Data-Driven Precision Medicine
There is an urgent need to translate genome-era discoveries into clinical utility, but the difficulties in making bench-to-bedside translations have been well described. The nascent field of translational bioinformatics may help. Dr. Butte's lab builds and applies computational tools to convert hundreds of trillions of points of molecular, clinical, and epidemiological data collected by researchers and clinicians worldwide over the past decade, now commonly known as “big data”, into new diagnostics, therapeutics, and insights into rare and common diseases. Dr. Butte, a bioinformatician and pediatric endocrinologist, will highlight how publicly-available molecular measurements to find new uses for drugs including drug repositioning for inflammatory bowel disease, discovering new treatable inflammatory mechanisms of disease in type 2 diabetes, and how the next generation of biotech companies might even start in your garage.
Butte, Atul
Integrating Microarray and Proteomic Data by Ontology-based Annotation
R01LM009719-01A1
Stanford University
John Pestian, PhD, MBA
Dr. John Pestian, Ph.D., MBA is a professor of Pediatrics and Biomedical Informatics at Children’s Hospital Medical Center, University of Cincinnati. He joined the faculty in 2000 as the founding director of the Division of Biomedical Informatics. He has been active in translating neuropsychiatric innovations from the bench to the bedside. One innovation, Optimization and Individualization of Medication Selection and Dosing has been used to identify optimal neuropsychiatric drugs in over 150,000 people. Dr. Pestian’s lab currently focuses on collection and analysis of prospective multimodal data like words, vocal sounds and facial expressions for predicting repeated suicide attempts, depression states, and anxiety in adolescents.
Lecture Abstract
Phenotypical Cohort Retrieval Using the Multi-Instutional Pediatric Epilepsy Decision Support (MiPeds) System
The Multi-Institutional Pediatric Epilepsy Decision Support (MiPeds) system provides point-of-care surveillance of phenotypically similar pediatric epilepsy patients using the electronic health records (EHR) from Cincinnati Children’s Hospital Medical Center, Children’s Hospital of Philadelphia, and Children’s Hospital of Colorado. Using this near real-time cohort retrieval system, the three organizations can review similarities and differences in clinical measures like: medication side-effects, types of seizures, seizure frequency, quality of life, neurological abnormalities, and so forth. This talk will describe the successes and challenges of developing MiPeds. Examples will focus on: aligning the research and clinical needs of each organization with data standards, factors that influence centralization and decentralization, automated method of de-identification, the usefulness of I2B2, developing collaborative measures of data quality and quality of care, searching and visualization. Efforts to generalize this novel approach to other neuropsychiatric disease will be described as well. NLM-Funded Research
Pestian, John
Capturing Patient-Provider Encounter through Text Speech and Dialogue Processing
R01LM011124-03
Cincinnati Children's Hospital Medical Center
2014 NLM Biomedical Informatics & Data Science Lectures – Speaker Profiles
Last Reviewed: July 2, 2025