NLM Informatics Training Conference

conference 2018

Welcome to the NLM Informatics Training Conference!

The National Library of Medicine supports research training in biomedical informatics and data science at sixteen educational institutions in the United States. These training programs offer graduate education and postdoctoral research experiences in a wide range of areas including: health care informatics, translational bioinformatics, clinical research informatics, public health informatics. In all of these areas, biomedical data science concepts and methods are part of the core curriculum. Seven programs also offer special tracks in environmental exposure informatics.

Each year an Informatics Training Conference is convened to bring NLM trainees together to showcase their work, to evaluate the full scope of current work in the field, and to meet their peers. The 2018 Informatics Training Conference will be held at Vanderbilt University, Nashville, TN, on June 4-5, 2018.

Trainees appointed at the Veterans Administration (VA) sponsored training programs, NLM’s intramural trainees, and informatics trainees at the NIH Clinical Center are also invited to attend and make presentations.

The 2018 annual conference is hosted by the Department of Biomedical Informatics at Vanderbilt University.

2018 NLM Training Conference Participants

Conference Awards

Best Presentations

Burcu Darst (1st Day)
University of Wisconsin, Madison
Intergrative Network of Metabolonics, Genomics and Alzheimer's Risk Factors

Abstract:
Although Alzheimer’s disease (AD) is highly heritable, few genetic variants have been associated with it. Genetic factors may only convey AD risk in individuals with certain environmental exposures. A multi-omics approach could be informative. We developed an integrated network to investigate relationships between metabolomics, genomics, and AD risk factors using the Wisconsin Registry for Alzheimer’s Prevention participants. Analyses included 1,111 Caucasian participants with whole blood expression for 11,376 genes (imputed from genome-wide genotyping using PrediXcan), 1,097 fasting plasma metabolites, and 19 AD risk factors. Residuals from the 12,493 variables, adjusted for sex and age, were used to test all 78,031,278 pairwise Spearman correlations. Correlations meeting a Bonferroni-adjusted P-value=6.4e-10 were used to develop an undirected graphical network, focusing on inter-omic relationships. Community detection identified the best partitions of variables. Our inter-omic network had 424 nodes and 679 edges, including 135 metabolite-gene and 529 metabolite-AD risk factor edges. No AD risk factors were directly linked to genes. However, most communities, such as those centered on insulin resistance and body mass index, included genes that were indirectly linked to AD risk factors through metabolites, suggesting that genes may influence AD risk through particular metabolites. Investigating these integrative communities further will be informative.

Burcu Darst

Dana Womack (2nd Day)
Oregon Health and Science University
Secondary Use of Ambient Data to Enable Automated Workplace Insight

Abstract:
Hospital clinical leaders face the discordant challenge of doing more with less, while avoiding detrimental patient safety, workload, and economic outcomes. Demand spikes and sustained periods of elevated patient need can overwhelm caregiver capacity, leading to adaptive work performance. To improve observability of workplace activity, this study articulates a method for liberation, aggregation, and analysis of ambient data that is automatically produced by operational systems. Guided by clinicians’ knowledge of real-world signs of strain, activity feature models are defined and applied to over 300,000 timestamped events from four operational systems. Summary statistics for extracted feature values reveal differences in activities such as medication administration, helping behaviors, and missed breaks on shifts with and without unplanned overtime. Discriminative features were applied to a classifier to predict unplanned overtime with 68% accuracy for a medical intensive care unit.

Dana Womack

Best Open Mic

Benjamin Cordier
Oregon Health and Science University
Quantum Algorithms for Bioinformatics and Clinical Informatics

Benjamin Cordier

Best Poster

Amelia Averitt
Columbia University
Noisy or Risk Allocation for Casual Inference

Abstract:
All methods for causal inference with observational data require assumptions, such fully observed covariates. Failure to meet this assumption, and others, may result in biased estimates via backdoor paths. We hypothesize that this bias will be lessened when the modeling assumptions are more principled. This research introduces a Bayesian, probabilistic model, Noisy-Or Risk Allocation (NORA), for the generation of causal knowledge from observational data. Given an outcome and a set of exposures, NORA infers the risk of the outcome for each exposure. The model is built on an alternative set of assumptions than other methods of inference, such as logistic regression (LR). In simulation, NORA recovered the risks of exposures, and produced estimates that were less biased than LR in the absence of fully observed covariates (37.1% vs 78.9%). We additionally present evidence of NORA’s ability to recover known, causal relationships from noisy observational data. Using electronic health record data from NewYork-Presbyterian Hospital, NORA identified Hyoplastic Left Heart Syndrome (58.2%) as the highest risk exposure of Heart Failure, and Renal Obstructive Defects (36.5%) for the outcome, Kidney Disease. This research demonstrates NORA’s robustness to confounding by the backdoor path, and ability to recover clinically meaningful causal relationships from observational data.

Amelia Averitt

Last Reviewed: August 6, 2018

Grants and Funding: Extramural Programs (EP)

Conference Awards

Best Presentations

Best Open Mic