Skip Navigation Bar

Grants and Funding: Extramural Programs (EP)

NLM Informatics Training Conference 2011

National Institutes of Health (NIH) Campus, Natcher Conference Center, Bethesda, MD
June 28–30, 2011
Agenda and Abstracts of Presentations and Posters

Agenda

Tuesday, June 28, 2011

TimeAgenda Item
7:00 AM - 7:45 AM
Breakfast
Poster Setup
Day 1 Group
7:45 AM - 8:00 AM
Welcome
8:00 AM - 8:20 AM
NLM Director's Remarks
8:20 AM - 8:30 AM
Introduction of Training Directors and Trainees; Program Update
8:30 AM - 9:45 AM
Plenary Paper Session #1 (1 hour 15 minutes, paper + discussion) (Main Auditorium, Lower Level) Moderator: George Hripcsak, Columbia
9:45 AM - 10:30 AM
Poster Session and Coffee Break - Attended - Day 1 Group: Posters (45 minutes Grouped by Topic) (Atrium, 1st Floor)
  • Topic 1 - Healthcare
    Ligons/Pittsburgh; Tong/UCLA; Salmasian/Columbia; O'Rourke/MGH-Harvard; Ronquillo/Harvard; Essaid/OHSU; Bhatia/Vanderbilt; Liu/Vanderbilt; Gobbel/VA; Hope/VA
  • Topic 2 - Translational
    Newburger/Stanford; McPeek Hinz/Vanderbilt; Pehlke/Wisconsin; Sprecher/Yale; Tegge/Missouri; Peng/Regenstrief
  • Topic 3 - Bioinformatics
    Linares/Utah; Stokes/Pittsburgh; Grinter/Missouri; Rojas/U Texas-Houston; Zhang/U Texas-Houston; Newkirk/UC Irvine
10:30 AM - 11:30 AM
Open Mic Session #X1 - Health Care & Public Health Informatics (Main Auditorium) Moderator: Bill Hersh, OHSU
  • Ahmed - Vanderbilt: MyMedEffects: Improving Drug Safety by Enabling Self-Reporting of Adverse Events and Side Effects using Social Microblogging
  • Farrington - U Virginia: Mixed Reality Simulator for the Treatment of Phantom Limb Pain
  • Hinz - Vanderbilt: Risk Stratification of Outpatient Populations for Outcomes in Quality Care Markers
  • Killoran - U Texas HSC: Linking EMR Usability and Outcomes in Clinical Anesthesia Practice
  • McKanna - OHSU: Clinical Validation of Computer Game Play as an Unobtrusive and Continuous Measure of Divided Attention in Elderly
  • Weiskopf - Columbia: Quality Assessment of EHR Data for Secondary Use
  • Workman - Utah: A New NLP Approach for Point-of-Care Data Discovery
11:30 AM - 12:30 PM
Lunch
  1. Birds of a Feather Lunch Tables (Bioinformatics, Health Care Informatics, Public Informatics, Translational Bioinformatics)
  2. Grant Program Session for Trainees (K grants, ESI R01 grants, R21 grants) (Room F1/F2, Lower Level)
11:30 AM - 12:30 PM
Executive Session of Training Directors (Room E1/E2, Lower Level) Session Chair: Dr. Donald A.B. Lindberg
12:45 PM - 2:00 PM

Plenary Paper Session #2 (1 hour 15 minutes, papers + discussion) (Main Auditorium) Moderator: Pierre Baldi, University of California, Irvine

2:00 PM - 3:00 PM
Parallel Paper Focus Session A (1 hour, 3 papers each + discussion) Focus Session A1 (Main Auditorium) Moderator: JT Finnell, Indiana

Focus Session A2 (Balcony A, 1st Floor, Upper Level) Moderator: Ira Kalet, University of Washington

3:00 PM - 4:00 PM
Poster Session and Coffee Break - Attended - Day 1 Group: Posters (1 hour Grouped by Topic) (Atrium, 1st Floor)
  • Topic 1 - Healthcare
    Ligons/Pittsburgh; Tong/UCLA; Salmasian/Columbia; O'Rourke/MGH-Harvard; Ronquillo/Harvard; Essaid/OHSU; Bhatia/Vanderbilt; Liu/Vanderbilt; Gobbel/VA; Hope/VA
  • Topic 2 - Translational
    Newburger/Stanford; McPeek Hinz/Vanderbilt; Pehlke/Wisconsin; Sprecher/Yale; Tegge/Missouri; Peng/Regenstrief
  • Topic 3 - Bioinformatics
    Linares/Utah; Stokes/Pittsburgh; Grinter/Missouri; Rojas/U Texas-Houston; Zhang/U Texas-Houston; Newkirk/UC Irvine
4:00 PM
Buses to Hotel
5:30 PM - 9:00 PM
Dinner at the Strathmore

Wednesday, June 29, 2011

TimeAgenda Item
7:30 AM - 8:15 AM
Breakfast
Poster Setup
Day 2 Group
8:15 AM - 9:30 AM

Open Mic Session #X2 - Bioinformatics & Translational Informatics (Main Auditorium) Moderator: Bill Caldwell, University of Missouri

  • Bruggner - Stanford: Automated Identification of Predictive Cell Populations from High-Dimensional Mass Cytometry Data
  • Diedrich - U Utah: Diagnosing a Newly Identified Cardiovascular Syndrome by Quantitative Medical Image Analysis and Electronic Medical Records
  • Fluitt - U Wisconsin: Monomeric and Early Aggregation Structures of Short Polyglutamine Chains
  • Harpaz - Columbia: Adverse Drug Event Discovery in Large Medical Data Repositories
  • Laderas - OHSU: Building and Analyzing a Dynamic Model to Prioritize Combination Drug Therapies in a Cancer Signaling Pathway
  • Lesniak - U Virginia: Integrating Touch Receptor Imaging and Recording through Computational Modeling
9:30 AM - 10:15 AM

Poster Session and Coffee Break - Attended - Day 2 Group: Posters (45 minutes Grouped by Topic) (Atrium, 1st Floor)

  • Topic 1 - Tools/Techniques
    Hugine/Virginia, Pivovarov/Columbia, Ikuta/Harvard, Liu/Colorado, Nettleton/OHSU, Bromley/UWashington, Fernald/Stanford, Jing/NLM
  • Topic 2 - Public Health
    Le/UWashington, Shen/JHU, Brewster/Utah, Quintiliani/Boston U, Ambert/OHSU
  • Topic 3 - Health Care
    Craven/Missouri, Mohammed-Rajput/Regenstrief
  • Topic 4 - Bioinformatics
    Bolen/Yale, Eskow/Colorado, Ochoa/U Houston, Williams/Virginia, Kayala/UC Irvine, Ortiz/Virginia, Timm/Wisconsin
10:15 AM - 11:30 AM

Plenary Paper Session #3 (1 hour 15 minutes, papers + discussion) (Main Auditorium) Moderator: Harold Lehmann, Johns Hopkins

11:30 AM - 12:30 PM

Lunch
Grants Management/X-TRAIN Meeting (Room F1/F2 , Lower Level)

12:30 PM - 2:00 PM

Parallel Papers Focus Session B (1.5 hours, papers + discussion)

Focus Session B1: (Main Auditorium) Moderator: Larry Hunter, University of Colorado

Focus Session B2: (Balcony A, 1st Floor, Upper Level) Moderator: Joyce Mitchell, University of Utah

Focus Session B3: (Balcony B, 1st Floor, Upper Level) Moderator: Cindy Gadd, Vanderbilt University

2:00 PM - 2:45 PM

Poster Session and Coffee Break - Attended - Day 2 Group: Posters (45 minutes Grouped by Topic) (Atrium, 1st Floor)

  • Topic 1 - Tools/Techniques
    Hugine/Virginia, Pivovarov/Columbia, Ikuta/Harvard, Liu/Colorado, Nettleton/OHSU, Bromley/UWashington, Fernald/Stanford, Jing/NLM
  • Topic 2 - Public Health
    Le/UWashington, Shen/JHU, Brewster/Utah, Quintiliani/Boston U, Ambert/OHSU
  • Topic 3 - Health Care
    Craven/Missouri, Mohammed-Rajput/Regenstrief
  • Topic 4 - Bioinformatics
    Bolen/Yale, Eskow/Colorado, Ochoa/U Houston, Williams/Virginia, Kayala/UC Irvine, Ortiz/Virginia, Timm/Wisconsin
2:45 PM - 4:00 PM

Plenary Paper Session #4 (1 hour 15 minutes, papers + discussion) (Main Auditorium) Moderator: George Phillips, University of Wisconsin-Madison

4:00 PM - 5:30 PM

NLM Showcase (Lister Hill Lobby, Building 38A)

Lobby Posters & Demos:

  • Lee Peters and Olivier Bodenreider: RxNav
  • Tom Rindflesch and Marcelo Fiszman: SemanticMedline
  • Lisa Forman and May Cheh: Genetics Home Reference
  • Marie Gallagher: Profiles in Science
  • Lan Aronson: MetaMap
  • Allen Browne and Chris Lu: Lexical Tools
  • Michael Ackerman: Video Medical Interpretation
  • Craig Locatis: Co-Location and Dispersion as Factors in Distance Education
  • Aurelie Neveol: Integrated Access to Disease Information Through PubMed
  • Rezarta Islamaj: Relationship Identification Model: How Do Medical Concepts Relate in Patient Records?
  • Jiao Li and Zhiyong Lu: Enriching and Integrating Drug Resources for Health Consumers
  • Jan Willis, Steve Emrick, and Patrick McLaughlin: UMLS Resources
  • Dianne Babski and Shana Potash: NLM Challenge
  • Donald Bliss and Amy Moran: Visualizing Cells and Viruses at Molecular Resolution: A Collaborative Project With The Laboratory for Cell Biology, NCI

Visitor Center:

  • Clem McDonald and Swapna Abhyankar: NLM Personal Health Record

7th Floor Conference Room:

  • (4:00) Dina Demner-Fushman: iMedline: Enhancing MEDLINE Citations With Relevant Figures and Images
  • (4:30) Rodney Long: Imaging Tools for Cancer Research
  • (5:00) Daniel Le: Machine Learning to Extract Bibliographic Data from Medical Articles
Evening
Dinner on your own

Thursday, June 30, 2011

TimeAgenda Item
7:30 AM - 8:15 AM
Breakfast
8:15 AM - 9:45 AM

175th Anniversary Informatics Careers Panel - 6 Former NLM Trainees (Main Auditorium) Moderator: Ted Shortliffe, President & CEO, American Medical Informatics Association

Panelists:

  • Joan Ash, OHSU
  • Atul Butte, Stanford University
  • Jim Cimino, NIH Clinical Center
  • John Glaser, Siemens Healthcare
  • Xinghua Lu, University of Pittsburgh
  • Hong Yu, University of Wisconsin – Milwaukee
9:45 AM - 10:00 AM
Coffee Break
10:00 AM - 10:45 AM

Open Mic Session #X3 - Bioinformatics & Translational Informatics (Main Auditorium) Moderator: Perry Miller, Yale University

  • Hughes - U Virginia: Historical Data Enhances Safety Supervision System Performance: In Silico Validation
  • Lu - NCBI: New Opportunities and Challenges in Literature Search
  • McDade - U Pittsburgh: Proteogenomics: Enhancement Through Careful Consideration of Biological Information
  • Nabavi - Harvard: CNV 'hot spots' and Breast Cancer Classification
  • Ovcharenko - NCBI: NCBI Dcode.org Toolkit for Comparative Genomics
11:00 AM - Noon
Closing Session and Speaker/Poster Awards
Dr. Valerie Florance

Presentation Abstracts

Using Informatics Tools to Study Obesity and Outcomes after Critical Illness

Swapna Abhyankar, Fiona M Callaghan, Dina Demner-Fushman, Clement J McDonald, National Library of Medicine

Abstract:

Obesity is associated with chronic diseases such as diabetes as well as higher healthcare costs. However, some studiesf have found that obesity may actually be protective during critical illness. There are no large studies on obesity and long-term outcomes after ICU hospitalization or comparing outcomes of obese patients with and without chronic conditions.

We are using MIT's MIMIC-II database containing de-identified data on over 19,000 adult ICU patients. We mapped lab and other variables to standard vocabularies, e.g. LOINC, in order to analyze comparable data, and we used natural language processing to retrieve unstructured data.

Initial survival analysis using the Cox proportional hazard model of over 12,000 ICU subjects shows that after adjusting for age, gender, initial SAPS score, first ICU service, mechanical ventilation, and diabetes, obesity and overweight are protective during and after critical illness. Compared to the normal weight group, overweight, obese, and underweight patients had 0.75, 0.76, and 1.47 times the hazard of death a year after their last hospitalization (p < 0.0001), with a similar pattern for in-hospital mortality. Age, SAPS score, and ICU service were also predictive. We expect our results to be even more robust after adding additional data and covariates.

top


Evidence-based Expert Routines in Medical Image Analysis from Gaze Recordings

Blake Anderson, Chi-Ren Shyu, Sanda Erdelez, Gerald Arthur, University of Missouri-Columbia

Abstract:

Medical experts frequently rely on scans from advanced imaging technologies. Interpreting a medical image is a complex, but semi-systematic procedure and an excellent target for identifying potential visual routines through image informatics. These visual routines derived from experts contain many clues about visual knowledge and its representation. By collecting eye tracking data through an inexpensive webcam-based gaze tracking method we are attempting to identify the routines of student and expert radiographers as they survey medical images. Through computational analysis of the results, we are providing insight into the behaviors and properties related to medical visual routines. We begin by identifying the signatures of visual routines as observed through eye tracking, and proceed to model these routines in a cognitive framework implemented with ACT-R. Discovering and reconstructing the visual processes associated with medical images will help us recognize and understand the tacit knowledge gained from extensive experience with medical imagery. These expert routines, once implemented in software, could potentially serve to reduce medical error, train new experts, and provide an understanding of the human visual system in diverse medical specialties including radiology, pathology and dermatology.

top


Improved Post-processing Strategies for Multi-dimensional Magnetic Resonance Spectroscopy (MRS)

Brian Burns, Albert Thomas, Alex Bui, University of California, Los Angeles

Abstract:

1H Magnetic resonance spectroscopy (MRS) is a powerful and non-invasive tool in measuring metabolic function and can detect up to 20 metabolites (0.5-10mM) that are commonly found in the human brain in vivo. Unfortunately, the clinical translation of MRS is difficult for two reasons: 1) the ability of current algorithms to sufficiently resolve different metabolite concentrations; and 2) the relatively long scan times required to obtain the imaging data. This research presents two methods to improve acquisition and signal resolution through the application of new computational techniques. Metabolites' resonance signals exist in a small spectral window (<500-1000Hz at 3T), and many exhibit complicated multiplet spectral structures caused by degenerate chemical shifts and J coupling. Due to these two major interactions, one-dimensional (1D) point resolved spectroscopy (PRESS) spectra are overcrowded and dominated by the singlets of N-acetylaspartate (NAA), creatine (Cre), and total choline (tCho). Two-dimensional (2D) J-resolved PRESS (JPRESS) and localized correlated spectroscopy (L-COSY) disentangle overcrowded spectra by spreading the spectral information into a second dimension. Metabolite concentrations from the 1D MR spectra are commonly estimated by peak height or area. Hence, these estimations are unreliable for singlets and impossible for overlapping multiplets. Therefore, to obtain objective estimates, prior knowledge-based spectral fitting methods, LC-Model for 1D and Profit for 2D MRS, were developed. Our recent research has shown that the ProFit algorithm, when applied to the L-COSY and JPRESS sequences, facilitates more accurate and consistent assessment of metabolite concentrations in vivo than the LC-Model algorithm when applied to the 1D PRESS spectra. Unfortunately, the lengthy scan times for 2D MRS are clinically unacceptable because of the repetitive spectral encoding strategy to sample the second dimension. Classic Fourier-based reconstruction techniques require linear sampling densities to cover the required spectral bandwidth. Yet spacing and sampling density need not be linear: non-Fourier based reconstruction techniques, such as maximum entropy reconstruction (MaxEnt), can instead be used to reconstruct 2D spectra sampled with arbitrary sampling densities. By incorporating a priori knowledge into the sampling density, scans times can be greatly reduced by sampling known singlet and multiplet resonances. This approach can greatly decrease 2D scan times and provide a means to directly estimate the T2* decay of individual metabolites and reconstruct higher resolution images.

top


Towards Personalized Therapy for Ependymoma, A Pediatric Brain Tumor

Matthew Burstein1, Chris Man1, Rudy Guerra2, Ching Lau1

  1. Texas Children's Hospital, Baylor College of Medicine
  2. Rice University

Abstract:

Overall survival of patients with Ependymomas (EPNs), the third most common pediatric intracranial tumor, is poor despite modern advances in neurosurgical and radiotherapy techniques. This is because the biology of EPN is not well understood and no targeted therapy exists. In an attempt to stratify patients and identify novel drug targets we profiled Copy Number Aberrations (CNAs) and mRNA expression of 118 EPN cases using array-based methods.

Unsupervised hierarchical clustering of CNAs revealed three distinct groups with different clinical outcomes. Membership in the cluster defined by whole arm gain of 1q multiplied the risk of disease progression or relapse by 3.94 (HR, P=2E-03) in a CoxPH model containing all available clinical data (P=6.43E-08). Further investigation into CNA-driven gene expression within these 1q gains revealed a parsimonious two gene signature that outperformed CNA-based stratification (HR=10.362, Coef P=4.4E-06, Model P=4.06E-08). This signature was validated in an independent set of 23 cases (HR=25.126, Coef P=0.0053, Model P=0.00239). Targeted therapy is available for these genes and is currently being tested on an orthotopic xenograft model of EPN.

top


Naïve EHR-Based Phenotype Identification for Rheumatoid Arthritis

Robert J Carroll, Joshua C Denny, Vanderbilt University

Abstract:

Electronic Health Records provide a broad patient pool for clinical and genomic research. These pools are limited by the rate of case identification and by the accuracy of that identification. Phenotype identification using informatics algorithms has been shown to be replicate known genetic associations found in clinical trials in association studies. Increasing the accuracy of these methods would lead to an increase in power for studies utilizing them. Decreasing the expert evaluation and curation required for model generation and feature creation could allow for broader application of phenotype identification methods with respect to both institutions and research topics. This study shows that a naïve bag of words approach in conjunction with a Support Vector Machine is able to indentify Rheumatoid Arthritis cases from an enriched pool of subjects with an AUC of 0.94. This is comparable to a previously published, retrained logistic regression model using pre-specified attributes.

top


Towards Early Recognition of Clinical Deterioration: Mining Nursing Documentation

Sarah A Collins, David Albers, David K Vawdrey, Columbia University

Abstracts:

Background: Nurses alter their monitoring behavior as the clinical condition of a patient deteriorates. Moreover, nurses detect subtle changes and record concerns before trends in physiological measurement are apparent. We hypothesized that the presence of different types of nursing documentation -specifically, optional free-text comments in electronic health record flowsheets-might be useful to predict deterioration and mortality.

Methods: Using data-mining methods, we analyzed electronic nursing documentation for cardiac arrest patients in the 48 hours prior to arrest, and for a set of randomly selected control patients in the 48 hours after admission. The frequency of vital sign measurements and the number of comments recorded were compared in the two groups to identify associations between documentation and survival.

Results: There were 201 cardiac arrest patients and 15,089 control patients. Increased comment documentation was associated with increased mortality for arrest patients (p<0.01). More frequent vital sign documentation was associated with increased survival of arrest (p<0.05). The distribution of comments for arrest and control populations differed with a higher proportion of arrest patients having more comments (Kolmogorov-Smirnov test, p < 0.0001).

Conclusion: Increased documentation frequency is associated with survival outcomes in hospital patients. Future work includes interventions to recognize and mitigate patient deterioration.

top


Development and Evaluation of Electronic Support Tools for Physician Handoff of Care

Justin M DeVoge, Kim E Brantley, Thomas E Perez, Matthew Bolton, Leigh Baumgart, Ellen J Bass, University of Virginia

Abstract:

Handover is a mechanism for transferring patient information, responsibility, and authority from one set of hospital caregivers to another. Electronic health records generally do not support such continuity of care activities. Thus we designed and implemented a custom resident handoff of care tool. Utilizing Keystroke-Level Models and surveys, we compared our custom handoff of care tool with that used prior to its introduction. Survey data compared resident preferences for information elements referenced when engaging in handover, when providing patient care, and those reported as rarely referenced. Key-stroke Level Models were developed for the tasks of adding a patient, editing an existing patient, discharging a patient, readmitting a previously discharged patient, and printing a patient list. For seven of the eight survey questions that directly compared the tools, the new custom system was preferred to the prior system. Model data predicted shorter task completion times with the custom tool for four of the five tasks. These results indicate that efficiencies in computer supported workflow can be realized in resident handover systems.

top


Computer Assisted Update of a Consumer Health Vocabulary By Mining Social Network Data

Kristina M Doing-Harris, Qing Zeng-Treitler, Department of Biomedical Informatics, University of Utah

Abstract:

Background: Consumer health vocabularies (CHV) aid consumer health informatics applications. To continue to do so they must evolve with consumers' language.

Objective: Create a computer assisted update (CAU) system operating on live corpora to identify new terms for the Open Access Collaborative (OAC) CHV.

Methods: The CAU system has three main parts: Web crawler; candidate term filter utilizing natural language processing tools including term recognition methods; and human review interface. In evaluation, the CAU system was applied to the health-related social network website PatientsLikeMe.com. Utility was assessed by comparing the candidate term list generated to a list of valid terms hand extracted from the crawled web pages.

Results: The CAU system identified 88,994 unique terms in 300 crawled PatientsLikeMe.com web pages. The manual review identified 651 valid terms omitted from the OAC CHV or the Unified Medical Language System (UMLS) Metathesaurus (i.e. , one valid term per 136.7 candidates). The term filter selected 774 candidates, of which, 237 were valid terms, i.e. one valid term among every three or four candidates reviewed.

Conclusion: The CAU system is effective for generating a list of candidate terms for human review during CHV development. The resulting system is linked off the consumerhealthvocab.org home page.

top


Evolutionary Meta-Analysis Reveals Ancient Constraints Affecting Missing Heritability and Reproducibility in Disease Association Studies

Joel T Dudley, Rong Chen, Sudhir Kumar, Atul J Butte, Stanford University

Abstract:

Genome-wide disease association studies contrast genetic variation between disease cohorts and healthy populations to discover single nucleotide polymorphisms (SNPs) and other genetic markers revealing underlying genetic architectures of human diseases. Despite many large efforts over the past decade, these studies are yet to identify many reproducible genetic variants that explain significant proportions of the heritable risk of common human diseases. Here, we report results from a multi-species comparative genomic meta-analysis of 6,720 risk variants for more than 420 disease phenotypes reported in 1,814 studies, which is aimed at investigating the role of evolutionary histories of genomic positions on the discovery, reproducibility, and missing heritability of disease associated SNPs (dSNPs) identified in association studies. We show that dSNPs are disproportionately discovered at conserved genomic loci in both coding and non-coding regions, as the effect size (odds ratio) of dSNPs relates strongly to the evolutionary conservation of their genomic positions. Our findings indicate that association studies are biased towards discovering rare variants, because strongly conserved positions only permit minor alleles with lowest frequencies. Using published data from a large case-control study, we demonstrate that the use of a straightforward multi-species evolutionary prior improves the power of association statistics to discover SNPs with reproducible genetic disease associations. Therefore, long-term evolutionary histories of genomic positions are poised to play a key role in reassessing data from existing disease association studies and in the design and analysis of future studies aimed at revealing the genetic basis of common human diseases.

top


Graphical Models for Integrating Multiple Sources of Genome-Scale Data

Daniel Dvorkin, University of Colorado, Brian Biehs, University of California, San Francisco, Katerina J Kechris, University of Colorado

Abstract:

Genome-wide information sources such as transcription factor binding data, gene expression data, and sequence-derived measures are used to identify binding regions and genes which are important to biological processes such as development or disease. These heterogeneous data can be difficult to use together, or integrate, due to the different biological meanings and statistical distributions of the various data types. However, each data type can provide valuable information for understanding the processes under study.

We present here a graphical mixture model approach for data integration. Model fitting is computationally efficient and produces results which have clear biological and statistical interpretations. The Hedgehog signaling pathway in Drosophila, which is critical in embryonic development, is used as an example. We show that genes in the pathway can be better identified with data integration than with any single data source, and compare results when using different data sources and model topologies.

top


Designing Effective Clinical Trials Using Simulations

Vincent A Fusaro, Prasad Patil, Chih-Lin Chi, Charles F Contant, Peter J Tonellato, Harvard Medical School

Abstract:

Randomized clinical trials are unsustainable in the era of personalized medicine due to the exponential number of combinations related to dosing, PK-PD response, demographic, phenotypic, and genomic information necessary for evaluating personalized treatment options. In order to realize the potential of personalized drug treatment new innovative high-impact computational strategies are needed to simulate outcomes of potential clinical trial designs in order to suggest those likely to succeed and avoid those likely to fail. Here, we report the creation and validation of a clinical trial simulation framework to model warfarin dosing and INR response to guide clinical trial design. We validated the framework by reproducing the results from the COUMA-GEN clinical trial and demonstrate that our simulation achieved the same primary endpoint - no significant difference between the two study arms. Furthermore, because our framework is modular, we also examined another dose adjustment protocol and can show that protocol exhibits a statistical difference between the two arms. We envision this framework will guide clinical trial planning and design by examining the likely outcomes of multiple algorithms and protocols prior to initiating the actual clinical trial.

top


Critical Finding Capture in the Impression Section of Radiology Reports

Esteban F Gershanik, Ronilda Lacson, Ramin Khorasani, Brigham and Women's Hospital, Harvard Medical School, Boston, MA

Abstract:

Introduction: The radiology report communicates imaging findings to the ordering physician. Due to the substantial amount of information included in the report, physicians often focus on the summarized "impression" section. This study evaluated documentation of a critical radiology finding (i.e. presence of a pulmonary nodule) in the "impression" section of a report, and describes how an automated application can improve documentation.

Methods: A retrospective review of all reports of chest CT scans performed at an academic institution from October 2009 to September 2010 was performed. A validated natural language processing (NLP) application was utilized to evaluate the frequency of reporting the presence of a nodule in the "impression" section of the report, in comparison to the "findings" section.

Results: Results showed that 3,401 reports documented pulmonary nodules in the "findings" section. Among them, 2,162 were also documented in the "impression" section. 36.4% of nodules were not documented in the impression, but were detected by the NLP application.

Conclusions: The study revealed discrepant documentation in the "findings" versus "impression" sections, with over a third of "impressions" missing a critical finding. Utilizing an automated system can improve documentation of critical findings and promote more effective communication between radiologists and ordering physicians.

top


Reducing Variability in Co-Expression Network Construction

David L Gibbs, Armand Bankhead III, Oregon Health & Science University

Abstract:

Systems biology approaches, such as co-expression network analysis, have been shown to be useful in the interpretation of high throughput data. Co-expression analysis utilizes a similarity metric, such as correlation, to identify emergent modules of highly connected genes.

Unfortunately, computational limitations require that a subset of genes be used to construct the network. Furthermore, the resulting modules are highly sensitive to small changes in data. In this work, we present a feature selection method to overcome correlation network instability. A fitness function, taking subsets of genes as input, was minimized using the genetic algorithm "rgenoud". By constructing a set of validation networks using the leave-one-out methodology, network stability is quantified. The identified subset of genes significantly improved module stability compared to networks conventionally constructed with the most variable genes. Using a leukemia data set, a highly stable module was discovered consisting of 278 genes. The module was present in all validation networks, but was absent in networks constructed from the most variable genes. The module was useful for machine learning classification of leukemia subtype with an average accuracy of 96.68%. Using the DAVID database, the module was found to be enriched with cancer-associated pathways including Wnt signaling and apoptosis.

top


Automated Annotation of Electronic Health Records Using Computer-Adaptive Learning Tools

Glenn T Gobbel, Ruth Reeves, Theodore Speroff, Steven H Brown, Michael E Matheny, VA Tennessee Valley Healthcare System, and Vanderbilt University, Nashville, TN

Abstract:

Clinical records contain critical information for identifying risk factors and symptoms. Unfortunately, collecting this information requires time-consuming manual review and annotation. This study developed a naïve Bayes machine learning system and evaluated the feasibility of using it to automatically annotate medical documents.

After randomly dividing 2,860 medical discharge summaries into training and test sets, natural language processing (NLP) generated a reference standard by mapping document phrases to SNOMED-CT concepts. We then used the reference standard to iteratively train our naïve Bayes machine-learning program. The resulting system was evaluated for recall and precision on the test set containing 16,007 unique phrases mapping to 12,055 distinct SNOMED-CT concepts. Recall and precision was 56% and 88% after 10 training documents, 88% and 91% after 100 documents, and 99% and 96% after 1,430 documents. Processing speed was ~8000 phrases per second.

The measured accuracy, speed, and efficiency demonstrate the feasibility of the approach. Training on approximately 100 reference documents was needed to generate a highly reliable system. This approach could reduce manual review costs for collecting free text data, providing an efficient way to identify and test candidate risk factors. Such a system might also provide critical feedback during patient encounters.

top


Characterization of Macrophage Signatures in Induced Sputum of Severe Asthmatics

Jose L Gomez-Villalobos, Mario F Perez, Arron Mitchell, Geoffrey L Chupp, Yale University

Abstract:

Genome-wide study of the airway transcriptome in asthma has been a major challenge given the need for invasive sampling of the bronchial tree with bronchoscopy. Induced sputum, a safer, non-invasive strategy, has the potential to be used in large scale studies and provide further insights into the molecular signatures of asthma. Using a combination of publicly accessible data available in the Gene Expression Omnibus from samples obtained through bronchoscopy and expression microarray data from induced sputum samples obtained at the Yale Center for Asthma and Airways Disease, we enriched whole sputum microarray data for macrophage signatures in severe asthma. Our analysis reveals the presence of groups of genes involved in apoptosis, leukotriene pathways and cytokine signaling pathways not previously identified in severe asthma, it also provides preliminary evidence of an association with DENND1B, a gene recently implicated in pediatric asthma, which may play a role in the severity of airflow obstruction in adults with asthma. Further validation of these findings is being performed by a combination of RT-PCR and protein quantification.

top


Computational Prediction and Experimental Verification of MAP Kinase Docking Sites

Elizabeth A Gordon, 1,2,3 Vishal R Patel2,3, Thomas C Whisenant1,2,3 Robyn M Kaake 1, Lan Huang 1 Pierre Baldi 1,2,3and Lee Bardwell1,2,3

  1. Department of Developmental and Cell Biology
  2. Institute for Genomics and Bioinformatics
  3. Center for Complex Biological Systems, University of California, Irvine

Abstract:

To understand signaling networks, new methods are needed to identify novel kinase substrates. Spatial regulation of MAP kinase signaling occurs at multiple levels: in addition to subcellular compartmentalization, MAP kinases make extensive use of docking and scaffolding interactions to bind their regulators and substrates. We have developed a hybrid computational search algorithm that combines machine learning and expert knowledge to identify novel MAP kinase docking sites (D-sites), and used this algorithm to search the human genome. Predictions were tested by peptide array followed by rigorous biochemical verification with in vitro binding and kinase assays. We identified several new D-site-dependent MAPK substrates, including the hedgehog-regulated transcription factors Gli1 and Gli3, suggesting there may be a direct connection between MAP kinase and hedgehog signaling. This finding has potential translational relevance to pancreatic cancer, gastric cancer, melanoma, and several other tumor types. Another novel substrate we discovered is a relatively poorly characterized member of the Smoothelin family, SMTNL2. In the case of SMTNL2 we identified phosphorylation on residues in close proximity to the docking site and showed that they were MAPK dependent in cell culture. In humans, SMTNL2 expression correlates with aerobic capacity, and is downregulated in Duchennes muscular dystrophy (DMD). These and other new substrates are being further characterized in vivo using cell-based assays and fluorescent imaging methods.

top


Using Temporal Mining to Examine the Development of Lymphedema

Jason M Green, Sowjanya Paladugu, Bob R Stewart, Jane M Armer, Chi-Ren Shyu

Abstract:

In breast cancer survivors, one of the most common and debilitating treatment side effects is lymphedema (LE), a chronic disease that manifests itself primarily as excessive arm swelling. The onset of LE has been reported over a wide temporal range after treatment, with documented diagnoses occurring soon after surgery to decades later. Early detection and intervention of LE have been shown to be critically important, as the condition becomes chronic and irreversible in its more severe stages. One interest by LE researchers is whether there are temporal patterns of limb volume changes that are common in the development of LE, as these may have significant clinical value.

An experiment was designed to investigate this, in which data from a group of 232 women with breast cancer were analyzed. Limb volume measurements were collected preoperatively, postoperatively, and at regular intervals over a 30-month observation period. A temporal mining algorithm was utilized to elucidate common patterns in limb volume changes, and results show reinforcement of existing themes in the literature as well as new findings. The patterns could be used to construct an early detection protocol for identifying those at risk of developing LE so that early intervention can be initiated.

top


Data Source Characteristics, Data Quality and Information Needs Related to Immunization: A Qualitative Study

Rebecca A Hills, Debra Revere, Blaine Reeder, William B Lober, University of Washington

Abstract:

Immunization Information Systems (IISs) collect, store and generate reports on immunization data. IISs and the data transferred to and from them represent one of the most common examples of regular clinical data transfer between health care providers and public health. However, in Washington State, IISs are rarely used by epidemiologists in public health practice.

The goal of this study is to characterize this rich data source within the framework of the information needs of public health practitioners. We conducted 15 interviews with 19 individuals working in a variety of roles in public health. All interviewees worked at either state health department or a local health department in Washington and were identified by the researchers or their supervisor as having some interest in immunization information as a part of their regular work. Interviews focused on the work and information needs of these public health practitioners pertaining to immunizations. Interviews were transcribed and analyzed qualitatively using thematic coding techniques. Analysis revealed significant information needs and data quality characteristics including: accessibility, bias, data format, data quality, granularity and metadata availability. These themes were used to develop a hierarchical model of information quality.

This work brings us closer to a grounded understanding of the information needs of public health practitioners related to immunization. These results have the potential to inform decisions about systems to support information exchange between health care providers and public health.

top


Learning Disease Patterns from Medical Images: Applications to Alzheimer's Disease Research

Chris Hinrichs, Vikas Singh, Sterling Johnson, University of Wisconsin-Madison

Abstract:

Alzheimer's Disease (AD) affects over 5 million people in the United States, and is the most common form of age-related dementia. Because the brain is so robust to neuronal damage, neuropathology can precede outward signs of cognitive impairment by as much as several decades; yet once brain matter is lost, it is irrecoverable. Therefore, treatment must also precede clinical dementia. Non-invasive indicators such as MRI and PET scans, or Cerebro-Spinal Fluid (CSF) measures can provide early clues, but the sheer volume of data, surprising levels of subject heterogeneity, and subtlety of the disease patterns (relative to normal aging) mean that traditional statistical group analyses have difficulties predicting at an individual level which patients are most likely to develop AD. In particular, our research focuses on combining multiple heterogeneous data modalities into a single integrated machine learning framework without compromising model complexity or generalizability.

Recent results have applied kernel methods such as Multi-Kernel Learning (MKL) to the problem, showing gains over single-modality methods. In addition, we have extended the MKL model to allow both prior-based and empirical modulation of interactions between kernels / modalities by means of non-isotropic norm regularization of kernel combination weights. Ongoing work includes applications to clinical trial enrichment and further development of multiple kernel methods.

top


Proteomics-Scale Mathematical Modeling of Translation During Yeast Osmotic Stress

Shane L Hubler, M Violet Lee, Scott Topper, Audrey Gasch, Joshua J Coon, University of Wisconsin-Madison

Abstract:

Yeast is well suited to a systems biological approach to studying how living cells adapt to environmental stresses. To model how cells respond to non-fatal stress, we performed salt-stress experiments on yeast. The experiment was performed as a biological triplicate on a 6-point time-series (0, 30, 60, 90, 120, and 240 minutes). We used DNA microarrays and combined high mass accuracy mass spectrometry with isobaric tagging on the entire transcriptome/proteome to quantify the yeast's transcriptional and translational responses, respectively. Here we report four increasingly complex mathematical models we developed to describe protein translation, applied to these yeast experiments. The modeling suggests that a complex coordination of transcription, translation and cell division enabled yeast to adapt to the environmental stress in a two-step process. In particular, we found that down-regulation of transcripts was not done to reduce relevant protein levels, as might be expected, but instead to increase the translational resources available to the up-regulated, stress-response transcripts. Indeed, down-regulated transcripts did not correlate with protein levels. We also show that the transcript is a poor substitute for protein abundance unless one takes into account changing translation rates (one during cell-cycle arrest and another during post stress response) and cell growth rates.

top


A System to Translate Physician Actions into Decision Support

Jeffrey Klann, Stephen Downs, Peter Szolovits, JT Finnell, Matthew Palakal, Gunther Schadow, Regenstrief Institute, Inc.

Abstract:

Nationwide, physician computer decision-support systems, which have been frequently shown to reduce costs and improve care quality, are underutilized and poorly maintained. However, this research shows that the collective wisdom of physicians can supplement this manually-maintained decision support. Collective wisdom is captured but untapped in electronic health records and order entry systems. I have developed new methods to rapidly find and learn domain-specific Bayesian networks from order-entry data and test results. Once learned, these networks can be used to dynamically make treatment suggestions based on the current context of the patient.

This work first explores why aggregated treatment data is in fact frequently wisdom, using principles from voting theory. Then, using order-entry data from the county hospital, it demonstrates an overall system to compute treatment suggestions using these Bayesian network methods, and it shows the system is able to predict what physicians actually do with high accuracy in several domains.

top


The Feasibility of Automating Audit and Feedback for ART Guideline Adherence in Malawi

Zach Landis Lewis, Claudia Mello-Thoms, Oliver J Gadabu, Miranda Gillespie, Gerald P Douglas, Rebecca S Crowley, University of Pittsburgh

Abstract:

Objective: To determine the feasibility of using electronic medical record (EMR) data to provide audit and feedback of Anti-Retroviral Therapy (ART) clinical guideline adherence to health care workers (HCWs) in Malawi.

Materials and Methods: We evaluated recommendations from Malawi's ART guideline using GuideLine Implementability Appraisal (GLIA) criteria. Recommendations that passed selected criteria were converted into ratio-based performance measures. We queried representative EMR data to determine the feasibility of generating feedback for each performance measure, summed clinical encounters representing each performance measure's denominator, and then measured the distribution of encounter frequency for individual HCWs across nurse and clinical officer (CO) groups.

Results: We analyzed 423,831 encounters in the EMR data and generated automated feedback for 21 recommendations (12%) from Malawi's ART guidelines. We identified 11 nurse recommendations and 8 CO recommendations. Individual nurses and COs had an average of 45 and 59 encounters per month, per recommendation, respectively. Another 37 recommendations (21%) would support audit and feedback if additional routine EMR data are captured and temporal constraints are modeled.

Discussion: It appears feasible to implement automated guideline adherence feedback for key, high-impact recommendations. Feedback reports may support workplace learning by increasing HCWs' opportunities to reflect on their performance.

Conclusion: A moderate number of recommendations from Malawi's ART guidelines can be used to generate automated guideline adherence feedback using existing EMR data. Further study is needed to determine the receptivity of HCWs to peer comparison feedback and barriers to the implementation of automated audit and feedback in low-resource settings.

top


A Partitioning Based Adaptive Method for Robustly Removing Irrelevant Features

Guodong Liu, Lan Kong, Vanathi Gopalakrishnan, University of Pittsburgh

Abstract:

Most high-dimensional biomedical datasets contain many features irrelevant to the target. Irrelevant features often lead to an intractably large model space as well as introducing noises that severely hinder the efforts for robust prediction models. While feature selection methods can be used for irrelevant feature removal, they are prone to losing relevant features. We proposed a novel method in Partitioning based Adaptive Irrelevant Feature Eliminator (PAIFE), for irrelevant feature removal. PAIFE evaluates feature-target relationships over not only a whole dataset, but also the partitioned subsets and is extremely effective in identifying features whose relevancies to the target are conditioned on other features. PAIFE adaptively employs the most appropriate evaluation strategy, statistical test and parameter instantiation, depending on the characteristics of different datasets/subsets. We envision PAIFE to be used as a data pre-processing tool for dimensionality reduction over high-dimensional datasets. Experiments on synthetic datasets showed that PAIFE consistently outperformed state-of-the-art feature selection methods in removing irrelevant features while retaining features relevant to the target. Experiments on eight genomic datasets also demonstrated that PAIFE was able to remove significant numbers of irrelevant features in real datasets. Classification models constructed from the retained features either matched or beat classification models using all features.

top


A Qualitative Study of Multiple Perspectives on Electronic Patient Data Used for Clinical Decision Support in Hospitals and Clinics

James L McCormack, Joan Ash, Oregon Health & Science University

Abstract:

Clinical Decision Support (CDS) systems, accessed through Electronic Health Records and Computerized Provider Order Entry, have great potential to reduce medical errors, increase the quality and consistency of care, and help reduce healthcare costs. This study is a secondary analysis of qualitative field data collected over four years by researchers based at Oregon Health & Science University. The Provider Order Entry Team (POET) used participant-observation, semi-structured interviews, and extensive field notes to explore the perceptions of CDS by purposively selected informants in two academic and five community hospitals, two outpatient clinics, and one Veteran's Administration hospital. The author conducted a grounded theory analysis of a subset of these data to further explore one major theme from the earlier analysis: "CDS requires a foundation of high quality and accessible patient data. " Preliminary results identified several new sub-themes, including the ways that ownership, burden of entry, ease of access, interoperability, and the representation of clinical data affect the perceptions of CDS by clinical users and other stakeholders. Ethnographic methods are a useful approach for informatics research and evaluation. This analysis extends the work of POET to develop a deeper understanding of the importance of clinical data in user's acceptance and use of CDS.

top


Temporal Modulation of Olfactory Signals From Dendrodendritic Synaptic Clusters

Thomas S McTavish, Michele Migliore, Michael Hines, Gordon Shepherd, Perry Miller, Yale University

Abstract:

On their lengthy lateral dendrites, mitral cells of the olfactory bulb form dendrodendritic synapses with clusters of granule cell interneurons. With biologically realistic computational models of the mitral-granule circuit, we explored the functions of these clusters and spatial arrangements of the cells on the propagation of signals through the olfactory bulb. We found that clusters are most effective at modulating signals when they are proximal to the somas of mitral cells. We also found that synchrony, which may be important for downstream processing, is enhanced between mitral cells when they share clusters. We also found that asymmetric clustering results in relative shifts in phase between mitral cell spike trains. Therefore, our model predicts particular connectivity between mitral pairs if their spikes exhibit phase patterning. In addition to exploring the role of clusters, we also quantified the functional effects of synaptic weights and virtual odor concentrations. In these cases, stronger synaptic weights increase synchrony and relative odor concentrations can also induce a shift in the phase response between mitral cell spike trains. Collectively, our results indicate that the spatial locations of dendrodendritic synaptic clusters of the mitral-granule operate to modulate the timing of mitral cell spikes.

top


Behavioral Medicine Perspectives on a Clinical Decision Support System for Chronic Pain

Amanda M Midboe, Eleanor T Lewis, Ruth C Cronkite, Mary K Goldstein, Jodie A Trafton, VA Palo Alto Healthcare System, and Stanford University School of Medicine Department of Psychiatry and Behavioral Sciences

Abstract:

Development of clinical decision support systems (CDSSs) has tended to focus largely on use in medical settings, facilitating decision-making by physicians to enhance medication management and support diagnosis. However, an understanding of behavioral medicine perspectives on the usefulness of CDSS for patient care can expand CDSSs to improve management of chronic disease. The current study explores feedback from behavioral medicine providers regarding the potential for CDSSs to improve decision-making, care coordination, and guideline adherence in pain management.

We performed qualitative analysis of semi-structured interview responses from 16 behavioral medicine stakeholders (9 psychologists, 3 pharmacists, 2 primary care physicians, and 2 nursing leaders), following demonstration of an existing CDSS for opioid prescribing, ATHENA-OT. Respondents suggested that a CDSS could assist with decision making by educating providers, providing recommendations about behavioral therapy, facilitating risk assessment, and improving referral decisions. They suggested that a CDSS could improve care coordination by facilitating division of workload, improving patient education, and increasing consideration and knowledge of options in other disciplines. Clinical decision support systems are promising tools for improving behavioral medicine care for chronic pain.

top


Semantic Annotation and Analysis of Neuroimaging Data through Web Services

Nolan Nichols, Daniel Rubin, Jim Brinkley, University of Washington

Abstract:

Brain imaging datasets are increasingly being shared publicly, and researchers need to mine these large-scale datasets to better understand how brain function and structure relate to different cognitive tasks across participant populations. For example, a dataset may contain the derived statistical images produced by the analysis of a functional magnetic resonance imaging (fMRI) study examining differences between schizophrenic patients and normal controls. Rather than requiring researchers to download these statistical images to explore locally, we developed the Image Annotation Service (IAS) to expose voxel-level information for semantic query through web services. The IAS processes statistical images registered in a standard spatial coordinate system where each voxel corresponds to an anatomical entity that is labeled in a brain atlas. In previous work these labels were manually curated into the Foundational Model of Anatomy (FMA) ontology, providing the means to automatically generate semantic annotations of statistical fMRI data in the Annotation Image Markup format. In response to a web service query, the IAS generates an AIM file in XML and populates it with statistical values, coordinates, and FMA identifiers for each region defined by the query. Future work will extend the IAS to enable large-scale semantic query and analysis of neuroimaging data.

top


Environment-Wide Association Studies (EWAS) on Serum Lipid Levels

Chirag J Patel, Mark R Cullen, John PA Ioannidis, Atul J Butte, Stanford University

Abstract:

Both genetic and environmental factors contribute to complex disease-associated phenotypes, such as lipid levels. While genome-wide association studies (GWAS) are currently testing the genetic factors against disease systematically, testing and reporting one or a few factors at a time leads to fragmented literature for environmental correlates. We systematically evaluated environmental correlates on risk factors for heart disease (HDL-cholesterol, LDL-cholesterol, and triglycerides) with an environment-wide association study (EWAS). We utilized four independent cohorts from the CDC/NHANES survey, comprised of a significant sample representative of the US population (N=500-7000). We associated 322 unique environmental factors directly measured in blood and urine to lipid levels while adjusting for other risk factors. We controlled for multiple comparisons by estimating the false discovery rate, and finally, significant findings were validated across independent cohorts. In sum, we identified validated up to 22 environmental factors associated with lipid levels. For example, we found markers of air pollution, nicotine, pesticides, and other industrial contaminants were associated with unfavorable lipid levels. On the other hand, nutrients and vitamin markers, such as enterolactone and vitamin D, were associated with favorable lipid levels. EWAS offers a way for comprehensively and systematically screening, validating, and reporting associations of environmental factors in large scale.

top


Characterization of Disease Time Course using ICD9 Codes

Adler Perotte, George Hripcsak, Columbia University

Abstract:

Background: This work is part of an effort to create data-driven and computable representations for the time course of a large number of conditions automatically. In this study, we exploited ICD9 code documentation because it is ubiquitous and generalizable to other sites, although we recognize the limitation that it is used for reimbursement.

Methods: We studied chronicity in two models. In the first model, we evaluated the differential entropy of documentation patterns across the population for each condition. This was done with the hypothesis that there would be more uncertainty in the documentation patterns of chronic conditions. In the second model, we aggregated ICD9 code time series across the population for each condition and inspected density estimates for these aggregated time series.

Results: In the first model, we used differential entropy as a test for chronic conditions, and it had an ROC AUC of 0.89 (CI 0.81-0.94) as compared to a clinician reviewer. It was observed that chronic conditions have higher entropy than their acute counterparts. In the second model, several characteristic temporal profiles were revealed, including permanent, chronic, acute, and refractory.

Conclusion: In conclusion, we have two methods with different granularity demonstrating the time course of conditions.

top


Implementing a Generalized Pipeline to Analyze Open Source Microarray Data

Sharanya Raghunath, Lewis J Frey, Department of Biomedical Informatics, University of Utah

Abstract:

In recent immunotherapy clinical trials, the use of dendritic cell (DC) based vaccines have shown promising results. Most immunology researchers use bench work techniques to characterize the effectiveness of an immune response. Microarray experiments are curial in understanding the immune system at a molecular level; however, the lack of integrated models limits researchers from fully utilizing the data. The goal of our computational approach is to transform microarray data sets into representations that have applicability at the wet bench. To understand the mechanism of an immune response, we consolidated 125 microarray data sets from Gene Expression Omnibus. We will analyze the data by using a combination of tools to perform cross validation, classification, and pathway analysis. Finally, the R statistics package will be used to generate graphical representations. Due to the complexity of each step, our lab developed a software package that is flexible and generalizable to various studies. ML-Flex is a unified platform that can implement algorithms from different software packages. By generating a pipeline to pre-process microarray data, we can carry out extensive machine learning analysis in ML-Flex. Our methods will offer a generalized approach for analyzing microarray data in order to develop experiment models for bench researchers.

top


Design and Implementation of Asynchronous Messaging in a Medical Record System

Zeshan A Rajput, Dieterich Lawson, Jeremy Keiper, Paul Biondich, Indiana University and Regenstrief Institute, Indianapolis

Abstract:

In many parts of the developing world, cellular networks provide the only reliable communications infrastructure. To support initiatives targeting both community-based health workers as well as patients, we created a system to support asynchronous communication within an open source medical record system. The system was designed using currently existing communications libraries and supports e-mail, SMS, and Twitter. SMS functionality is provided either by a tethered modem or through SMPP to a telecommunications provider.

One implementation of the system provides two-way communications between users of the medical record system and patients. A second implementation uses automated routines to monitor the quality of data within satellite medical record system installations and delivers in-time notifications to system administrators that are centrally located. Both implementations are currently in progress at the USAID-AMPATH Partnership in Eldoret, Kenya. Future work will focus on registering patients within the medical record system from remote locations, on providing alerts to patients and community health workers based on the occurrence of predetermined notifiable conditions within the medical record system, and on facilitating limited information retrieval from the medical record system at remote locations.

top


Visual Science Informatics of CLABSI for Decision Making - Evaluation Study

Yair G Rajwan, Pamela W Barclay, Theressa Lee, I-Fong Sun, Catherine Passaretti

Johns Hopkins University, Maryland Health Care Commission, Johns Hopkins Bayview Hospital

Abstract:

The purpose of this study was to evaluate information visualizations of central line-associated blood stream infection (CLABSI) outcome data to health care consumers and practitioners for decision making. In this presentation, we would display several display options for public visualization of CLABSI data and describe the two focus groups' observations. Survey data results, collected from the two focus groups, would be presented and were used to develop the final recommendations for how to visualize publicly reported Maryland acute care hospitals' CLABSI data. Finally, recommendations and a conclusion on presenting CLABSI outcome data based on our evaluation study were summarized.

top


Zernike Phase Contrast Cryo-EM Reveals the Portal Complex in Herpes Virus

Ryan H Rochat, Xiangan Liu, Kuniaki Nagayama, Frazer Rixon, Wah Chiu, Baylor College of Medicine

Abstract:

Herpes simplex viruses cause some of the oldest documented human diseases, with descriptions of the infections dating back more than 5000 years. While de novoatomic-resolution structural studies of icosahedrally arranged viral proteins, like the herpes simplex virus type 1 (HSV-1) capsid, have become routine in cryo-electron microscopy (cryo-EM), resolving the non-icosahedral components of these viruses (e.g. the genome packaging apparatus) is non-trivial. The low-contrast images, characteristic of the low-dose imaging conditions in conventional cryo-EM, make identifying and aligning by these non-icosahedrally arranged componentsa technical obstacle.

However, using a newly developed Zernike Phase Contrast electron cryo-microscope in Japan and a suite of novelalgorithms, recently developed in our laboratory, we have been able to resolve the structure of the HSV-1 portal in the context of the capsid shell for the first time. This work directly removes uncertainty over the location and stoichiometry of the herpes virus portal. As HSV-1 is by far the most prevalent form of the herpes in humans, with global estimates for latent infection in adults reaching nearly 90%, a detailed understanding of the proteins involved in dsDNA loading and release upon infection may lead to potential targets for future therapies directed against HSV-1 infection.

top


Transcriptional Regulation of Mammary Gland Development as a Model for Breast Cancer

Michael L Salmans, Padhraic Smyth, Bogi Andersen, University of California, Irvine

Abstract:

Mammary gland branching morphogenesis is driven by terminal end buds (TEBs), stem cell-enriched spherical structures at the ends of the growing ducts whose proliferative and invasive nature makes them an excellent model for oncomechanisms. To study the role of transcriptional regulation during mammary gland development we generated a mouse model expressing a dominant negative Co-factor of LIM (CLIM) protein, a transcriptional co-regulator required for branching morphogenesis. We performed a timecourse microarray analysis to characterize gene expression profiles from TEB and duct cells during four developmental stages of puberty. Through this analysis we have gained insights into (a) the transcriptional networks involved in normal mammary gland development; (b) the gene expression profiles that characterize TEB and duct cells; (c) the correlation between breast cancer gene signatures and TEB and duct gene signatures. We found a high correlation of the TEB gene signature with aggressive breast cancers, suggesting that while the TEB proliferates and invades in a controlled manner, it has cancer-like properties. We also identified the genes regulated by CLIM that are required for branching morphogenesis. Interestingly, CLIM is a direct transcriptional regulator of Her2, Her3, and Fgfr2, which are essential signaling proteins in mammary gland development and carcinogenesis.

top


Wireless Data Collection of Self-Administered Surveys using Tablet Computers

Kyle W Singleton, Mars Lan, Corey Arnold, Lisa Arangua, Alex AT Bui, Lillian Gelberg, University of California, Los Angeles

Abstract:

The accurate and expeditious collection of survey data by coordinators in the field is critical in the support of clinical research studies. Early methods using paper documentation have slowly evolved into electronic capture systems. Indeed, tools such as REDCap and others illustrate this transition. However, many current systems are tailored web-browsers running on desktop or laptop computers, requiring keyboard and mouse input. We present a system that utilizes a touchscreen interface running on a tablet PC with consideration for portability, limited screen space, wireless connectivity, and potentially inexperienced and poorly literate users. The system consists of an online survey designer; survey client; and administrative tools to manage subject enrollment and follow-up with survey participants. The framework was developed using C#, ASP.net, and SQL Server by multiple programmers over the course of the last year. Working with UCLA Family Medicine, the system is currently deployed for the collection of data in a group of Los Angeles area clinics for a study on drug addiction and treatment. Results are presented on the overall architecture and design considerations and initial feedback from users in the field.

top


Natural Language Processing of Patient Text Messages in a Medication Management System

Shane P Stenner, Joshua C Denny, Kevin B Johnson, Vanderbilt University

Abstract:

Mobile technologies provide a platform for electronic patient-directed medication management and an opportunity to implement guideline-based support for patients. MyMediHealth (MMH) is a medication management system that includes a medication scheduler, a medication administration record, and a reminder engine that sends short message service (SMS) text messages to cell phones. Although this system is well designed for scheduled medications, it requires unprompted text-based communication to record the administration of certain medications, such as those that are taken on an as-needed basis. Unprompted text-based communication with patients using natural language could engage patients in their healthcare but presents unique natural language processing (NLP) challenges.

We developed a new functional component of MMH, the Patient SMS Text Tagger (PaSTe). The PaSTe webservice uses NLP techniques, custom lexicons, and existing knowledge-sources, such as the National Library of Medicine's RxNAV, to extract and tag medication concepts from patient text messages. An early prototype of the system was assessed using sample messages from test users. Output of the system was compared with manually tagged messages. We report on the precision and recall of PaSTe for extracting and tagging medication concepts from patient messages. Future improvements, evaluation, and expected use of PaSTe are discussed.

top


Graph Theory for Automatic Summarization of Biomedical Text

Han Zhang, Marcelo Fiszman, Dongwook Shin, Thomas C Rindflesch, National Library of Medicine

Abstract:

Semantic MEDLINE manages the results of PubMed searches by summarizing predications extracted from citations with SemRep. Results are presented to the user as a graph of interconnected predications in which nodes represent arguments and arcs predicates. Currently, summarization depends on predetermined schemas and is not effective for large graphs. We generalized the method to graphs of any size on any biomedical topic using the graph-theoretic constructs node degree, which represents the number of arcs linked to a node, and cliques, structures in which each node is linked to every other node in the structure. Both are measures of connectedness, and we assume they contribute substantially to the most informative aspects of a summary. After eliminating predications with frequency of occurrence and node degree below a computed threshold, a hierarchical clustering algorithm grouped cliques into the most salient aspects of the final summary. System results covering eleven topics, including diseases, physiologic processes, and substance, were compared to a baseline determined by computing silhouette coefficient. Clusters were evaluated by measuring cohesion within the individual clusters and separation among the clusters, and also by comparing each cluster to MeSH terms assigned to the citations from which cluster predications were extracted.

top


Poster Abstracts

Day 1 Group

Automated Plan-Recognition of Chemotherapy Protocols

Haresh Bhatia, Mia Levy, Vanderbilt University

Abstract:

Cancer patients are often treated with multiple sequential chemotherapy protocols ranging in complexity from simple to highly complex patterns of multiple repeating drugs. Clinical documentation procedures that focus on details of single drug events, however, make it difficult for providers and systems to efficiently abstract the sequence and nature of treatment protocols. We have developed a data driven method for cancer treatment plan recognition that takes as input pharmacy chemotherapy dispensing records and produces the sequence of identified chemotherapy protocols. Compared to a manually annotated gold standard, our method was 75% accurate and 80%f precise for a breast cancer testing set (110 patients, 1,827 drug events). This method for cancer treatment plan recognition may provide clinicians and systems an abstracted view of the patient's treatment history.

top


An Ontological Exploration of Medication Propositions in Clinical Records

Shahim Essaid, Oregon Health and Science University

Abstract:

Medication entries in clinical records play a central role during medication reconciliation. However, medication lists and medication coding systems frequently do not capture the full meaning of these entries in order to support such processes. This compromises our ability to computationally reason over these facts and this is a challenge for developing automated tools to support medication reconciliation.

Current coding systems provide adequate referential semantics (i.e. word or code meaning) for medications but we lack a clear understanding of the propositional semantics (i.e. sentence meaning) in which these coding systems are being used. We propose a theoretical approach to identify and categorize the various propositions being made in reference to medications, within clinical records. Speech act and social ontology theories provide us with an understanding that allows us to explicitly specify the types, structures, and meanings of medication related propositions in clinical systems. A pilot study is planned where authors and records of such propositions from various clinical settings will be observed and the various propositions noted will then be confirmed and clarified through unstructured interviews. Our goal is to identify, analyze, and categorize these propositions, with an attempt to propose a novel, but grounded, systematic representational scheme.

top


Evaluation of a Simulation Based Intervention for Individuals with Diabetes

Bryan Gibson1,2, Michael Lincoln1, Matthew Samore 1,2, Nancy Staggers1, Charlene Weir 1,2

  1. Salt Lake City George E. Whalen VA Medical Center, Salt Lake City UT
  2. Department of Biomedical Informatics, University of Utah, Salt Lake City, UT

Abstract:

Self-management is a complex task for patients with Type 2 Diabetes Mellitus (T2DM). Prior evidence indicates that few people with T2DM use physical activity as a self-management strategy. In addition, many individuals with T2DM have low outcome expectancy regarding the effect of physical activity on glycemia.

Simulation is a powerful mechanism for increasing knowledge and motivation. Using simulated glucose curves; we developed two versions of a short animated film. The "control" version of the film presents key concepts in diabetes self-management in a graphical manner that is intended to improve understanding, even in subjects with low numeracy. The "intervention" version of the film additionally includes the expected change in the glucose curve with increased physical activity, and a planning component intended to help the viewer form implementation intentions (specific "if-then" plans).

We are conducting a randomized trial comparing the efficacy of the two versions of the film in facilitating short-term behavior change. Outcome measures include: diabetes-related knowledge, attitudes and intentions regarding physical activity, and self-reported physical activity. We are measuring diabetes-related numeracy and outcome expectancy as possible moderators. We will present the preliminary results of this trial as well as lessons learned.

top


An Inverse Docking Approach for Identifying New Potential Anti-Cancer Targets

Sam Z Grinter1,2,4, Yayun Liang1,5, Sheng-You Huang1,2,3,4, Salman M Hyder1,5, and Xiaoqin Zou1,2,3,4

  1. Dalton Cardiovascular Research Center, 134 Research Park Drive, University of Missouri, Columbia, MO 65211, United States
  2. Department of Physics and Astronomy, Informatics Institute, University of Missouri, Columbia, MO 65211, United States
  3. Department of Biochemistry, Informatics Institute, University of Missouri, Columbia, MO 65211, United States
  4. Informatics Institute, University of Missouri, Columbia, MO 65211, United States
  5. Department of Biomedical Sciences, University of Missouri, Columbia, MO 65211, United States

Abstract:

Inverse docking is a relatively new technique that has been used to identify potential receptor targets of small molecules. As a validation study, we present the first stage results of an inverse-docking study which seeks to identify potential direct targets of PRIMA-1. PRIMA-1 is well known for its ability to restore mutant p53's tumor suppressor function, leading to apoptosis in several types of cancer cells. For this reason, we believe that potential direct targets of PRIMA-1 identified in silico should be experimentally screened for their ability to inhibit cancer cell growth. The highest-ranked human protein of our PRIMA-1 docking results is oxidosqualene cyclase (OSC), which is part of the cholesterol synthetic pathway. We show that both PRIMA-1 and Ro 48-8071, a known potent OSC inhibitor, significantly reduce the viability of BT-474 and T47-D breast cancer cells relative to normal mammary cells. In addition, like PRIMA-1, we find that Ro 48-8071 results in increased binding of p53 to DNA in BT-474 cells (which express mutant p53). For the first time, Ro 48-8071 is shown as a potent agent in killing human breast cancer cells. The potential of OSC as a new target for developing anticancer therapies is worth further investigation.

top


Evaluation of Documentation of Delirium in the VA Electronic Health Record

Carol J Hope, Nicolette Estrada, Adi V Gundlapalli, Mike Lincoln, Charlene Wier, Brian C Sauer, VA Salt Lake City Health Care System and the University of Utah

Abstract:

The true prevalence of delirium is not accurately reflected in the medical record or administrative coding. Efforts to improve detection and identification of patients with delirium would benefit from recent advances in computer-based methods. As a first step, the documentation of the diagnosis was evaluated in a cohort of 25 elderly VA inpatients. These veterans were confirmed to have delirium by the "delirium" mental health consultation team. Two clinical researchers conducted a retrospective review of the records of the entire hospitalization for this cohort. Using a standardized data collection protocol, data on the documentation and administrative coding of the disease and associated signs and symptoms were extracted and analyzed. Of the 25 patients, a total of 12 had specified diagnoses of delirium with 4 noted in the discharge summary, 10 in the mental health consult note, and one in a physician note. Symptoms of delirium were identified in a total of 16 patients, with 11 documented in the discharge summary, 4 in mental health consults, and 8 in the physician's notes. Seven of 25 patients were assigned delirium-related ICD- 9 codes. These results will inform the development of an ontology and natural language processing methods to identify patients with delirium.

top


Assessing the Usability of a Medication Delivery Unit Through Inspection Methods

Frank M Ligons, Katrina M Romagnoli, Suzanne Browell, Harry S Hochheiser, Steven M Handler, University of Pittsburgh

Abstract:

Background: Polypharmacy and medication non-adherence are common problems in older adults, potentially leading to medication-related problems and increased healthcare expenditures. Computerized Medication Delivery Units (MDUs) may improve adherence by providing reminders to take medication. These devices have potentially complex user interfaces, possibly presenting usability challenges for older adults with cognitive, visual, or fine-motor impairments.

Objective: To conduct a pilot study examining the usability of a commercial telemedicine MDU (the EMMA® from INRange Systems) by triangulating three different inspection methods to identify potential interface problems that may influence usability for older adults. 

Methods: This study applied three usability inspection methods (Heuristic Evaluation, Cognitive Walkthrough and Simulated Elderly Interaction) to the EMMA®, from INRange Systems, Inc.

Results: Each method revealed different problems, with repeated discoveries via different methods providing triangulated evidence. Significant usability issues include major challenges related to physical interaction with the device. Patients suffering from common impairments may face difficulties using the EMMA®.

Conclusion: The combination of inspection methods identified potential usability problems that might have been overlooked without triangulation.  Despite the device's general usability, issues of varying severity were discovered. Further analyses, employing usability testing in older adults with a variety of impairments, are needed to validate the findings.

top


Physiology-Based Modeling of Nanoparticles and Their Extrapolation to Humans

Olinto Linares,1 S hraddha S Sadekar2-3, A Ray2-3, Julio Facelll,1 and Hamdid Ghandehari2-4

  1. Department of Biomedical Informatics
  2. Department of Pharmaceutics and Pharmaceutical Chemistry
  3. Center for Nanomedicine Nano Institute of Utah
  4. Department of Bioengineering, University of Utah

Abstract:

The use of nanoparticles as anticancer drug carrier has been expanded rapidly around the world for their promise to combat this class of diseases. This rapid expansion on nanoparticle research demands new methods for their interpretation, analysis, quantification of toxicity, biodistribution and, final fate of these nanoparticles and their extrapolation to human populations as the final target.

Physiological Based Models have been used for small molecules and could be used for nanoparticles studies with some model modifications necessary for the appropriate uses on nanoparticles research. This study propose the used of PBM to characterize the biodistribution of two of the most promising nanoparticles as drug carrier such as poly(amide amine) PAMAM dendrimers containing surface hydroxyl groups and a linear N-(2-hydroxylpropyl) methacrylamide (HPMA) copolymers in vivo studies. The second part of this study proposes the quantifications/consideration necessaries for their translational to human population.

top


Determining Drug Exposure Status of Patients in Electronic Medical Records

Mei Liu, Vivian K Kawai, Charles M Stein, Dan M Roden, Hua Xu, Vanderbilt University

Abstract:

Medication errors have serious impacts on hospital safety and quality. Many of these errors are due to confusion about patients' medication regimen. Electronic Medical Records (EMRs) provide valuable resources for medication reconciliation (MR) and clinical research. However, detailed medication information is often embedded in narratives, which is not immediately available for analysis. The patient drug exposure information is critical for both MR and drug-related clinical research. Here, we introduce a framework combining Natural Language Processing and Machine Learning to predict patient drug exposure status from time-sequenced drug mentions in EMRs.

The framework consists of two steps: 1) label drug mentions with status (e.g. "start" and "stop"); and 2) predict if a patient is ON/OFF a drug using status and temporal information associated with the mentions. Both phases were implemented using Support Vector Machine. Phase I was treated as a sentence classification task using contextual features. Phase II used the status and temporal information as features and investigated two temporality-related issues: 1) time window; and 2) temporal representation. We applied the framework to 107 admissions from 61 patients for warfarin exposure prediction. 92.8% accuracy was achieved for Phase I and 81% precision and 83% recall was achieved for Phase II.

top


Profiling Physician Quality Metrics, Practice Habits, and Patient Mortality Risk

Eugenia R McPeek Hinz, Joshua C Denny, Vanderbilt University

Abstract:

Quality care metrics provide a means to compare physicians across practices. Variables that influence the quality of outpatient care include physician factors, time constraints, patient demographics and comorbidities. We used Learning Portfolio, an educational website designed to track clinical experiences, to log all patients seen by primary care physicians from 2009-2010. Patient data from our electronic health record were placed in a database including demographics, diagnoses, quality metrics, and medication usage. From this data, each patient's mortality risk was estimated using a previously validated score. A dynamic interface allowed physicians to view and query their practice data. The patient data and mortality risk measurement revealed significant differences in the characteristics of primary care physician practices. Overall 65.1% of all patients were females with female physicians consistently seeing a higher percentage of female patients, (average of 72.3% versus 53.7% for male providers). The average patient age varied widely per provider from 40.9 to 64.1 years with correlating differences in mortality risk frequently seen. Patient demographic factors and mortality risk significantly influence physician practice quality metrics. Differences in mortality risk of patient populations may explain some of the differences in quality metrics for a physician. Future practice profiling and quality improvement data may improve with incorporation of such data.

top


Identifying Mendelian Disease Mutations in Colorectal Cancer Patients

Daniel E Newburger, Hua Xu, Georges Natsoulis, Hanlee Ji, Serafim Batzoglou, Stanford University

Abstract:

Although more than twenty percent of colorectal cancer (CRC) cases are familial, the vast majority of Mendelian mutations responsible for heightened CRC susceptibility remain unidentified. Discovering and characterizing these mutations represents an important goal in translational medicine; it offers affected families the benefits of predictive genetic screening and personalized treatment. Recent advances in high-throughput sequencing provide a new platform for the investigation of Mendelian disease that overcomes the limitations of traditional linkage approaches, but current methods fail to provide comprehensive coverage of genomic mutations and lack robust examination of mutation detection confidence. We therefore developed a novel mutation ranking method for the identification of dominant Mendelian mutations that incorporates information from both functional genomic annotations and pedigree structure. We likewise have built an integrated exome and whole genome sequencing pipeline for comprehensive identification of genomic variants. By applying our mutation ranking method and variant pipeline to a family with familial CRC, we successfully identified a causative, germline CRC mutation that was overlooked by clinical screening. We hope that this work will serve as a framework both for further investigation of familial CRC syndromes and for other dominant diseases in small, complex family pedigrees.

top


Global Analysis of Cohesin-Mediated Gene Regulation in a CdLS Mouse Model

Daniel Newkirk, Richard Chien, Aniello Infante, Kyoko Yokomori, Xiaohui Xie, University of California at Irvine

Abstract:

Cohesin is an essential complex required for sister chromatid cohesion and chromosome segregation in mitosis. However, mutations of the cohesin loading factor Nipbl and cohesin subunits were found to cause the human developmental disorder Cornelia de Lange Syndrome (CdLS), strongly suggesting a critical role of cohesin in developmental gene regulation. In order to understand the transcriptional role of cohesin in CdLS pathogenesis, we performed chromatin immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) on mouse embryonic fibroblasts (MEFs) derived from wild type and Nipbl (+/-) mice that closely model human CdLS. Since cohesin also binds to repeat sequences, which may be important for its transcriptional activity, we developed a new tool called AREM to analyze repetitive regions traditionally excluded in ChIP-Seq analysis. We mapped genome-wide cohesin binding sites in the wild type and mutant MEFs. By correlating cohesin binding with MEF expression array data, we showed that cohesin sites are strongly associated with genes that are significantly dysregulated in the mutant, and that cohesin binding was indeed decreased at some of the most affected genes, such as the Pcdhb cluster and adipocyte-related genes. Our results indicate that the reduction of genome-wide cohesin binding causes expression abnormalities and underlies the pathogenesis of CdLS.

top


Psychosocial Risk Factors are Linked to High Hospital Costs among High-Risk Patients

Julia O'Rourke1, Jeff Weilburg2, Adrian Zai1

  1. Laboratory of Computer Science, Massachusetts General Hospital
  2. Massachusetts General Physicians Organizations

Abstract:

The Massachusetts General Hospital is aiming to improve the cost-effectiveness of care among high-risk patients covered by the Center of Medicare & Medicaid Services (CMS). To achieve this goal, we set forth to identify reversible psychosocial risk factors associated with increased hospitalization cost, a critical step for developing patient-specific intervention strategies.

We analyzed the total hospitalization cost for CMS patients from December 2006 to July 2009 to determine whether or not an association exists between psychosocial characteristics and total hospitalization cost. We extracted psychiatric diagnoses from the electronic health record (EHR) and grouped them into ten variables (adjustment, affective, anxiety, axis 2, neuropsychiatric, psychosis, sleep, substance abuse, other, and unknown disorders). We also extracted other psychosocial characteristics (adherence, refusal of services and high rate of missed appointments) from free text clinical notes utilizing natural language processing techniques.

The final linear regression model described about 10% of variance in the total hospitalization cost. We identified psychosis, neuropsychiatric disorders, substance abuse, adherence and high rate of missed appointments to be associated with high total hospitalization cost.

top


Quantitation of Collagen Alignment: A Tool for Characterizing Cancer Invasion and Progression

Carolyn Pehlke, Jared Doot, Curtis Rueden, Robert Nowak, David Beebe, Patricia Keely and Kevin W Eliceiri, University of Wisconsin-Madison

Abstract:

It has been hypothesized that the angle of collagen fibers relative to the tumor boundary may be used as a predictor of imminent invasion and metastasis. Hence, the ability to quickly and accurately quantify both fiber angle and any related structural changes of the collagen matrix could result in an effective experimental and diagnostic tool. We have developed a semi-automated analysis program based on the Curvelet Transform, developed by Candes and Donoho. This approach is well suited for collagen alignment analysis as the curvelet transform is specifically designed to determine a sparse representation of edges in images, even in the presence of complex geometries such as those associated with TACS.

The fundamental advantage of the Curvelet algorithm applied to the problem of detecting Tumor-Associated Collagen Signatures, is the ability of the transform to retain orientation information from the image. The Curvelet transform not only varies on scale and location in the image, but also in the orientation of the edges. This results in the ability to examine all prominent edges at a particular orientation and a particular scale (varying only on location of the fixed scale and orientation Curvelet in the image). When applied to the Tumor-Associated Collagen Signatures problem, the Curvelet transform becomes a powerful tool for detecting the presence of long straight edges and their location, scale, and orientation. By obtaining accurate quantitative data regarding collagen amount, morphology, and organization/orientation, biologically relevant data can be derived. The implementation of our curvelet analysis program uses an implementation of the wrapping-based digital Curvelet transform.

top


Selective Polypharmacology Towards Drug Repositioning Based on Cancer Related Signaling Pathways

Xiaodong Peng, Liwei Li, Bo Wang, Samy Meroueh, Regenstrief Institute

Abstract:

For recent years drug development progress is painfully slow in industry. Drug repositioning is a heavily needed solution for that situation. Docking drugs into cavities on cancer signaling pathway components could make it possible to repurpose existing drugs for the treatment of cancer. Currently, docking is focused on all cavities identified on the targets based on physiochemical properties of residues. In this work, we identified the cavities by their distance to biological active residues of the targets. Previously, we have docked the library of marketed drugs to all cavities in protein structures from Human Cancer Protein Interacting Networks (HCPIN) and Human Druggable Proteome (HDP). We narrowed these structures to those that contained biologically-relevant binding cavities that occurred either at enzymatic active sites or protein-protein interactions sites. We mined the resulting database and (i) identified signaling pathway preference of drugs; (ii) found that cancer drugs favored binding targets from cancer signaling pathways; and (iii) that in most cases cancer drugs targeted multiple signaling pathways compared to non-cancer drugs and non-drug compounds.

This work is the first attempt at rational repurposing of non-oncology drugs in a multitargeted manner based on active protein structures with direct implication for treatment of cancer.

top


Sequence Based Alignment Used to Detect Rare Variants in Human Mitochondria

Mark Rojas, Jesse Howard, William Widger, Yuriy Fofanov, University of Houston

Abstract:

Cardiovascular disease is the number one killer of Americans leading to an estimated 80,000,000 people. For mitochondrial ATP production in diabetic heart tissue, the fatty acid oxidation increases to 95% or more of the total fuel consumption. The shift in fuel signals a redistribution of metabolic pathways that may lead to mitochondrial dysfunction due to an increase in mitochondrial reactive oxygen species (ROS) production. By observing mutations in the mtDNA, percentage of dysfunction can be calculated and used as an early measure of mitochondrial damage.

Collaboration efforts have been made to obtain and isolate mtDNA from multiple samples of mitochondria from the cardiac muscle. Deep sequencing using Next Generation Sequencing (NGS) and a comprehensive computational approach has allowed us to perform an analysis identifying proportions of each type of genomic variation.

These findings raised a concern about the reliability of SNP data. When a read is misaligned it can incorrectly imply the presence of a SNP. These SNP Identification Errors (SIEs) can arise from sequencing errors or errors in the alignment algorithm. Discerning SIEs from real SNPs is the primary obstacle in the development of a robust SNP calling algorithm for post-mapping analysis and a major focus of this research.

top


Trends in Genetic Testing Using Electronic Health Records (EHR)

Jeremiah G Ronquillo1,2, Rie Sakai3, Cheng Li4, William T Lester2

  1. Harvard Medical School
  2. Massachusetts General Hospital
  3. Juntendo University School of Medicine
  4. Harvard School of Public Health

Abstract:

Background: Advances in clinical informatics have helped physicians use EHRs to integrate patient information, including genetic testing which has become increasingly important for clinical management. However, quantitative analysis of the epidemiology of genetic testing in a tertiary care center has not been performed.

Methods: We used the Research Patient Data Registry to identify patients at Massachusetts General Hospital (MGH) who received genetic testing from 2001-2010. Relevant summary statistics were calculated.

Results: A total of 53,507 patients received genetic testing from 2001-2010 with increasing trend (0.71% over the 10-year period). A median of 1.23% (interquartile range 1.03-1.46%) of all patients seen at MGH received genetic testing. A median of 73.6% (interquartile range 72.9-75.4%) of tested patients were female and 26.4% (interquartile range 24.7-27.1%) were male, with a significant association between gender and year genetic test was ordered (p<0.0001). From 2001-2005, 39-49 year olds were the most frequently tested (ranging 25-37% of patients), while 29-39 year olds predominated from 2006-2010 (32-38%). Race and ethnicity ranged from 70-89% Caucasian, 2-6% Black, 1-6% Asian, 2-14% Hispanic and 3-8% Other/Unknown.

Conclusion: Patients who received genetic testing has increased over time but remains low, with most patients tending to be female, Caucasian, and younger over time.

top


Developing a Meaningful Measure of Use

Hojjat Salmasian, Rita Kukafka, Columbia University

Abstract:

Background: "Meaningful use" part of the HITECH Act requires covered entities to adopt specific features in the electronic health records (EHR). Yet a great deal is unknown about how EHRs are used, and there is no standard for measuring use. As part of a larger project to determine factors predicting use of specific EHR functions, we analyzed EHR usage logs from 927 unique providers participating in the Primary Care Information Project (PCIP) to construct a use metric.

Methods: We developed a multi-variable composite use metric that combined the information of five variables representing basic clinical use of EHR. Results: The five variables (measuring blood pressure, reviewing labs, ordering labs, making prescriptions and entering billing information) received a similarly high score in principal component analysis (ranging from 0.65 to 0.75). Our use metric accounted for 90% of the variation in the original five variables and covered more practice-months than any individual variable. Use of more advanced features was associated with higher value of composite basic use metric.

Discussion: We believe a multi-variable approach to measure basic EHR use can be a more meaningful alternative to measuring single EHR functions and used to contrast with the utilization of more advanced functions.

top


Deriving Biomarkers using Brief-Exposure to Trastuzumab in HER2+ Breast Cancer

Emmett Sprecher, Sudipa Sarkar, Kimberly Lezon-Geyda, Lyndsay Harris, David Tuck, Yale University

Abstract:

The ‘brief exposure' paradigm proposes that biomarkers derived from in vivo assessment of genomic response to targeted drugs will better predict benefit for cancer patients than baseline status alone. Although trastuzumab has remarkably improved outcomes in HER2+ breast cancer (a common and aggressive subtype), resistance remains a significant problem requiring improved biomarkers. Therefore, we aimed to identify genomic differences induced across brief trastuzumab treatment between resistant and responsive HER2+ breast tumors. From a preoperative clinical trial in 80 patients, we obtained biopsies pre and post exposure to a single dose of trastuzumab, then evaluated response following trastuzumab and chemotherapy. We considered differential gene expression (GE) each before, after, and across trastuzumab treatment for the responsive and resistant tumors. Differential GE between responsive and resistant tumors at baseline was rare. However, 61 genes changed significantly (FDR q-value < 0.05) in responsive tumors, while none did in resistant. Among other genes, GRB7 and HER2, in the 17q21 amplicon showed significantly reduced expression in responders. The brief exposure paradigm has the potential to detect important biological changes that may predict response and contribute to adaptive therapeutic decision making.

top


Modular Relief (MoRF) for Ranking Genetic Predictors of Disease

Matthew Stokes, Shyam Visweswaran, University of Pittsburgh

Abstract:

Identification of genetic variants that are predictive of disease is an important goal in bioinformatics. The genetic patterns underlying disease risk are often complex, however, and may involve multiple interacting genes. Brute-force search for these multi-locus epistatic effects is infeasible for high-dimensional genomic data (e.g., GWAS SNP data). The Relief family of algorithms is a powerful tool for efficiently detecting interactions between genetic variants, even in the absence of main effects. Several versions of Relief have been developed over the past two decades. We introduce a framework called Modular ReliefF (MoRF), which abstracts the common features of the Relief algorithms.  This framework allows new versions of Relief to be easily specified as a collection of component functions. Using the MoRF framework, we developed a new spatially weighted version of Relief, which has significantly greater accuracy than other Relief algorithms in detecting interacting genetic variants in both synthetic and real genome-wide data.

top


Interpreting Discordant Results Between Two Diagnosis-Relapse Data Sets

Allison N Tegge1, Dong Xu2, Charles W Caldwell3

  1. MU Informatics Institute
  2. Department of Computer Science, University of Missouri-Columbia
  3. Department of Pathology and Anatomical Sciences, University of Missouri-Columbia

Abstract:

In Childhood Acute Lymphoblastic Leukemia (ALL), there is a relapse rate of almost 25%, and these children usually have a poor prognosis after relapse. Previous studies have identified sets of significantly differentially expressed genes between diagnosis and relapse, but have yet to define the underlying biology of relapse. A deeper understanding of the molecular mechanisms that cause relapse will aid in both prevention and treatment of relapsed individuals.

Using two publicly available, independent microarray data sets for childhood ALL at time of diagnosis and relapse, we have applied various methods to identify perturbations in regulation between diagnosis and relapse. Both data sets yield inconsistent results when the same methods were applied. The incoherence in results for the two data sets suggest there are additional factors, other than the disease itself, that influence the collected raw data used. Possible factors for these differences include, but not limited to, treatment, methodologies, and geography. Overall, this raises questions about the reliability and consistency of publicly available data.

top


A Tool to Utilize Information from Clinical Trials

Maurine Tong, Anna Wu, William Speier, SuGeun Chae, Ricky Taira, University of California, Los Angeles

Abstract:

A knowledge representation is necessary to standardize and to organize large amounts of information, such as presented from the results of clinical trials. Unfortunately, the contents of published papers reporting on clinical trials are often heterogeneous with respect to the types of presented information, and vary frequently in terms of completeness. As such, clinical trial results are laborious to interpret, and the synthesis of their information is often time-consuming. Given the increasing roles of evidence-based medicine and comparative effectiveness research, physicians need to be aware of the most up-to-date scientific knowledge when answering clinical questions - but gaining such knowledge from clinical trials is presently at best impractical. Tools that can standardize and compile clinical trials results into knowledge-bases are needed, facilitating the retrieval and logical presentation of information to a physician on demand.

This research's goal is the development of a set of tools to assist in the creation of such a clinical trial results knowledge-base. As a testbed, we examine a specific disease within neuro-oncology, glioblastoma multiforme (GBM), and both the recent and ongoing clinical trials to treat this cancer. A tool was created to help extract and structure information from selected clinical trials; this data was then compiled into a knowledge-base and a visualization was developed to facilitate navigation. The tool includes functionality to: 1) structure the content of papers into a machine-understandable representation based on CONSORT guidelines; 2) summarize papers by study design, including eligibility criteria and intervention details, and results; 3) link results and conclusions to supporting evidential data; and 4) aggregate results across multiple papers in a unifying representation. The tool also allows a user to search the knowledgebase through keyword searches, as well as searches based on related molecular and genetic traits.

top


Positive-Feedback & Sensitivity of Memory Consolidation to Protein Synthesis

Yili Zhang, Paul Smolen, Douglas A Baxter, John H Byrne, Department of Neurobiology and Anatomy, The University of Texas Medical School at Houston, Houston, TX

Abstract:

Memory consolidation requires kinase activation and protein synthesis. Blocking either process shortly after training disrupts memory stabilization, which suggests that a time window exists during which these processes are necessary. The present study used models of kinase auto-activation and synthesis to investigate the ways in which positive-feedback loops contribute to this window. By simulating protein synthesis inhibition (PSI) before or after training, we found that positive-feedback loops can account for a time window of memory consolidation to PSI with physiological "dosages". Simulations using noisy stimuli suggested that PSI increases the sensitivity of memory to noise by delaying consolidation. Similar results were found with kinase inhibition. Simulations also reproduced reconsolidation (without PSI) and demonstrated a brief time window during which PSI can disrupt reconsolidation, similar to that for consolidation.

The resistance of a consolidated memory to PSI resulted from a net increase in protein synthesis as compared to synthesis before training. This elevated protein synthesis resulted from the positive feedback. Similar results were obtained with the more complex model of Pettigrew et al. (2005). Although our models are based on simplifications of mechanisms underlying consolidation, they illustrate that consolidation and reconsolidation may depend, in part, on dynamics of molecular positive-feedback loops.

top


Day 2 Group

Novel Approaches to using Text-mining for Optimizing Community-Curated Neuroscience Database Workflows

Kyle H Ambert, Aaron Cohen, Oregon Health & Science University

Abstract:

The emphasis of multilevel modeling techniques in the Neurosciences has led to an increased need for large-scale databases containing neuroscientific data. Despite this, such databases are not being populated at a rate commensurate with their demand amongst Computational Neuroscientists. The reasons for this are common to scientific database curation in general--limitation of resources. Much of Neuroscience's long tradition of research is documented in computationally inaccessible formats, such as the pdf, making data extraction laborious and expensive. Here, we propose a series of studies designed to mitigate the bottlenecks in Neuroscience database curation. In particular, we focus our efforts on the Neuron Registry, a community-curated database of neuron-related information pulled from the primary literature. We describe three research projects. First, we identify the most effective approaches to document classification for this setting, including a comparison of various algorithm and feature representation combinations. Next, we develop methods for paragraph prioritization--novel algorithms which, given information putatively located in a particular document, will prioritize its paragraphs according to their likelihood of being the primary communication of the information in question. Finally, we apply structured machine learning techniques that leverage a well-defined Neuroscience ontology to perform information extraction for populating the database directly.

top


Cell Subset Prediction for Improving Blood Genomic Studies

Christopher R Bolen, Mohamed Uduman, Steven H Kleinstein, Yale University

Abstract:

Genome-wide transcriptional profiling of patient blood samples offers a powerful tool to investigate underlying disease mechanisms and personalized treatment decisions. Most studies are based on analysis of total peripheral blood mononuclear cells (PBMCs), a mixed population. In these cases, accuracy is inherently limited since cell subset-specific differential expression of gene signatures will be diluted by RNA from other cells. While using specific PBMC subsets for transcriptional profiling would improve our ability to extract knowledge from these data, it is rarely obvious which cell subset(s) will be the most informative. We have developed a computational method (Subset Prediction from Enrichment Correlation, SPEC) to predict the cellular source for a pre-defined gene expression signature using only data from total PBMCs. SPEC does not rely on the occurrence of cell subset-specific genes in the signature, but rather takes advantage of correlations with subset-specific genes across a population. Validation using multiple experimental datasets demonstrates that SPEC can accurately identify the source of a gene signature as myeloid or lymphoid, as well as differentiate between B cells, T cells, NK cells and monocytes. Using SPEC, we predict that myeloid cells are the source of the interferon-stimulated gene signature associated with HCV patients who are non-responsive to standard therapy.

top


Dietary Pattern Analysis Using Electronic Grocery Transaction Data

Philip J Brewster, John F Hurdle, Department of Biomedical Informatics, University of Utah

Abstract:

The aim of dietary pattern analysis is to measure the distribution of food groups as surrogate indicators of dietary health in a defined population.  Rather than evaluating the effect of individual foods in isolation, the idea of a dietary pattern is intended to account for the range in variation of how foods are actually consumed in different combinations during meals. Traditional methods of data collection to ascertain dietary patterns in nutritional epidemiology - dietary recall and food frequency questionnaires - are costly and limited in scope and accuracy.  The potential for systematic error due to reliance on self-reported factors to estimate dietary health is significant.

Our research shows the promise of automating methods of data collection in nutritional epidemiology using electronic grocery transaction records.  Thanks to a unique data-sharing agreement with a nationwide retail grocery chain, we are designing exploratory studies that measure the distribution of food group data at varying degrees of dimensionality.  Thus, transactional item-set frequency (‘market baskets') represents the range of dietary choices made by individual shoppers over time.  Geospatial factors derived from the retail data may refine epidemiological hypotheses on how store location affects the ‘healthiness' of food shopping patterns in certain populations.

top


DIVE: A Data Intensive Visualization Engine

Dennis Bromley, Steve Rysavy, Valerie Daggett, University of Washington

Abstract:

Molecular dynamics protein simulations produced in the Daggett Lab are created at atomistic and pico-second resolutions, exemplifying the phenomenon of information overload.  To address this, we developed DIVE, an extensible streaming visual analytics pipeline, to better understand protein science and develop in silico therapeutics.

DIVE's streaming architecture allows access to more data than can fit into local RAM.  Coupled with automatic CPU load-balancing and GPU-accelerated 3D visualization, DIVE allows exploration of massive biomedical datasets at interactive speeds; the current version successfully streamed a 70 GB dataset from server to workstation at ten frames-per-second.  To make the interaction more intuitive, DIVE dynamically creates a propertied object model from arbitrary SQL data and exposes it to any .NET host including scripting languages and applications.  DIVE also hosts internal micro-scripting that offers arbitrary programmatic transform and filtering capabilities, even for non-programmers.  Pipeline sources and sinks can be built using any .NET technology.  For example, we integrated Microsoft Excel as a data sink and now leverage Excel's analysis and presentation tools within DIVE.  DIVE also supports research continuity by maintaining both a data-provenance trail and a ‘virtual lab book' that automatically records data manipulations, transforms, or user comments for examination by future researchers.

top


Electronic Health Record Implementation Processes in Rural Critical Access Hospitals

Catherine K Craven1, Lanis Hicks1, Russell Leftwich 2, MaryEllen Sievert1, Gregory L Alexander1, Chi-Ren Shyu1

  1. University of Missouri Informatics Institute, Columbia MO
  2. State of Tennessee Office of eHealth Initiatives

Abstract:

As a result of the HITECH Act, with its Medicare and Medicaid financial incentives and eventual penalties, increasing numbers of small, rural hospitals will implement electronic health records (EHRs). The study purpose is to find out what the implementation processes are by which rural critical access hospitals (CAHs) are preparing for Meaningful Use-compliant EHR rollout.

The research questions are as follows:

  1. What are rural CAHs doing in preparation for EHR rollout to address policy adherence, technology requirements, and people and organizational readiness?
  2. Does preparation in these small, rural settings align with or diverge from known implementation processes and "best practices" in larger hospital settings, and if so, how and why?
  3. What are successful and unsuccessful readiness strategies in small, rural settings?

Qualitative methods, effective for gathering rich data on complex, dynamic processes, will be used, with the approach located in Grounded Theory and an objectivist Glaserian emphasis. Data collection strategies include in-depth interviews, focus groups, and observations. The data collection protocol, with detailed topical sub-questions, is being developed based on clinical information systems (CIS) implementation and adoption literature and via interviews with a spectrum of experts including but not limited to Office of the National Health Coordinator for Health Information Technology Regional Extension Centers (REC) staff, and Federal Advisory Committee and Workgroup members. Data collection units are six rural hospitals: two CAHs in Missouri, two CAHs in Tennessee, and two small, rural hospitals in Chile, as a comparison to the U.S. hospitals. Narrow level units are individuals involved in the implementation process, and groups of these stakeholders. The State of Tennessee Office of e-Health Initiatives, MO HIT Assistance Center (MO REC), tnREC (TN REC), and International Medical Informatics Association representatives are assisting with hospital selection. Data collection will start in July. Data analysis will be conducted with the Atlas.ti coding software.

top


A Machine Learning Approach to Predicting Flexible Regions in Proteins

Elizabeth Eskow, Deanne Sammond, Hubert Yin, Debra Goldberg, University of Colorado, Asa Ben-Hur, Colorado State University

Abstract:

Proteins are dynamic molecules. Their flexibility plays a key role in protein function, and is essential to many interactions with other proteins or molecules. Due to the excessive computational cost of calculating which areas of the protein are flexible, this computation is typically not included in computational protein design or other molecular modeling algorithms. We are developing a machine learning approach to predicting protein backbone flexibility at the residue level, which we believe can provide beneficial information for scoring functions within existing protein modeling software.

By comparing different conformations of the same protein molecule we are able to label residues for training and testing with several classes representing distinct ranges of observed flexibility. Input features to the machine learning classification are calculated from individual experimental structures, including secondary structure and the number and types of non-covalent interactions in which a side-chain participates. Areas of protein flexibility are often found to be consecutive in sequence space; we therefore include features from neighboring residue windows as attributes to encourage the prediction of these flexible regions. Preliminary classification results will be presented showing the promise of the method.

top


Generating Credible Hypotheses for Drug-Drug and Drug-Gene Phenotypes

Guy Haskin Fernald, Nicholas P Tatonetti, Russ B Altman, Stanford University

Abstract:

Patient response to treatment by drugs is highly variable and depends on, among other things, patient specific genetics and the presence of other drugs in the system.  It is estimated that more than 100,000 people die every year in the United States due to adverse drug reactions.  Understanding the molecular mechanisms involved in phar-macogeomics and drug-drug interactions will lead to both improved patient outcomes and a reduction in adverse drug reactions.  Unfortunately, the molecular mechanisms of drug-drug and drug-gene interactions are difficult to study and in many cases are unique to the specific combination of drugs and genes involved.  Today there exist many large scale biological and chemoinformatic databases that contain knowledge about the individual components involved in these interactions, such as OMIM, iRefWeb, and SM- PDB.  Using these resources independently it is difficult to generate credible hypotheses that identify causal interactions to explain drug-drug or drug-gene phenotypes.  In this work we present an integrated approach for generating candidate interactions that may cause drug-drug and drug-gene interaction phenotypes.  The method starts with known interaction partners of the drugs and genes involved and finds likely pathways to connect them to observed interaction phenotypes.

top


Factors that Affect Interpretation of Data Displayed in Treemaps

Akilah L Hugine, Ellen J Bass, and Stephanie A Guerlain, University of Virginia 

Abstract:

Quality improvement may involve the interpretation of large data sets.  Treemaps are space-constrained visualizations that display multidimensional data as sets of nested rectangles with area proportional to a specified dimension on the data.  Analysts are presented with the issue of having to make comparisons of the hierarchal data in the treemap by comparing the area of rectangles that are located at different horizontal distances and offset angles from each other.  In a controlled experiment participants were presented with two rectangular stimuli. They identified which was smaller and then estimated the percent difference in area by making a "quick visual judgment".  Results indicate that true area differences and angle of offset have a significant effect on the accuracy of perceived differences of area but that horizontal distance does not. While treemaps can help support fast characterization of the data, they may not be well-suited when highly accurate comparisons of data elements are required. Future work will examine the use of treemaps for interpreting data collected as part of the National Surgical Quality Improvement Program.

top


Validation of Radiation Data Extraction Using the PARSE Open Source Toolkit

Ichiro Ikuta, Aaron Sodickson, Elliot Wasser, Graham Warden, Ramin Khorasani, Harvard Medical School

Abstract:

Purpose: We sought to validate the automated extraction of radiation data from unstructured nuclear medicine reports.  This information provides more accurate, patient-specific calculations of radiation dose in medical imaging. 

Methods: This project was approved by the IRB and was compliant with HIPAA.  1,000 nuclear medicine reports from the past 10.6 years were randomly selected for 95% confidence intervals no wider than 3.1%.  The open-source toolkit PARSE (Perl Automation for Radiopharmaceutical Selection and Extraction) was implemented to automate the extraction of radiation data fields (unit of radioactivity, quantity of administered activity, and radiopharmaceutical).  Precision and recall with 95% confidence intervals were determined.  The goal for extraction measurements was greater than 90%. 

Results: PARSE precision was 98.8% + 0.60% for units of radioactivity, 97.9% + 0.79% for quantitative administered activity, and 95.3% + 1.17% for radiopharmaceutical.  Recall was 99.4% + 0.43% for units of radioactivity, 99.0% + 0.55% for quantitative administered activity, and 98.4% + 0.71% for radiopharmaceutical. 

Conclusion: PARSE is reliable for the automation of data extraction from nuclear medicine reports to meet both research and clinical needs in radiology.  PARSE provides a way to more accurately estimate patient-specific radiation exposures in radiology.

top


Graphical Methods for Reducing, Visualizing and Analyzing Large Data Sets Using Hierarchical Terminologies

Xia Jing, James J Cimino, National Library of Medicine, NIH

Abstract:

Objective: To explore new graphical methods for reducing and analyzing large data sets in which the data are coded with a hierarchical terminology.

Methods: We use a hierarchical terminology to organize a data set and display it in a graph. We reduce the size and complexity of the data set by considering the terminological structure and the data set itself (using a variety of thresholds) as well as contributions of child level nodes to parent level nodes.

Results: We found that our methods can reduce large data sets to manageable size and highlight the differences among graphs. The thresholds used as filters to reduce the data set can be used alone or in combination. We applied our methods to two data sets of containing information about how nurses and physicians query online knowledge resources. The reduced graphs make the differences between the two groups readily apparent.

Conclusions: This is a new approach to reduce size and complexity of large data sets and to simplify visualization. This approach can be applied to any data sets that are coded with hierarchical terminologies.

top


Learning to Predict Chemical Reactions

Matthew A Kayala, Chloe-Agathe Azencott, Jonathan H Chen, and Pierre Baldi, University of California, Irvine

Abstract:

Predicting the course of chemical reactions is essential to the practice of organic chemistry, with applications ranging from improving drug synthesis to understanding the origin of life.  Previous computational approaches are not high-throughput, are not generalizable or scalable, or lack sufficient data.  Here, we describe a new approach to reaction prediction.  Using a physically motivated conceptualization, we describe mechanistic reactions as interactions between coarse molecular orbitals (MOs).  Using an existing rule-based system, we derive a restricted dataset of 2989 productive and 6.15 million unproductive mechanistic steps.  And from machine learning, we pose identifying productive reactions as a ranking problem: given input reactants and conditions, learn a ranking model over potential MO interactions such that the top-ranked yield the major products.  Our artificial neural network based implementation follows a two-stage approach.  We first train atom-level reactivity classifiers to filter the vast majority of non-productive reactions.  Then, we train ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps. Our trained models exhibit close to perfect recovery of the rule-based labels.  Furthermore, the ranking system correctly predicts multi-step reactions and shows promising generalizability, making reasonable predictions cases not handled by the rule-based expert system.

top


Data Visualization Techniques for Older Adults' Wellness

Thai Le, Katarzyna Wilamowska, Hilaire Thompson, George Demiris, University of Washington

Abstract:

Over the last decades, there has been an increasing focus on developing applications to monitor older adult health status. Each of these applications produces large amounts of data, which unfortunately are not presented in a meaningful way. The presentation of health data can often be fragmented and despite the growing amounts of data provided about the status of an individual, the overall well-being of a specific individual is still difficult to assess. We propose to apply techniques for visualization of wellness incorporating physical, social, spiritual, and cognitive measures collected from health monitoring applications in an independent retirement community. Over a 2 month pilot study with 27 older adults (mean age 88 years), we examined information sources from study intake methods and commercially available informatics tools and explored how they can be presented to older adults, their family members and care providers.

We present several approaches to visualizing wellness data focusing on a holistic wellness model and wellness trajectory over time. The importance of these visualization techniques lies in the integrated view of well-being which can then be leveraged by both older adults and clinicians as a shared decision support tool.

top


From Graphs to Events: A Subgraph Matching Approach for Information Extraction from Biomedical Text

Haibin Liu, Ravikumar Komandur, Karin Verspoor, University of Colorado

Abstract:

An important task in biological information extraction is to identify descriptions of biological events involving genes or proteins, such as binding events or post-translational modifications. We propose a graph-based approach to automatically learn rules for detecting biological events in the life-science literature. The event rules are learned by identifying the key contextual dependencies from full syntactic parsing of annotated text. Event recognition is performed by searching for an isomorphism between event rules and the dependency graphs of complete sentences in the input texts. While we explored methods such as performance-based rule ranking to improve precision, we merged rules across multiple event types in order to increase recall.

We applied our approach to the datasets of the BioNLP 2011 shared task (BioNLP-ST'11) to tackle the GENIA event extraction (GE) and the Epigenetics and Post-translational Modifications (EPI) tasks. We achieved a 41.13% F-score in detecting events across 9 types in the GE task, and a 52.67% F-score in identifying events across 15 types in the EPI task. Our performance on both tasks is comparable to the state-of-the-art systems. Our approach does not require any external domain-specific resources. It may be generalized to extract events from other domains where training data is available.

top


Resource Constrained Countries Need Health Information Exchanges Too!

Nareesa A Mohammed-Rajput, Paul G Biondich, Regenstrief Institute, Indianapolis, Indiana

Abstract:

Healthcare systems around the world were not designed to facilitate data flow amongst providers in different geographical locations, especially those in resource-constrained environments. This lack of information sharing causes fragmented patient care. What resulted as a solution to strengthen infrastructure and facilitate information exchange in healthcare delivery systems was a public and private partnership (PPP). The PPP intends to accomplish this goal by developing an enterprise architecture framework that when implemented will facilitate data exchange and information flow. Reference implementations are planned in 5 representative resource-constrained environments -Rwanda, Cambodia, Mozambique, Zimbabwe and Ghana.

By creating partnerships locally and globally, Ministries of Health in reference implementation countries learn how to build local and sustainable capacity to maintain the health information exchanges. By building partnerships between Ministries and universities, technical workers receive training and potential employment. From past and current work, we see the emergence of registries for patients, clinic, hospitals, and healthcare practitioners, and national identification numbers to track interactions between patients and healthcare systems. Fostering local capacity will ultimately result in a sustainable model for data exchange.

top


Mapping Neural Response to Alcohol Using Optical Imaging Techniques

Rosemary T Nettleton, Eilis Boudreau, Oregon Health and Science University; Yali Jai, Ruikang K Wang, University of Washington

Abstract:

The use of functional imaging techniques to study drugs of abuse, especially alcohol, is in its infancy and their use in the mouse brain is severely limited by its small size.

We propose the use of two complementary optical imaging techniques, laser speckle imaging (LSI) and Doppler optical microangiography (DOMAG) to quantitatively and non-invasively map changes in cerebral blood flow and blood volume down to the level of individual vessels (10 micrometers) in the mouse brain. Our preliminary LSI and DOMAG mouse data show that alcohol decreases arterial and venous flow. However, several biophysical and computational issues must be addressed to realize the full potential of these techniques. For example, blood vessel size could differentially impact alcohol induced changes and these changes may be region specific. Furthermore, analysis of the imaging data is computationally intense. Thus only a fraction of the vessels from our dataset have been analyzed.

The purpose of this work is to increase the efficient analysis of optical imaging data in these techniques and to combine whole brain qualitative LSI data with quantitative regional DOMAG data, demonstrating their use in determining the differential response of mice known to have different genetically mediated behavioral responses to alcohol.

top


Developing "Structure Space" Parameters Using Point Mutation Data

Daniel Ochoa1, George E Fox1, Yuriy Fofanov2

  1. Department of Biology and Biochemistry, University of Houston
  2. Department of Computer Science, University of Houston

Abstract:

The "structure space" of an RNA or protein molecule is defined as all the functional (valid) primary sequence variations that exist, or could exist, for that molecule. Understanding the nature of real structure spaces could provide significant insight to how evolution occurs at the molecular level. As a result, a number of studies have been undertaken using small RNAs as model molecules. However, these studies only consider Watson-Crick secondary structure and ignore tertiary interactions, and interactions with other molecules, altogether. A more realistic model system is required in order to make further progress.

Using Vibrio proteolyticus as a model, the available Vibrio 5s rRNA sequences were compared and cross-referenced with data from in vivo point mutational studies done with V. proteolyticus, in order to determine how useful point mutational data is in predicting functional variants. In general, the sequences with fewer variations contained a higher percentage of functional variants. Sequences with a larger number of variations contained a higher percentage of non-functional or undetermined variants. These initial results suggest point mutational data may be useful in predicting functional and non-functional variants of the V. proteolyticus sequence, containing as many as 9-10 changes, with very high reliability.

top


Quantifying Time Varying Insulin Sensitivity

Edward A Ortiz and Stephen D Patek, University of Virginia

Abstract:

Insulin sensitivity (SI) is central to mathematical models of glucose-insulin interaction, characterizing the effect of insulin in reducing blood glucose (BG) concentration.  Accurate assessment of SI contributes to safe and effective insulin therapy, particularly for ICU patients who are prone to wild SI fluctuations.

BG, insulin, and feeding data from 192 burn-unit patients have been analyzed to produce time-varying estimates of SI  via Kalman filtering, with an extended version of Bergman's minimal model. For each patient, 25 SI functions are assessed (using different filter configurations), each of which is superimposed onto the extended minimal model and simulated using the same insulin and feeding inputs. The resulting simulated BG traces are compared to actual BG, with fit quality measured as one minus the ratio of the sum of absolute errors to the sum of actual BG values offset by 50mg/dl, with one being the best possible score.

Preliminary results show that best fit quality is greater than .73 for 95% of the patients, whereas this quality of fit (or better) is achieved in only 45% of the population using the baseline value of SI.

This study suggests that patient-adapted SI can at least partially explain BG fluctuations within a burn-unit population.

top


A Comprehensive Data-Driven Method for Ontological Term Aggregation

Rimma Pivovarov, Noemie Elhadad, Columbia University

Abstract:

Background: Granularity creates a problem of signal dilution when performing analyses on ontologically standardized datasets.  If semantically similar concepts can be aggregated, a sparse dataset becomes much richer and yields itself to better address various research questions.  Similar concepts that require aggregation are context-dependent however; terms such as "coughing" and "acute cough" should be combined in a study about kidney disease, but left separate in a pneumonia study.

Methods: We leverage distributional semantics of clinical text, lexical inclusion and ontological structure to identify similar concepts in an ontology. We demonstrate our method on a collection of Chronic Kidney Disease (CKD) patients' notes and SNOMED-CT. 

Results: We are able to show that a three-tiered filter on the clinical text reduces a large portion of noise in the dataset.  Additionally, in a small-scale experiment we extracted 58 pairs of similar concepts from a corpus of 115,259 CKD patient notes.  The final set of similar concepts will be freely available online.

Conclusion: By taking advantage of different ontological relationships and combining both patterns of text usage in clinical notes and knowledge representation information we are able to prune out context-dependent concepts that are simply related from those which are semantically similar.

top


Exploratory Analysis of an Intervention Delivery Channel Effectiveness Scale

Lisa M Quintiliani1, Julie A Wright2, Timothy Edgar3, Robert H Friedman1

  1. Boston University
  2. University of Massachusetts Boston
  3. Emerson College

Abstract:

The emergence of new technologies indicates the need to fit delivery channels (i.e., ways to deliver information such as texting or podcasting) to individuals' perceptions of meaningful channel characteristics, which may increase the effectiveness of health behavior change programs. We developed a 27-item Channel Effectiveness Scale [CES] representing 7 established channel characteristics: credibility, decodability, intrusiveness, safeness, participation, personalization, and depth. Here, we present exploratory scale analyses. The CES was administered at 6 months during a 3-group (automated computer telephone, Web, and control) randomized trial targeting diet. Exploratory analyses were conducted among automated telephone participants (n=129). Most were non-Hispanic White (98%) and male (68%). Principal components analysis yielded 11 items accounting for 74% of total variance across four components: credibility (n=3), decodability (n=2), intrusiveness (n=3), and relevance (n=3), thus eliminating ‘safeness' due to high ratings, ‘participation' due to overlap with other characteristics, and combining ‘personalization' and ‘depth' into ‘relevance'. Cronbach's alpha ranged from 0.7-0.9. Positive ratings (reflecting high credibility, decidability, relevance, and low intrusiveness) were correlated with automated telephone use (r=0.33, P<0.0001). The CES demonstrated internal reliability and was associated with intervention use. Future work will entail confirmatory analyses of scale stability across different channels and validation with behavioral change.

top


A Text Mining System using Statistical Machine Learning Approaches for Genetic Epidemiology Articles

Terry Shen, Niels Kasch, Tom Armstrong, Harold Lehmann, Tim Oates, Johns Hopkins University, University of Maryland Baltimore County

Abstract:

Most common and complex diseases, such as diabetes, are influenced at some level by variation in the genome as well as environmental exposures. To truly address the goal of translational research, genomic variation in the context of gene-environment interactions (GEI) must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs) is the first step to understanding human genomic variation. Understanding the implication of a given SNP on a phenotype or clinical outcome requires annotation of the SNP. In turn, SNP annotation is a challenge due to functional uncertainty and to the volume of relevant scientific articles. Both of these aspects are especially true in the context of GEI, where an extra level of uncertainty enters the relationships. The main objective of this project is to develop and evaluate a translational-informatics method that supports machine-learning text mining for determining hypotheses and results in GEI studies.

top


Quantitative Measures of Viral RNA Validate and Improve a Predictive Model

Collin Timm, Ankur Gupta, Jim Rawlings, and John Yin, University of Wisconsin-Madison

Abstract:

Viruses are a major threat to human health, causing diseases such as influenza, AIDS and some cancers. Understanding and predicting the replication of viruses will allow researchers to efficiently and effectively design anti-viral drugs and treatments. Using a well studied model system for RNA viruses, a mechanism based model was built and solved using a combination of ordinary differential and algebraic equations.1 All equations are based on known aspects of the viral growth cycle: the RNA genome enters the cell, mRNAs are transcribed by the viral polymerase, proteins are synthesized which switch the polymerase to a replication mode, and finally the virus components are assembled and released as infectious particles.

Although the predictive model was fit to experimental virus production data and predicts the growth of viral mutants, the model fails to predict viral mRNA and genome levels by orders of magnitude. We seek to advance this model through absolute quantification of viral mRNA and genome on a per cell basis using qRT-PCR. These data will help us validate the mechanistic model and determine parameters to describe the complicated aspects of the viral infection.

1Lim K, Lang T, Lam V, Yin J (2006) Model-based design of growth-attenuated viruses. PLoS Comput Biol 2(9): e116. DOI: 10.1371/journal.pcbi/0020116

top


Restoring Touch Sense: Modulating Signals to Optimally Depolarize Sensory Neurons

Aaron L Williams, Gregory J Gerling, University of Virginia

Abstract:

Prior work on tactile mechanotransduction in the neural informatics community has sought to predict the timing of action potentials using force sensors andmathematical algorithms.  The next major gap is to optimally depolarize neurons with artificial touch sensors connected directly to nerves, enabling closed-loop, motor plus sensory prosthetic touch.  Building this signal modulation function consists of two challenges: its calibration for the delivery of reliable and robust trains of action potentials that are psychophysically discernible, and limiting its delivery of energy amplitude and duration to prevent damage to the nerve.  We are presently developing a two-part model, where 1) compressive force from the environment is transformed to analog voltage and then action potential timing with a leaky-integrate-and-fire neuronal model and 2) event timing is transformed to current pulses to depolarize the afferent.  Transformations will be parameterized with in vivo electrophysiological recordings from peripheral nerves of the rat.  Minimal current pulse amplitude and duration will be found by electrically stimulating the sural nerve and refined by use of a partial factorial experiment to further explore the solution space.

top