Skip Navigation Bar
 

Grants and Funding: Extramural Programs (EP)

NLM Informatics Training Conference 2009

Oregon Health and Science University
Old Library, BICC, Hospital South, School of Nursing

June 23-24, 2009

Agenda and Abstracts of Presentations and Posters


Tuesday, June 23, 2009

7:45 - 8:00 Welcome (Dr. William Hersh & Dr. Mark Richardson, Dean, School of Medicine, OHSU)
   
8:00 - 8:20 NLM Director's Remarks (Donald A.B. Lindberg)
   
8:20 - 8:30 Introductions of Training Directors and Trainees; Program Update (Dr. Valerie Florance)
   
8:30 - 10:00 Plenary Paper Session # 1 – 5 papers Moderator: Dr. Pierre Baldi
 
   
10:00 – 10:45 Poster Break – Attended – Day 1 Group: Posters
  Voted Best Poster, Day 1: Development and Evaluation of a Widget-based ‘Web 2.0’ Electronic Health Record, Yalini Senathirajah
   
10:45 – 12:15 Parallel Paper Sessions – 3 papers each + discussion
   
  Network Session A1 Moderator: Dr. Perry Miller
 
   
  Network Session A2 Moderator: Dr. Harold Lehmann
 
   
12:15 - 1:15 Executive Session of Training Directors (Session Chair: Dr. Donald A.B. Lindberg)
   
1:15 - 2:30 Plenary Paper Session # 2 – 4 papers Moderator: Dr. George Demiris
 
   
2:30 - 4:00 Parallel Paper Sessions – 3 papers each + discussion
   
  Network Session B1 Moderator: Dr. Stephen Downs
 
   
  Network Session B2 Moderator: Dr. David Pollock
 
   
  Network Session B3 Moderator: Dr. Alex Bui
 
   
4:00 – 4:30 Poster Break – Attended – Day 1 Group: Posters
  Voted Best Poster, Day 1: Development and Evaluation of a Widget-based ‘Web 2.0’ Electronic Health Record, Yalini Senathirajah

Wednsday, June 24, 2009

8:15 – 9:45 Plenary Paper Session #3 - 5 papers Moderator: Dr. Russ Altman
 
   
9:45 – 10:15 Poster Break – Attended – Day 2 Group
  Voted Best Poster, Day 2: Sigmoid: An Integrative System for Pathway Bioinformatics and Systems Biology, Ben Compani, University of California, Irvine
   
10:15 - 11:30 OHSU Showcase
   
12:30 – 1:30 Parallel Paper Sessions - 2 papers each + discussion
   
  Network Session C1 Moderator: Dr. Stephanie Guerlain
 
   
  Network Session C2 Moderator: Dr. Bill Caldwell
 
   
  Network Session C3 Moderator: Dr. Cindy Gadd
 
   
1:30 – 2:00 Poster Break – Attended – Day 2 Group
  Voted Best Poster, Day 2: Sigmoid: An Integrative System for Pathway Bioinformatics and Systems Biology, Ben Compani, University of California, Irvine
   
2:00 - 3:15 Plenary Paper Session # 4: - 4 papers - Moderator: Dr. George Hripcsak
 
   
3:15 – 3:30 Closing Session (Dr. William Hersh) & Poster Awards (Dr. Valerie Florance)


PRESENTATION ABSTRACTS

Effect of Actionable Reminders on Performance of Overdue Testing

Authors:
Robert El-Kareh, Tejal Gandhi, Eric G Poon, John Orav, Thomas Sequist, Brigham and Women's Hospital, Boston, MA

Abstract:
Background: Many patients do not receive appropriate testing for preventive care and chronic disease management. Electronic reminders linked to computerized order entry (actionable reminders) might facilitate ordering of overdue tests. We studied the impact of actionable reminders on cancer screening, osteoporosis screening, and diabetes care.

Methods: We identified 4 intervention primary care clinics and 4 control clinics matched based on baseline screening mammography rates. Reminders prompted clinicans during office visits with patients overdue for mammograms, bone density scans and diabetic monitoring. We analyzed patient visits during a 6-month period prior to the actionable reminder implementation and a 6-month period following implementation. We fit multivariable logistic regression models with generalized estimating equations to measure the effect of the reminders on testing rates.

Results: There were no differences in rates of appropriate testing between the intervention and control sites for overdue screening mammography (11.0% vs. 14.3%, p=0.20), bone density exams (6.2% vs. 5.8%, p=0.61), and diabetes testing for hemoglobin A1c (46.4% vs. 48.6%, p=0.71) or low-density lipoprotein (24.2% vs. 20.7%, p=0.58).

Conclusion: Actionable reminders did not improve receipt of breast cancer screening, osteoporosis screening, and diabetes care. Future work should investigate the limitations of such systems and how to improve their effectiveness.

We have shown improvements in early precision resulting from the above improvements over a standard text-based retrieval system.

top


A Novel Visualization Tool for Evaluating Adverse Events in Multi-Drug Regimens

Authors:
Jon D Duke, Shaun J Grannis, Regenstrief Institute, Inc.

Abstract:
Patients on multiple medications are at increased risk for adverse drug events. To evaluate such events, the physician must be aware of the potential side-effects of all of a patient’s medications. Current information resources are inefficient for researching such multi-drug regimens because they require each medication be looked up individually. The results must then be interpreted to determine which drug is most likely to have caused the adverse event. We describe a novel information resource called Rxplore which allows users to retrieve adverse reaction data for multiple medications simultaneously. Results are delivered as an intuitive visualization to allow for rapid interpretation of the data. We are currently engaged in a laboratory study measuring physician speed and accuracy in retrieving adverse reaction data using Rxplore versus UpToDate. Preliminary results show a 60% increase in speed when using Rxplore without any loss of accuracy. A survey of the study participants reveals a high level of satisfaction with the application. Future work will include integrating Rxplore into an electronic medical record and evaluating the system’s impact on the evaluation of ADE’s in the clinical setting.

top


Display Format Affects Physicians’ Interpretations of Laboratory Data

Authors:
David T Bauer, Stephanie A Guerlain, University of Virginia

Abstract:
Data overload is a significant problem in many areas of health care and can lead to suboptimal care. Computer-based information systems provide opportunities to remedy present medical data in new, more efficient ways. In this study, we sought to design a laboratory data display for an electronic flowsheet that suits physicians’ information needs. The design features small, data-dense graphics called sparklines, which are intended to emphasize trends and aid comparisons among variables over time. To evaluate the display, physicians in a pediatric intensive care unit were asked to talk aloud as they assessed four hypothetical patients with the graphical display and with a traditional tabular display. Verbalizations were transcribed and coded, and the within-subject design allowed us to compare participants’ descriptions of the same data in each display condition. Assessment time and responses to a questionnaire about the displays and cases were also analyzed. Physicians completed assessments significantly faster with the graphical display, and considerable differences were found in how participants interpreted laboratory data based on display format. These results indicate that display format affects how data are interpreted and must be considered in the design of health care information systems.

top


Real-Time Surveillance for Rapid Correction of Clinical Decision Support Failures

Authors:
Allison B McCoy, Josh F Peterson, Lemuel R Waitman, Vanderbilt University

Abstract:
The utilization of clinical decision support (CDS) is increasing among healthcare facilities which have implemented computerized physician order entry or electronic medical records. Formal prospective evaluation of CDS implementations occurs rarely, and misuse or flaws in system design are often unrecognized. Retrospective review can identify failures but is too late to make critical corrections or initiate redesign efforts. Real-time surveillance of user responses and patient outcomes comprises one approach to give immediate feedback to CDS designers and help operate a safety net which intercepts CDS failures. We outline four types of CDS (passive alerts, interruptive alerts, order sets, and complex ordering advisors) and describe common failures with surveillance applications for high-alert medications such as aminoglycosides, anticoagulants, and insulin. We then present a computerized tool for high-alert medication prescriptions which serves developers, clinical pharmacists, and institutional physician leaders. The tool has two views: the surveillance view allows users to scan all CDS failures and prioritize high-risk scenarios, and the patient detail view provides context for understanding CDS failures. Entries on the surveillance tool populate automatically when CDS is used or ignored, allowing verification of prescription safety and accuracy.

top


Translating a Technique From Proteomics to Organizational Dynamics: Clustering of Tasks, Knowledge, and Resources in PH Work

Authors:
Jonathan Keeling, Jacqueline Merrill, Department of Biomedical Informatics, Columbia University

Abstract:
Public health (PH) is the science and practice of protecting and improving the health of a community. The PH system incorporates bench research, clinical preventive care, and assuring population health via a national system of local health departments. We applied a single informatics technique at two levels of PH research. First, we used hierarchical clustering (between-groups linkage, cosine distance) as an analytical technique to determine whole-organism effects of uncharacterized small molecules by identifying nearest neighbor compounds with known effects. Through this technique half of the unpromising candidates in a cohort were eliminated early in the drug discovery process. Second, we used clustering as a knowledge discovery technique to elucidate patterns in PH work. Task, knowledge, and resource profiles of 1062 employees from 11 county health departments were hierarchically clustered and qualitatively examined to identify causal patterns. We found clusters on specific job titles, education, job level, and experience but not on departments, divisions, or age. This implies that information needs of PH workers may be empirically quantified and that commonalities in PH practice might be leveraged across PH departments to improve performance in a standard way. Informatics methods applied across PH science exemplify the translational paradigm advanced by NIH.

top


Mapping MicroRNAs Genes to Active Breast Cancer Regions Using LIA - Locus Intersection Analysis Algorithm

Authors:
George Gorgon1, Chris Zaleski2, Scott A Tenebaum2, David P Tuck1
1Yale University, New Haven, CT, 2State University of New York, Albany, NY

Abstract:
MicroRNAs are a class of small non-coding RNA genes whose final products are RNA molecules (~22 nucleotides long) capable of both repressing and up-regulating translation. We hypothesized that some of these molecules could be concomitantly over-expressed or repressed in breast cancer by virtue of their genes being located within or in the vicinity of active breast cancer regions, e.g., amplicons or deletions. The repressed or amplified micro-RNA could play an active role in the regulation of cell cycles by activating or suppressing genes important in cellular growth control and carcinogenesis. In this work, we developed a robust algorithm - Locus Intersection Analysis Algorithm (LIA) - to perform comparisons among chromosomal coordinates from multiple datasets. LIA executes comparisons among any coordinate-based data, identifying shared or common regions. The chromosomal coordinates of amplified and deleted breast cancer regions from different datasets were mapped to coordinates of known microRNA molecules, generating a list of potential biologically active molecules. Additionally, target genes for selected microRNAs molecules were queried using public databases. A growing body of experimental evidence has confirmed that some of the microRNAs we identified are indeed important in carcinogenesis. Some literature-based data as to biological role of our findings is also presented. The use of LIA allows experimental biologists to narrow down a potential list of target sites for experimental validation.

top


Occupancy Classification of PWM-Inferred Transcription Factor Binding Sites

Authors:
Hollis Wright, Shannon McWeeney, Aaron Cohen, Kemal Sönmez, Greg Yochum, Oregon Health and Science University

Abstract:
Computational prediction of transcription factor binding sites (TFBS) is a difficult process and fraught with high rates of error; for example, the popular position-weight matrix methods such as TRANSFAC/MATCH exhibit very poor specificity. Recently, methods for improving the accuracy of TFBS prediction using additional data beyond the TFBS motif sequence have shown promise. In this presentation we will primarily discuss our evaluation of several machine learning methods (Bayesian networks, SVM, Reconstructability Analysis) as classifiers of predicted TFBS as high-occupancy (e.g. biologically relevant) or low-occupancy sites using a variety of sequence and chromatin features, including information-theoretic metrics, DNA methylation and histone modifications. We will also briefly present work in both exploring the possibility of combining TF-specific classifiers into a general classifier for TFBS occupancy through classifier stacking, and evaluating the structure of gene interaction networks constructed using occupancy predictions as compared to interaction networks constructed from biological data.

top


A Mathematical Model for Methionine Metabolism

Authors:
Keith Booher, Tarek Najdi, Todd Johnson, Eric Mjolsness, and Peter Kaiser, University of California, Irvine

Abstract:
Cancer cell methionine dependency describes the phenomenon in which cancer cells undergo cell cycle arrest followed by apoptosis when methionine is restricted from the growth media and replaced with the immediate metabolic precursor homocysteine. By contrast, non-transformed cells proliferate in media supplemented with homocysteine suggesting a pathway containing therapeutic drug targets. Methionine and homocysteine are metabolites in the transmethylation and methionine cycle metabolic pathways. In order to understand the nature of cancer cell methionine dependency, we endeavor to develop an in silico mathematical model of methionine metabolism. As a first step, we have begun modeling the highly conserved pathways in the yeast model.

Using kMech, we implemented a Lam-Delosme simulated annealing numerical optimization algorithm to produce a Cellerator output such that the metabolites of the methionine metabolic pathway reached a steady state concentration at their published levels in yeast. Furthermore, application of the algorithm generated parameters within pre-determined constraints dictated by known values for the Km, kcat, and enzyme concentration governing each reaction of the pathway. This work presents a first step towards simulation of methionine dependency of cancer and our goal to identify candidate cancer drug targets in silico.

top


Social, Organizational, and Contextual Aspects of CDSS for Intensive Insulin Therapy

Authors:
Thomas R Campion Jr., Cynthia S Gadd, Asli Ozdas, Nancy M Lorenzi, Lemuel R Waitman, Vanderbilt University

Abstract:
Historically clinical decision support system (CDSS) evaluations have focused on practitioner performance rather than social, organizational, and contextual factors, and CDSS for intensive insulin therapy (IIT) is no exception. IIT, which relies on frequent blood glucose measurements and insulin infusion adjustments to maintain tight blood glucose control, has been the standard of critical care since 2003. However, recent studies have questioned the therapy’s mortality benefit and safety. Computer-based IIT approaches, which have generally outperformed paper-based versions in terms of protocol adherence and practitioner performance, typically involve interaction between nursing staff, blood glucose testing devices, and CDSS modules, a process more complex and susceptible to error than most studies acknowledge. Examining the interdependent aspects of this process can possibly facilitate improvement. This work 1) reviews computer-based IIT evaluation literature using institutional theory, a discipline from sociology and organizational studies, to show the inconsistent reporting of social, organizational, and contextual elements, 2) demonstrates elements frequently omitted from IIT evaluations through a case study, and 3) assesses IIT CDSS workflow complexity by quantifying nurse data entry error and appropriateness of clinical judgment. By addressing social, organizational, and contextual aspects of CDSS for IIT, researchers and practitioners can potentially improve practitioner and patient outcomes.

top


Predicting Treatment-Induced Acute Hypoglycemia in the Intensive Care Unit

Author:
Ying Zhang, Harvard Medical School, Massachusetts Institute of Technology

Abstract:
The benefits of tight glycemic control in critical care could be achieved by implementing more intense insulin therapy. However, such therapy also exposes ICU patients to higher risk of acute hypoglycemia. To make tight glycemic control safer, this project develops models for predicting the occurrences of hypoglycemia during intravenous insulin infusion before the actual hypoglycemic events take place. Data from 3116 adult ICU patients have been retrospectively analyzed to elucidate glycemic dynamics and to devise a methodology for proactive prediction. Mutual information, embedded selection by classification trees, odds ratios of categorized clinical time series and occurrences of acute hypoglycemia are used to compare features of patients’ glycemic dynamics. Machine learning is then applied to key features to generate predictive models of acute hypoglycemia. Results show that blood glucose levels and change in dose response to insulin within the last two hours are the most informative features. Predictive models built with the key features could accurately predict 82.12% of acute hypoglycemic events (specificity: 89.87%; positive predictive value: 88.72%; accuracy: 86.00%). Future work will focus on using the mechanistic approach from this project to discover trends in the clinical data leading up to acute hypoglycemic episodes.

top


Using Predictive Methods in Conjunction with Risk-Based Insulin Reduction for Glycemic Control in Type 1 Diabetes

Authors:
Colleen S Hughes, Stephen D Patek, Marc Breton, Boris Kovatchev, University of Virginia

Abstract:
Continuous glucose monitoring is an enabling technology for future systems that regulate glucose concentration in patients with type 1 diabetes, with the key idea being to adjust the insulin pump settings automatically based on measured deviation from a target value. Safety is at the forefront in the design of these systems, and there is a clear need for supervisory processes that lower the risk for hypoglycemia in closed-loop, open-loop, and advisory mode settings. We have developed an algorithm for smoothly attenuating insulin pump injections by (1) monitoring CGM and insulin injection data, and (2) predicting the patient’s risk of hypoglycemia for some time period ahead (e.g. 30 minutes). Pump output is regulated so that injections are dramatically reduced when the risk of hypoglycemia is high. Preclinical in-silico trials using and FDA-accepted metabolic simulation environment demonstrate that, in situations involving an artificially elevated basal insulin rate, hypoglycemia (70 mg/dl) can be avoided 96% of the time, compared to 60% using algorithms that shut off the pump based on a linear prediction of hypoglycemia from time series data.

top


Assigning Individual Weights to Pedigree-Members for Genetic Association Analysis

Authors:
Stacey Knight, Ryan P Abo, Jathine Wong, Alun Thomas, and Nicola J Camp, Biomedical Informatics Department, University of Utah

Abstract:
While methods exist to appropriately perform association analyses in pedigrees, they are computationally impractical for genomewide association (GWA) studies. Here we introduce a new algorithm which, using all relationships simultaneously, assigns weights to pedigree members that can be used to address relatedness. We compare this new method with an existing weighting algorithm, a naïve analysis (where relatedness is ignored) and an empirical method that appropriately accounts for all relationships. Framingham GWA data were used with a dichotomous phenotype based on HDL cholesterol level (1,611 cases and 4,043 controls). Cochran-Armitage trend tests were performed for 17,333 SNPs using both weighting systems and the naïve approach, and a subset of 500 SNPs were tested empirically. Results from the two weighting methods were strongly correlated (r=0.96). Compared to the empirical results our new weighting method performed better than the existing approach (r=0.89 vs r=0.83), which is due to a more moderate down-weighting. However, the naive analysis obtained the best correlation with the empirical gold standard results (r=0.99). Our results suggest that weighting methods do not accurately represent tests that account for familial relationships in genetic association analyses and are inferior to the naïve method as an efficient first-step GWA screening tool.

top


Robust Three Dimensional Object Modeling and Tracking for Human Surveillance Using Stereo Vision

Authors:
Robert Luke, Derek Anderson, James Keller
National Library of Medicine Predoctoral Fellow
MU Biomedical Informatics Research Training Program
Electrical and Computer Engineering Department and MU Informatics Institute University of Missouri

Abstract:
A new procedure for robust scene modeling using stereo vision is presented. Contrary to previous approaches, models are built in a three dimensional world instead of two-dimensional image space. Sets of stereo camera pairs are placed throughout an environment and voxel spaces for each camera pair are intersected. The resulting voxel space represents the solid volume occupied by all objects. The objects are segmented and tracked through time. Humans in the scene are distinguished from other objects for the goal of monitoring activity. The proposed system is being built for the goal of monitoring the well being of elders. This approach is important and different in that it is extremely robust to dynamic illumination conditions, such as lighting changes and shadows, the high accuracy of objects built in the scene is due to stereo vision and voxel construction, and object identification and reasoning allows for the tracking of humans.

top


Modeling Diffusion-Weighted MRI Images Using Bayesian Finite Mixture Models

Authors:
Juan Eugenio Iglesias, Paul Thompson, Zhuowen Tu, University of California, Los Angeles

Abstract:
Diffusion weighted magnetic resonance imaging (DW-MRI) is a technique that measures the 3D profile of water diffusion in the brain at each spatial location in vivo. This information can be used to track axonal fibers in the brain. In diffusion tensor imaging, a zero-mean Gaussian probability distribution function (PDF) is fitted to the DW-MRI data at each spatial location. However, this model fails to explain fiber crosses and bifurcations. One way of overcoming this limitation is to sample the diffusivity on a high number of directions on a sphere around each voxel. This approach is known as high angular resolution diffusion imaging. The existing literature in the DW-MRI domain focuses on optimizing the fit of the individual tensors, putting less emphasis on the joint statistics of the tensors as a field. In this study, we introduce Bayesian finite mixture models for studying the DW-MRI images as a field. The result is an algorithm which produces a general model that can be used in different DW-MRI applications. A denoising technique is illustrated here, with promising results. The application to fiber tracking remains as future work.

top


Computational Modeling of Genome-Wide Targeting of Somatic Hypermutation

Authors:
Jamie L Duke, Man Liu, David G Schatz, Steven H Kleinstein, Yale University

Abstract:
Activation Induced Cytidine Deaminase (AID) is required for somatic hypermutation (SHM) of the B cell receptor during normal immune responses. Mistargeting of AID can lead to mutation of non-immunoglobulin genes and has been proposed as a contributing factor of tumorigenesis. Through large-scale sequencing, we have shown AID targets a large fraction of expressed genes in normal B cells and results further suggests the B cell genome is protected by two distinct processes: targeting of AID to particular genes and gene-specific targeting of high-fidelity repair to AID-induced lesions1. From these experiments, we compiled a dataset of 26 Mb of sequence from 180 genes in wild-type and various knockout mouse models. Each gene exhibits a unique mutation pattern and genomic context offering and unprecedented opportunity to address several questions concerning AID targeting and mechanism of action.

Our analysis includes a comparison of targeting mechanisms for SHM in non-immunoglobulin genes versus immunoglobulin genes, and an examination of the potential functional consequences of aberrant SHM. Additionally, models have been developed to quantitatively distinguish between genes which are strongly and weakly targeted by AID and SHM with current results suggesting the mechanism of AID targeting is at least partially shared between immunoglobulin and non-immunoglobulin genes.

top


Developing and Evaluating a Semantic Model for NLP

Authors:
Jeannie Y Irwin, Henk Harkema, Lee M Christensen, Wendy W Chapman, University of Pittsburgh

Abstract:
Natural language processing applications that extract information from text rely on semantic models. The objective of this project is to describe a methodology for creating a semantic model to represent the clinical information that will be automatically extracted from textual clinical records. We illustrate two of the four stages of the methodology in this project using the case study of encoding information from dictated dental exams: (1) develop an initial model from a set of training documents and (2) iteratively evaluate and evolve the model while developing annotation guidelines. Using these two steps, we have developed a semantic model for the dental domain comprised of 13 nodes and 16 relationships. Eleven of the nodes represent an underlying Bayesian network that infers a high level concept. To evaluate the model three annotators slotted conditions and defined relationships of concepts found in twelve hard tissue exams. Our model was quite complex, however the annotators had high agreement when modeling the concepts and relationships. Our approach for developing and evaluating a semantic model has been useful in guiding our model development and could be used for other domains.

top


Organizing Biological Parts: A Semantic Model of Synthetic Biology Designs

Authors:
Michal Galdzicki, Maxwell L Neal, Daniel L Cook, John H Gennari, University of Washington

Abstract:
Synthetic biology holds significant promise for the development of new biotechnology in the biomedical domain. The public information resources for the design and implementation process in synthetic biology are plagued by data which are inconsistent, weakly described, and not machine interpretable. Recent efforts have established standardized rules for physical composition and assembly of DNA segments coding for the parts intended for the creation of new biological systems. However, the functional and behavioral description of the role the regulatory sequences and genes will play is encoded in disjoint information resources as mathematical models of the biochemical process. These models serve as predictive tools in the design process and should be linked directly to corresponding entries in the Registry of Standardized Biological Parts. We propose to create a semantic architecture necessary to represent the building blocks in terms of their expected functional description as specified in the mathematical models to make them accessible using a standard query language. The eventual goal is to advance the synthetic biology design process by accelerating access to the knowledge created in previously designed systems.

top


Evaluating TimeML's Recognition of Temporal Expressions Within Medical Documents

Authors:
Ferdo R Ong1,2, Ruth Reeves1,2,Ted Speroff1,2 , Steven H Brown1,2,3
1Department of Veterans Affairs Tennessee Valley Healthcare System
2Department of Biomedical Informatics, Vanderbilt University
3Health and Medical Informatics Office, Department of Veterans Affairs.

Abstract:
Introduction: TimeML is an XML-based markup language for encoding temporal and event time information for use in automatic text processing, developed primarily for general news articles. This study examines the frequency and usage of temporal expressions in medical documents, and provides a baseline comparison between human and TimeML's automated mark-up of these expressions.

Methods: Two informaticians annotated a random sample of 100 medical documents for DATE, TIME, DURATION, and SET (reoccurring times) using a general-purpose text annotation tool. A reference standard was created from the merged annotations. The same documents were then processed using TimeML for the automated recognition of temporal expressions. Comparison was performed against the reference standard.

Results: TimeML compared to reference standard Precision: .1309 Recall: .4512 F-measure: .203

Discussion: TimeML found no instances of SET, whereas 32.2% of all the expression types recognized in the reference standard were of type SET. The largest classification disagreement was instances classified by TIMEML as DURATION, but classified by the standard reference as DATE, TIME, or SET. TimeML classified all expressions with time units as DURATION. Although this is sometimes the correct classification, the strategy vastly over-generates.

top


Molecular Origins of DNA Flexibility: Sequence Effects on Conformational and Mechanical Properties

Authors:
Vanessa Ortiz, Edward Sambriski, Juan de Pablo, University of Wisconsin-Madison

Abstract:
DNA sequence plays an important role in nucleosome stability and dynamics by influencing the conformational and mechanical properties of the DNA segment in contact with the histone protein complex. Here, we elucidate the sources of that sequence influence by performing free-energy calculations on a DNA model that accounts for both sequence-dependent base-pair step deformability and base-pairing. Previous DNA mesoscale models do not take base-pairing into account and we show that this effect is key for observing sequence effects on bending. Sequence is observed to affect the conformational space of DNA by creating kinks on the segment. These kinks arise from the ability of certain sequences to slide its strands along each other making the bases form non-native contacts. Bending stiffness is also affected by sequence but only in its directionality (different sequences will bend at the same energy cost but with different preference for the direction in which they adopt the bend). Short DNA segments (68 bp) are found to be very flexible; with persistence lengths of 27 nm, roughly half of the persistence length reported for long segments. Implications on nucleosome positioning are examined. demonstrates that RNA structure formation is an important mechanism regulating gene expression and disease.

top


Using Comparative Genomics to Improve Protein Phosphorylation Prediction

Authors:
Samuel M Pearlman, James E Ferrell, Jr., Stanford University

Abstract:
Investigation into the evolutionary origins of phosphorylation has revealed the usefulness of comparative genomic data in identifying phosphorylation sites, both by examining the site itself and nearby positions. I will evaluate the benefits of incorporating evolutionary features, including the rate of conservation of phosphorylatable residues and the rates of replacement of phosphorylatable residues by acidic residues, into phosphorylation site prediction methods. I also will present work on examining and quantifying selective conservation at positions close to the phosphosite, indicating the possible co-evolution of binding motifs.

top


Computational Comparison Method for Biological Pathways Suggests Clinical Biomarkers

Authors:
Mary F McGuire, M Sriram Iyengar, David W Mercer, University of Texas Health Science Center at Houston

Abstract:
A major current challenge in translational systems biology is to derive clinically meaningful results – and new hypotheses for treatment and drug targets – from ever-increasing amounts of quantitative/qualitative data. These data are typically of high dimensionality, obtained from large scale biochemical assays and from biomedical databases. We describe a method that enables dimensionality reduction of the data and computationally tractable comparisons of evoked biological pathways across time and clinical outcomes. The method is applied to serum assays of 27 biomolecules taken across clinical outcomes and multiple time points from trauma patients meeting standardized criteria. First, significance sets of biomolecules S(t) that differentiate outcomes in defined time periods are identified by applying statistical methods. Second, representative values of assays for specific clinical outcomes in these time periods are input into a pathway knowledge base resulting in sets of evoked biological pathways. These pathways are mapped to matrices, and biological questions such as which molecules appear only in one outcome, or only one time period, can be answered through matrix algebra. From these computations, a training model for biomarker identification for a specific outcome is derived. Future work includes test and verification of this model based on a prospective study

top


The Effect of Simulation as a Component of an Exercise Decision Support System Within a PHR

Authors:
NBryan Gibson, Michael Lincoln, Matthew Samore, Nancy Staggers, Charlene Weir, University of Utah

Abstract:
Simulation is a powerful mechanism for individual behavior change. Patient directed advice and reminder systems have already been shown to be effective in increasing physical activity in persons with Diabetes. This project describes software that integrates an exercise advice and reminder system with an exercise simulation module within a PHR. The logic of the advice system is drawn from evidence-based consensus documents and takes into account the patient's Readiness to Change, glycemic control, medication regimen, the presence of diabetic complications, musculoskeletal conditions, and access to different modes of exercise. In the simulation component the veteran can explore both the acute (single session) and long-term (3 months) effects of different exercise routines on their Blood glucose, Hemoglobin A1c, Blood pressure, Weight and Lipids. The system is being piloted as an adjunct to an ongoing clinical trial. A trial with veterans is planned to determine if use of the system, with and without the simulator, results in changes in motivational outcomes (stages of change, self efficacy) and clinical outcomes (Blood glucose, Hemoglobin A1c, Blood pressure, Weight and Lipids). Future work will recruit veterans to provide their PHR data for research to improve the predictive equations used in the simulator.

top


SmartCane: Active Guidance Towards Proper Cane Usage

Authors:
Lawrence Au, Winston Wu, Maxim Batalin, William Kaiser, University of California, Los Angeles

Abstract:
TThe usage of conventional assistive cane devices is critical in reducing the risk of falls, which are particularly detrimental to the elderly and the disabled. Individuals with the greatest risk typically rely on cane devices for support of ambulation. The results of many studies, however, have shown that incorrect usage is prevalent among cane users. In this presentation, we describe the development of a real-time data processing algorithm based on the SmartCane platform developed at UCLA. The algorithm provides direct detection of cane usage characteristics. Specifically, the system supports direct feedback to the cane user, allowing proper guidance and avoiding misuse. The algorithm processes data locally and classifies whether an individual is executing a stride with a proper cane motion and the correct applied forces.

top


Home Monitoring Improves Survival of Post-Lung Transplantation Patients

Authors:
HoJung Yoon, Stanley Finkelstein, University of Minnesota

Abstract:
Home-monitoring is an increasingly important component in chronic disease care. This study presents a survival analysis for the subjects in the Lung Transplant Home Monitoring Program at the University of Minnesota. The home-monitoring measured daily pulmonary function and respiratory symptoms, and the adherence was calculated based on the number of the weekly transmission of the data. The data from the 246 subjects in the study provided 132,822 daily readings of spirometry for this analysis. The subjects’ adherence rates were correlated with their survival length. The Kaplan-Meier analysis of the subjects who survived at least one year showed statistically significant improvement in survival for the subjects with high adherence to the home monitoring (Log-Rank p=0.0072). We conclude that the higher adherence to the home monitoring for the post-lung transplant subjects improves survival after the first year of the transplantation.

top


Determining the Statistical Significance of Survivorship Prediction Models

Authors:
Holly P Berty, Haiwen Shi, James Lyons-Weiler, University of Pittsburgh

Abstract:
The assessment of statistical significance of survivorship differences of model-predicted groups is an important step in survivorship studies. Some models determined to be significant using current methodologies are assumed to have predictive capabilities. These methods compare parameters from predicted classes, not random samples from homogenous populations, and may be insensitive to prediction errors. Type I –like errors can result wherein models with high prediction error rates are accepted. We developed and evaluated an alternate statistic for determining the significance of survivorship between or among model-derived survivorship classes. We propose and evaluate a new statistical test, the F* test, which incorporates parameters that reflect prediction errors unobserved by the current methods of evaluation. We found the Log Rank test identified fewer failed models than the F*test. When both the tests were significant we found a more accurate model. Using two prediction models applied to eight datasets we found the F* test gave a correct inference five out of eight times where as the Log Rank test only identified one model out of the eight correctly. Our empirical evaluation reveals that the hypothesis testing inferences derived using the F*test exhibit better parity with the accuracy of prediction models than other options.

top


A General Statistical Method for Dose Titration

Authors:
Robert G Turcott, Hersh Sagreiya, Euan A Ashley, Russ B Altman, and Amar K Das, Stanford University

Abstract:
In its most general form, dose titration includes both pharmacologic and non-pharmacologic therapeutic interventions. Despite being a common challenge in clinical medicine, statistical tools for the assessment of titration data are lacking. We have developed a general statistical approach to dose titration and applied it to two distinct clinical challenges: estimation of optimum warfarin dose and optimization of cardiac pacemaker timing intervals. In each domain, the patient-specific optimum dose (drug weight or pacing interval) was estimated from a mathematical function fit to the measured data. The precision of the estimated optimum was quantified using bootstrapping. The optimum warfarin dose was associated with an average precision of ±18%, suggesting that titration within this range may be of limited utility. The precision of the estimated optimum pacing interval was significantly smaller for impedance cardiography than either of the echocardiographic methods that were tested, suggesting that the former technique is superior. Dose titration is a common medical challenge that is amenable to statistical analysis. The method proposed here can be readily integrated into the electronic medical record, and can provide formal analyses to guide clinical decision making.

top


Analysis of Transposons Using Transposition Assays and Sequencing Technologies

Authors:
Kenny Daily, Kim Nguyen, Paul Rigor, Pierre Baldi, Suzanne Sandmeyer, University of California, Irvine

Abstract:
The mobile genetic element Ty3 transposon in Saccharomyces cerevisiae represents a class of retrotransposable elements which are predecessors to retroviruses such as HIV and MLV. Unlike the promiscuous genomic integration which characterizes retroviruses, the Ty3 transposable element integrates at genomic loci upstream of genes targeted by the Polymerase-III transcription machinery. We develop a computational analysis pipeline similar to ChIP methods on data obtained from transposition assays using several Ty3 mutants along with high-throughput sequencing technologies which generate millions of sequence reads. The computational and modeling techniques, while comparable to those for ChIP-Seq, pose different constraints for noise removal, determination of a significant signal (insertion frequency) threshold, and comparison of signals between strains. The combination of experimental and computational techniques allows for the characterization of the Ty3 transposition profile as well as the elucidation of its transcriptome targeting specificity. We are able to computationally verify hundreds of Ty3 targets from previous experimental data and discover de novo sites of integration, as well as quantify how targets can facilitate Ty3 integration. Given mutant strains of Ty3, we can help uncover the mechanisms of its targeting affinity, possibly leading to gene therapy applications.

top


Prediction of Transcription Factor Binding Sites Using Multiple Linear, Multivariate Regression Techniques

Authors:
Elizabeth A Siewert, Katerina J Kechris, University of Colorado-Denver

Abstract:
Identification of transcription factor binding sites (TFBS) is an important, but difficult area of study. TFBSs, short sequences in the promoter region of genes, are recognized by regulatory proteins that are involved in the proper regulation of gene expression, which is critical for the viability of an organism. Detecting TFBSs is difficult because they are very short (5-20 bases long), contain degeneracies at some of the positions in the site, and are buried in unknown locations in the long promoter region of a gene. Earlier attempts incorporated both expression data and promoter sequences into a linear-model framework, regressing expression on counts of putative TFBS in promoters for a single species. Since then it has been shown that looking at sequence data across multiple species improves the prediction of TFBSs. In this work, we describe an extension of the single-species, linear-model framework for the analysis of paired cross-species sequence and expression data. A repeated-measures model (a special case of the multivariate model) for gene expression measurements across species is used, accounting for phylogenetic relationships among species through the covariance of the error structure. This multiple linear, multivariate algorithm is applied to a four-species, yeast data set under heat-shock conditions and comparisons are made to the single-species algorithm using both independent transcription factor binding and expression data sets.

top


Developing Tools for Semantic Composability in Biosimulation

Authors:
Maxwell Lewis Neal, Daniel L Cook, Michal Galdzicki, John H Gennari; University of Washington

Abstract:
For decades researchers in the biomedical sciences have used computational simulations to understand the dynamics of biological processes. These simulations continue to increase in complexity as computational power becomes more affordable and biological knowledge accumulates. Biosimulation models today may contain thousands of variables and equations, and since most models are still hand-coded, this complexity poses information management challenges. Biological modelers therefore have a need for tools that minimize hand-coding and support model reuse. To meet this need we are developing a component-based biological modeling system called SemGen. As with existing tools used in component-based software engineering, SemGen will allow users to build and manage complex biosimulation models from reusable code modules. SemGen leverages our SemSim (Semantic Simulation) model description format to represent legacy models as lightweight, OWL-based ontologies. We use URI pointers to link SemSim model codewords to classes in reference ontologies like the Foundational Model of Anatomy and the Ontology of Physics for Biology, thereby making these models semantically-composable. That is, SemSim model compositions result in biologically-valid systems wherein the components share the meaning of the data exchanged through their interfaces. This form of composability has great potential for automating model composition, thereby advancing the entire biosimulation field

top


Does the HITREF Evaluation Framework Discern A Difference Between Systems?

Authors:
Paulina S Sockolow, Harold P Lehmann, Johns Hopkins University

Abstract:
Objective: To assess whether the complete, evidence-based Health Information Technology Resource-based Evaluation Framework (HITREF) could discern a difference between two very different programs for All-inclusive Care for Elders (PACE) sites: one with an electronic health record (EHR) and one with paper patient records. Nationally, PACEs provide services to over 17,000 nursing-home eligible, frail elderly who choose to remain in their homes. HITREF was developed for this study and derives in part from the previously published Ammenwerth-de Keizer model.

Design: Non-equivalent comparison study with two groups, the intervention and purposefully selected non-equivalent comparison sites, to account for temporal trends.

Data Collection: HITREF was operationalized as a survey to learn about clinician satisfaction with their patient record system. Surveys were administered twice, simultaneously, at the EHR PACE and the second PACE site.

Analysis: HITREF’s ability to discern a difference between the 2 sites was assessed by comparing changes in clinician satisfaction at the EHR site to clinician satisfaction at the paper-based site. The unit of analysis is the clinician.

top


Clinical Recommendation Algorithms for Corollary Ordering

Authors:
Jeffrey Klann, Gunther Schadow, JM McCoy, Regenstrief Institute and Indiana University

Abstract:
Corollary orders are decision support rules designed to reduce physician errors of omission, and they have been shown to more than double physician compliance. Like other decision support content, they are time-consuming and expensive to manually maintain. Corollary orders uniquely have an Order A-> Order B structure similar to that of e-commerce recommendation algorithms, suggesting such algorithms could automate corollary order development. This project describes a recommender based on association rule mining and uses it to generate corollary ordering rules from 866,445 orders made in the Wishard Memorial Hospital inpatient setting in 2007. The resulting rules are then analyzed through direct examination, by measuring sensitivity and specificity of correctness, and through a subjective relevance measurement. Subsequently, we will use the results as a benchmark to improve the algorithm, moving toward a fully automated algorithm to be run in real-time on existing decision support systems, with additional evaluation. This project confirms prior indications that association rules are useful in building corollary orders, extends previous work by developing an automated recommender algorithm, and is suggestive that algorithms from the e-commerce domain are applicable to the more complex world of medical care.

top


A Network Model for Predicting Residue Contacts in the Human Mediator Protein Complex

Authors:
Elizabeth Eskow, Greg Caporaso, Rob Knight, Dylan Taatjes, Debra Goldberg, University of Colorado

Abstract:
Mediator, a multi-subunit protein complex conserved throughout eukaryotes, is required for expression of all protein-coding genes. A four subunit CDK8 subcomplex can associate with and modulate the function of Mediator and has been purified from recombinant expression and also directly from human cells. We describe a network model for evaluating the consistency of residue contact predictions between pairs of subunits in CDK8. We also describe the computational methods such as sequence covariation analysis and structural comparative modeling used to make the predictions, along with results from predictions between 2 of the CDK8 subunits. The predictions provide guidance for biochemistry experiments. The experimentally validated residue contacts are then included in the network model to enhance its ability to discriminate the accuracy of the predictions. The nodes of the network are the residues, that are predicted (or known) to interact with residues in another subunit, and the edges are the predicted or known contacts, colored by the method of discovery. Multiple edges between nodes exist if the contact has been predicted (or validated) in multiple ways, and are weighted by their comparative reliability. The network is a dynamic picture of our combined predicted and experimental knowledge about residue contacts in the subcomplex.

top


Integrative Protein Fold Recognition by Alignments and Machine Learning

Authors:
Allison N Tegge1, Zheng Wang2, Jianlin Cheng3
1National Library of Medicine Predoctoral Fellow 2MU Biomedical Informatics Research Training Program 3Computer Science Department and MU Informatics Institute University of Missouri

Abstract:
Protein fold recognition is an essential task in understanding protein tertiary structure and protein functions. The fold of a protein can be recognized through alignment and machine learning methods. In the past, alignment methods such as sequence alignment, sequence-structure alignment, and profile alignment have been used to identify similarly folded structures. Additional approaches utilize machine learning methods, such as support vector machines and neural networks. The machine learning methods use the sequence comparison information generated by various alignment methods, in addition to other predicted structure features of the proteins, to identify the structural folds for the protein sequence. These machine learning predictions, with the inclusion of additional features, outperform the previous methods that use solely alignment methods for fold recognition.

top


Supermarket Sales Data as a Public Health Surveillance Tool

Authors:
Kristina M Brinkerhoff, Kristine C Jordan, John F Hurdle, University of Utah

Abstract:
Researchers develop nutritional assessment methods to better understand the interplay between diet and health. Traditional methods rely on self-reported behavior and are extremely resource intensive to employ. To overcome these limitations, we examined supermarket sales data as an inexpensive, indirect nutritional assessment method. Sales data show great potential as a nutritional assessment surrogate at the household level and as a nutritional surveillance tool in large studies at the public-health level. For a preliminary study, we assessed the utility of mapping >2.0 million food items (representing 36,000 discrete customers) to a publicly available nutrient database. Subsequently, we collected one year of retrospective sales data from a supermarket chain for 50 consenting households in Utah. We compared sales data against a standard nutritional assessment method, the Household Food Inventory, using the USDA food categories: dairy, vegetables, fruits, meats, baked goods, grains, legumes, sweets, and fats/oils. Pearson’s correlation coefficient was used to identify significant correlations between the two data sources. Preliminary analysis using macronutrients (carbohydrate, protein, and fat) is also presented. Future work includes comparing sales data to self-reported dietary intake.

top


Perplexity Analysis of Obesity News Coverage

Authors:
Delano J McFarlane, Rita Kukafka, Columbia University

Abstract:
News coverage helps define what the public thinks is salient, and is often the public’s initial and primary source of information. Unfortunately research shows that health news is often biased, inaccurate or incomplete. Therefore it is important that health news evaluations be accurate, thorough and timely. Informatics can help, but an understanding of how health news coverage differs from more general content is essential to appropriate method selection. Language model (LM) perplexity is often used to evaluate and compare corpora content. Perplexity measures a probability distribution’s ability to predict events in another distribution. High LM perplexity signifies that a corpus’ content is more restrictive in vocabulary and syntax than another. This may mean that one corpus’ content is a Sublanguage of another and methods capable of exploiting Sublanguage properties would be appropriate. A perplexity analysis of obesity news was performed to test if news content from a specific health topic is a Sublanguage. Perplexity increased as news coverage became more general relative to obesity news (obesity news control corpus = 169, general health news = 228, general news = 273). These results suggest methods that exploit Sublanguage properties may be appropriate for evaluating health news coverage of specific health topics.

top


Characterization of Cardiac Tamponade and Pulsus Paradoxus Using a Human Cardiorespiratory Model

Authors:
Deepa Ramachandran, Rice University, Chuan Luo, Rice University, Tony S Ma, VA Medical Center, John W Clark, Jr., Rice University

Abstract:
Our large-scale model of the human cardiorespiratory system is employed to study mechanisms underlying chronic cardiac tamponade and pulsus paradoxus, resulting from fluid accumulation in the pericardial sac surrounding the heart. The model integrates hemodynamics, whole-body gas exchange, and autonomic nervous system control to provide simulations of pressure, volume, and blood flow waveforms, and can mimic clinical phenomena associated with tamponade including elevation and equilibration of pericardial and chamber pressures, cardiac output and ejection time reduction, changes in right heart hemodynamics, abnormal transvalvular flow, partial chamber collapse, and pulsus paradoxus. We present detailed analysis of the diastolic and systolic effects of pericardial constraint, including the appearance of atrioventricular interaction which alters blood flow phasing, chamber mechanics, and septal motion. Respiratory effect on cardiopulmonary pressures and flows and pulsus paradoxus is also analyzed. By employing a contractile pump septal model, the prominent role of the septum in tamponade and pulsus paradoxus is demonstrated. Simulation results suggest two distinct mechanisms underlying effusion-generated pulsus paradoxus, namely series and parallel ventricular interaction. Our study provides biophysically based insights into cardiac tamponade with pulsus paradoxus, including the roles played by septal motion, atrioventricular interaction, pulmonary blood pooling, and depth of respiration.

top


Machine Learning for Personalized Medicine

Authors:
Eric B Lantz1, International Warfarin Pharmacogenetics Consortium2, Jesse Davis1, David Page1, Michael D Caldwell3
1University of Wisconsin-Madison, Madison, WI 2Pharmacogenomics Knowledge Base, www.pharmgkb.org 3Marshfield Clinic, Marshfield, WI

Abstract:
With advances in genotyping technology and the rapidly-spreading use of electronic medical records, it may soon be possible for clinics to have extensive genetic data and clinical histories on large numbers of patients. Personalized medicine seeks to use this data to help inform treatment decisions by predicting patient response to treatment. This talk will begin by presenting a recent success in this direction by the International Warfarin Pharmacogenetics Consortion (IWPC) in predicting stable dose of the anticoagulant warfarin. (NLM sites UW-Madison/Marshfield and Stanford were both involved in this work). It then will discuss joint work with Marshfield Clinic in predicting which patients on Cox2 inhibitors (a class of pain relievers) are at increased risk for myocardial infarction. These efforts provide examples of how machine learning and statistical algorithms can be used to provide predictive models for personalized medicine.

top


A Cognitive Model of Medical Record Coding: Implications for Understanding Inter-rater Agreement

Authors:
Emily M Campbell, Oregon Health & Science University, Dean F Sittig, University of Texas School of Health Information Sciences at Houston, Brian Hazlehurst, Kaiser Permanente, Portland, OR, Wendy Chapman, University of Pittsburgh, Aaron M Cohen, Oregon Health & Science University

Abstract:
Expert human performance identifying relevant concepts in free text remains the benchmark for evaluating coding tasks. Results from any automated method should therefore approach or equal human performance to be considered successful. By understanding what causes human raters to disagree when coding clinical information, we should be able to reduce variability among coders and improve benchmarks for evaluating coding performance. This study explored the cognitive differences between lay persons and clinical experts when coding ambulatory care documents for the presence or absence of information related to smoking and asthma. Early results indicate that experts appear to use far less textual data than lay persons to form clinical opinions, but that answering tenary-style questions (e.g., where the answer is one of “Yes”, “No”, or “I’m not sure”) is very difficult for both groups. In addition, in the absence of specific, irrefutable textual statements (e.g., “this patient’s current asthma is well controlled” or “this person smokes 1 pack of cigarettes a day”) both groups have difficulty identifying “clearly stated” concepts in clinical documents. These results suggest that understanding how experts extract concepts from ambiguous language can help reduce the variability among human coders, and improve benchmarks for evaluating automated coding systems.

top


Electronic Health Record Disease Footprint: The Case of Influenza The Test-Performance Characteristics of Controlled-Vocabulary Entry in an EHR for Population Based Surveillance

Authors:
Jacob Aaronson, Harold Lehmann, Johns Hopkins University

Abstract:
Coded electronic health record (EHR) data has the potential to provide a more specific signal than text-based chief complaints in real-time surveillance. Our working hypothesis is that diseases leave a "footprint" of findings documented in an EHR. The goal of this study was to define and to assess the performance characteristics of such a footprint for culture-positive influenza. This study evaluated and compared the performance characteristics of EHR encounter text -based chief complaints, signs (temperature), MEDCIN®-tagged symptoms, and ICD-9 coded diagnoses. The Department of Defense (DoD) Clinical Data Mart was queried for all encounters documented in the DoD EHR AHLTA across a twelve state area for which there was a laboratory order for influenza during the period September 2007 through May 2008. A total of 5407 encounters met these criteria and were included in the analysis. Test performance characteristics of clinically- and syndromic-surveillance-relevant single findings and influenza case description groups were derived. Preliminary results suggest that test performance characteristics of coded symptoms are more predictive of influenza than text-based chief complaints.

top



DAY 1 POSTER ABSTRACTS

Building Language Support Tools into Information Retrieval Systems

Authors:
Steven D Bedrick, William R Hersh, Oregon Health and Science University

Abstract:
Today, clinicians and researchers who wish to publish their work for an international audience must publish in English. This linguistic monoculture can be a significant barrier to access for readers from non-Anglophone countries who wish to search, consume, and contribute to their fields' bodies of knowledge. There exists a great deal of research about the challenges facing non-native English speaking (NNES) clinicians and researchers attempting to publish in English; much less is known about the language-related difficulties facing NNES scholars who must search and retrieve English-language literature, and what may be done to help ameliorate these difficulties. This project consists of a series of experimental user interfaces (UI) for a biomedical literature database search engine, a novel approach to medical machine translation, and a set of methodologies that we are using to evaluate these systems' relative efficacy when used by NNES clinicians of varying levels of English proficiency. Our UIs implement a variety of language support features, including query-building tools, several forms of query translation, and a number of different result-presentation modes. Future work will investigate the relationship between English proficiency and language support feature preference.

top


A Methodology for the Formal Verification of Medical, Human-interactive Systems

Authors:
Matthew L Bolton and Ellen J Bass, University of Virginia

Abstract:
Human behavior has contributed to between 44,000 and 98,000 deaths nationwide every year in medical practice. Such failures are complex in that they may be influenced by many factors including the plan of care, medical personnel’s actions, device automation, human-device interfaces, and the operational environment. This work introduces a methodology that integrates task analytic models of human behavior with formal models and model checking in order to formally verify properties of human-interactive systems. In this methodology, models that fail verification produce a counterexample illustrating how the failure occurred. Thus, this system allows analysts to verify that modeled human behavior will never produce a failure, or understand what factors contributed to a discovered failure. This methodology is illustrated with a case study based on the programming of a patient controlled analgesia pump used in the University of Virginia Hospital. Two specifications, one which verifies to true, and another which produces a counterexample, are used to illustrate the different analysis and visualization capabilities of the methodology.

top


HIV/AIDS amongst Northern Thailand’s Ethnic Minorities: Mitigating Obstacles to Provide Health Education and Improve Health Seeking Behaviors for Marginalized Hill Tribes

Authors:
G Pammie R Crawford, Harold P Lehmann, Robert Lawrence, David Celantano, Johns Hopkins University

Abstract:
Asia’s HIV prevalence is second only to Sub-Saharan Africa. Thailand’s governmental efforts to reduce HIV were successful with many, but marginalized groups (i.e. MSM, IDU, ethnic minorities) still suffer from extremely high HIV prevalence. Northern Thailand is the country’s HIV epicenter where prevalence amongst some sub-populations can reach 50%. Ethnic minorities are further disadvantaged because they are unrecognized by Thailand’s government and cannot access health services at government hospitals/clinics. Most are illiterate and unable to speak Thai. This project will focus on eliciting health information needs of these tribes to better understand health seeking behaviors/understanding of HIV behavior risks. Fourteen villages in northern Thailand have been randomly selected and matched. Within these communities purposive sampling (snow-ball/chain) will be performed to create a non-probability sample of participants. Focus groups/in-depth interviews will be utilized to elicit data. Data will be transcribed, translated and coded. Integrative memos and conceptual frameworks will be developed to understand health seeking behaviors and information needs of these groups. Results will be used to develop culturally-appropriate HIV health education interventions. Messages will be provided in pictorial and audio format via computer-assisted delivery mechanisms to overcome language and literacy barriers. Efforts should improve health behaviors amongst these disadvantaged populations.

top


Arterial Tortuosity in High-Risk Intracranial Aneurysm Pedigrees

Authors:
Karl Diedrich, John Roberts, Dennis Parker, University of Utah

Abstract:
This study seeks to determine if mean tortuosity, twistedness, of intracranial arteries indicates intracranial aneurysm risk. We will quantitatively measure tortuosity of intracranial arteries of high-risk cases acquired from high-risk family pedigrees, and normal risk control subjects. Initial qualitative assessment indicated a correlation between tortuosity and aneurysm risk, prompting this quantitative study. Subjects were imaged with Time of Flight Magnetic Resonance Angiography to highlight flowing arterial blood. We apply image-processing tools developed with ImageJ to segment arteries, find centerlines and compute quantitative tortuosity measurements. For segmentation of arteries we apply an edge detection method to remove the scalp, and a Maximum Intensity Projection Z-buffer algorithm to identify artery seed voxels. We developed a new seed intensity histogram method to determine neighbor-growing thresholds and use this to generate 3-D segmented arteries. Centerlines are determined using distance from edge scoring and Dijkstra's shortest paths algorithm. Tortuosity is measured from centerlines using the distance factor metric, sum of angles and mean curvature methods. Qualitative assessment shows tortuosity increases with age so quantitative scores can be correlated with age testing validity. The end goal of this study is to develop quantitative tortuosity measures for use as a phenotype in epidemiological studies.

top


Bridging the Semantic Gap with Ranking SVM in Content-Based Medical Image Retrieval

Authors:
Haiying Guan, Sameer Antani, L Rodney Long, National Library of Medicine

Abstract:
With increasing use of images in clinical medicine and biomedical research, there is a compelling need for efficient image retrieval techniques to support medical informatics applications. Content-Based Image Retrieval (CBIR) has been proposed as a possible solution for this problem. Previous research has mainly focused on extracting low-level visual features (e.g., color, texture, shape, spatial layout) and using them directly to compute image similarity. Extensive experiments have shown that such visual features cannot always capture the desired semantic concepts in an image. This poses a serious shortcoming in developing search and retrieval techniques where use of standardized biomedical concepts is routine. We present an approach for bridging the "semantic gap" between high-level semantic concepts and low-level image features to improve retrieval quality using the Ranking Support Vector Machine (Ranking SVM) algorithm. Ranking SVM is a supervised learning algorithm that models the relationship between semantic concepts and image features and performs retrieval at the semantic level. We apply it to the problem of vertebra shape retrieval on a digitized spine x-ray image collection from the second National Health and Nutrition Examination Survey (NHANES II). Initial results using the proposed method report retrieval precision of 45%, an improvement of 15% over using image features alone.

top


Interface Design for a Web-Based Semi-Automated Early Systemic Inflammatory Response Syndrome (SIRS) Screening Tool

Authors:
Stephen L Jones, Laura J Moore, Frederick A Moore, The Methodist Hospital – Houston TX, Jiajie Zhang, Todd R Johnson, The University of Texas School of Health Information Sciences at Houston, Houston, TX

Abstract:
Sepsis is an enormous healthcare problem in the United States, with an incidence of over 750,000 cases per year and a mortality rate approaching 60% if the patient progresses to septic shock. It is a range of clinical conditions caused by the body's systemic response to an infection, which if it develops into septic shock, is accompanied by single or multiple organ dysfunction or failure, leading to death. Early signs of sepsis are often missed and time critical interventions are delayed. Part of the problem may be attributable to a lack of situation awareness (SA) by members of the healthcare team of the patient’s risk for developing sepsis in the next 12-24 hours. The purpose of this project is to examine a process and method for designing a web-based tool to facilitate the early identification of patients at risk for developing sepsis in the next 12 to 24 hours. Using human computer interaction (HCI) methodologies we developed a functioning prototype of a web-based systemic inflammatory response syndrome screening tool. Further study of the usability of the prototype and of the impact upon clinical outcomes remains to be done to fully validate the tool produced from this project.

top


The Topology of the Bacterial Co-Conserved Protein Network and its Implications for Predicting Protein Function

Authors:
Anis Karimpour-Fard, Sonia M Leach, Ryan T Gill, and Lawrence E Hunter, University of Colorado

Abstract:
The number of published sequenced genomes has been growing in recent years, and at the present time, about 800 microbial genomes are fully sequenced. The next step after sequencing is to predict genes and their functions from the sequence. The explosion of sequence information has widened the gap between the number of predicted proteins and the number of experimentally characterized ones. Escherichia coli K12 is best characterized, but still has 15% of genes with unknown function. Other genomes have between 15% and 70% uncharacterized genes. The best established method for function prediction is based on sequence homology to proteins of known function. Unfortunately, strictly homology-based predictions are of limited use due to the large number of homologous protein families with no known function for any member. Another way to assess the function of a sequence is through identification of its interactions with other proteins. Co-conservation (phylogenetic profiles) is an alternative source of information for generating protein interaction networks. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. We showed that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins. Lastly, by integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins.

top


Using the Guideline Elements Model to Determine Completeness and Accuracy of Guideline for DSS

Authors:
Octavis D Lampkin, Harold Lehmann, Karen Robinson, Johns Hopkins University

Abstract:
Clinical guidelines are a major tool in improving the quality of medical care. However, most guidelines are in free text, not in a formal, executable format, and are not easily accessible to clinicians at the point of care. Implementing guidelines in computer-based decision support systems promises to improve the acceptance and application of guidelines in daily practice because the actions and observations of health care workers are monitored and advice is generated whenever a guideline is not followed. This poster will use Guideline Elements Model (GEM) developed by Rick Shiffman to demonstrate whether Completeness and Accuracy of a guideline impacts whether a guideline can be computerized for a computer-based decision support systems.

top


Task Performance Efficiency in a Touchscreen EMR for Low-Resource Settings

Authors:
Zach Landis Lewis, Gerald P Douglas, Valerie Monaco, Rebecca S Crowley, University of Pittsburgh

Abstract:
Objective: To determine the relative efficiency of novices compared to a prediction of skilled use when performing touchscreen EMR tasks.

Measurements: Thirty-one common EMR tasks were selected. The authors observed novice users performing the tasks and recorded timestamped performance data. The skilled user performance time for each task was predicted using human performance modeling software. Differences between novice and skilled task performance times were measured. Novice efficiency, errors, and task completion rates were evaluated with respect to user interface design.

Results: Nineteen participants performed 31 EMR tasks seven times for a total of 4,123 observed performances. A representative analysis of 12 tasks was conducted, leaving 1,596 performances. Mean novice performance time was significantly slower than mean predicted skilled performance time (p < 0.001). Novices performed faster than the predicted skilled time in 65 (9%) completed tasks. Novices failed to complete 42 (3%) tasks.

Conclusion: Within the first hour of system use, novices performed touchscreen tasks more slowly than predictions of skilled use. Novices performed above the skilled level some of the time with low rates of task failure. These findings suggest the system supports a primary design goal – to allow novice users to perform tasks efficiently and effectively.

top


How “Should” We Write Guideline Recommendations?

Authors:
Edwin A Lomotan, George Michel, Zhenqiu Lin, Richard N Shiffman, Yale Center for Medical Informatics, Yale University

Abstract:
Increasing attention has focused on transforming the knowledge contained in clinical practice guidelines into computable formats. A major challenge is how to translate commonly found deontic terminology (words such as “should,” “may,” “must,” and “is indicated,”) into decision support tools. Using an electronic survey, we investigated the understanding of deontic expressions by members of the health services community. Researchers developed a clinical scenario and presented participants with recommendations containing 12 deontic terms and phrases. Participants indicated the level of obligation they believed guideline authors intended by using a slider mechanism ranging from “No obligation” to “Full obligation.” 445/1332 registrants (36%) of the 2008 annual conference of the Agency for Healthcare Research and Quality submitted the on-line survey. “Must” conveyed the highest level of obligation and least amount of variability. “May” and “may consider” conveyed the lowest levels of obligation. All other terms conveyed intermediate levels. Members of the health services community believe guideline authors intend variable levels of obligation when using different deontic terms within practice recommendations. Ranking of a subset of terms by intended level of obligation is possible. “Must,” “should,” and “may” are ideally suited to represent a standard set of deontic expressions for use by guideline developers.

top


Creating a Clinical Collaborative Community Using Information Technology

Authors:
Johnny Y Mei, Patty Hoey, Paul Nichol, Veterans Health Administration, Seattle, Washington

Abstract:
Veterans Health Administration (VHA) currently does not have a single location where individuals and communities can contribute information, share knowledge, and store it for reference and reuse. Like any large, successful enterprise, VHA has tacit knowledge that must be transformed to organizational value using a knowledge-transfer platform linked to an already existing process: the creation of computerized patient record system (CPRS - a VHA medical record) order dialogs, templates, clinical reminders, and other tools. People speak more than they can write, and know more than they can speak. This working in progress project is more than merely updating the current request procedure with more sophisticated forms that populate a repository. It is about providing and facilitating a virtual place for a “meeting of minds” where the content for a CPRS tool can be collaboratively developed, making use of the best minds, experience, and research in and outside of the agency, before anyone opens a blank request form. In doing so, stakeholders, subject matter experts, and interested parties contribute to new knowledge creation and promote its reuse throughout the organization. Some categorical measures will include usability, acceptability, productivity, cost and time savings, and improve workflow processes using the virtual SharePoint

top


Accelerating Total Variation Regularization for Matrix-Valued Images on GPUs

Authors:
Maryam Moazeni, Alex Bui, Majid Sarrafzadeh, University of California, Los Angeles

Abstract:
The advent of matrix-valued magnetic resonance imaging modalities such as diffusion tensor imaging (DTI) requires extensive computational acceleration. Computational acceleration using graphics processing units (GPUs) can make the regularization (denoising) of DTI images viable in clinical settings, improving the quality of DTI images in a broad range of applications. Moreover, such acceleration will provide a means of moving advanced image processing to the point of care. Construction of DTI images consists of direction-specific MR measurements. Compared with conventional MR, direction-sensitive acquisition has a lower signal-to-noise ratio. Therefore, high noise levels often limit DTI imaging. Advanced post-processing of imaging data can improve the quality of estimated tensors. However, the post-processing problem is only made more computationally difficult when considering matrix-valued images. This poster describes the acceleration of a total variation (TV) regularization method for matrix-valued images, in particular, for DTI images using an NVIDIA Quadro FX 5600. The TV regularization of a 3D image with 1283 voxels ultimately achieves 128x speedup and requires ~1.5 minutes on the Quadro; while this algorithm on a dual-core CPU takes more than 3 hours. This study provides insight into adapting methods to the GPU architecture for other image processing algorithms designed for matrix-valued images.

top


Development of a Learning Health System for Inflammatory Bowel Disease

Authors:
Marc D Natter1,4,5, Athos Bousvaros2,4, Joshua Korzenik3,4, Benjamin M Adida1,2,4, Kenneth D Mandl1,4,5
1Children’s Hospital Informatics Program, 2Center for Inflammatory Bowel Disease, Children’s Hospital Boston, 3Crohn’s and Colitis Center, Massachusetts General Hospital, 4Harvard Medical School, Boston, MA; 5Harvard-MIT Division of Health Sciences & Technology, Cambridge, MA

Abstract:
The inflammatory bowel diseases (IBD), Crohn’s disease (CD) and ulcerative colitis (UC), affect over 1 million children and adults. Despite markedly increasing incidence of IBD in the modern era, an exact cause remains unknown. New knowledge relating epidemiology, treatment, and phenotype to genotype is clearly needed. By ensuring that data captured during clinical encounters drives research for all patients from participating IBD centers, we enable creation of a “learning health system” in which outcomes are continuously monitored and hypotheses rapidly tested. We hypothesize that (a) there is temporal-spatial clustering of new onset cases of IBD and disease exacerbations and (b) nutritional factors and intestinal microbiota modulate IBD disease course. In order to obtain the requisite phenotypic data, we have designed a flexible registry infrastructure offering semantic interoperability between (a) a CDISC-compliant Electronic Data Capture tool, (b) a Patient Survey Tool embedded in a Personally-Controlled, Health Care Record, and (c) a registry-specific, cross-institutionally queriable data-mart. Pilot implementation is proceeding at the pediatric IBD Center, with subsequent incorporation at the adult IBD Center, and an expected enrollment of 300 pediatric and 3000 adult subjects. This registry will also enable long-term, longitudinal studies following pediatric subjects with IBD across sites and into adulthood.

top


Genome-wide Association Study in the Alzheimer’s Disease Neuroimaging Initiative Cohort

Authors:
Kwangsik Nho1, Andrew J. Saykin2, Li Shen2, Sungeun Kim2, John D. West2, and the Alzheimer’s Disease Neuroimaging Initiative
1Regenstrief Institute, Inc., 2Indiana University School of Medicine

Abstract:
Genome-wide association analysis has become an important topic in genetics studies of complex diseases owing to the recent advances in high-throughput genotyping techniques. Genome-wide association studies have successfully identified numerous loci which strongly affect susceptibility to common diseases and also influence disease-related traits. Alzheimer’s disease (AD) is the most common neurodegenerative disease. In this project we analyze genome-wide single nucleotide polymorphism (SNP) data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to evaluate genetic effect on brain for AD and amnestic mild cognitive impairment (MCI). In particular, candidate neuroimaging measures such as hippocampal volume are used as phenotypes in addition to the clinical data. Before the actual analysis, quality control is performed to remove questionable individuals and SNPs: for individuals, we detect and remove outliers based on heterozygosity and potential population stratification, and for SNPs, we exclude those using the marker exclusion criteria such as minor allele frequency and Hardy-Weinberg equilibrium test. The resulting SNP data are tested for standard, allelic association using the ?2 test in relation to diagnosis or imaging phenotypes. In addition, we investigate possible effects of population structure that could affect association analysis using a cluster-based approach.

top


Approaches to Tagging by Physicians: A Design Exploration

Authors:
Rupa Patel, Walter Curioso, Kristen Shinohara, Laura Marshall, University of Washington

Abstract:
Tagging provides a way to organize information that can facilitate personal information retrieval and discovery of personal and community resources. However, tagging in the medical community is not well understood. A repository of clinical content tagged within an online community can potentially enhance the information-seeking behavior and collaboration of healthcare professionals. Initial semi-structured interviews with internal medicine physicians at University of Michigan have revealed that tagged case reports would be desirable to aggregate online across institutions. Further semi-structured interviews conducted in February to March 2009 focus on a group of 10-12 physicians at three Seattle-based organizations (Seattle Children’s, Poly Clinic, and University of Washington Medical Center). Early results suggest that physicians do not anticipate that they would use free-entry tagging, but would accept a standard set of tags that are suggested. We plan to iteratively propose designs to promote individuals’ tagging and facilitate online community relevance.

top


E-Health Solutions for Cancer Disparities in Rural Missouri: Consumer Health Informatics Approach

Authors:
Keila E Pena-Hernandez, Suzanne A Boren, Jeannette Jackson-Thompson, Charles W Caldwell, University of Missouri

Abstract:
Significant health problems are encountered by rural populations in the United States. The causes of disparities in cancer care are multi-faceted, requiring both quantitative and qualitative research approaches. The purpose of this study is to explore current knowledge and information-seeking methods of rural Missouri undeserved cancer patients. A qualitative approach will be used to obtain and organize themes and perceptions on factors influencing treatment choices among adult residents of rural counties in Missouri regarding cancer. Individual interviews will be conducted to assess level of health literacy (i.e., the degree to which people have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions) and explore the perception of cancer risk, community knowledge about cancer, fatalism, mistrust of the medical system, and perception of patients’ cancer treatment experiences. Quantitative data from the Missouri Cancer Registry will be used to describe the gender and racial composition of individuals with cancer living in rural counties and to assess stage at diagnosis. Findings from this study will allow for a better understanding in the field of cancer disparities and provide new insights on possible solutions to cancer disparities in rural populations of Missouri.

top


Estimating Genome-wide Mutation Rates in the Social Amoeba Dictyostelium discoideum

Authors:
Gerda Saxer1, Mark Rojas2, Sara Middlemist1, David C Queller1, Yuriy Fofanov2, Joan E Strassmann1
1Department of Ecology and Evolutionary Biology, Rice University
2Bioinformatics Laboratory, Department of Computer Science, University of Houston

Abstract:
Mutations create the genetic variation that enables organisms to adapt to changing environments. However, most mutations are detrimental and have been linked to diseases such as diabetes and coronary heart disease. Understanding how often mutations occur, where in the genome they occur and what kind of mutations occur is essential for a better understanding of evolution in general and disease evolution in particular. Mutation rates have so far been difficult to assess and have been limited to estimates for single genes. In addition, selection often purges deleterious mutations, which can lead to an underestimation of genome wide mutation rates. To address these limitations, we used an experimental evolution approach and evolved multiples lines of the social amoeba Dictyostelium discoideum for 1000 generations under conditions that allowed the accumulation of mutations. We are currently in the process of sequencing the whole genomes of these lines using high-throughput sequencing technology to identify the mutations. Since all these lines share a common ancestor and evolved under controlled conditions for the same number generations, we will be able to estimate the genome wide mutation rate for a eukaryote.

top


Leveraging Existing Biological Knowledge in Genome Wide Association Studies

Authors:
Ronald P Schuyler, Lawrence E Hunter, University of Colorado

Abstract:
Many areas of high-throughput data analysis have benefited from a systems level view by taking into consideration known interactions between the many parts. Current methods of incorporating existing knowledge into the analysis of genome-wide association studies use known gene-gene associations such as shared annotation terms or biological pathways to build explicit models of epigenetic interaction between sets of loci. These models assume that specific combinations of variants at two or more loci are required to see an effect. However, there may be cases in which any one of a set of related variants may contribute to the phenotype at a level undetectable in a single variant analysis due to low penetrance or small sample size, which would be missed by epigenetic models. We are developing a method to combine association results from multiple loci that do not reach significance individually into one observation. By combining results from multiple loci that are unlinked genetically, but associated in the existing background knowledge we may reveal sets that collectively exceed a significance threshold even after adjusting for the additional tests. This approach may be viewed as a complement to analyses of single variants and epistatic models.

top


Voted Best Poster, Day 1

Development and Evaluation of a Widget-based ‘Web 2.0’ Electronic Health Record

Authors:
Yalini Senathirajah, Suzanne Bakken, Columbia University

Abstract:
The complex, variable nature of health information system requirements and vendor-controlled development has resulted in systems that frequently neither reflect users’ domain knowledge nor meet their needs. ‘Web 2.0’ approaches have transformed the commercial/public internet world. Our research explores the technical feasibility, usefulness, cognitive effects, efficiency, and task-technology fit of a ‘web 2.0’ EHR interface. We have created a widget-based ‘web 2.0’ EHR interface which allows users to select, configure, and share information, displays and tools, via simple interfaces, without programmers. Logfile analyses of current patterns of EHR use and expert consultation inform the creation of initial sets of widgets/views. Evaluation includes expert heuristic evaluation, structured interviews, analysis of user-created materials, and usability testing using clinical scenarios to assess cognitive effects, efficiency, and task-technology fit. Preliminary testing revealed the system’s technical feasibility and usefulness. Logfile studies showed diverse and stereotypical patterns of information access. Implementation and data analysis are in progress. Possible advantages include greater suitability to user needs, incorporation of multiple information sources, interoperability, agile reconfiguration, capture of user tacit knowledge, efficiencies due to workflow and HCI improvements, and greater acceptance. This could lead to new research directions in clinical informatics and interaction design.

top


Combinatorial Control of Exon Definition

Authors:
Peter J Shepard, Ester Choi, Klemens Hertel, University of California, Irvine

Abstract:
Pre-mRNA splicing is carried out by the spliceosome, which identifies exons and removes intervening introns. Alternative splicing results in the generation of multiple isoforms from gene transcripts. Splice-site selection depends on multiple parameters and the relative contributions of these parameters control how efficiently the splice sites of an exon are recognized. Here we examine how the strength of splice sites affect exon recognition by constructing a large set of 3-exon minigenes that have variable 5' and 3' splice site strengths and assay for inclusion levels of this exon. Additionally, we have constructed a database of cassette exons and constitutively spliced exons. We analyze in-silico the contribution that the 3' and 5' splice site strength makes toward efficient exon recognition by using a Linear Discriminant Analysis model. Using both approaches we find that the combined effect of 3’ and 5’ splice site strength influences the efficiency of exon recognition more than the individual contribution of either splice site. We conclude that the cellular decision to constitutively include an exon into a transcript is largely dependent on the combined strength of the 3' and 5' splice sites. Based on these results we provide a model predicting the inclusion level of an exon

top


Evaluation of Genome-Wide Association Study Results Through Development of Ontology Fingerprint

Authors:
Lam C Tsoi1, Michael Boehnke2, Richard L Klein1, and W Jim Zheng1
1Medical University of South Carolina, 2University of Michigan

Abstract:
Genome-wide association (GWA) studies may identify multiple variants that are associated with a disease or trait. To narrow down candidates for further validation, quantitatively assessing how identified genes relate to a phenotype of interest is important. We describe an approach to characterize genes or biological concepts (phenotypes, pathways, diseases, etc) by ontology fingerprint—a list of Gene Ontology terms that are overrepresented among the PubMed abstracts discussing the gene or biological concept together with the enrichment p-value of these terms generated from a hypergeometric enrichment test. We then quantify the relevance between genes and the phenotype from a GWA study by calculating similarity scores between their ontology fingerprints using enrichment p-values. We validate this approach by correctly identifying corresponding genes for biological pathways with a ninety percent average area under the ROC curve (AUC). We applied this approach to rank genes identified through a GWA study that are associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, LPL, and CETP for HDL; LDLR, APOE and APOB for LDL; and LPL, APOA1 and APOB for triglyceride. In addition, we identified genes relevant to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation.

top


Comparative Genomics of the Environmental Stress Response in Ascomycete Fungi

Authors:
Dana J Wohlbach, Jessica Clarke, Audrey P Gasch, University of Wisconsin-Madison

Abstract:
In the Ascomycete fungi, the transcriptional response to diverse stresses involves the co-regulation of approximately 900 genes and is termed the Environmental Stress Response (ESR). This gene expression profile is conserved in both Saccharomyces cerevisiae (Sc_ESR) and Schizosaccharomyces pombe (Sp_ESR), species that diverged approximately 500 million years ago. Although the response is the same, the stress signals and the transcription factors that ultimately activate the ESR are very different in S. cerevisiae and Sz. pombe. Using a comparative genomics approach that takes advantage of the growing numbers of sequenced fungal genomes, we developed a novel method to assign orthologs and paralogs to the whole genomes of 42 Ascomycete fungi. From these lists of orthologous genes, we identified orthologs to both the Sc_ESR and the Sp_ESR and characterized the enrichment of known cis-regulatory sequences in the induced ESR (iESR) genes. Based on the distribution of these cis-regulatory sequences, we propose models for how the regulation of the ESR has evolved in the Ascomycete lineage.

top


DAY 2 POSTER ABSTRACTS

An EHR Dashboard to Improve Compliance with Inpatient Quality Measures

Authors:
Barry Aaronson, David Stone, Matthew Schaft, Derk Adams, Astrid Schreuder, Christine Cottingham, Margaret Neff, J Richard Goss, University of Washington

Abstract:
Hospitals are required to report performance on multiple key quality and safety measures to government entities. Performance data on many of these measures is publicly available at hospitalcompare.gov. Performance on most of these measures at most hospitals is suboptimal (not 100%). Efforts to improve performance on these measures typically involves analysis and presentation of retrospective reports of performance at hospital meetings and conferences. This system of delayed reporting of performance has limited success with improving performance.

In an effort to improve performance on key quality measures, we created within our Electronic Health Record (EHR) a real time dashboard for quality and safety measures. This dashboard displays in a grid patients as rows and key quality measures as columns on a large nursing station wall mounted monitor. Indicators in compliance display as green whereas indicators out of compliance display red. This system provides real time feedback of current patient status for multiple quality measures at a single glance. This dashboard that enhances situational awareness for the entire care team has the potential to significantly improve compliance with quality measures and therefore quality of care. Clinical studies are needed to determine if this system achieves its anticipated beneficial effects.

top


Sono-Seq: Characterization of a New Biological Method through Data Integration

Authors:
Raymond K Auerbach1, Ghia Euskirchen1, Joel Rozowsky1, Nathan Lamarre-Vincent2, Zarmik Moqtaderi2, Philippe Lefrançois1, Kevin Struhl2, Mark Gerstein1, Michael Snyder1
1Yale University, New Haven, CT, 2Harvard Medical School, Boston, MA

Abstract:
The identification of novel genomic features typically involves a large degree of data integration across protocols, platforms and datasets and remains a key challenge in bioinformatics. By integrating several publicly available datasets from GEO and other published sources and by using a combination of signal aggregation, binning, and intersections of chromosomal coordinates, we successfully characterize one such new protocol, “Sono-Seq.” Sono-Seq sites are located in regions of high chromatin accessibility and are co-associated with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation marks, and CpG islands. Additionally we show that Sono-Seq sites occur preferentially at promoters of actively transcribed genes relative to inactive genes. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, those observed for other open chromatin markers. Our results demonstrate that Sono-Seq is a useful and simple method for mapping many local alterations in chromatin structure and give insights into sample types such as input DNA, normal IgG, MNase-digested DNA, and naked DNA, that may be used as references when scoring ChIP-chip (microarray) and ChIP-Seq (next-generation DNA sequencing) experiments. Understanding the characteristics of these reference sample types also has a direct effect on the bioinformatics to analyze these experiments.

top


Characterizing the Range of Prostate Abnormalities Palpable by Digital Rectal Examination

Authors:
Leigh A Baumgart, Gregory J Gerling, Ellen J Bass, Reba Moyer Childress, Marcus L Martin, University of Virginia

Abstract:
Although the digital rectal exam (DRE) is a common method of screening for prostate cancer, the limits of ability to perform this hands-on exam are unknown. Perceptible limits are some unresolved function of the size, depth and hardness of abnormalities within a given prostate stiffness. To better understand the perceptible limits of the DRE, we conducted a psychophysical study with 18 participants using a custom-built apparatus to simulate prostate tissue and abnormalities in various configurations. Utilizing a modified version of the psychophysical method of constant stimuli, we uncovered thresholds of absolute detection and variance in ability between examiners. Within silicone-elastomers that mimic normal prostate, abnormalities at a depth of 5 mm need to be at least 20 mm3 in volume (4 mm dia) to be consistently detected. In contrast to this material which is 21 kPa, abnormalities located in simulated tissue of greater stiffness (82 kPa, akin to inflammation) must be twice that volume. In addition, the study indicates that size and depth of abnormalities most influence detectability, while the relative hardness between abnormalities and tissue affects detectability for some size-depth combinations. The work is useful for informing the development of training and allowing clinicians to set performance expectations.

top


Impact of a Change in Care Delivery Technology on Nurse Workflow

Authors:
Rhonda G Cady, Stanley M Finkelstein, University of Minnesota

Abstract:
Workflow analysis of the interactions, distributed across people, artifacts and time, which occur during the process of health care is a prerequisite to successful implementation of health information technology, particularly when the technology changes the delivery mechanism. The purpose of this sequential, mixed methods research is to understand the transformation of workflow and ensure quality of conformance when the delivery mode of triage for children with complex special health care needs changes from telephone to home-based, interactive video. A cognitive ethnography of the nurse’s workflow using telephone and video triage is used to identify the tasks, interaction, artifacts and modifications to workflow when the delivery mode changes. A time-motion study validates tasks extracted from the ethnographic data and quantifies the time needed to conduct the tasks of telephone and video triage. A process measure of time provides a direct calculation of the efficiency of tasks delivered by telephone and video triage, and allows comparison of the two forms of delivery. Analysis of the qualitative and quantitative data using the Interactive Sociotechnical Analysis Framework and statistical process control techniques will categorize any unintended workflow consequences and minimize design deficiencies such as reduced productivity, increased workload and most important, increased error.

top


Enhanced Laboratory Reports: Leveraging a Regional Health Information Exchange

Authors:
Kevin C Chang, Martin M Were, Siu Hui, J Marc Overhage, Indiana University School of Medicine and Regenstrief Institute, Inc.

Abstract:
Today’s primary care physicians face increasing demands of their time while the scope of services they provide continues to escalate. Decisions are made on-the-fly often without complete data of a patient’s record. In response to this the Regenstrief Institute has created a system which leverages the power of the Indiana Health Information Exchange to provide context-sensitive data to primary care providers on returned laboratory test results. Our system extracts information from data repositories in the Indiana Network for Patient Care, and adds the following to the traditional reports: historical test results, medication dispensing events, historical visit information, and clinical reminders. We call these compiled summaries “Enhanced Laboratory Reports.” The enhanced reports are seamlessly delivered to a large group of practices already connected through the Indiana Health Information Exchange by the DOCS4DOCS® messaging service. This paper will address design and implementation challenges of creating this system and provide a preliminary analysis of initial practitioner responses to the Enhanced Laboratory Reports.

top


An Emergency Department Discharge Framework for Preventing Adverse Events

Authors:
Kou-Wei Chiu, Michael Matheny, Ian Jones, Dominik Aronsky, Vanderbilt University

Abstract:
Limited research exist about creating and examining the impact of a framework that assists clinicians in identifying potential adverse events at the time of discharge from the Emergency Department. This study proposes one approach through an ED Discharge Writer that integrates patient information from disparate sources and supports clinicians in the discharge process and in identifying potential areas that require their attention. We report on the progress of an actual implementation of such an application and its use in identifying potentially inappropriate prescriptions in the elderly seen at Vanderbilt University Hospital Emergency Department.

top


Voted Best Poster, Day 2

Sigmoid: An Integrative System for Pathway Bioinformatics and Systems Biology

Authors:
Ben Compani, T Su, I Chang, P Baldi, E Mjolsness, University of California, Irvine

Abstract:
Motivation: Progress in systems biology critically depends on developing scalable informatics tools to model and visualize complex biological systems, and flexibly store information about these systems and their models. Here we describe Sigmoid, a generative, scalable software infrastructure for pathway bioinformatics and systems biology. Several features of the three-tier Sigmoid architecture, in aggregate, position it uniquely within realm of the currently available systems biology software systems. Sigmoid uses the web services framework to create a truly distributed system. This flexible framework offers powerful modularity that, in conjunction with the generative nature of the Sigmoid coding cycle, offers a significantly reduced development time for integration of new components. Also, the OJB object relational bridge, offers the advantages of object oriented programming in conjunction with relational databasing. Sigmoid capitalizes on the robust mathematical software tools and problem solving environment that Mathematica offers, along with the Xcellerator/kMech/Cellzilla packages designed to facilitate biological modeling via automated equation generation. The synthesis of these features yields a flexible scalable architecture that not only allows for manageable adoption of new system components, but may open the ability to play within yet larger bioinformatics frameworks.

top


Comparison of Selection Pressures on the Genetic-Basis of Seven WTCCC Diseases

Authors:
Erik Corona, Joel Dudley, Atul Butte, Stanford University

Abstract:
Genome-wide selection analysis of risky (increasing susceptibility) and protective (decreasing susceptibility) SNP alleles has never been attempted for any disease. Using data from the Wellcome Trust Control Case Consortium (WTCCC) and the HapMap project, selection analysis is conducted for 7 diseases. Type 1 Diabetes (T1D) and Crohn's Disease (CD) risky SNP alleles have recently undergone positive selection. There are 137 SNPs strongly associated to T1D (p-value cutoff 0.005) which also show strong signs of positive selection. 117 of them are selecting for the risky allele while only 20 select for the protective allele. The Human Leukocyte Antigen (HLA) region (strongly associated with T1D) contains among the strongest selection signals. Selection in the HLA selection heavily favors T1D risky alleles. The genetic-basis of T1D and CD have recently undergone positive selection. Coronary Artery Disease (CAD) SNPs show the least selection even when risky and protective selected alleles are exclusively checked for selection. CAD is unique in that overall selection, risky selection, and protective selection all fall below the expected random levels of selection among WTCCC SNPs. This may be due to people historically dying before the expected age of CAD onset.

top


Developing New Anti-Tumor Agents by Understanding the Action Mechanism of PRIMA-1

Authors:
Sam Z Grinter, Yayun Liang, Sheng-You Huang, Salman M Hyder and Xiaoqin Zou National Library of Medicine Predoctoral Fellow MU Biomedical Informatics Research Training Program and MU Informatics Institute University of Missouri

Abstract:
We use a bioinformatics approach to study the molecular basis of PRIMA-1’s potent anti-tumor effects. PRIMA-1 is an organic compound that activates mutant p53 protein, restoring the tumor-suppressing functionality present in wild-type p53. However, the mechanism of PRIMA-1 is unclear and the target(s) of this agent are unknown. In this study, we use our new protein-ligand docking tool to perform inverse docking to screen for molecular targets of PRIMA-1 from a large database of protein structures. Our preliminary study has identified a potential protein target involved in the cholesterol synthetic pathway. By using an inhibitor of this protein, we have found a novel potent agent in killing human breast cancer cells as determined by a sulforhodamine B cell culture assay. Our approach of combining in silico database screening and cell culture assays for human breast cancer cells may easily be generalized to other cancer-related studies.

top


The Impact of a Clinical Information System on Users in the Dental School: A Case Study

Authors:
Heather K Hill, Joan S Ash, Oregon Health & Science University

Abstract:
Objective: To understand the impact experienced by users in the dental school setting of integrating clinical information system (CIS) into patient care.

Design: We used qualitative research methods, including interviews, observations and focus groups, to capture the experiences of CIS users at a single institution. The data were analyzed using the grounded theory approach.

Results: Nine themes emerged from the data: 1) CIS benefits were disproportionate among users, 2) Communicating about the CIS was challenging, 3) Users experienced a range of strong emotions, 4) The instructor persona diminished, 5) There was variation in how users’ time was impacted, 6) The training and support needs of end-users were significant, 7) There were shifts in the school’s power structure, 8) Lack of CIS usability made documentation cumbersome, 9) Clinicians’ workflow was disrupted.

Conclusion: By identifying the issues that were experienced by users as the CIS was integrated into patient care, administrators and faculty will be better equipped to manage their impact on the success of CIS implementation.

top


Endemic Limitations of Target-Decoy Database Strategy for Peptide Identification

Authors:
Shane L Hubler, Graeme C McAlister, Joshua J Coon, Gheorghe Craciun, University of Wisconsin- Madison

Abstract:
Tandem mass spectrometry is often used to determine the protein composition of complex mixtures, due to its extremely high sensitivity. Current practice dictates that researchers use a target database, containing the proteome of interest, and a decoy database, typically of the same size as the original. Next, they apply a scoring algorithm to find the best answer. They set their False Discovery Rate (FDR) by choosing a score threshold above which the percentage of best scores from the decoy database is less than the FDR. We describe an alternative methodology which is provably under-estimating the FDR and compare this with the results from a target-decoy database. We find that the target-decoy technique under-estimated the true FDR for all scores and, for FDR < 65%, under-estimated FDR by a factor of 2-4 (p-values ranged from 0.01 to 10-59). We show that this problem arises from peptide sequence homologies and is endemic to the target-decoy database technique. Future work will verify this finding for a variety of assumptions, algorithms, databases, and experiments. In addition, we will derive a method that provides a confidence measure of the FDR, providing both an upper- and lower-limit on FDR.

top


Effect of Visual Feedback on Copy/Paste Behavior in Electronic Clinical Documentation

Authors:
Michael Jernigan, William Lester, Massachusetts General Hospital

Abstract:
Studies suggest that copy/paste events in electronic documentation results in a “high-risk” error in over 1/3 of patient charts. However, all studies to date are limited in that they are retrospective analyses of unstructured text. We hypothesize that real-time graphical user feedback during note writing will influence user copy/paste activity.

We designed and installed an inpatient, electronic documentation system at four community hospitals in the Portland, Oregon area. This system creates structured notes, facilitates capture of copy/paste activity on a sentence by sentence level, and has the option to display graphical user feedback of sentences that are pasted and not altered. We are currently studying this graphical interface intervention in a 9 month, off-on-off, prospective trial. Seventy clinicians are currently enrolled and have written 15,000 notes with an average of 20 pasted sentences per note at baseline. The study’s intervention phase is scheduled for completion at the end of March. I will present interim results comparing the baseline and intervention phases of the trial.

Our hypothesis is that this intervention, designed to increase user awareness of unaltered pasted text, will change copying and pasting behavior, and specifically that it will reduce unaltered pasting in narrative sections of the note.

top


Mortality Prediction in Patients with Septic Shock

Authors:
Richard Lu, Ronilda Lacson, Brigham and Women’s Hospital

Abstract:
Introduction: Several published ICU severity scoring systems exist that predict mortality from all causes, but far fewer exist that predict septic shock, which has a very high mortality rate. The objectives of this research include (1) To utilize machine learning algorithms in predicting mortality among septic shock patients, and (2) To identify variables that are most highly associated with mortality.

Methods: Data from the MIMIC II database on 1,372 patients with septic shock was obtained. After reserving 30% of the data for validation, Logistic Regression, Neural Network, Classification Tree and Bayesian Network models were developed using the remaining data. Calibration of all models was performed using Hosmer Lemeshow goodness of fit.

Results: Lactic acid level is the best single predictor of mortality in septic shock (OR=1.33, p=3.38 x 10-9) using multiple logistic regression (AUC=0.71). Models using Neural Network, ClassificationTree and Bayesian Network have AUCs of 0.69, 0.65 and 0.74, respectively. The models were well-calibrated (p>0.10).

Conclusion: This research provides empirical evidence that machine learning algorithms can predict mortality in patients who develop septic shock. More clinical studies are warranted to determine the impact of lactic acidosis in patient mortality as well as determining whether correction might lead to improved outcome.

top


Hyper- and Hypo-Dynamism in Gene Expression Across Multiple Studies

Authors:
Alexander A Morgan, Atul J Butte, Stanford University

Abstract:
Very high throughput measurements of mRNA expression such as DNA microarrays have given the biomedical research community an overview of processes at a molecular level. The subsequent collection of the results of many of these experiments in repositories such as the Gene Expression Omnibus (GEO) provides an even loftier view of the variations in mRNA expression across all genes and all measured conditions. We investigated the properties of this very high dimensional space by looking at those genes whose mRNA varies the most (hyper-variable genes) and least (hypo-variable genes) and examining the physiological and functional associations with high and low variation. In this study we develop methods of examining/summarizing overall gene expression variation and demonstrate their application in a meta-analysis involving 29,000 microarrays. We show conservation in overall patterns of gene expression levels of variation changes across three species and show associations with known disease related genes. We also show the physiological importance of these measures of expression variation in association with the functional properties of the genes. These results can influence thinking about the selection of genes for microarray design and how to analyze measurements of mRNA expression variation in a global context of expression variation across many conditions.

top


Can de facto Dosing Practices Bridge the Knowledge Gap in Pedi Drug Recommendations?

Authors:
Elisabeth L Scheufele, Anil Dubey, Greg Estey, Henry Chueh, Massachusetts General Hospital

Abstract:
A knowledge gap exists in pediatric medication dosing recommendations due in part to the complexity of performing studies of medication efficacy and safety in this population. One possible resource to close this gap resides in the electronic prescribing practices of pediatric clinicians. In this study, de facto pediatric weight-based levothyroxine dosing was studied as a potential source for pediatric clinical decision support. This was accomplished by extracting physical exam and prescription details from a well-used clinical data warehouse to calculate weight-based dosing practices, and comparing the results with established medication recommendations. Of the 854 instances of weight-based dosing, 728 (85.2%) prescriptions were under the recommended dosing range, 80 (9.37%) prescriptions were within the range and 46 (5.39%) prescriptions were over the range. These results indicate that real world practices for certain medications may differ from recommendations. In summary, this work demonstrates that it is possible to extract de facto weight-based medication dosing practice patterns from a clinical data warehouse. Such information may be a potentially valuable resource in pediatric clinical decision support, particularly in situations where practice diverges from recommendations, and can help close the knowledge gap where pediatric drug dosing information is sparse or unavailable.

top


Novel Tool for High-Throughput Glycerophospholipid Profiling and Difference Testing

Authors:
Peter S Straub, Eric L Purser, David L Tabb, Vanderbilt University

Abstract:
Glycerophospholipids (GPLs), the primary component of cell membranes, display a diversity of both structures and functions, including important roles in cell signaling pathways. As such, researchers require a means of identifying and quantifying GPLs. Direct infusion electrospray ionization mass spectrometry (DI-ESI-MS) constitutes a high-throughput, cost-effective approach to GPL identification and quantitation. Our software, LipiDiff, automates the processing of DI-ESI-MS data from experiments employing complex hierarchies of samples. Users define a lipid search space, and LipiDiff matches putative lipid species to peaks observed in DI-ESI-MS scans and detects statistically significant differences in GPL intensities between cohorts of spectra. After intensity normalization, an unpaired t-test using a multiple testing correction is conducted between cohorts of spectra for each lipid in the search space. Written in C#, LipiDiff features a step-by-step analysis wizard and reads files from multiple manufacturers via the ProteoWizard library. We will validate LipiDiff on multiple instruments with different mass accuracies using samples of known composition. In addition, we will use enzymes expected to alter the GPL distribution (such as phospholipase D) on cell extracts to verify the difference testing component.

top


Alterations in the Reaction Kinetics of the MAP Kinase Signaling Cascades Through the Introduction of Obstacles on a Lattice-Gas Automata Model

Authors:
Joshua E Swearingen and John H Schwacke, Medical University of South Carolina

Abstract:
The dynamics of molecular events occurring in the cell are fundamentally altered by changes in concentration. The concentration of these agents could be seen as local densities resulting from not only the concentrations of the molecules of interest, but of smaller localized volumes created by the relative high densities of non reactive species also extant within the cell. Simulations of these reactions in non-homogeneous media show a divergence of behavior from classical kinetics due to effects of this crowded environment, and the limitations of continuous systems when dealing with a small number of molecules often found in intracellular environments. Through simulations done at the level of individual molecules using lattice-gas automata models, we investigate the effect of crowding on components of the MAP Kinase system. This system has been shown to have qualities of emergent behavior such as bistability due to dual site phosphorylation. We find significant changes in the rates of phosphorylation through the introduction of obstacles. We consider the divergence of the observed kinetics, in a range of obstacle densities and levels of molecular mobility, and determine how large the deviation is from traditional mass action towards a more fractal-like kinetics.

top


Ordering the Patient Problem List

Authors:
Tielman T Van Vleck, Noémie Elhadad, Columbia University

Abstract:
While much research has studied automated generation of patient problem lists from clinical notes, little has been done to examine whether there are patterns to how physicians order information when authoring a problem list. We collected 7673 physician-authored Past Medical History (PMH) sections from initial visit notes as a proxy for problem lists. We cast ordering as a ranking task. We extracted features which could influence the order of concepts in the PMH: concept UMLS codes, UMLS and MedLEE semantic types, severity (both as concept frequency and presence of related symptoms and conditions in the chart), temporal information, and corresponding anatomical system. Models were built for various feature sets using an SVM-based ranker. Accuracy of each model-generated list was compared to a gold-standard list using Kendall’s tau. The best feature combination yielded rankings with an average tau of 0.8, showing significant agreement between the learned and actual orderings. Temporal-based features yielded the best ordering. Current analysis indicates some regularity in how physicians order information in problem lists. In an automatically generated problem list, a temporal ordering strategy will yield a list most appropriate given the reading physician’s expectations.

top


A Comparison of Two Approaches for Identifying Negations in Radiology Reports

Authors:
Julie A Womack1, Matthew Scotch2, Cynthia Gibert3, Wendy Chapman4, Michael Yin5, Amy Justice, 1,6, Cynthia Brandt1,2
1West Haven VA Healthcare System, Department of Veterans Affairs, West Haven, CT, USA, 2Yale Center for Medical Informatics, Yale University, 3Veterans Affairs Medical Center and Department of Medicine, George Washington University Medical Center, 4Department of Biomedical Informatics, University of Pittsburgh, 5PH8–876, Division of Infectious Diseases, Department of Medicine, Columbia College of Physicians and Surgeons, 6Section of General Internal Medicine, Department of Internal Medicine, Yale University

Abstract:
We compared a text mining approach that required extensive knowledge of a programming language against one with minimal requirements. Results will include an assessment of accuracy and precision of these two approaches for full text clinical searches.

Introduction: Most health care providers have limited informatics expertise. Text-processing tools that are accessible and require minimal programming skills may facilitate automated electronic medical record (EMR) chart review.

Methods: This project was implemented to support a case-control study investigating the impact of HIV status on wrist, hip, and vertebral fractures in the Veterans Aging Cohort Study (VACS)1. SQL Server 2008 Full Text Search (SQL Server) and NegEx2 were used to identify patients with fractures and exclude those with negative fracture reports or with old fractures. The gold standard will be established by chart reviews done by two clinicians (JW and CB), with a third (CG) arbitrating discrepancies.
SQL Server utilized the “CONTAINS” predicate and the prefix term “fractu*” and the “AND NOT” predicate along with proximity terms to identify and exclude radiology reports negative for fracture and positive for old fractures.
With NegEx, written in Python 2.3 programming language, we marked-up the same notes and identified those fracture expressions that were negated. Using regular expressions, we identified those reports that included non-negated fracture expressions. We will use a similar process to identify reports of old fractures only.

Results: 151,270 radiology reports were available for review. SQL Server identified 23,595 reports (16%) that included the term “fracture.” 1523 reports (6%) matched search criteria for positive fracture. Of these, 697 (46%) matched criteria for acute fractures. NegEx, identified 6363 reports (27%) that matched search criteria. Identification of acute fractures is pending. The reviewers are validating the negative and positive fracture reports. Determinations of precision and accuracy will be made.

Conclusions: Conclusions regarding the accuracy and precision of SQL Server versus NegEx will be drawn. While radiology reports have a more well-defined terminology than other text notes, the presentation of negative and incidental findings presents a challenge. Ease of use may not trump accuracy and precision of outcomes.

References
1Justice AC, Dombrowski E, Conigliaro J, et al. Veterans Aging Cohort Study (VACS): Overview and description. Med Care. Aug 2006;44(8 Suppl 2):S13-24.
2Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. Oct 2001;34(5):301-310.

top


Bayesian Combinatorial Partitioning for Detecting Gene-Gene Interactions

Authors:
An-Kwok Ian Wong, Shyam Visweswaran, University of Pittsburgh

Abstract:
An important challenge in the analysis of single nucleotide polymorphism (SNP) data is the identification of SNPs that interact in a nonlinear fashion in their association with disease. Such epistatic interactions among genes likely underlie the inheritance of complex diseases. We have developed a Bayesian combinatorial partitioning (BCP) method for analyzing genetic effects on a dichotomous outcome variable. Combinatorial methods are multi-locus methods that search over all possible combinations of markers to find combinations that are associated with the phenotype and thus are capable of detecting epistatic interactions among genes. A widely used combinatorial partitioning method is the multifactor dimensionality reduction method (MDR).

top


Semantic MEDLINE, with Summarization Enhancement as a Natural Language Processing Application in Content Discovery

Authors:
T Elizabeth Workman1, Marcelo Fiszman2, Joyce A. Mitchell1, John F Hurdle1, Thomas C Rindflesch2
1University of Utah; 2National Library of Medicine

Abstract:
The accelerated growth of the available biomedical literature has caused a crisis in literature evaluation. Researchers have difficulty staying abreast of new findings in their fields. As an alternative to traditional information retrieval, we studied the application of the Semantic MEDLINE model as a discovery tool for genetic database curators. METHODS: We collected citations from three source databases (MEDLINE, BIOSIS, CINAHL), and processed them with Semantic MEDLINE, a multi-step application that utilizes the NLM’s Semantic Knowledge Representation tools SemRep, Summarization, and Visualization. We developed a new Summarization schema focused on identifying salient research in the genetic etiology of disease. We used bladder cancer as a test condition, because it presents an especially interesting potential for emerging research. We compared the output to existing data in curated genetic databases Genetic Home Reference (GHR) and the Online Mendelian Inheritance in Man (OMIM). RESULTS: Semantic MEDLINE, with the new genetic etiology of disease summarization schema, identified 76 unique genetic entities from the professional literature indexed by MEDLINE, BIOSIS, and CINAHL that were implicated in bladder cancer. Many of these entities were not represented in the curated databases.

top