Skip Navigation Bar
 

Grants and Funding: Extramural Programs (EP)

NLM Informatics Training Conference 2007

June 26–27, 2007 at Stanford University

Agenda and Abstracts of Presentations and Posters


Tuesday, June 26, 2007

7:45 - 7:50 Welcome (Russ Altman)
   
7:50 - 8:10 NLM Director's Remarks (Donald A.B. Lindberg)
   
8:10 - 8:25 Introductions of Training Directors and Trainees, Overview of Program (Valerie Florance)
   
8:30 - 10:10 Plenary Paper Session #1 - Key Topics in Informatics - 5 papers (Session Chair: Robert Greenes)
 
10:30 - 11:30 Parallel Paper Sessions
   
  Session A - Health Care Quality - 3 papers (Session Chair: Charles Caldwell)
 
  Session B - Tools & Techniques, Part 1 - 3 papers (Session Chair: Peter Tarczy-Hornoch)
 
  Session C - Translational Informatics - 3 papers (Session Chair: Joyce Mitchell)
 
1:00 - 2:00 Poster Session
   
  Executive Session of Training Directors
   
2:00 - 3:40 Plenary Paper Session - Health Care Informatics - 5 papers (Session Chair: William Tierney)
 


Wednesday, June 27, 2007

7:00 - 7:50 Training Directors Business Meeting Slides
   
8:00 - 9:20 Plenary Paper Session - Bioinformatics - 4 papers (Session Chair: Russ Altman)
 
9:20 - 10:20 Parallel Paper Sessions
   
  Session D - Tools & Techniques, Part 2 - 3 papers (Session Chair: G. Anthony Gorry)
 
  Session E - Consumer/Public Health Informatics - 3 papers (Session Chair: Cynthia Gadd)
 
  Session F - Knowledge Discovery & Summarization - 3 papers (Session Chair: Alex Bui)
 
10:40 - 12:20 Plenary Paper Session: Modeling - 5 papers (Session Chair: George Phillips)
 
12:20 - 1:20 Grants Administration Workshop Slides
   
1:00 - 1:20 Repeat of Poster Session
   
1:20 - 3:00 Plenary Paper Session: Information Retrieval/Information Studies - 5 papers (Session Chair: William Hersh)
 
3:00 - 3:30 Closing Session (Russ Altman)
   





PRESENTATION ABSTRACTS



Understanding Workflow and Information Flow in Chronic Disease Care

Authors:
Kim M Unertl, Matthew B Weinger, Nancy M Lorenzi, Kevin B Johnson, Vanderbilt University

Abstract:
Health information technology (HIT) may help to address the significant and growing problems presented by chronic disease care. However, not enough is known about how the chronic disease environment differs from the primary care environment. The research study sought to examine workflow and information flow in chronic disease care. Over 150 hours of direct observation in three ambulatory specialty clinics identified elements of the workflow and information flow, usage patterns of existing HIT, and gaps between user needs and existing functionality. Clinic-specific models of workflow, information flow, and temporal flow were developed and further refined using semi-structured interviews. Generalized models were developed that identified the common aspects of workflow and information flow across all three clinics. Aspects of chronic disease care workflow that are important to address in the design of HIT were identified. Core similarities among different chronic disease domains were identified, as were crucial differences. Comprehending the unique aspects of workflow and information flow in chronic disease care early in a software design process may assist in improving user satisfaction and increase the likelihood of implementation success.

top



Automated Function Prediction: A Comparison of SeqFEATURE to Other Methods and Applications to Structural Genomics

Authors:
Shirley Wu, Michael P Liang, Russ B Altman, Stanford University

Abstract:
Determining protein function is critical for furthering our understanding of biological processes and facilitates advances in human health and disease treatment. The function of a protein depends on its structure, and protein structure offers an important level of information that sequence alone cannot reveal. Until recently, however, structure-based analysis of proteins has been limited by the relative lack of structural data. Although the burgeoning field of structural genomics is rapidly increasing the number and diversity of protein structures available, many of the new structures are of hypothetical proteins or proteins of unknown function. Often, these proteins share very little sequence similarity to known proteins, and so methods to predict the function of a protein from its structure are needed. We have constructed a library of 3-D functional site models derived from PROSITE patterns, called SeqFEATURE, which can be used to scan protein structures for function. SeqFEATURE is based on the FEATURE system, which describes the local 3-D environment around sites of interest using a naive Bayes framework. We compared SeqFEATURE both to sequence-based methods - PROSITE and Pfam - and structure-based ones - Secondary Structure Matching (SSM) and 3-D templates. We have also applied SeqFEATURE in a scan of the entire Protein Data Bank and analyzed a subset of structures from the TargetDB database of structural genomics targets. This analysis has yielded a number of intriguing predictions that warrant further study.

top



A Systems-Based, Computational Approach to Link Phenotypes with Causal Genetic Events

Authors:
Kartik M Mani, Celine Lefebvre, Wei Keat Lim, Kai Wang, Andrea Califano, Columbia University

Abstract:
A significant focus of bioinformatics methods in cancer research is the computational identification of genes that may be causally related to the presentation of a specific phenotype. This task has typically been accomplished in gene-centric fashion, using learning algorithms to discover genes that, for example, differentiate tumors versus their normal counterparts. These methods tend to identify long lists of genes, whose rank is often inconsistent between datasets and generally a poor predictor of their causal role. In this study, we argue that key genes which are causally related to specific phenotypes naturally emerge when analyzed in the context of network dynamics. We first infer a genome-wide, B cell-specific interaction network using a Bayesian evidence integration framework. We then use a large compendium of microarray expression profiles to identify those interactions which exhibit significantly altered behavior in each phenotype of interest. Finally, we calculate a statistical enrichment score for the genes involved in these interactions. We show that for four well-annotated phenotypes, the method identifies the causal gene reported in the literature within the top 20 from over 6000 possible candidates (top 0.3%), significantly outperforming standard approaches and providing a more mechanistic view of the genetic changes underlying the phenotypic transition.

top



Building 3D Chemical Structures from 2D Information

Authors:
Ryan W Benz, Pierre Baldi, University of California, Irvine

Abstract:
The exploration chemical space, which contains all possible small organic molecules, is important for discovering new compounds of biological interest. Chemoinformatics has played an important role in this exploration using the tools of computer science, mathematics and chemistry to uncover patterns and information in chemical data. One existing challenge in chemoinformatics is incorporating 3D chemical structures into useful molecular descriptors. As such information is often not readily available, computational methods for generating reliable molecular coordinates are important, particularly in the study of virtual chemical space. Here we present recent work aimed at generating high quality, 3D molecular structures based upon 2D connection information. Starting from the molecular structures contained in the Cambridge Structural Database, molecules are separated into rigid segments that are stored in a new database along with the torsion angles between segments. This database can then be used to reconstruct 3D molecular structures using 2D SMILES representations as input. Analysis of the fragment distributions will also be presented along with a quality assessment of the generated structures.

top



Sharing Personal Health Information within Social Networks

Authors:
Meredith M Skeels, Wanda Pratt, University of Washington

Abstract:
Understanding how and why people share health information with their social networks is important when designing technology to support health consumers. We conducted a qualitative study, using semi-structured interviews with 13 health consumers, to describe what health information consumers share, how they share it, and why they share it. These interviews revealed protocols for sharing information, reasons to share and not to share health information, and criteria for deciding how much health information is shared with specific people. We found that many health consumers share health information broadly and that sharing occurs for a variety of reasons. Participants described learning about health conditions, treatments, and outcomes from other peoples' experiences and described turning to people close to them for support and reassurance as well as for information and expertise. Health information was shared in a variety of formats, often using multiple communication modalities. The findings from this study will inform the design of technology to aid personal health information management and to support sharing and collaboration within social networks.

top



The Impact of Pediatric Adverse Events on the Cost-Effectiveness of Oseltamivir

Authors:
Tara A Lavelle, Timothy M Uyeki, Lisa A Prosser, Harvard University, Boston, MA, Centers for Disease and Control and Prevention, Atlanta, GA

Abstract:
Studies have demonstrated the benefit oseltamivir treatment provides in reducing the duration of influenza symptoms in children, but recent reports of neuropsychiatric adverse events (AE) deserve consideration. This study investigated the effect that these AE have on the cost-effectiveness of oseltamivir treatment in children. A decision tree was developed to evaluate the costs and effectiveness of 3 clinical options for otherwise healthy 5-11 year old children with influenza-like illness: no antiviral treatment, testing then treatment with oseltamivir, and empirical oseltamivir treatment. In the base case analysis, where neuropsychiatric AE were estimated to occur in <0.1% of treated patients, testing then treatment with oseltamivir led to an incremental cost-effectiveness ratio (ICER) of $28,811 per quality adjusted life year (QALY), compared to no antiviral treatment. Empirical treatment was a more costly, but more effective strategy, with an ICER of $60,511. Assuming a willingness-to-pay threshold of $100,000/QALY, empirical treatment with oseltamivir remained cost-effective until the probability of neuropsychiatric AE reached 1.4%. Testing then treatment, however, remained cost-effective provided that the probability of these AE remained below 19.1%. These results indicate that oseltamivir remains a cost-effective treatment option in this pediatric population, even when allowing for substantial increases in the likelihood of neuropsychiatric AE.

top



Usage and Perceptions of Athena-Hypertension DSS by Primary Care Physicians

Authors:
Martha Michel (Palo Alto VA CHCE Postdoc), Susana Martins (Palo Alto VA GRECC), Nancy Lin (Palo Alto VA GRECC/ Stanford), Mary K Goldstein (Palo Alto VA GRECC/ Stanford)

Abstract:
ATHENA-Hypertension, an automated decision support system (DSS), provides patient-specific recommendations to primary care physicians at the point of care. Our study aims were to describe (1) clinician-reported perceptions and use of ATHENA and (2) actual interactions with ATHENA during a 15-month randomized implementation trial. For each visit, data were collected on the type of interaction (e.g., updated ATHENA Advisory, accessed patient history). At trial end, experimental-group clinicians were sent a survey that included questions about usability and navigation of ATHENA. Overall the interaction rate was 52% in the experimental group (10739 interactions / 20524 displayed Advisories) and 16.5% in the controls (3694 interactions / 22301 hypertension reminders) (p<0.001). From the survey, eighty percent reported good or excellent ease of navigation in finding information (n=44). Fifty-two percent reported good or excellent integration in their clinical workflow. Providers tend to interact with ATHENA-hypertension and reported high satisfaction with navigation.

top



Adherence to Home-Monitoring and Mortality in Post-Lung Transplantation Patients

Authors:
Hojung J Yoon, Hongfei Guo, Marshall Hertz, Stanley Finkelstein, University of Minnesota

Abstract:
Home-monitoring is an important aspect of chronic disease care in promoting early detection and prevention of long-term complication. However, adherence to home-monitoring has often been less than optimal, and this suboptimal adherence is posing great challenges in chronic disease management. Demonstration of the correlation between adherence and better survival may provide further motivation to improve the rates of adherence. In order to determine the relationship between the adherence behavior patterns and mortality, we studied home-monitoring and mortality records of post-lung transplant patients. Since there are different ways to define adherence, several definitions were compared against the post-transplant survival. In this study, we examined 246 patients' data with 132,822 daily transmission readings of spirometry. This project will describe several definitions of adherence and how the home-monitoring data according to each definition correlate with the mortality data in these patients, using a Cox Proportional Hazards model. Proportion of adherence to the total time, mean of the gaps, and variability in the adhered-to schedule will be analyzed adjusting for confounding variables.

top



Content Validity of Obtrusiveness Model of Home Telehealth Technologies

Authors:
Brian Hensel, University of Missouri-Columbia, George Demiris, University of Washington-Seattle, Karen Courtney, University of Pittsburgh

Abstract:
This research examines content validity of a conceptual model of perceived obtrusiveness of home telehealth technologies1 by 1) comparing model dimensions and sub-categories to perceptions of specific technologies voiced by focus groups from residential care facilities;2 and 2) gathering feedback on the model's definition of obtrusiveness and its dimensions and associated subcategories via a Web-based Delphi survey of scholars and practitioners familiar with telehealth. Transcripts of focus groups contained examples of all eight dimensions and sixteen of the twenty-two subcategories of the model. Analysis suggests that model dimensions are comprehensive in capturing obtrusiveness as defined; at least some subcategories need to be more clearly defined; and combining some and creating some new subcategories should be considered. Preliminary (May 2007) results of first round of Delphi survey support focus group findings plus suggest clarification of model's definition of obtrusiveness. Additional rounds will be conducted, and iterative changes made, until an informed consensus is reached between participants. Results will be used in developing a measurement instrument that will be tested for content, predictive, concurrent, and construct validity, and for reliability. User perceptions of obtrusiveness are important to the adoption of these technologies and can inform their design.

1. Hensel BK, Demiris G, Courtney KL. Defining Obtrusiveness in Home Telehealth Technologies: A Conceptual Framework. J Am Med Inform Assoc. 2006;13:428-431.

2. Courtney, K. L., Demiris, G., & Hensel, B. K. (In Press). Obtrusiveness of Information-based Assistive Technologies as Perceived by Older Adults in Residential Care Facilities: A Secondary Analysis. Medical Informatics and the Internet in Medicine.

top



Evaluation of the VA/KP Subset of SNOMED for E-Prescription Clinical Decision Support

Authors:
Surendranath Mantena, Gunther Schadow, Regenstrief Institute, Indiana University School of Medicine

Abstract:
There is a need for a standardized clinical terminology to represent clinical terms or concepts used for medical indications. The FDA has adopted the VA/KP Problem List Subset of SNOMED as the terminology to represent medical conditions in electronic labels. In this paper, we evaluate the ability of this subset to represent the text phrases extracted from a medication decision support system and the indications section of existing labels. We compiled a test set of 1265 distinct phrases and mapped them to (1) UMLS, (2) Entire SNOMED (3) All Precoordinated terms from the "findings" category of SNOMED, and (4) VA/KP Subset. 95% of the phrases mapped to concepts in UMLS, 90.3% to SNOMED, 79.5% to SNOMED Precoordinated and 71.1% mapped completely or partially to concepts in the VA/KP subset. Our results suggest that it may be advisable for the FDA to use the full SNOMED Precoordinated set as the terminology of choice rather than the empirically constrained VA/KP Subset.

top



Cryptographic Accuracy Annotations for Electronically Exchanged Personal Health Information

Authors:
David Haight and Patricia Brennan, University of Wisconsin

Abstract:
Accurate, reliable and contextually complete information about a patient's health, lifestyle factors and clinical care are the fundamental currency that effective and efficient health care delivery is based on. New mechanisms for collecting and aggregating clinically relevant information such as Personal Health Records and Health Information Exchanges are being increasingly utilized to address the increasing trend of the distribution of patient information across fragmented of care settings. Clinicians assess the clinical significance of information on the information content and some additional context that surrounds each data element that is used to assess the quality or accuracy attributes of each element. Accuracy is calculated along two dimensions: the correctness of each data element and the completeness of the aggregation of information. Aggregating information from a variety of sources using PHRs or HIEs results in a range of accuracy, reliability and completeness characteristics of the elements of the data set that pertains to the care of the patient at a point in time. This work proposes a technical infrastructure to implement integrity annotations which express the correctness and completeness attributes of information elements using public key based cryptographic methods enabling effective computational representation and analysis for use in clinical contexts.

top



The Biofluidome: Determination and Quantification of Functional Peripheral Proxies

Authors:
Gil Alterovitz, Michael Xiang, Amelia Chang, Marco F Ramoni, Children's Hospital Informatics Program, Division of Health Sciences and Technology, Harvard Medical School and Massachusetts Institute of Technology, Boston

Abstract:
Recent work in clinically-oriented proteomics investigations has targeted the discovery of viable biomarkers for functional, disease, and drug state characterization. While the accessibility of proteins in biofluids makes this approach attractive, an important concern is the loss of information when moving from examining actual tissues to the testing of associated biofluids. This study, therefore, addresses this problem by quantifying the amount of information transferred between body tissues and potential biofluid proxies. An information theoretic approach was used to map proteins from 160 tissues and biofluids combinations to functional, disease and drug interaction spaces. In this space, the extent of information loss (across a modeled channel - from tissue to fluid) was calculated using relative entropy. The result is the biofluidome, a network of significant biofluid-tissue relationships (p < 0.01) which spans the three spaces. Using this technique, validation of existing relationships was achieved, in addition to yielding novel, unexplored biofluid-tissue relationships. This investigation has laid the foundations for further work on biomarker discovery, facilitating the process by constraining the focus of wet lab validation to specific biofluids and biomarkers - which are shown to carry a statistically significant amount of information about specific tissues.

top



Genome-Wide Linkage Analyses for Asthma Predisposition Loci in Extended Pedigrees

Authors:
Craig C Teerlink, Nicola J Camp, Lisa A Cannon-Albright, University of Utah, Salt Lake City

Abstract:
Asthma is a multi-factorial disease with undetermined genetic factors. We performed a genome-wide scan to identify predisposition loci for asthma, using 565 STR markers. The asthma phenotype consisted of physician-confirmed presence or absence of asthma symptoms. We used 82 extended Utah pedigrees ranging from three to six generations with 746 affected individuals, ranging from two to 40 per pedigree. We performed parametric multipoint linkage analyses with dominant and recessive models. Our primary analysis revealed suggestive evidence of linkage to regions 5q (LOD = 3.75, recessive model), 6p (LOD = 2.08, dominant model, and 11q (LOD = 1.73, recessive model). Marginal evidence of linkage was found on chromosome 19q (LOD = 1.14), which included a single pedigree with a pedigree-specific LOD of 2.28. All of the regions indicated in these analyses (5q, 6p, 11q and 19q) have been previously identified regions of interest in other studies. These results indicate that the Utah extended pedigrees are useful in confirmation of regions of interest and support further investigation of these regions.

top



Detecting Natural Selection Across Populations

Authors:
Eleanne Solorzano and Hongyu Zhao, Yale University

Abstract:
The detection of signals of positive selection is very important because it can provide significant insights into recent evolutionary processes in humans. There exist various methods to detect selection based on the analysis of haplotypes. For example, Sabeti et. al. (2002) discuss using long-range haplotypes in human populations to detect natural selection. More recently, selection methods have focused on the whole genome. For example, Voight et. al. (2006) reported on a genome-wide scan for signals of very recent positive selection in favor of variants that have not yet reached fixation. Zhang et. al. (2006) recently developed a new method called the whole genome long-range haplotype test (WGLRH) which uses genome-wide distributions to test for recent positive selection. There is a need to detect natural selection across populations. In this project, a new method of detecting natural selection across populations is introduced. This method is based on linear models and takes the dependency of the populations under consideration. Simulations are performed using the software simuPOP (Peng and Kimmal, 2005). Power and Type I error rates are found for various sample sizes.

top



The Cost of Adverse Drug Events in Ambulatory Care

Authors:
Matthew M Burton1,2, Carol Hope3, Michael D Murray4, Siu Hui1,2, J Marc Overhage1,2
1 Regenstrief Institute, Inc
2 Division of General Internal Medicine, Department of Medicine, Indiana University School of Medicine
3 Johns Hopkins Hospital Pharmacy Information Systems
4 University of North Carolina School of Pharmacy

Abstract:
Background: Many justifications for ePrescribing predict savings achieved by reducing the number of adverse drug events (ADEs) in the ambulatory setting however, there is little evidence from which to estimate the size of these savings. Estimating this cost in the ambulatory setting would improve the reliability of these predictions. Methods: We identified patients with potential ADEs in a primary care practice setting and characterized the patient's age along with charge and utilization indicators for six weeks pre- and post-event. We then used linear regression to determine charges attributable to an ADE. Results: Charges were higher for patients following an ambulatory visit who were determined to have ADEs. This occurred in a linear fashion: two ADEs ($4,976); one ADE ($2,337); and no ADEs ($1,943). The charge attributable to a single ADE is $643 (2001 U.S. dollars) or $926 (cost adjusted to 2006 U.S. dollars). Conclusions: Patients with ADEs incur greater charges. The charges attributable to an ambulatory ADE are a significant cost to the health care delivery system on the order of $8 billion annually.

top



Content-Based Image Retrieval of Malignant Brain Tumors

Authors:
Shishir Dubea, Suzie El-Sadena, Timothy F. Cloughesyb, and Usha Sinhaa
a Medical Imaging Informatics, University of California at Los Angeles, CA, USA
b Department of Neurology, University of California at Los Angeles, CA, USA

Abstract:
We propose a prognostic content-based image retrieval system for patients diagnosed with glioblastoma multiforme (GBM) that predicts time to survival (TTS). Our proposed system consists of three components: 1) a preprocessing scheme to condition the image quality and provide consistency; 2) a fast multilevel segmentation technique; and 3) a multivariate linear model for prognosis. The multivariate linear model, as applied to a set of training data, had a correlation coefficient of 0.848, which indicated a strong association of the selected extracted features for predicting time to survival. However, when test volumes were queried with the multivariate model, the predicted TTS were significantly different. This resulted from the fact that there were confounding variables (i.e., multiple surgeries) that were not represented in the training set. Future work will involve expanding the training set and incorporating additional features not explicitly extracted from the segmented tumor regions.

top



The Effects of Hands Free Communication Devices: Communication Changes Among Nurses, Nurse Managers, and IT Staff

Authors:
Joshua E Richardson, Joan Ash, Oregon Health & Science University

Abstract:
Vocera (Vocera Communications, Inc.) is a hands free information and communication technology comprised of wearable "badges" and server-based software. The technology is increasingly being used in clinical care settings, particularly among nursing staff. Quantitative studies and surveys report that clinicians who use Vocera, or hands free communication devices (HFCDs), experience faster communication times but also have concerns about reliability and patient confidentiality. The objective of this qualitative research is to describe the multiple perspectives of staff nurses, nurse managers, and IT staff in relation to the use of HFCDs in hospitals. The researcher conducted semi-structured interviews and field observations of HFCD users in an academic medical center and a community hospital. Participants included nurse managers, staff nurses, IT support staff, and IT administrators. Preliminary results show that users perceive HFCDs to speed communication among staff, and to a lesser degree, across departments. Staff nurses describe utilizing a different set of skills in order to manage HFCD calls, such as handling patient information and a need for etiquette, and may see the implementation of HFCDs as a series of trade offs. Many interviewees believed appropriate training is critical to effective use of HFCDs.

top



The Available Health Record is Often Deficient- But Can We Work with What We Have?

Authors:
Andrew J Brunskill, Gayle Reiber, Ruth Etzioni, Kenric Hammond, VA HSR&D, FHCRC and University of Washington, Seattle

Abstract:
Predicting the risk of diabetes complications from administrative and clinical databases allows resources to be assigned where they are most likely to benefit. But some strongly predictive "key" disease, personal, laboratory or pharmaceutical variables are often absent or flawed, for example duration of diabetes. In the "ACQUIP" data set [approximately 4500 male patients at the VA with diabetes followed for 2 years or more] an extensive range of self reported data and other variables are available. A multivariate logistic model which included self reported duration of diabetes showed moderate [AUC 0.70] prediction of the outcome of future incident congestive heart failure within two years. If duration of diabetes was omitted but employment status was substituted the model showed similar [AUC 0.69] prediction, albeit on a smaller group of patients. Similar effects can be shown over a range of variables. Conclusion - It is ideal to have all data on all patients. But use of alternative available variables and innovative analytic models may compensate for the lack of "key" variables in this redundant and over-determined situation.

top



Improving Diabetes Population Management Efficiency with an Informatics Solution

Authors:
Adrian Zai, Richard Grant, Carl Andrews, Ronnie Yee, Henry Chueh, Massachusetts General Hospital

Abstract:
Population-level strategies to organize and deliver care have been shown to improve diabetes management although further research to translate the registry information into action is required. We set forth to develop a Registry Population Manager (RPM), a web-based application that organizes and presents information about groups of diabetic patients within Partners Healthcare. Since our current method for intervening on diabetic patients continues to fall short of evidence-based goals, our primary intention is to design a user interface that minimizes inefficiencies within the current workflow process. We recently initiated our first pilot study at a single clinical practice, and are pleased to report a significant gain in workflow process efficiency. Although the core function of a diabetes registry is to organize and present information about populations of diabetic patients, it does not have to be limited to those standard functions. In our case, we have developed a registry that also streamlines workflow leading to more efficient population evaluation and action.

top



Counterion Localization Caused by DNA Supercoiling Affects Type II Topoisomerase Recognition

Authors:
Graham L Randall*,§, E Lynn Zechiedrich*,†, B Montgomery Pettitt*,§
* Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine
Department of Molecular Virology and Microbiology, Baylor College of Medicine
§ Department of Chemistry, University of Houston

Abstract:
In a population of circular DNA plasmids, type II topoisomerases have been shown to reduce, by approximately 1.8 fold, the variance in the degree of winding or linking number (Lk). Data show that the concentration of positively-charged counterions has a dramatic effect on the degree of DNA supercoiling (writhe), and that the activity of type II topoisomerases is highly dependent on the concentration of these counterions. We hypothesize that the more open and accessible geometry of supercoiled DNA in low salt concentrations makes the molecule a more suitable substrate for type II topoisomerases. We have performed some preliminary calculations to quantify the effect of Lk on the local concentration of counterions. Surprisingly, solutions to the Poisson-Boltzmann equation for an all-atom model of a DNA double helix in an electrolytic solution show that the average electrostatic potential increases as the change in Lk decreases from +12 to -12 turns, but our calculations do not show a significant change in counterion concentrations. Presently, we are expanding our investigation into these effects using molecular dynamics to simulate all-atom explicit solvent systems of DNA helices with various Lk.

top



Statistical Analysis of the Genomic Distribution and Correlation of Regulatory Elements in the ENCODE Regions

Authors:
Zhengdong D Zhang1, Alberto Paccanaro2, Yutao Fu3, Sherman Weissman1, Zhiping Weng3, Joseph Chang1, Michael Snyder1, Mark Gerstein1
1 Yale University
2 University of London
3 Boston University

Abstract:
The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE project enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 ~ 100 kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly non-uniform. We demonstrate, in fact, that regulatory elements are highly associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich "islands" and poor "deserts." Next, we examine how consistent the non-uniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.

top



Effects of Aging on Mouse Transcriptional Networks

Authors:
Lucinda K Southworth, Art B Owen, Stuart K Kim, Stanford University

Abstract:
Although many genes have been implicated in the aging process, little is known about age-related changes in transcriptional control. We sought to test the hypothesis that transcriptional regulation becomes less efficient with age through the examination of gene expression networks. We built gene co-expression networks for mice of ages 16 and 24 months. We found age-associated decline in coexpression at both the network level and gene group level. This finding supports the hypothesis of a loosening of transcriptional control with age, with targeted decline in specific gene groups.

top



Peptide Identification in Whole-Sample Mass Spectrometry Proteomics

Authors:
Richard Pelikan, Miloš Hauskrecht, University of Pittsburgh

Abstract:
Peptide identification methods for whole-sample mass spectrometry (MS) proteomics are not well-developed. While sophisticated tandem MS/MS instrumentation exists for accurate peptide identification after sample separation, there are few options for those who produce data from intact protein samples. We present a novel algorithm which uses available information from the literature and online protein databases to provide reliable labeling of features in whole-sample MS proteomic data. Our novel model decomposes the MS signal into a location aspect and an intensity aspect. Labels must fit the criteria of a good match to both the location and intensity aspect simultaneously. The location aspect attempts to match labels to peaks based on the weight of the label's peptide. The intensity aspect attempts to match labels to peaks based on the expected relative abundance of the label's peptide. The dynamic programming technique is applied to find the most probable assignment of labels to peaks, in a fashion similar to sequence alignment. Future work will focus on incorporating additional information to determine the effects of post-translational modifications and experimental design on relative abundances of molecules in the sample.

top



Formal Usability Testing Finds New Problems for Novice Users of Pediatric Portals

Authors:
Maria Britto1,2,4, Holly Jimison4, John Pestian1,2, Marta Render2,3, and William Hersh4
1 Cincinnati Children's Hospital Medical Center
2 The University of Cincinnati College of Medicine
3 The Cincinnati VA Medical Center
4 Oregon Health and Sciences University

Abstract:
Patient portals may improve chronic disease outcomes, but few have been rigorously evaluated for usability by parents. Using formal scenario-based testing with think-aloud protocols, we evaluated portals for parents of children with cystic fibrosis, diabetes or arthritis. The portals included real-time views of key medical record elements and secure messaging. They were developed and tested with clinicians and parents and underwent previous heuristic review. Sixteen parents used a prototype and test data to complete 14 comprehension or navigation tasks followed by a validated satisfaction questionnaire. Sessions were videotaped and content-analyzed. Participants were mostly mothers (80%) and very familiar with their child's condition (64%), but described their computer savvy as low (44%) or medium (50%). Mean task completion times ranged from 73 (±61) seconds to locate a document to 431 (±289) seconds to graph lab results. Tasks such as graphing, location of data, requesting access, and data interpretation were challenging. Satisfaction was greatest for interface pleasantness (5.9 ±0.7) and likeability (5.8±0.6) and lowest for error messages (1.8 ± 1.5) and clarity of information (3.2 ± 2.2). Despite parent involvement and prior heuristic testing, scenario-based testing demonstrated difficulties in navigation, language complexity, lack of feedback and error recovery, and provider-based organizational schema.

top



A Combined Qualitative Method for Testing an Interactive Risk Communication Tool

Authors:
Jessica S Ancker, MPH1 and Rita Kukafka, DrPH, MA1,2
1 Department of Biomedical Informatics
2 Department of Sociomedical Sciences, Columbia University

Abstract:
Background: Descriptions of risks in words, numbers, and graphics can be associated with comprehension problems. Our novel risk communication tool, developed on the basis of cognitive theory, involves a game-like interaction to provide an experience of the probability of a health event. Method: For this tool to be useful, it must be intuitive and address lay conceptions of risk. Usability methods to test the computer interaction alone are insufficient. We have developed a combined qualitative method incorporating standard focus group methods and scenario-based usability testing in a community-based participatory research setting. In this procedure, a trained facilitator mediates discussion between software developers and participants. Preliminary results: The method has been applied in five focus groups. Participant comments have been used to revise the tool in several iterations. The tool appears to help some people envision themselves affected by the health risk in a way that static pictures do not. Discussion: The combined method has enabled a series of productive discussions between a developer and potential lay users in a collaborative setting. Complete transcript analysis is expected to lead to richer understandings of lay models of risk and probability, computer and Internet use, and health issues in a disadvantaged community.

top



RF++: Robust Random Forest for Clustered Data Classification

Authors:
Yuliya V Karpievitch1,2, Anthony P Leclerc3, Elizabeth G Hill2, Jonas S Almeida1
1 Medical University of South Carolina, Charleston
2 M. D. Anderson Cancer Center, University of Texas, Houston
3 University of Charleston, Charleston

Abstract:
In many biological experiments such as biomarker identification, it is a common to collect multiple samples from an individual subject and/or collect technical replicates for the same subject sample. This produces clustered data, a phenomenon that occurs when some data points form clusters because they come from the same source. The clustered data cannot be properly analyzed using conventional classification methods that are based on the assumption that the samples are independent and identically distributed (i.i.d.). We present a new Random Forest-based algorithm, RF++, which can classify clustered or longitudinal data in a statistically robust fashion. RF++ handles clustered data automatically, thus the need to average or otherwise reduce replicate samples to single samples is eliminated and produces statistically valid results without data reduction. Tree weights based on within-subject proximities are incorporated into the classifier yielding improved classification. In our simulations we address two distinct properties of RF++: variable importance measure and classification performance based on subject-level bootstrapping. Though the disease biomarker identification and classification in mass spectrometry data is specifically considered here, RF++ can, in general, be applied to any clustered or non-clustered data sets. To enhance experimentation with RF++ we provide a knowledge-assisted graphical user interface.

top



Forecasting Emergency Department Crowding by Discrete Event Simulation

Authors:
Nathan R Hoot, Larry J LeBlanc, Ian Jones, Dominik Aronsky, Vanderbilt University

Abstract:
Emergency department overcrowding decreases quality and access of health care. A discrete event simulation was developed to model workflow, which may allow the distributions of many operational variables to be forecast at arbitrary times in the future. Every simulation forecast represented the average of 1000 observations. First, at three random times daily during 2006 (n=1095), a four-hour forecast was obtained for the waiting count, waiting time, occupancy level, length of stay, boarding count, and boarding time. Second, at consecutive 10-minute intervals during 2006 (n=52,560), a four-hour forecast of ambulance diversion probability was determined using the fraction of observations having 10 patients waiting and all beds full, per local institutional policy. The actual and predicted operational data were strongly correlated for the waiting count (r=0.57), waiting time (r=0.56), occupancy level (r=0.82), length of stay (r=0.87), boarding count (r=0.86), and boarding time (r=0.84). The ambulance diversion forecast showed high discriminatory power for diversion status four hours in the future (AUC=0.88). The simulation accurately predicted the future operating status of the emergency department, so it may provide early warnings of overcrowding to hospital administrators. Future research must examine forecasting timeliness versus the cost of interventions to alleviate overcrowding.

top



Towards Electronic Health Information Exchanges between Clinical Care and Public Health

Author:
Patricia Swartz, Johns Hopkins University

Abstract:
Health departments need timely data from physicians to respond to population health threats and to issue preventive measures. These data, in turn, have potential to inform individual clinical-decisions. Current public health data systems however, suffer from underreporting as physicians may not be aware of the public health reporting requirements or have efficient methods of communicating. Electronic health information exchanges (HIEs) may improve public health reporting and enable bi-directional communication between clinicians and public health.

The goal of this study is to inform the development of electronic HIEs between clinicians and health department by documenting (1) physician experience with current public health data reporting and (2) what data, information, or knowledge from health departments can benefit physicians.

An online survey of current reporting patterns and types of feedback desired was conducted of ambulatory physicians recruited through professional medical associations (State and Regional Primary Care Associations, AAP, etc.). The survey addressed reporting of infectious diseases, chronic conditions, and other condition, including rates and barriers, providing both quantitative and qualitative data, relating results to practice profiles and physician characteristics.

The types of information that physicians are interested to receive from health departments will inform the development of electronic HIEs between clinical care and public health for their common goal of delivering quality care and protecting the public's health.

top



Factual Versus Narrative Messaging: Different Modalities and Personal Involvement. Looking for the Best Strategy of Persuasion. The Weight Loss and College Drinking Examples

Author:
Julia Braverman, Medical Informational System Unit, Boston University, Boston, MA

Abstract:
Health communications use factual information or/and personal testimonials to inform and influence individual decisions that enhance health. Increasingly, Web and other computer-based systems are being used to communicate with patients. We evaluated the relative effectiveness of testimonials compared to factual messages delivered to a recipient through the Web. 420 participants took part in 2 Web-based experiments, in which they were randomly assigned to be exposed one of four kinds of messages about weight management (Experiment 1) and moderating alcohol drinking (Experiment 2). The study demonstrated that the testimonials were more persuasive when presented through the audio mode rather than when written on a computer screen. Also, testimonials were more persuasive compared to factual messages if perceived by the individuals who had low rather than high readiness to change their behavior. We interpret the results in terms of the elaboration likelihood model that states that individual's persuasion depends on his/her motivation to scrutinize the message. The findings help in developing the more effective ways of computer-based health communication.

top



Identification and Extraction of Functional Protein Point Mutation Effects in Biomedical Literature

Authors:
Lawrence C Lee, Fred E Cohen, University of California, San Francisco

Abstract:
Point mutations relay invaluable information regarding the functional properties of proteins. We have previously described a method, called Mutation GraB, which identifies point mutation terms in the literature and associates them with their protein of origin. To extend the functionality and usefulness of point mutation extraction, we now present a method for the identification and extraction of functional point mutation effects in biomedical literature. We have divided this task into two distinct subtasks: (A) Given sentences containing point mutation term(s), identify those that describe a functional effect of the point mutation(s). (B) Given sentences containing functional point mutation effect(s), identify the affecting verb phrase and effected noun phrase corresponding to the point mutation. To solve task A, we used both a Naïve Bayes and Maximum Entropy classifier trained on selected features of sentences to achieve F-measures in the range of 0.75-0.80. For task B, we utilized a rule-based decision tree to extract the mutation effects using phrase patterns and word locations as features; this achieved a F-measure of 0.60-0.65. These results were generated from a set of 1034 tagged sentences retrieved from Cystic Fibrosis literature.

top



Identifying Anatomical Phrases in Clinical Reports using Shallow Semantic Parsing Methods

Authors:
Vijay Bashyam and Ricky Taira, Medical Imaging Informatics, University of California at Los Angeles

Abstract:
Natural language processing (NLP) is being applied for several information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for this domain. We describe a method to identify semantically coherent phrases within clinical free-text reports. This preprocessing is an important step towards full syntactic parsing within a clinical NLP system. To demonstrate and evaluate the system, this semantic phrase chunker was used to identify anatomical phrases within radiology reports related to the genitourinary (GU) domain. A discriminative classifier based on support vector machines (SVMs) was used to classify words into one of five phrase classification categories. Training of the classifier was performed using 1,000 hand-tagged sentences from a corpus of GU radiology reports. Features used by the classifier include n-grams, syntactic tags, and semantic labels. Evaluation was conducted on a separate, blind test set of 250 sentences from the same domain. The system achieved overall performance scores of 0.87 (precision), 0.91 (recall) and 0.89 (balanced f-score), illustrating that anatomical phrase extraction can be rapidly and accurately accomplished.

top



Automatic Summarization of Mouse Gene Information by Clustering and Sentence Extraction from MEDLINE Abstracts

Authors:
Jianji Yang, Aaron M Cohen, William Hersh, Oregon Health & Science University

Abstract:
Tools to automatically summarize gene information from the literature have the potential to help genomics researchers better interpret gene expression data and investigate biological pathways. The task of finding information on sets of genes is common for genomic researchers, and PubMed is still the first choice because the most recent and original information can only be found in the unstructured, free text biomedical literature. However, finding information on a set of genes by manually searching and scanning the literature is a time-consuming and daunting task for scientists. We built and evaluated a query-based automatic summarizer of information on mouse genes studied in microarray experiments. The system clusters a set of genes by MeSH, GO and free text features and presents summaries for each gene by ranked sentences extracted from MEDLINE abstracts. Evaluation showed that the system can provide meaningful clusters and informative sentences are ranked higher by the algorithm.

top



Using Mathematical Models to Identify Novel JNK Substrates

Authors:
Thomas Whisenant, David Ho, Ryan Benz, Frank Antilla, Pierre Baldi, Lee Bardwell, University of California, Irvine

Abstract:
Mitogen-activated protein kinases (MAPKs) are a ubiquitously expressed, conserved group of proteins involved in transduction of a diverse array of extracellular signals. The JNK MAP kinase pathway has been implicated in a variety of diseases associated with deregulation of apoptotic signaling. Many substrates of JNK interact through a conserved docking site (D-site) which confers specificity to the interaction. An algorithm was developed to identify additional substrates in the human transcriptome through identification of their D-site. Confirmation of the algorithm output was carried out with a semi-high throughput peptide macroarray. The full length sequences of the best candidate peptides were cloned, along with their docking site mutants, for GST pulldown assays to asses the impact of the D-site on interaction with JNK. We were able to identify multiple novel substrates that bind to JNK in vitro through their D-site.

top



A Bayesian Model for Modeling Overlapping Gene Expression Modules using Latent Variables

Authors:
Thomas Asbury, Adam Richards, Xinghua Lu, Medical University of South Carolina

Abstract:
Gene expression data have typically been modeled by grouping genes into independent disjoint collections of genes through clustering of their common expression profiles. The actual gene expression network is more complicated, with genes participating in multiple expression modules which, in turn, are regulated by different signaling transduction pathways in a context specific manner. We present a Bayesian graphical model to accommodate the above assumptions, which allows genes to be members of multiple modules and explicitly represents the state of the signaling components regulating the expression of the modules. The model simulates the gene expression system through integrating the information from both microarray experiments and genomic sequences. This is achieve by introducing two important latent variables: a set of membership variable per gene indicating to which modules a gene belongs, and a set of switching state variables representing the states of the signaling system under a given experiment. A variational Bayesian inference algorithm is developed to infer these latent variables, which can be interpreted in a biologically meaningful fashion.

top



Closing the Gap in Homology Modeling and its Application to Drug Development

Authors:
Jeff Reneker, Chi-Ren Shyu, University of Missouri-Columbia

Abstract:
Homology modeling is widely used to predict structures of novel protein sequences. Structures of homologous proteins must already be solved to serve as models for the query sequence. Homologous proteins are commonly discovered through sequence alignments which, quite often, introduce gaps in the query sequence, the subject sequence, or both. How best to handle gaps and thus improve the accuracy of the predicted structures is currently a major focus of investigation.

Structure-based drug design in silico requires very accurate structural predictions of a target protein so that inhibitors can be developed to the natural protein that will bind to the active site and block metabolic activity. Therefore, a necessary first step to better, faster drug design is to find better ways to handle alignment gaps and generate increasingly accurate structure predictions.

We are currently investigating gaps from PSI-BLAST alignments and statistically determining the likelihood of finding accurate substructures that cover these regions from within the set of outputted alignments. Our findings are being incorporated into a homology modeling, feature selection algorithm currently under development. We will compare our structure predictions with two published examples of in silico drug design via structure modeling: bacterial MurA and human heparanase.

top



Statistical Relational Learning for Biomedical Domains

Authors:
Jesse Davis1, Elizabeth Burnside1, Vitor Santos Costa2, Jude Shavlik1, David Page1
1 University of Wisconsin - Madison
2 Universidade do Porto, Portugal

Abstract:
Structured data, such as patient clinical histories or molecular structures, represent an obstacle for machine learning research. Such data are most naturally stored in a relational database with multiple tables, whereas most machine learning algorithms assume data are stored in a single table. Statistical relational learning (SRL) upgrades standard learning algorithms to multi-table settings. Currently, SRL techniques learn joint probability distributions over the fields of a relational database, but are constrained to use only the tables and fields already in the database, without modification. In contrast, human users of relational databases find it beneficial to define alternative views of a database, further fields or tables that can be computed from existing ones. Our research develops a state-of-the-art SRL framework that defines new views of the data and is applicable to clinical and biological problems. Thus far, I have focused on two application domains: providing decision support system for radiologists who read mammograms and predicting three-dimensional Quantitative Structure-Activity Relationships for drug design. In the mammography domain, our system made more accurate predictions than radiologists and identified a novel feature that is indicative of malignancy. For drug activity prediction, our system resulted in more accurate predictions and discovered biologically relevant pharmacophores.

top



Use of Classification Models Based on Usage Data for the Selection of Infobutton Resources

Authors:
Guilherme Del Fiol, Peter J Haug, University of Utah, Salt Lake City, Utah

Abstract:
"Infobuttons" are information retrieval tools that predict the questions and the on-line information resources that a clinician may want to look at in a particular context while using an EMR system. The goal of this study was to employ infobutton usage data to produce classification models that predict the information resource that a clinician is most likely to select. Methods: Five data mining techniques were applied to a data set containing 7,968 infobutton sessions conducted at Intermountain Healthcare. The data set included 13 attributes describing the user, the patient, and the EMR module being used. We compared the ability of the models to predict the resources that users selected and calculated the agreement (kappa) with the actual choices made by clinicians. The infobutton implementation currently in place at Intermountain Healthcare was used as a referent. Results: Agreement using any of the classification models (kappa > 0.8) was significantly higher than agreement observed using the current system (kappa = 0.39). Two to five attributes were sufficient for the models to achieve their best performance. Conclusion: Applying data mining tools to infobutton usage data is a promising strategy for improving the prediction capability of infobuttons, directing clinicians to more relevant information.

top



Contextual Analysis of Variation and Quality in Human-Curated Gene Ontology Annotations

Author:
W John MacMullen, School of Information & Library Science, University of North Carolina, Chapel Hill

Abstract:
Two prospective randomized controlled studies of scientific curators of model organism databases (MODs) were conducted using common document collections to investigate the origins, nature, and extent of variation in curators' Gene Ontology (GO) annotations. Additional contextual data about curators' backgrounds, experience, personal annotation behaviors, and work practices were also collected to provide additional means of explaining variation. A corpus of nearly 4,000 new GO annotations covering 5 organisms were generated by 31 curators and analyzed at the paper, instance, and GO element levels. Variation was observed by organism expertise, by group assignment, and between individual and consensus annotations. Years of GO curation experience was found to not be a predictor of annotation instance quantities. Five facets of GO annotation quality (Consistency, Specificity, Completeness, Validity, and Reliability) were evaluated for utility, and showed promise for use in training novice curators. Pairwise matching and comparison of instances was found to be difficult and atypical, limiting the usefulness of the quality measures. Content analysis was performed on more than 600 pages of curators' hand-annotated paper journal articles used in GO annotation, yielding six types of common notations.

top



Adoption of Innovation in Practice: An In-Depth Study of Heparin Dosing by Residents

Authors:
Prudence W Dalrymple, Michael B Streiff, Harold P Lehmann, Johns Hopkins University

Abstract:
The Institute of Medicine (2001) recommends developing and disseminating clinical guidelines and providing tools and systems to support their implementation. Effective systems design requires a deep understanding of the work patterns and information flow in a clinical environment as well as ongoing evaluation and re-design to keep pace with changing standards of care. This study reports on an in-depth study of therapeutic heparin practice of a group of 25 residents admitting patients to selected inpatient units in an academic medical center. Using mixed methods including observation, semi-structured interviews, document review and analysis of patient charts, this research was conducted to portray in-depth the ways in which residents acquire and communicate information during the process of ordering and managing therapeutic heparin. Within the theoretical framework of diffusion of innovation, the results provide insight into the process through which evidence-based guidelines are (or are not) adopted in practice and suggests desirable decision support features needed by clinicians while ordering and managing high-risk drugs. The insights generated by in-depth examination of a specific class of drugs have the potential to inform the design and implementation of tools designed to support similar clinical needs.

top



A Study of Experimental Information Management in Biomedical Research

Author:
Nicholas Anderson, University of Washington, Seattle

Abstract:
Microarray expression research requires a range of skills, instrumentation and resources that are not all typically available within an individual researchers laboratory. As a consequence, such resources are increasingly being accessed through research collaborations, from on-line repositories or through the use of shared instrumentation and analysis facilities. The typical researcher faces novel experimental information management challenges in organizing these resources. A consequence of the ad-hoc coordination of processes, data, biological samples and expertise is evidenced in the difficulty in capturing experimental workflows to ensure reproducibility and consistency. We thus surveyed (n=49) and conducted in-depth qualitative interviews (n=21) of microarray researchers using two microarray core facilities. These researchers were studied to understand how the management of distributed experimental information and resources affected their experimental work. This study identified specific commonalities across research environments that informatics solutions need to address (socio-technical organization, research collaboration and information use). This study also identified general strategies for managing the existing data heterogeneity across research environments.

top



Sequential Search Result Refinement of the Medical Literature

Authors:
Len Y Tanaka, Jorge R Herskovic, M Sriram Iyengar, Elmer V Bernstam, The University of Texas School of Health Information Sciences at Houston

Abstract:
Reviewing all relevant research to answer questions about health and disease is nearly impossible due to the rapid growth of the biomedical literature. In fact there is so much information available that searching for "breast cancer" results in an overwhelming 157,781 articles (PubMed, March 2007). We previously found that we could prioritize important articles identified by experts as "must read" articles from very large result sets using simple citation count (i.e., the number of times the paper is cited) and PageRank (aka "the Google algorithm"). These algorithms rely on a database of citations from one article to another called the Science Citation Index. However, there is usually a lag from when an article is written to when it is cited. Our current work involves developing a sequential process to refine search result sets to yield the highest concentration of cited articles using only the information available at the time of publishing. We have identified key factors including journal impact factor, author network, and MeSH term article counts as having a role in future citation. Completion of this project would yield a search interface presenting scientific articles, including the most recent, by order of importance.

top



Sharing Detailed Research Data is Associated with Increased Citation Rate

Authors:
Heather A Piwowar, Roger S Day, Douglas B Fridsma, University of Pittsburgh

Abstract:
Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin. This association between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.

top





POSTER ABSTRACTS



Real-Time Recognition of Falls from Silhouettes

Authors:
Derek Anderson, James M Keller, Marjorie Skubic, University of Missouri-Columbia

Abstract:
A serious problem among the elderly involves falling. The impact can be the source of injury or it could be the amount of time spent on the floor until someone discovers them that is critical. The recognition of falls first requires the reliable segmentation of a human from the background. In order to preserve privacy we extract silhouettes, which are binary maps that only indicate image pixels corresponding to a person's location. Graphics Processor Units, specialized hardware used within the computer gaming and film industries, are leveraged in order to achieve real-time extraction of silhouettes in both color and texture spaces. Two different approaches to fall detection are being explored. The first approach is the application of Hidden Markov Models for temporal pattern recognition using features extracted from the silhouettes. In the second approach we will use fuzzy logic in order to recognize falls in a 3D model acquired from multiple cameras in the same scene. A larger set of fall data, based on what a group of nurses identified as common or important, is being collected in order to demonstrate the effectiveness of these approaches.

top



Long-Term Anticoagulation after Travel-Related Pulmonary Embolus: A Decision Analysis

Authors:
Sara Z Baig, Wendy Golden, Arathi Rajendra, Stephen Pauker, John B Wong, Tufts-New England Medical Center

Abstract:
Our decision consult service was asked whether a 75-year-old rock-climbing physician with an air travel-related pulmonary embolism (PE) should receive lifelong anticoagulation. Using data from the medical literature, we constructed a Markov computer simulation model to compare lifelong anticoagulation to no anticoagulation considering morbidity and mortality risks of recurrent DVT, PE, and major bleeding, and his personal preferences, e.g., avoiding warfarin and considering an intracranial bleed (ICB) to be equivalent to dying. No anticoagulation yielded 10.8 quality-adjusted life years and lifelong anticoagulation would extend his life by 1 quality-adjusted month, despite his preferences to avoid warfarin. Discontinuing anticoagulation strategies after a complication did not alter the preferred strategy of lifelong warfarin. To determine the critical determinants, all parameters were varied over a broad range in sensitivity analysis. Anticoagulation remained favored unless quality of life while taking warfarin fell below 0.98 (baseline 0.99), annual probability of ICB exceeded 0.2% (baseline 0.15%) or annual probability of DVT fell below 13% (baseline 15%). This analysis was consistent with guideline recommendations for idiopathic PE and helped the physician accept lifelong anticoagulation. Our analysis demonstrates how patient-centered values might affect evidence-based risk and benefit assessments to optimize individualized decision-making.

top



Critical Issues in EHR Adoption: Impact of Electronic Discovery on E-Health

Author:
Katherine L Ball, Division of Health Sciences Informatics, Johns Hopkins University School of Medicine

Abstract:
Recent changes of the Federal Rules of Civil Proceedings (FRCP) in the areas of discovery and disclosure, retention and destruction, preservation orders, and spoliation of electronically stored information (the legal electronic health record (L-EHR)) will affect biomedical informatics in many ways. Special considerations should include forms of data production, report outputs, metadata and handling of legacy data.

Because courts are time and document centric, the outputs required of the L-EHR must accurately reflect the healthcare delivered and the information available in the health information technology (HIT) system for medical decision making at the time of care delivery. The renditions of the data derived from clinical HIT systems must account for the processes associated with the normal disruptive workflows of clinical medicine. A well defined L-EHR, consisting of a collection of quality outputs from of our "paperless systems" is critical to the knowledge provided from the science of health informatics and the business of medicine.

Acceptance and adoptions of interoperable HIT technologies will be augmented by skilled informaticians designing and developing applications that most clearly reflect the standard of care delivered and the evidence supporting that care for research in clinical, legal and policy domains. The iterative processes of procurement, development, testing, implementation and evaluation of HIT systems must take into consideration the FRCP for electronic discovery.

Acknowledgments: Author's Fellowship supported in part by Grant 5T15LM007452-05 from the National Library of Medicine.

References: 1. Addison K, Braden JH, Cupp JE, Emmert D, Hall LA, Hall T, et al. Update: Guidelines for defining the legal health record for-disclosure purposes. J AHIMA. 2005 Sep;76(8):64A-G.
2. Foundation of Research and Education of AHIMA. Update: Maintaining a legally sound health record--paper and electronic. J AHIMA. 2005 Nov-Dec;76(10):64A-L.
3. For the text of the FRCP Amendments, see http://www.uscourts.gov/rules/EDiscovery_w_Notes.pdf.
4. Baldwin-Stried K. E-discovery and HIM: How amendments to the federal rules of civil procedure will affect HIM professionals. J AHIMA October 2006 October;77(9):58-60ff.
5. The Sedona guidelines for managing information and records in the electronic age. September 2005. Available from: http://www.thesedonaconference.org/dltForm?did=TSG9_05.pdf.
top



Pilot Study of the Semantic Differential Power Perception (SDPP) Survey

Author:
Christa E Bartos, University of Pittsburgh

Abstract:
This is a pilot study of a survey instrument developed to measure an individual's perception of the personal power they have in their work domain and their perceptions of computerized physician order entry (CPOE). This survey utilizes semantic differential questions which are used to measure attitudes and concepts, and is based on French and Raven's six bases of power. To ensure that the semantic differential word pairs represent the appropriate power bases, reliability and validity testing were performed. The pilot study was administered in an electronic form and distributed via a link within an email. The ten subjects in the pilot study were physicians, nurses, and health unit coordinators from a large tertiary care hospital. Reliability testing was done by performing Cronbach's Alpha for each power base. Validity testing was done by correlating with the Sources of Power Audit survey (an established instrument based on the same bases of power). The reliability is > 0.83 for all but one power base (0.411), and validity is > 0.87 for all but one power base (0.58). Acceptable reliability and validity values were achieved by making minor changes in the semantic differential word pairs used in those power bases.

top



Protein Simulation Data Mining: A Harmonic Domain Classification Based on Wavelet Analysis

Authors:
Noah C Benson, Valerie Daggett, University of Washington

Abstract:
Protein domain classification has traditionally been limited to structural domains, which must consist of continuous amino acids in a polypeptide chain. While these classification methods are useful in understanding structure, describing folding pathways, and designing novel proteins, they often fail to describe the overall dynamics or intra-molecular interactions. We propose a new type of domain classification, harmonic domains, which are defined to be regions of a protein that vibrate similarly on a short timescale. These domains can be extracted from molecular dynamics simulations using wavelet decomposition, which transforms the atomic trajectories into a hybrid frequency-time space, comparison of the data in this space, and clustering. We find that most proteins have a small number of domains that are continuous in space, though not necessarily in sequence, and that harmonic domains may be important in explaining certain aspects of disease.

top



Enabling the Creation of New Drugs: Discovering the Structures of Proteins

Authors:
Christopher A Bottoms, Rajkumar Bondugula, John J Tanner, Dong Xu, University of Missouri-Columbia

Abstract:
By knowing the structures of proteins, we can design drugs that target specific diseases. If a protein can be crystallized, then X-rays can be used to get information from the protein crystal that can lead to solving its structure. Currently, this process can be simplified if a structure of a very similar protein (i.e. at least 40% identical) is known. However, many proteins lack similar structures. Therefore, more laborious methods are required for solving the novel protein structures. A method has been developed for predicting protein structures based on sequence and structural features of proteins that are less than 30% identical. These predicted structures can be used to simplify the process of solving protein structures from X-ray data. This method can enable more automated methods of determining protein structures and thus more efficiently produce drug targets. More specifically, sequence and structural information for a training set of about 6000 proteins is used to correlate sequence information with structural information. These correlations are used to create possible protein structures (e.g. 10000) based on the sequence of the novel protein. Structures that pass quality tests are then used to attempt to help convert the X-ray data into a protein structure.

top



Hybrid Data Integration and Reasoning for Genome Annotation and Curation

Authors:
Eithon Cadag, Dhileep Sivam, Peter Myler, Peter Tarczy-Hornoch, University of Washington

Abstract:
In modern biology much of the analytical phase of genomic research has shifted away from the bench and toward the workstation. Public databases, which house millions of sequences, records and other resources of biological relevance, are a common starting point for scientists interested in "omic" research. While these repositories contain vast amounts of biological information, it is left to the scientist to parse through the data, understand the connections between data sources and make sense of the contents. For example annotation, the process whereby the role of a gene is determined, has become increasingly laborious as the rate of putative gene discovery increases. One solution to this problem involves using a federated, data integration approach to collect annotation-relevant information from disparate databases. We present a system and method which addresses problems in annotation by coupling a data integration system, BioMediator, to an inference engine, to reason on biological information. The hybrid system is capable of annotating genes with a precision as good as or better than current GenBank annotations ~80% of the time. This approach will be used to computationally curate candidate pathogenic genes in infectious prokaryotes before structural analysis, using scientist- and literature-derived rules.

top



Analysis of a Computerized Sign-Out Tool: Identification of Unanticipated Uses

Authors:
Thomas R Campion, Jr., Stuart T Weinberg, Nancy M Lorenzi, Lemuel R Waitman, Vanderbilt University

Abstract:
A computerized tool designed to facilitate physician sign-out has been in use at Vanderbilt University Hospital and Children's Hospital for close to a decade. The authors produced descriptive statistics for a three month period of sign-out tool use. Results showed anticipated use by resident physicians and nurse practitioners to generate and print notes, as well as unanticipated use by nurses, case managers, and medical receptionists/care partners to print providers' notes. Resident physicians and nurse practitioners authored 90% of notes they printed. In contrast, nurses authored only 8% of the notes they printed, despite printing more notes than residents overall. Proportions of unique note generation and printing were similar to nursing for other non-providers. The difference in contributions and printing for providers and non-providers suggests that providers serve as "producers" and non-providers as "consumers" of sign-out content. Although this trend was observed in most units, exceptions included the trauma intensive care unit, where case managers generated 64% of sign-out content, and the medical intensive care unit, where medical receptionists printed 64% of notes. Findings have implications for workflow and redesign of the sign-out tool.

top



MRSA/VRE Infections in Patients Who Require Isolation

Authors:
Randy J Carnevale, Randolph A Miller, Dario A Giuse, Thomas R Talbot, Vanderbilt University

Abstract:
Vanderbilt University Hospital (VUH) policy states that inpatients with active methicillin-resistant staphylococcus aureus (MRSA) or vancomycin-resistant enterococcus (VRE) infections require contact isolation precautions. Nevertheless, isolation is not timely for some inpatients. Measuring physician compliance with the MRSA/VRE isolation policies requires both a reliable means to identify patients' new MRSA or VRE positive cultures and to ascertain which patients are not yet in isolation. To identify MRSA/VRE infections, we created a parser that processes the human-readable-format bacterial culture reports from the VUH lab system, and records culture results in a relational database. Database queries thus identify all cases with active MRSA and VRE infections. Determining patients' isolation is more difficult. Automated review of care provider order entry (CPOE) records can identify some of the isolated patients - those with active isolation orders. Unfortunately, others lack isolation orders, but fortunately, isolation status also is available from the infection control service's records. From these retrospective data, we found that 57% of all MRSA/VRE-infected patients lacked an isolation order, even though many of these patients were actually placed in isolation. There is room to improve isolation practices at VUH through real-time decision support tools. We are currently developing such a tool.

top



The Impact of a Subcutaneous Insulin Decision Support System for Inpatient Glycemic Control

Authors:
Karen Chang, Sona Sharma, Matthew Bair, James Walsh, Center of Excellence on Implementing Evidence-Based Practice, Roudebush VAMC, Indianapolis, IN

Abstract:
Hyperglycemia is a powerful predictor of morbidity and mortality in hospitalized patients. Effective management of inpatient glucoses requires insulin administered as: (1) basal insulin, (2) mealtime insulin, and (3) correction dose insulin. To facilitate effective use of insulin for inpatient glycemic control, a Computerized Patient Record System (CPRS)-based Decision Support System (DSS) for subcutaneous insulin was introduced in late 2004. The purpose of this study was to examine the impact of the DSS on inpatient hyperglycemia (% of hospital days with average daily blood glucose > 180 mg/dL) and hyperglycemia treatment strategies. A retrospective pre and post cohort study was conducted. We examined admissions from a 6 month period before implementation of the DSS in 2004 and from the same 6 months in 2005 and 2006. The proportion of hospital days with hyperglycemia in 2004, 2005, and 2006 was examined using a 2-sample test for equality of proportions with continuity correction. Among patients at high risk to develop hyperglycemia and hospitalized to the Roudebush VAMC for at least 48 hours during study period, significant reductions of hyperglycemia occurred each year. Increased usage of the DSS was also observed over the study period.

top



Using Clinical Laboratory Data and Gene Expression Measurements to Infer Gene Function

Authors:
David P Chen, Atul J Butte, Stanford University

Abstract:
Traditionally, the elucidation of genes involved in biological processes has involved gene expression profiling or the perturbation of genes of interest in model organisms followed by examination of phenotypic response. Although research using model organisms has undoubtedly increased knowledge about human biological processes, their suitability has often been questioned. This, along with the difficulty in the acquisition and examination of human data, creates a distinct gap in our understanding of these processes as they occur in humans. We propose a novel hybrid in silico / in vivo method that uses human phenotypic data in the form of clinical laboratory measurements to infer genes that may be involved in human biological processes. We show that the correlation between clinical laboratory measurements to gene expression measurements across corresponding diseases can identify genes involved in biological processes that have been previously verified as well as predict novel associations. Our results reiterate the influence of IL7 on the amount of lymphocytes in humans and suggest a similar correlation between IL13 and lymphocytes.

top



ChemDB: A Public Database of Small Molecules and Related Chemoinformatics Resources

Authors:
Jonathan Chen, Erik Linstead, S Joshua Swamidass, Dennis Wang, Yimeng Dou, Pierre F Baldi, University of California, Irvine

Abstract:
ChemDB is a chemical database containing ~5M commercially available small molecules. The data is publicly available over the Web for download and for targeted search using a variety of powerful methods. The chemical data includes predicted 3D structure, ideal for docking and other studies, and physicochemical properties such as solubility. Recent developments include optimization of chemical structure (and substructure) similarity search algorithms enabling full database searches in less than a second. A text-based search engine allows efficient searching of compounds and over 65M vendor annotations, such as systematic and common names, and fuzzy text matching capabilities that yield productive results even when the correct spelling of a chemical name is unknown. Finally, built in reaction models enable searches through virtual chemical space, consisting of hypothetical products readily synthesizable from the building blocks in ChemDB. ChemDB is available at http://cdb.ics.uci.edu.

top



ca!: A Low Cost Distributed Sensor Network Utilizing the caBIG™ Grid

Authors:
Martin Cryer and Lewis Frey, University of Utah, Salt Lake City, Utah

Abstract:
Existing solutions for environmental monitoring of events such as a radiological dispersion device detonation are not in widespread use, are dependent upon reliable network communications and do not provide a common data access standard. The ca! project will provide low cost, highly available, sensor devices that utilize first responder vehicle networking resources to provide a self-healing network. Data will be aggregated at a regional level, utilizing the Cancer Biomedical Informatics Grid (caBIG™) and the Cancer Data Standards Repository (caDSR) to provide a data abstraction for use by applications. This data will be made available across the caBIG™ grid network to other caBIG™ compliant applications, enabling a wide area network hierarchy of analysis recourses. First responder applications will utilize local sensor data to increase safety awareness of first responder personnel. Regional caBIG™ certified applications will enable data integration with clinical information systems for patient influx prediction and environmental toxin contamination determination for patients, staff and facilities. Aggregated sensor data will provide information to regional and national agencies in order to be able to monitor more than one simultaneous event. The ca! systems will re-use existing in-use technologies throughout, to provide a low barrier to entry for adoption.

top



KFC Server: Using Decision Trees to Predict Protein Interaction Hot Spots

Authors:
Steven J Darnell, Laura E LeGault, David Page, Julie C Mitchell, University of Wisconsin-Madison

Abstract:
Protein-protein interactions can be altered by mutating one or more "hot spots," the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting the practical value of hot spot predictions. We present two knowledge-based models that improve the ability to predict hot spots: K-FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K-CON uses biochemical contact features. The combined K-FADE/CON (KFC) model displays better overall predictive accuracy than Robetta's computational alanine scanning (Robetta-Ala). In addition, because these methods predict different subsets of known hot spots, a large and significant increase in accuracy is achieved by combining KFC and Robetta-Ala.

We also introduce the KFC Server (http://www.mitchell-lab.org/kfc/), a public web server that uses the KFC model to predict protein interaction hot spots, and an interactive molecular viewer to display the results. The analysis used by the KFC Server is very fast (typically taking less than one minute for most protein complexes) and is well suited for processing jobs from many users.

top



Use and Value of the Consolidated Health Data Repository

Authors:
Jack M Davis, Fellow, Medical Informatics, Michael Lincoln, Medical Informatics Fellowship Director, Omar Bouhaddou, Standards and Terminology Services, Salt Lake City VA Medical Center

Abstract:
The Consolidated Health Data Repository (CHDR) is a bi-directional real time computed data exchange between selected Department of Defense (DoD) and Veterans Affairs (VA) health care facilities. Because of the increased number of military members, both active duty and separated, who seek care in DoD and VA facilities, due to aging as well as the current conflicts in Iraq and Afghanistan, patients admitted for care at one agency facility may be transferred to another agency facility. A crucial aspect of that patient's continuity of care is transfer of his or her health care records in a timely manner, with accuracy, as well as usability.

The primary scope of CHDR is to provide accurate, up-to-date patient information on drug-drug interactions and drug allergies shared between DoD and VA facilities. The purpose is to enhance patient safety, decision support for the clinicians, and accuracy in data transmitted between facilities.

Since June, 2006, CHDR has been used in selected DoD and VA health care facilities. To assess the value of CHDR data, a questionnaire has been submitted to health care providers in both DoD and VA health care facilities who have had access to, been trained in, and use CHDR data. The purpose of the questionnaire is to assess the use and value, as well as the trustworthiness of the CHDR data in behalf of the patient transferred from a DoD to VA, or VA to DoD, health care facility.

top



Amplification Distortion Test: A Method to Fine Map Selection in Tumors

Authors:
Ninad Dewal, Columbia University, Matthew Freedman, Dana Farber Cancer Institute, Harvard Medical School, Thomas LaFramboise, Case Western Reserve University, Itsik Pe'er, Columbia University

Abstract:
Selection of amplified genomic segments in particular cellular lineages drives tumor development. However, pinpointing genes under such selection has been difficult due to these regions' large sizes. We propose a new method, called Amplification Distortion Test (ADT), that identifies specific nucleotide alleles that confer better survival for tumor cells when somatically amplified. ADT will draw upon existing statistical and population genetics techniques, such as the Transmission Disequilibrium Test, and be extended to evaluate distortion on individuals' haplotypes in addition to single markers, via adaptation of haplotype-based computational methods. Due to its genome-wide scale, such analysis presents computationally complex challenges, which will be addressed through novel algorithms. ADT is in its formative stages but shows strong potential; a prototype version has revealed amplification distortion on human chromosome 17q. We plan to test and apply our method on 1900 tumor samples that are typed for copy number variation and 240K single nucleotide polymorphisms genome-wide, obtained from our collaborators at the Broad Institute, Dana Farber Cancer Institute, and Case Western Reserve University. We show that using ADT to identify tumor-preferred alleles would signify its success, thus paving a new path for cancer-gene discovery with potential therapeutic and diagnostic benefits to clinical oncology.

top



Details of a Successful Clinical Decision Support System

Authors:
Jeff Friedlin, Paul R Dexter, J Marc Overage, Regenstrief Institute, Inc, and Indiana University School of Medicine

Abstract:
Computerized physician order entry (CPOE) with clinical decision support (CDS) is regarded as a highly effective way to improve the quality of health care and increase patient safety. Creating a CPOE/CDS system is a complex task, and some fail despite time consuming and expensive development. The CPOE system at the Regenstrief Institute incorporates sophisticated CDS and is one of the oldest and most successful in the U.S. We recently completed a full analysis of our CPOE/CDS system and present details of its structure, functionality and contents. The 1,306 total rules in our CDS system consist of 898 (69%) supporting rules, and 408 (31%) reminder rules. Approximately 72% of the reminder rules are triggered by a treatment order, 15% by the order of a diagnostic test, 10% by entry of a diagnosis or problem, and 3% by the passage of time. Reminder rules are generally complex. Only 61(15%) use just 1 supporting rule and the average number of supporting rules used by reminder rules is 25. We describe our method of monitoring our CDS system and its potential use in provider education and cost analysis. Our successful CDS system can serve as a model for the future development of similar systems.

top



Interpreting Weight Data: The Moderating Effect of Presentation and Personality

Authors:
Jeana Frost, Julia Braverman, Boston University

Abstract:
While communicating health information is central to medical informatics, the ideal presentation of information is not well understood. Using weight loss as an example, we explored how changes to the format and representation of data alter patients' interpretation of the underlying dataset. We compared presentation types (tables and graphs) and reference points (baseline and goal) on subsequent evaluations of the data and document how these evaluations interact with a personality trait, in particular a preference for preventing adverse consequences versus optimizing positive outcomes. 452 people participated in an online study. Main effects were observed with participants predicting greater future weight loss when goals rather than baseline levels were highlighted and when the data was presented in a table versus a graph. Irrespective of presentation type, personality impacted evaluations with promotion-focused participants predicting greater success than prevention-focused participants. For graphs, there was a three way interaction between personality type and data framing with personality influencing which type of graph framing was most useful. This work suggests how design choices impact patients' reactions to clinical information and how personality traits factor into those reactions.

top



Agile Semantic Meta-data Evolution in Proteomics LIMS

Authors:
Robert E Gorlitsky, John H Schwacke, Medical University of South Carolina

Abstract:
Scientific knowledge in general and the field of proteomics in particular, are growing and changing at an increasingly rapid pace. Laboratory information management systems (LIMS) are not sufficiently adaptable to keep up with the rapidly changing research environment because knowledge management experts must be consulted to update the semantic structure of the data stored in these systems. Ideally, the biologists should be able to easily incorporate new data elements into a LIMS such that the semantic structures can be iteratively refined over time while maintaining compatibility with earlier datasets. This project aims to incorporate semantic meta-data evolution functionality into a proteomics LIMS. A novel semantic similarity algorithm for ranking the similarity of ontology terms is being created and validated. An web-based ontology management interface is being created which will incorporate this algorithm to advise users of related terms in an existing ontology. This interface will then be incorporated into the new MUSC proteomics LIMS and evaluated for usability. The experienced gained through the creation and usage of this tool will lead to insights about the advantages and disadvantages of incorporating user-driven ontology evolution into data management systems.

top



Building Clinical Information System Balance Score Card: A Retrospective Study of Understanding Clinical Automation and Usability Scores to Improve Healthcare IT Decision Making

Authors:
Dwayne Grant, Neil Powe, Ruben Amarasingham, Harold Lehmann, Aaron Cunningham, Johns Hopkins University

Abstract:
Health Information Technology (HIT) implementation is a very expensive endeavor and healthcare organizations spend between $17 and $42 billion dollars on HIT per year. Health IT decision-maker are responsible for planning the acquisition and implementation of health information technology. There are many organizational factors that influence the adoption of health information technology. In this project, benchmark data was used to present comparative analysis amongst hospital clinical information system usability and automation scores using a validated instrument This project describes the CIT reports and approach to developing a survey instrument, and preliminary results of the follow-up study regarding health IT decision makers' use of benchmark data. The project aim is to understand the information needs of the health IT decision maker to build future development of a clinical information system balance score card. Future work will also focus on the development of information architecture to present clinical information system benchmark data.

top



A Cognitive Approach to Understanding Physician Use of CPOE: Implications for Improving Technology Acceptance

Authors:
Kenneth P Guappone, Joan Ash, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University

Abstract:
The purpose of this qualitative study is to understand the cognitive processes that occur as physicians use computerized provider order entry (CPOE) systems. The participants are community hospital physicians using in-patient commercial systems. The usability engineering methods consist of a survey to establish their computer experience and "think-aloud" sessions. These sessions record the user in the live environment, collecting real-time images of each computer screen with screen capture software, user verbalizations which are recorded and then transcribed verbatim, and physical actions with a video recorder. The data will be analyzed with software that merges the audio verbalizations, the written transcriptions and the images of the computer screen and of the user. Through protocol analysis, the cognitive goals of subjects doing similar tasks will be recorded and analyzed, so as to understand the cognitive model that users create based on Norman's theory of action. Comparing these models to the intended task and the interface of the system, I hope to advise changes in further interface development, create ideas for user training, guide system implementation and understand future research needs in this area of study.

top



Predicting the Risk and Trajectory of Intensive Care Patients using Survival Models

Authors:
Caleb W Hug, Peter Szolovits, Massachusetts Institute of Technology, Cambridge, MA

Abstract:
Increasing availability of clinical, laboratory and signals data about patients receiving intensive care makes realistic the possibility of having computers "understand" the evolving condition of a patient. We propose a patient representation that leverages patient outcome prediction models in order to illuminate the current state and the near-term trajectory of a patient as he or she progresses through the ICU. Patient acuity scores such as the Simplified Acute Physiology Score (SAPS) and Acute Physiology and Chronic Health Evaluation (APACHE) score are commonly available in the ICU but their calibration typically limits the score to one value at 24 hours after admission. We have studied patients from retrospective data in a variety of intensive care units at a Boston-area hospital. In this data set, we have identified 13,327 patients with 156 characteristics for study. Our representation uses a time-dependent Cox proportional hazards model and considers the large number of patients who leave the ICU alive to be censored cases. This representation predicts final patient outcome slightly better than the SAPS I. More importantly, these continually available predictions demonstrate interesting trends as a patient progresses through the intensive care unit.

top



pH-Dependent Conformation of Adenine on Gold Surfaces via Surface Enhanced Spectroscopy: Theoretical and Experimental Investigation

Authors:
Benjamin G Janesko, Oara Neumann, Janardan Kundu, Dongmao Zhang, Naomi J Halas and Gustavo E Scuseria, Rice University

Abstract:
The interaction between DNA and metal surfaces is important for several existing and proposed nanobiological systems. Strong interactions between gold and the DNA base adenine have been known for decades. However, the precise binding mode is incompletly characterized, particularly for solvated systems. We present experimental evidence that both the bare adenine base and adenine-containing DNA show similar reversible, pH-dependent changes in their binding to tunable gold plasmonic substrates. These substates enable the system to be probed by surface-enhanced infrared and Raman spectroscopies. Ab initio electronic structure calculations are performed on several model adenine-gold complexes, and the calculated spectra are compared with experiment. This combination of multiple spectroscopic techniques and high-level calculations provides novel insights into the pH-dependent interaction.

top



Using Pseudotorsions to Build Structure into RNA Electron Density

Authors:
Kevin S Keating, Leven M Wadley, Anna M Pyle, Yale University

Abstract:
The role of RNA as an information carrier in the cell has long been known. However, recent studies have found increasingly complex roles for RNA, including RNA molecules that catalyze reactions and regulate gene expression. The backbone of RNA is critical to these functions; however, extensive studies of the backbone are hampered by the difficulty of accurately determining its structure. When using X-ray crystallography, complications arise due to both the large number of torsional angles per residue and the difficulty of obtaining well-diffracting crystals.

A previously developed pseudotorsional system allows for accurate and automated analysis of low resolution RNA structures. This system involves two virtual dihedrals that use only the phosphate and C4' atoms of each nucleotide. However, this analysis does not alleviate the difficulty of accurately locating other backbone atoms in low resolution structures. To aid in this, we have combined the pseudotorsions with an all-atom backbone rotamer library. The pseudotorsions can then be used to predict the appropriate rotamer during the structure building process, which will allow for accurate determination of the RNA backbone in low and medium resolution structures.

top



Detection and Analysis of Fixed Nucleosome Positions in Human Hox Clusters

Authors:
Peter Kharchenko, Peter Park, Children's Hospital Informatics Program, Boston Caroline Woo, Robert Kingston, Department of Molecular Biology, Massachusetts General Hospital

Abstract:
Chromatin structure is thought to play a critical role in regulation of gene expression. Studies of specific loci have illustrated that chromatin changes can modulate DNA accessibility, however scope of such changes and the mechanisms by which they occur remain poorly understood. Towards this goal we investigate distribution of fixed nucleosome positions in 0.5Mb of human genome which covers all four Hox gene clusters. Using genome tiling arrays we measured DNA protection patterns established by micrococcal nuclease digestion in two human cell lines. An HMM-based algorithm was used to identify putative locations of fixed nucleosomes, corresponding linker regions and regions that are likely to contain delocalized nucleosomes. Based on the biological replicates and immunopercipitation measurements using histone H3 antibodies, we estimate that the false positive rate of fixed nucleosome predictions is below 20%. Using predicted nucleosome positions we are able to analyze nucleosome density at various genomic regions, and illustrate its dependency on transcriptional activity. We identify regions of nucleosome depletion, and classify the differences in nucleosome distribution between HeLa and K562 cell lines. Our results illustrate the degree to which changes in nucleosome positions at various genomic regions are related to regulation of transcriptional activity.

top



Using Logical Semantic NLP and UMLS Mapping to Tailor Online Information Retrieval

Authors:
Susan Kossman1, Josette Jones3, and Patricia Flatley Brennan1,2
1 University of Wisconsin-Madison, School of Nursing
2 University of Wisconsin-Madison, College of Engineering
3 Indiana University School of Nursing and School of Informatics

Abstract:
Retrieving appropriate information to support necessary knowledge and skills for chronic care management can be difficult. This poster presents initial work building a system for retrieving pertinent, current online healthcare information at the point of need tailored to consumer's and healthcare provider's information and learning preferences. We are using a logical semantics approach to natural language processing (NLP) of web-based information resources related to management of depression in adolescents. This approach parses text at the sentence level, then uses string matching and probabilistic likelihood to map sentences to unique codeable propositions representing their meaning (1). To build the knowledge base of propositions, we identified a base set of high quality webpages through focused searching of PubMed and MedlinePlus in topic areas identified through a needs assessment in the Blue Sky study. Propositions developed describe semantic meaning of page content and user attributes. To increase information retrieval accuracy, sensitivity and precision, we mapped propositions to a set of UMLS terminologies relating to nursing and psychology (for use in a tailoring algorithm) and to MeSH and SNOMED CT (for use in an algorithm triggering automatic updates to the knowledge base by matching key words in new articles published in PubMed and MedlinePlus).

Reference:
1. Jamieson P. Process and system for extracting the semantics of sentences in a knowledge domain. Patent Application. 2006.

top



Benefits of Ensemble Refinement in Macromolecular Crystallography

Authors:
Elena J Levin, Dmitry A Kondrashov, Gary E Wesenberg, and George N Phillips Jr., University of Wisconsin-Madison

Abstract:
In crystallography, each atom is represented by three coordinates indicating its average position and a temperature factor describing the variance of a isotropic, harmonic distribution of deviations from that mean. Although this single conformer model is adequate for describing the average structure of proteins, it is limited in its ability to account for their dynamic motion, which may be anisotropic, anharmonic and multimodal. An alternative approach, called ensemble refinement, is to use multiple copies of the entire protein, each accounting for a fraction of the total electron density. In this study we apply ensemble refinement to a set of 50 experimental X-ray structures solved by the Center for Eukaryotic Genomics as well as three simulated datasets generated by molecular dynamics simulations. We then carry out a systematic evaluation of the technique's benefits relative to single conformer refinement. The results suggest that ensemble refinement lead to notable improvements in the agreement between the crystallographic model and the experimental data for the majority of the structures tested, driven primarily by improved modeling of the magnitude and directionality of the proteins' motions.

Support was provided by NIH PSI grants p50 GM64598 and U54 GM074901, as well as training grants T15 LM007359 and GM07215.

top



Heuristic Sample Selection to Minimize the Reference Standard Training Set for a Part-of-Speech Tagger

Authors:
Kaihong Liu, Wendy Chapman, Rebecca Hwa, Rebecca S Crowley, University of Pittsburgh

Abstract:
Part-of-speech tagging represents an important first step for most medical NLP systems. The majority of current statistically-based POS taggers are trained using a general English corpus. Consequently, these systems perform poorly on medical text. Annotated medical corpora are difficult to develop because of the time and labor required. We investigated a heuristic-based sample selection method to minimize annotated corpus size for retraining a Maximum Entropy (ME) POS tagger. We developed a manually annotated domain specific corpus (DSC) of surgical pathology reports and a domain specific lexicon (DL). We sampled the DSC using two heuristics to produce smaller training sets and compared the retrained performance against (1) the original ME modeled tagger trained on general English, (2) the ME tagger retrained on the DL, and (3) the MedPost tagger trained on MEDLINE abstracts. We found that the ME tagger retrained with a DSC was superior to the tagger retrained with the DL, and also superior to MedPost. Heuristic methods for sample selection produced performance equivalent to use of the entire training set, but with many fewer sentences. Learning curve analysis showed that sample selection would enable an 84% decrease in the size of the training set without a decrement in performance.

top



Context-Aware Mapping of Gene Names Using Trigrams

Authors:
ThaiBinh Luong, Nam Tran, Michael Krauthammer, Yale University

Abstract:
We are working on text mining strategies to uncover gene disease associations from the biomedical literature. One of the current bottlenecks in text mining is the ability to accurately recognize gene names, and link them to external database identifiers. Here, we present a method to map gene names to their respective NCBI EntrezGene identifiers. We use a combination of two methods to address term variability and term ambiguity when mapping biomedical terms. The first method involves splitting potential gene strings, or "entities", into overlapping groups of three alphanumeric characters (trigrams). This approximate-mapping step examines entities based on the actual makeup of the gene string, and successfully addresses term variability. The second, fine-mapping method examines the context of these entities (the words in an abstract) to disambiguate between genes that are lexically close. Our approach is unique in that it is realized as a sequence of simple matrix manipulations, which allows for a fast implementation of the algorithm.

top



Using Term Frequency to Identify Trends in the Media's Coverage of Health

Authors:
Delano J McFarlane, Rita Kukafka, Columbia University

Abstract:
Objective: Determine if term frequency can be used to characterize the media's coverage of health. Methods: SalientNews, a news analysis system that we built, was used to collect and analyze over 10000 news articles published online between October 2006 and March 2007. Rainbow, a statistical text classification program, was used to identify health related news articles. Frequencies were calculated for terms appearing in article titles and descriptions. Trend and usage analysis was conducted for terms appearing in the top 20 most frequent for any week of analysis. Results: 132 terms met the criteria for analysis. Some terms (e.g. new, children, report) had consistently high frequencies (over 10 weeks in top 20) and appeared in various different health news stories. Other terms (e.g. coli, autism) exhibited more dramatic changes in frequency (under 3 weeks in top 20) and were typically associated with specific health related events and debates. Conclusions: Term frequency can be used during automated news analysis to identify terms that characterize the media's coverage of health. Based on preliminary analysis, terms used in health news often emphasized the impact of health issues on children, the importance of new reports and studies, and the sporadic prominence of certain health topics.

top



Enzyme Mechanism Based Modeling and Simulation of Metabolic Pathways in Escherichia coli

Authors:
Tarek S Najdi, Chin-Rang Yang, Eric D Mjolsness, G Wesley Hatfield, University of California, Irvine

Abstract:
To elucidate the systems biology of the model organism, Escherichia coli, we develop mathematical models to simulate carbon flow through the metabolic pathways of central metabolism and pyruvate family amino acid biosynthesis. To achieve this goal, we use kMech/Cellerator enzyme mechanism-based modeling described in the schematic below. Here, we describe a more flexible model in Cellerator, which generalizes the Monod, Wyman, Changeux (MWC) model for enzyme allosteric regulation to allow for multiple substrate, activator and inhibitor binding sites. In addition, we use a random steady state model to describe catalysis by enzyme complexes such as pyruvate dehydrogenase (PDH) and alpha-ketoglutarate dehydrogenase (KGDH) in the TCA cycle. To verify our simulations and models for enzyme mechanisms, especially under conditions of metabolic and genetic perturbations, we explore the overall effect of cellular growth on an alternative carbon sources such as acetate on the central pathways of metabolism and how well this shift correlates with our simulations.

NDCs Collected from Diverse Sources and Translated into Clinical Drug Codes Author:
Linas Simonaitis, Regenstrief Institute

Abstract:
Background: National Drug Codes (NDCs) are often present in the medication order messages sent by pharmacy computer systems. However, NDCs are not suitable for clinical applications. Drug Knowledge Bases (DKBs) have been developed by four commercial enterprises, as well as the National Library of Medicine. Methods: We obtained NDCs from 12 data sources in the Indiana Network for Patient Care (INPC). We also examined copies of six DKBs. We calculated the rates at which NDCs from each data source were mapped to each DKB. Results: The majority of NDCs were successfully mapped to at least one DKB. However, an important minority of NDCs failed to be recognized by any DKB. This failure is due to the fact that most health care institutions invent NDC-like codes for local use. Locally-invented NDC-like codes constituted a substantial fraction of all codes (range: 0.48% to 23%; median: 7.2%), when calculated on the basis of distinct codes. However, they constituted a lesser percentage of codes (range: 0.17% to 7.4%; median: 1.4%) when calculated on the basis of message volume. Discussion: Drug Knowledge Bases can successfully translate NDCs in pharmacy messages into a clinically-oriented set of codes. Health care institutions should limit the creation of locally-invented NDC-like codes.

top



Construction of 3D Morphological Imaging Atlases for Osteoarthritis

Authors:
Hussain Tameem and Usha Sinha, Medical Imaging Informatics, University of California at Los Angeles

Abstract:
3D magnetic resonance (MR) imaging of articular cartilage allows for accurate morphological assessment with relevance for identifying osteoarthritis (OA) status and, subsequently, monitoring progression and response to treatment. We propose the creation of morphological atlases of the cartilage using normal subjects segregated by age, sex, and gender. These atlases capture the variation of shape in normal subjects and are then used to classify new imaging studies as belonging to "normal" (asymptomatic of OA) or "abnormal" (symptomatic of OA) populations. The classification is performed by: 1) analysis of the 3D deformation field required to move imaging voxels to their corresponding locations in the atlas, such that deformations beyond 2 standard deviations of normal variation constitute regions with large morphological changes; and 2) generating active shape models from the normal subject data and using shape coefficients to classify cartilage morphology. The methodology is evaluated with an atlas of 20 normal subjects in one sub-type and testing the classification potential with three symptomatic and another three asymptomatic subjects of OA.

top



Validation of Primary Care Provider Data in Hospital Admission Records

Authors:
Jacob Tripp and Stanley Huff, University of Utah, Salt Lake City, Utah

Abstract:
Healthcare is becoming increasingly specialized. Patients with chronic co-morbidities often are treated by numerous specialists, and primary care providers are forced to attempt to coordinate the care provided by all of these specialists. Often critical details of inpatient care episodes are not successfully communicated to outpatient providers charged with follow-up care. Vital to the successful communication of these details is the correct identification of who should be receiving this communication. This project will describe various methods of validating Primary Care Provider identification in a large EMR that stores data for both inpatient and outpatient data. In 2006, 29,792 inpatient encounters were recorded, and of these, 20,291 had a primary care provider recorded upon admission or during the encounter. We would like to know how accurate these records are, and currently looking at criteria including whether or not the patient later had an outpatient encounter with the provider identified as their primary care provider during their inpatient encounter.

top



Using Array Data Type to Store Microarray Data in an Object-Relational Database - PostgreSQL

Authors:
Lam C Tsoi and W Jim Zheng, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina

Abstract:
A well-designed microarray database can provide valuable information on gene expression levels. However, designing an efficient microarray database with minimum space usage is not an easy task since designers need to integrate the microarray data with the information of genes, probe annotation, and the descriptions of each microarray experiment. Developing novel methods to store microarray data can greatly improve the efficiency and usefulness of such data. A novel schema is proposed to store microarray data by using array data type in an object-relational database management system - PostgreSQL. The implemented database can store all the microarray data from the same chip in an array data structure, and the variable length array data type in PostgreSQL can store microarray data from same chip. The implementation of our schema can help to increase the data retrieval and space efficiency. By using array data type to store all the experimental results of the probe, the storage size and the performance of the database can be greatly reduced by keeping annotation and expression value in the same table. Processing queries will also be more efficient since the number of records in the table is minimized, and accessing the array by index is very fast.

top



Adaptive Multi-Scale Stochastic Simulations for Chemical Reaction Systems

Authors:
Jesse H Turner, Dennis Cox, Rice University

Abstract:
Modeling the time evolution of a biochemical system in a cellular environment poses a challenge for quantitative biologists. Three major approaches model the time dependent behavior of biochemical systems: ordinary differential equations (ODEs), stochastic differential equations (SDEs), and Gillespie's stochastic simulation algorithm.

Unfortunately, many chemical reaction systems exhibit characteristics not well captured individually by any of these methods. Therefore, a hybrid model incorporating aspects from all three must be employed. The aim is to construct one that is close in accuracy to Gillespie's algorithm, but comparable in computational speed to a differential equations approach. Then, applications to biological reaction systems would be more tractable.

top



Statistical Image Processing for Hearing Aids

Authors:
Eilat Vardi-Gonen and Gabor T Herman, The Graduate Center, City University of New York

Abstract:
A novel image processing approach to signal processing for the hearing aid application is described. The goal of the work is to increase intelligibility of noisy speech signals. Being that the application is hearing aids, signal processing needs to be performed in "real-time." The methodology will include the following steps as the noisy speech signal arrives: 1) transform a section of the noisy signal into a noisy grayscale image column, 2) create a binary mask for the noisy grayscale column, 3) estimate a grayscale column that corresponds to clean speech, and 4) synthesize the estimated clean speech signal."

top



A Regression Model for Predicting Conditional Survival for Head & Neck Cancer Patients

Authors:
Samuel J Wang1, Clifton D Fuller2, Dean F Sittig3, John M Holland1, Charles R Thomas1 1Department of Radiation Medicine, Oregon Health & Science University 2Department of Radiation Oncology, University of Texas Health Science Center 3Applied Research in Medical Informatics, Northwest Permanente, PC

Abstract:
Conditional survival (CS) accounts for changing risk over time and is a more accurate estimate of prognosis for patients who have already survived a time period following diagnosis and treatment. The purpose of this study was to construct a statistical model and web-based tool to predict individualized CS for head & neck cancer survivors. Using 27,825 head & neck cancer patients diagnosed between 1988-97 from the NCI Surveillance, Epidemiology, and End Results 17 (SEER) database, we built a multivariate Cox proportional hazards regression prediction model. Patient and tumor characteristics included as covariates were age, sex, race, tumor site, stage, and grade. The primary endpoint was overall CS. The model was validated for discrimination using the concordance index and a calibration plot was constructed. Bootstrapping was used to correct for optimistic bias. A web-based software tool was built to calculate customized CS probability. The regression model showed good calibration and discrimination with a bootstrap-corrected C-index of 0.71. For a 65-yr old white male with a moderately-differentiated tonsil cancer with regional lymph nodes, the predicted 5-yr overall CS increased from 50% at diagnosis to 63% after 3 years. This regression model can accurately predict CS for head & neck cancer patients.

top



Extracting Subject Demographics from Abstracts of Randomized Clinical Trial Reports

Authors:
Rong Xu, Yael Garten, Kaustubh S Supekar, Amar K Das, Russ B Altman, Alan M Garber, Stanford University

Abstract:
In order to make more informed healthcare decisions, consumers need information systems that deliver accurate and reliable information about their illnesses and potential treatments. Reports of randomized clinical trials (RCTs) provide reliable medical evidence about the efficacy of treatments. Current methods to access, search for, and retrieve RCTs are keyword-based, time-consuming, and suffer from poor precision. Personalized semantic search and medical evidence summarization aim to solve this problem. The performance of these approaches may improve if they have access to study subject descriptors (e.g. age, gender, and ethnicity), trial sizes, and diseases/symptoms studied.

We have developed a novel method to automatically extract such subject demographic information from RCT abstracts. We used text classification augmented with a Hidden Markov Model to identify sentences containing subject demographics, and subsequently these sentences were parsed using Natural Language Processing techniques to extract relevant information. Our results show accuracy levels of 82.5%, 92.5%, and 92.0% for extraction of subject descriptors, trial sizes, and diseases/symptoms descriptors respectively.

top