![]() |
|
| Home > Bibliographic Services Division > Information for Licensees > Licensee Research Use of MEDLINE®/PubMed® Data | |
This report is sorted alphabetically by organization/institution. It lists NLM's licensees who have submitted
information about their use of the data for research and permitted NLM to make their information available on the Web.
Click on the Research Category title to view the summary report for that research category. General information about NLM's Web site for licensees' research
projects is available.
| 1 |
Advanced Health Media 2840 Morris Ave, Union, NJ 07083 | ||
|
Eric Johnson |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: IM2 KOL Identification Purpose/Goal: To identify therapeutic class experts to act as investigators or speakers for healthcare companies Results/Outcomes, e.g., data or schema/tools developed: We have developed a search and query interface for capturing the necessary information. While still in its infancy, we are working to expand our capabilities through this platform. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://www.insiteresearch.net | |||
| 2 |
Arity Corporation Research & Development | ||
|
Peter Gabel peter.gabel@arity.com |
Pamela Schaepe pamela.schaepe@arity.com | ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: NLP Information Extraction for Biomedical Research & Development Purpose/Goal: It is critical to pharmaceutical industry to acccurately assess enormity of litearture for safety data. Natural Language Processing techniques should contribute significantly to this process. Results/Outcomes, e.g., data or schema/tools developed: Project in progress. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 3 |
ATA SpA - Advanced Technology Assessment | ||
|
Massimo Riccaboni info@atalab.com |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: The focus of this project is to parse affiliations associated with MEDLINE/PubMed records and link them to GIS data in order to generate statistics on worldwide scientific productivity in the Life Sciences at different aggregation levels. Ad- hoc text-mining techniques are being developed and tested with MEDLINE/PubMed, CatfilePlus, and Serfile data, in order to produce a set of tools to analyze un-structured and semi-structured information from scientific publications in the Life Sciences. Results/Outcomes, e.g., data or schema/tools developed: This is an ongoing project; preliminary results indicate that automated or semi-automated extraction and identification of geographical information from PubMed affiliations are possible, even if a detailed investigation is still needed in order to optimize extraction of pieces of information such as institution names. Improvements are also needed as to recognition of city names (with a special emphasis on Unicode compatibility). Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 4 |
Canaledge Inc. | ||
|
Yoshiyuki Kobayashi yashi@canaledge.com |
Takao Asanuma asanuma@canaledge.com | ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: text-mining system development Purpose/Goal: Redistribution or provision of access to the results of our research to our customers. Following products or product components will be provided. 1.Relationship Data 2.Pathway Viewer 3.Technical Term Highlight Function 4.Tagged Corpus. * The Medline data will be used internally for text-mining. The research results produced by our text-mining process will be provided to our customers, but the actual Medline citation data. We will not provide a commercial citation search service such as is available via PubMed using Medline data. Results/Outcomes, e.g., data or schema/tools developed: Still under development. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 5 |
Carnegie Mellon University | ||
|
Eric Nyberg |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: Apply information extraction and semantic indexing techniques developed in the JAVELIN project to the MEDLINE data Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 6 |
ChemGenex Pharmaceuticals | ||
|
Jeremy Jowett jjowett@idi.org.au |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: Data mining for functional information on positional candidate genes for diabetes and obesity Purpose/Goal: Human genetics research using large family pedigrees and genetic linkage analysis allows the identification genomic intervals that are likely to contain disease influencing genes. Each interval may contain as many as 300 genes, To rank genes for further analysis, we incorporate the Pubmed data into a search engine to reveal functional information about each gene. Results/Outcomes, e.g., data or schema/tools developed: We have successfully used the database to assist in ranking of candidate genes located within genetic intervals analysis of the top ranked genes is ongoing. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 7 |
Children's Hospital of Philadelphia | ||
|
Peter White white@genome.chop.edu |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Mining the bibliome: Information extraction of the biomedical literature Purpose/Goal: Our goal is qualitatively better methods for automatically extracting information from the biomedical literature, relying on recent progress and new research in three areas: high-accuracy parsing, shallow semantic analysis, and integration of large volumes of diverse data. We are focusing initially on pediatric oncology. This application, worthwhile in its own right, provides an excellent test bed for broader research efforts in natural language processing and data integration. In particular, we propose to develop and test new general methods for information extraction from text, based on our ongoing research in corpus-based algorithms for parsing, predicate-argument analysis and reference resolution.We will apply these general methods to particular problems in biomedical information extraction. The engine of recent progress in language processing research has been linguistic data: text corpora, treebanks, lexicons, test corpora for information retrieval and information extraction, and so on. As part of the project, we are developing and publish new linguistic resources in three categories: a large corpus of biomedical text annotated with syntactic structures (Treebank) and shallow semantic structures ("proposition bank" or Propbank); a large set of biomedical abstracts and full-text articles annotated with entities and relations of interest to researchers, such as enzyme inhibition, or mutation/cancer connections (Factbanks); and broad-coverage lexicons and tools for the analysis of biomedical texts. Results/Outcomes, e.g., data or schema/tools developed: A general machine-learning algorithm for entity tagging of biological object classes. High-accuracy entity tagger instances for identifying genes, genomic variations, and malignancy types. A procedure for accurate normalization of gene mentions. Automated tagging of all MEDLINE content with the aforementioned taggers, and automated normalization of all gene mentions. Ability to generate lists of genes implicated in specific biological processes by biomedical text rapidly and more accurately than domain experts. A query interface with query expansion from normalization to search MEDLINE for mentions of human genes. Citations of related published papers: McDonald RT, Winters RS, Mandel M, Jin Y, White PS, Pereira F. An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics, 20:3249-3251, 2004. McDonald R, Pereira F, Kulick S, Winters RS, Jin Y, White PS. Simple algorithms for complex relation extraction with applications to biomedical IE. Proceedings of the Association for Computational Lin Jin Y, McDonald RT, Klerman K, Mandel MA, Liberman MY, Pereira F, Winters RS, White PS. Identifying and extracting malignancy types in cancer literature. Proceedings of BioLINK 2005, in press. Web addresses to pertinent sites further describing the work and/or its results: http://bioie.ldc.upenn.edu/ http://fable.chop.edu | |||
| 8 |
CINECA | ||
|
Andrew Emerson a.emerson@cineca.it |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: MedMOLE - Mining On-Line Expert on MedLine Purpose/Goal: DNA microarray technology is a high throughput method for gaining information on gene function. This large amount of data can be analyzed to identify groups of genes that share common expression characteristics, but the obtained results provide little information regarding the presence of functional biological correlations of genes within clusters. The published literature, on the other hand, provides a potential source of information to assist in interpretation of clustering results. We have developed a tool (MedMOLE) that improves the comprehension of microarray experimental results by grouping co-regulated genes on the basis of the informational content of MEDLINE documents. The tool relies on two components: a gene name extractor and a mining algorithm. The name extractor is based on existing dictionaries of gene names and aliases. The mining algorithm analyses the co-occurrences of words in the selected documents in order to automatically interpret the context!, identify where the gene names appear, and map documents/genes into functional classes Results/Outcomes, e.g., data or schema/tools developed: The medmole tool which is freely available on the web. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://medmole.cineca.it/ | |||
| 9 |
Columbia University | ||
|
Stephen Johnson sbj2@columbia.ed |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: CIQR Purpose/Goal: CIQR attempts to fill gaps in physician knowledge. To do this, it is vital to understand the goals and environment of the users who are seeking information. Information seeking strategies are complex plans for finding information that go far beyond simple queries based on a few keywords. Strategies are efficient in prioritizing where to look for information, and adaptive in the sense that finding too little or too much material can lead to trying another method. There is evidence that experts are far better at constructing successful strategies, but more research needs to be conducted in the health care domain on the analysis and representation of techniques for seeking information. We will experiment with automating complex search strategies on the Medline database. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://lucid.cpmc.columbia.edu/ciqr/ | |||
| 10 |
David Calloway | ||
|
David Calloway calloway@novatechnologies.net |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: wikipdf Purpose/Goal: This research is primarily targeted at helping students and researchers understand technical papers that are often littered with unfamiliar terms and domain-specific jargon. The wikipdf tool is an on-line web site (www.wikipdf.com) that automatically generates glossaries of terms extracted from Adobe PDF journal articles. The glossaries include short (~1 paragraph) definitions of all rare and/or unusual words, along with links to the wikipedia for complete, detailed descritpions. Results/Outcomes, e.g., data or schema/tools developed: This project is currently under development, but a prototype can be found at www.wikipdf.com. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://www.wikipdf.com | |||
| 11 |
EMBL-European Bioinformatics Institute Rebholz Group | ||
|
Peter Stoehr |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Whatizit Purpose/Goal: The group focuses on extraction of facts from scientific literature in molecular biology. This is mainly based but not limited to Pattern Matching and other High-Throughput methods. The group has experience in chunk parsing, natural language processing (NLP), and has applied its methods to different tasks (refer to publications). This includes identification of terminology, of abbreviations, of mutations and of relations between named entities, e.g. protein-protein interactions. Results/Outcomes, e.g., data or schema/tools developed: Two tools have been developed and are freely available for use, 'Whatizit' and 'EBIMed'. 'Whatizit' employs a pipeline of filters to enrich plain text with annotation. Interesting pieces of text (eg gene and protein names, GO terms) are combined into XML elements. Some of the XML tags applied carry link information to biological databases. 'EBIMed' combines information retrieval and extraction from Medline. EBIMed finds MEDLINE abstracts and analyses them to offer a complete overview on associations between UniProt protein/gene names, GO annotations, Drugs and Species. The results are shown in a table that displays all the associations and links to the sentences that support them and to the original abstracts. Citations of related published papers: Rebholz-Schuhmann, D., Kirsch, H. and Couto, F. Facts from text-is text mining ready to deliver?. PLos Biol. 2005, 3 (2): e65. PMID: 15719064 Web addresses to pertinent sites further describing the work and/or its results: http://www.ebi.ac.uk/Rebholz-srv/whatizit/form.jsp http://www.ebi.ac.uk/Rebholz-srv/ebimed/index.jsp http://www.ebi.ac.uk/Rebholz/ | |||
| 12 |
Erasmus MC Department of Urology | ||
|
Guido Jenster g.jenster@erasmusmc.nl |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: Gene information extraction from MEDLINE Purpose/Goal: Two problems we encounter are i) the need for detailed information on the large numbers of genes-of-interest that are derived from high-throughput data analyses, and ii) automated extraction of this gene-information from the MEDLINE literature database. The main objectives of our research project are to summarize knowledge and extract new knowledge from the combined gene-information from MEDLINE. To achieve this, we extract gene-information from MEDLINE based on gene-concept co-publication and combine this data for users to mine. Results/Outcomes, e.g., data or schema/tools developed: We have generated search strings for 15.621 genes based on full gene name, symbol and aliases and identified the abstracts in which they were mentioned. In addition, thesauri for diseases, tissues, molecular functions, biological processes, and cellular components were generated and used to extract the MEDLINE abstracts in which each concept of each thesaurus was mentioned. Co-publication matrices were generated of gene-concept pairs. An interface was constructed to mine the co-publication data, called CoPub Mapper. Citations of related published papers: Alako BT, Veldhoven A, van Baal S, Jelier R, Verhoeven S, Rullmann T, Polman J, Jenster G. CoPub Mapper: mining MEDLINE based on search term co-publication. BMC Bioinformatics. 2005;6(1):51. PMID: 15760478 Web addresses to pertinent sites further describing the work and/or its results: http://www.erasmusmc.nl/gatcplatform | |||
| 13 |
Food Industry Research & Development Institute | ||
|
Wang Chun-lin clw@firdi.org.tw |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: Purpose/Goal: MEDLINE/PubMed is the most valuable biology/medical text resource in the world. There are numbers of bioresources reported in the MEDLINE/PubMed. The Objective of the research project we are going is to establish the knowledge of bioresource (mainly microbiologys) usage in the MEDLINE/PubMed text. Results/Outcomes, e.g., data or schema/tools developed: The project is still running and we are trying to find some microbes published in PubMed which is available in our local deposity. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 14 |
Fujitsu Limited Makuhari Systems Laboratory | ||
|
Shuhei Kinoshita kino@strad.ssg.fujitsu.com |
Masato Mori masatom@strad.ssg.fujitsu.com | ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Bio Chemical Information Project Purpose/Goal: Our goal is to find therapeutic target proteins and to design new drug by in silico method. Results/Outcomes, e.g., data or schema/tools developed: 3.5 millions PPI or PCI information is extracted by our original method from Medline abstracts. The accuracy is about 80%. Some therapeutic target proteins are extracted by our biologists. Citations of related published papers: Kinoshita S, Cohen DB, Ogren PV, HunterL, BioCreAtIvE Task1A: entity identification with a stochastic tagger. BMC Bioinformatics 2005; PMID: 15960838 Web addresses to pertinent sites further describing the work and/or its results: | |||
| 15 |
Marquette University | ||
|
Craig Struble craig.struble@marquette.edu |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: We have two primary research projects using MEDLINE. The first is to identify experimental techniques used in research articles. The motivation is to allow investigators to assess the quality of research results in a paper based on the techniques used. The other project is to extract information about protein kinase inhibitors. The objective of this project is construct a database of protein kinase inhibitor information for drug screening and design purposes. Results/Outcomes, e.g., data or schema/tools developed: No results have been published yet, but we expect for software to be released as a result of both projects. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 16 |
Monash University and The Beddoes Group | ||
|
Adam Tucker adam.tucker@med.monash.edu.au |
mail@allori.org | ||
|
Research Category: Other Research Project: Anaesthesia and Medline Purpose/Goal: The aim is to get a better appreciation of the data that describes our field and other fields that relate to anaesthesia Results/Outcomes, e.g., data or schema/tools developed: Descriptive material including text and diagrams (maps) Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 17 |
NAIST | ||
|
Kouichi Doi doy@is.naist.jp |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: Our total goal is creation of database of protein protein interaction. Our sub goals are as follows:1. named entity of proteins, 2. classification of verb or verb phrase, 3. extraction of abbrevation. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 18 |
Nara Institute of Science and Technology | ||
|
Yuji Matsumoto |
Masashi Shimbo | ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: Our research project aims to build a user-friendly retrieval system for scientific/medical literature. MEDLINE is used as the target literature database, as well as to evaluate the effectiveness of various text mining/machine learning methods that we use to extract additional information useful to the users of the retrieval system. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: M. Shimbo, T. Yamasaki, and Y. Matsumoto. Sentence Role Identification in Medline Abstracts: Training Classifier with Structured Abstracts. In: Active Mining, LNAI 3430, Springer, 2005. Web addresses to pertinent sites further describing the work and/or its results: | |||
| 19 |
National Cheng-Kung University Department of Computer Science and Information Engineering | ||
|
Wen-Hsiang Lu whlu@mail.ncku.edu.tw |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: MMODE: Cross-Language Medical Information Retrieval for Consumers Purpose/Goal: In the past decade, cross-language information retrieval (CLIR) has become an important research topic in the field of information retrieval. Many methods have been proposed to deal with the problems of query translation, including machine translation (MT) system, bilingual dictionaries, parallel, and comparable corpora. In fact, these methods are not fully appropriate for directly applying to CLMIR. For example, general-purposed machine translation system and bilingual dictionaries might be incapable of medical term translation. Large-scale medical parallel corpora are hardly available for increasing translation coverage. Recently, some CLMIR research effectively handled the above problems by utilizing high-quality MT system, comparable corpora, or the combination of MT system and multilingual medical thesaurus. MeSH has been shown as an effective thesaurus for CLMIR. However, manual lexicography is time-consuming and not cost-effective. Fortunately, we have also proposed an effective Web-based term translation method to enhance the coverage of translations of diverse Web unknown (new) terms. We have also compiled over 19,000 entries (including all of the 9,646 disease terms) of the Chinese-English MeSH using our proposed Web-based term translation method. Based on the bilingual thesaurus, we start developing a practical cross-language medical metasearch engine. Results/Outcomes, e.g., data or schema/tools developed: We developed MMODE (Multilingual Medical Online Data Explorer), a prototype of cross-language medical metasearch engine to provide convenient cross-language search services about disease information from English source websites, including PubMed, MedlinePlus, NLH QA service, and Google. MMODE provides a bilingual MeSH tree for consumers to clarify their information needs by navigating the Chinese-English disease terms. MMODE translates users??queries into English, sends the English queries to the source websites, and retrieves relevant articles, news, and images from these sites. To help users understand the retrieved English documents, MMODE labels the English medical terms with the corresponding Chinese translation in the documents. Citations of related published papers: Lu WH, Lin SJ, Chan YC, Chen KH. Semi-Automatic Construction of the Chinese-English MeSH Using Web-Based Term Translation Method. Proc AMIA Symp. 2005. Web addresses to pertinent sites further describing the work and/or its results: http://mmode.twbbs.org/mmodeurl2 | |||
| 20 |
National University of Singapore | ||
|
Ailing Zhu |
Jian Li | ||
|
Research Category: Biological Knowledge Discovery Research Project: Purpose/Goal: find the disease candidate genes. Results/Outcomes, e.g., data or schema/tools developed: NA Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 21 |
Novo Nordisk A/S DK-2880 Bagsvaerd | ||
|
Henning Nielsen hepn@novonordisk.com |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: Purpose/Goal: Novo Nordisk is a Danish Based Research, Pharmaceutical and Biotech Company. Our main research focus is on the therapy areas Diabetes, Haemophilia, Growth Hormone, Hormone Replacement Therapy and Industrial Enzymes. MEDLINE is a main source of literature for our Research and Development Organisation and is made available through a secure company intranet to all our researchers. MEDLINE is incorporated in a cross-searchable database set-up consisting of the main biomedical databases: MEDLINE; Embase; Biosis and Current Content. Searching is based on Livelink Discovery Server a Bibliographic Retrieval System software. This setup allows us to run profiles and make alerts in the database in house. MEDLINE Data is used in the scientific research in the above mentioned areas. The data found in MEDLINE is used to establish the state of the art and surveying research areas of interest to Novo. We are currently investigating the possibilities of data mining MEDLINE data for greater benefit in our research. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://www.novonordisk.com/science/default.asp | |||
| 22 |
OmniViz, Inc. | ||
|
Jeffrey Saffer |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Development of Visualization Software Purpose/Goal: To develop means of quickly understanding large collections of biomedical literature, thus enabling researchers to gain valuable context for research decisions. Results/Outcomes, e.g., data or schema/tools developed: Software for visualizing MEDLINE data. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 23 |
Polish Academy of Sciences Institute of Biochemistry & Biophysics | ||
|
Pawel Siedlecki |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: Using medline abstract database to mine aminoacid sequence related data that could help to anotate or discover new functions. Automation of data mining is sought and various elements of the query builder is the ultimate goal. Using the system one can automaticly list abstracts related to the query sequences and make comparisons to other subsets. Results/Outcomes, e.g., data or schema/tools developed: A tool is under preparation that will alow users to utomaticly list abstracts related to the query sequences and make comparisons to other subsets. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 24 |
Public Health Genetics Unit, Cambridge | ||
|
Julian Higgins julian.higgins@mrc-bsu.cam.ac.uk |
Roger Hale roger.hale@linguamatics.com | ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: The project comprises an investigation of our collaborator's (Linguamatics) I2E text-mining system and its ability to identify and extract information from records of studies in human genome epidemiology. Results/Outcomes, e.g., data or schema/tools developed: The project will start proper in November 2005. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://www.phgu.org.uk http://www.linguamatics.com | |||
| 25 |
Spanish National Biotechnology Center (CNB-CSIC) Protein Design Group | ||
|
Martin Krallinger martink@cnb.uamm.es |
martingenetech@yahoo.com | ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: NATURAL LANGUAGE PROCESSING STRATEGIES Purpose/Goal: Develop data sets and strategies to extract automatically functional (annotation-relevant) descriptions of proteins. Results/Outcomes, e.g., data or schema/tools developed: A tool for retrieval of annotation relevant text passages has been implemented, in part based on statistical analysis of a collection of PubMed abstracts which were associated to proteins and functional concepts (Gene Ontology Annotation) Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://www.pdg.cnb.uam.es/martink/ | |||
| 26 |
SUNY Stony Brook Dept. of Computer Science | ||
|
Steven Skiena skiena@cs.sunysb.edu |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: Our research project revolves several aspects of improving our Lydia text analysis system (www.textmed.org) to analyze PubMed abstracts: -- Construction and annotation of canonical lists of gene names, drugs, etc. -- Improve the accuracy of text markup on PubMed abstracts. -- Building and analyzing the graph of interesting relationships between these entities. -- Extracting the nature of relationships ("regulates", "inhibits", "treats") identified between drugs, genes, and diseases. -- Identifying acronyms, abbreviations, and other aliases for entities. -- Clustering drugs, genes, and diseases by similarity based on the associations they share in common. -- Proposing new drug candidates for a given disease based on similarities of references. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://www.textmed.org | |||
| 27 |
U TX-Houston School of Health Information Sciences | ||
|
Elmer Bernstam Elmer.V.Bernstam@uth.tmc.edu |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: MedlineQBE Purpose/Goal: Information overload is no longer a theoretical concept, but a real impediment to education, research and patient care. Our long term goal is to improve patient care by providing better information retrieval tools to students, researchers and clinicians. Our unifying hypothesis is that techniques pioneered on the World Wide Web (WWW) can be successfully adapted to the combination of MEDLINE and the Science Citation Index (SCI) to improve information retrieval. The combination of MEDLINE and SCI is a hyperlinked environment similar to the WWW. Therefore, successful WWW algorithms can be applied to identify the most important and relevant articles to fulfill users' information needs. This research is a continuation of work performed during NLM fellowship training at Stanford where we designed and implemented the Medline Query-by-Example (MQBE) computational framework. The MQBE framework is being used to minimize the human effort required to implement and evaluate information retrieval strategies. MQBE has been enhanced to store statements of information need, queries and relevance judgments of real users who are using the system to satisfy real information needs. A new evaluation methodology developed by Joachims that uses clickthroughs to compare alternative search strategies will be employed to compare algorithms to each other. In addition, the usage data can subsequently be used to associate queries with "interesting" articles as identified by previous users; an approach based on collaborative filtering. Our collaborators include researchers from Vanderbilt (Dr. Constantin Aliferis and his lab) and Oregon Health Sciences Universities (Dr. William Hersh). Results/Outcomes, e.g., data or schema/tools developed: Medline Query-By-Example framework abstracts away the user and database interfaces and provides well-defined interfaces for results ranking, retrieval and similarity-determination algorithms. Qubec information retrieval system based on the MedlineQBE framework. Citations of related published papers: Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF. Sriram MG. Hersh WR. Using citation data to improve retrieval from MEDLINE. J Am Med Inf Ass, in press. PMID: 16221938 Herskovic JR, Bernstam EV. Using incomplete citation data for MEDLINE results ranking. Proceedings of the AMIA Fall Symposium 2005, pp. 316-20. Web addresses to pertinent sites further describing the work and/or its results: | |||
| 28 |
UCLA | ||
|
Robert Bilder rbilder@mednet.ucla.edu |
| ||
|
Research Category: Ontologies or Classification Schema Research Project: Cognitive Phenotyping for Neuropsychiatric Therapeutics Purpose/Goal: This project, as part of the NIH Roadmap Initiative, aims to accelerate discovery and overcome obstacles that have hindered research specifically in the discovery of causes and treatments for neuropsychiatric syndromes. As part of this work, we are developing ontologies for complex neuropsychiatric syndromes at syndromal, symptom, cognitive, and neuroanatomic levels, to be linked to other sources of biological knowledge. We are also developing tools for mapping of literatures and extracted concepts onto the human brain. Results/Outcomes, e.g., data or schema/tools developed: We have developed some controlled vocabularies for syndromes, symptoms, cognition and anatomic terms, with class hierarchies representing many of these concepts. We have conducted some 'proof of concept' mappings of specific concepts (e.g., 'schizophrenia AND cognition') onto human brain, using a knowledgebase of functional imaging studies and concept associations between syndromal and cognitive function levels. We have developed the initial specifications of a knowledgebase schema for a 'cognitive atlas' that will help 'register' concepts across widely different levels of investigation, and are currently developing a pipeline of tools for literature-mining, based on this architecture. Citations of related published papers: Bilder R. Cognitive Phenomics for Neuropsychiatric Therapeutics. Schiz Bull, 2005; 31(2), 318-319. Web addresses to pertinent sites further describing the work and/or its results: http://www.phenomics.ucla.edu | |||
| 29 |
University Health Network Jurisica Lab | ||
|
David Otasek |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: Automated extraction of protein-protein interactions from Medline/PubMed abstracts. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 30 |
University Health Network, Microarray Centre MaRS Centre, Toronto Medical Discovery Tower, 9-308G | ||
|
LU ZHIBIN zlu@uhnresearch.ca |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: Purpose/Goal: Finding homologs, placement in gene network through identification of protein-protein interactions and general knowledge discovery of proteins. Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: http://data.microarrays.ca http://www.microarrays.ca | |||
| 31 |
University of Arizona The Artificial Intelligence Lab | ||
|
Hsinchun Chen hchen@eller.arizona.edu |
Cathy Larson cal@eller.arizona.edu | ||
|
Research Category: Biological Knowledge Discovery Research Project: GeneScene (please note that the name of this project is undergoing revision and will soon change) Purpose/Goal: The research goal of GeneScene is to develop novel Natural Language Processing (NLP) techniques to support efficient and effective text analysis in biomedical fields, particularly, the analysis of genetic regulatory pathways which is crucial for a thorough understanding of biological processes such as gene regulation and cancer development. GeneScene is also aimed at the formation of an integrated framework for pathway-related knowledge representation and visualization using the combination of different approaches. The ultimate goal of GeneScene is to provide biomedical researchers with a platform of pathway-related literature abstraction, data analysis and knowledge integration, thus to support hypothesis development and scientific discovery. Results/Outcomes, e.g., data or schema/tools developed: We have developed shallow and full parsing-based natural language processing (NLP) techniques to extract pathway relations from the PubMed abstracts. They either use templates based on closed-class words (prepositions) and capture relations between noun phrases (shallow parser), or uses a broad coverage syntactic-semantic hybrid grammar to identify verb relations (full parser). To increase precision, both approaches use relevant biomedical lexicons such as GO, HUGO, and UMLS to filter the extracted relations. With these techniques the parsers can reach a precision of up to 90.8% and a recall of up to 61%. To organize the extracted raw pathway relations and to form an integrated framework for pathway- related knowledge representation, we developed a feature decomposition approach to biomedical concept matching and relation aggregation. Features are attributes assigned to the extracted entities and connectors. Feature synonyms are stored in some lexicons built upon biological databases and ontologies such as RefSeq, EntrezGene, HUGO, SGD, and Gene Ontology. A BioAggregate Tagger then uses a decompositional approach to identify key features in extracted name strings, aggregate multiple references to the same substance or function, and consolidate the relations at different granularities. Our studies indicated promising network consolidation and extracted information can also be matched to external resources by this approach. We also developed a GeneScene Visualizer to support searching and browsing of the regulatory pathways. It allows users to search for specific genes, to browse retrieved relations, to display them as networks with different level of details, and to access the underlying PubMed abstract. Citations of related published papers: Leroy G, Chen H, Martinez JD. A shallow parser based on closed-class words to capture relations in biomedical text. J Biomed Inform. 2003 Jun;36(3):145-58. PMID: 1461522 McDonald DM, Chen H, Su H, Marshall BB. Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser. Bioinformatics. 2004 Dec 12;20(18):3370-8. PMID: 15256411 Marshall R, Su H, McDonald D, Chen H. Linking ontological resources using aggregatable substance identifiers to organize extracted relations. Pac Symp Biocomput. 2005;:162-73. PMID: 15759623 Web addresses to pertinent sites further describing the work and/or its results: http://genescene.arizona.edu/index.html http://ai.eller.arizona.edu/research/bioinformatics/index.htm | |||
| 32 |
University of Oklahoma 101 David L. Boren Blvd. | ||
|
Jonathan Wren Jonathan.Wren@OU.edu |
| ||
|
Research Category: Biological Knowledge Discovery Research Project: Purpose/Goal: 1) Knowledge discovery using entities found within MEDLINE - for both shared relationship analysis (e.g. to find commonalities for microarray responders) and implicit relationship analysis (e.g. to discover previously unknown/undocumented relationships).2) Searching for sequence data within abstracts (e.g. immune epitopes)3) Extract URLs from abstracts for accessibility analysis 4) Extract citation information to see if such information can be located online (i.e. via Google) Results/Outcomes, e.g., data or schema/tools developed: 1) IRIDESCENT software developed for knowledge discovery & microarray analysis 2) SRCA software developed to extract sequence (e.g. protein, DNA) data from text and identify associated parameters 3) Rates of URL decay in MEDLINE established 4) Study conducted regarding journal articles that can be found online at non-journal websites Citations of related published papers: Wren JD, Bekeredjian R, Stewart JA, Shohet RV, Garner HR "Knowledge discovery by automated identification and ranking of implicit relationships" Bioinformatics 2004 Feb;20(3): 389-98 Wren JD "404 Not Found: The Stability and Persistence of URLs Published in MEDLINE" Bioinformatics 2004 Mar; 20(5): 668-72 Wren JD "Open access and openly accessible: A study of scientific publications shared via the Internet" British Medical Journal 2005 May 14; 330(7500):1128-31 Web addresses to pertinent sites further describing the work and/or its results: http://faculty-staff.ou.edu/W/Jonathan.D.Wren-1/ http://www.sci-tech-today.com/story.xhtml?story_id http://www.nature.com/nature/journal/v428/n6983/full/428592a_fs.html | |||
| 33 |
University of Sheffield Department of Computer Science | ||
|
Angus Roberts |
| ||
|
Research Category: Other Research Project: CLEF postgraduate studentship Purpose/Goal: Research the use of corpus based lexical relation extraction techniques for the extraction of meronyms (part whole relations) from biomedical texts, including research papers and clinical records. Lexico syntactic patterns that encode meronyms are iteratively learned from texts, and applied to further texts in order to expand the set of known meronyms. The part whole relation is of particular importance in biomedicine, and the newly learned relations could be used, for example, in automated ontology construction. Results/Outcomes, e.g., data or schema/tools developed: Prototype relation extraction tools; development ongoing. Citations of related published papers: Roberts, A. Learning Meronyms from Biomedical Text. Association of Computational Lingusitics 2005, Proceedings of the Student Research Workshop, 49-54 Roberts, A. Learning Parts and Wholes from Biomedical Texts. Proceedings of the 8th Research Colloquium of the UK special-interest group in Computational Linguistics (CLUK-05), 63-70 Web addresses to pertinent sites further describing the work and/or its results: http://www.dcs.shef.ac.uk/~angus/phd.html | |||
| 34 |
University of Texas at Austin Center for Computational Biology and Bioinformatics | ||
|
Edward Marcotte |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: Purpose/Goal: We plan on mining the medline data to develop new information extraction methods and for enhancing our knowledge of protein-protein interactions Results/Outcomes, e.g., data or schema/tools developed: Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
| 35 |
UT Southwestern Medical Center at Dallas NA2.226 | ||
|
Harold Garner |
Justin Hicks | ||
|
Research Category: Biological Knowledge Discovery Research Project: INNOVATION LABS DATA MINING Purpose/Goal: 1) Develop a more effective way of searching for relevant citations in medline. 2) Develop a tool to automatically find novel and potentially biologically meaningful relationships. Results/Outcomes, e.g., data or schema/tools developed: To be updated. Citations of related published papers: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics. 2004 Feb 12;20(3):389-98. Epub 2004 Jan. PMID: 14960466 Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. PMID: 14734310 Web addresses to pertinent sites further describing the work and/or its results: http://invention.swmed.edu/etblast/index.shtml http://invention.swmed.edu/etblast/etblast.shtml | |||
| 36 |
Wageningen University and Research Centre Laboratory of Bioinformatics | ||
|
Jack Leunissen jack.leunissen@wur.nl |
| ||
|
Research Category: Information Extraction or Retrieval Methods Research Project: BIOMETA Purpose/Goal: The data are used in three different projects. The first project (BIOMETA) aims at the development of novel text analysis and information extract methods based upon N-index phrases. Results/Outcomes, e.g., data or schema/tools developed: As part of the BIOMETA project a complete index of term pairs after (NLP-based) processing of the MEDLINE abstracts is constructed. This allows a rapid retrieval of documents answering to specific phrases, such as "aspirin causes cancer", or "aspirin ? cancer", or "aspirin causes ?" etc. Citations of related published papers: Web addresses to pertinent sites further describing the work and/or its results: | |||
Last updated: 22 August 2007
First published: 01 August 2005
Metadata| Permanence level: Permanence Not Guaranteed