Fact Sheet
The National Center for Biotechnology Information
Programs and Activities

Introduction

Understanding the elegant natural language of living cells is the quest of modern molecular biology. From an alphabet of only four letters representing the chemical subunits of DNA, emerges a syntax representing the life processes required to build and maintain a human being. The unraveling and use of this "alphabet" to understand new "words and phrases" is a central focus of the field of molecular biology. The staggering volume of molecular data and the subtle patterns that encode biological information have led to an absolute requirement for computerized databases and analysis tools. The challenge is in finding new approaches to deal with the volume and complexity of data, and in providing researchers with better access to analysis and computing tools in order to advance understanding of our genetic legacy and its role in health and disease.

Creating the National Center

The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research and sponsored legislation that established in November, 1988, the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM). The NLM was chosen for its experience in creating and maintaining biomedical databases, and because as part of the National Institutes of Health (NIH), it could establish an intramural research program in computational molecular biology. NCBI's mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. Its mandate includes four major tasks:

Perform research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules.
Create automated systems for storing, retrieving, and analyzing knowledge about molecular biology, biochemistry, and genetics.
Facilitate the use of databases and software by biotechnology researchers and medical personnel.
Coordinate efforts to gather biotechnology information worldwide.

Basic Research

NCBI has a multi-disciplinary research group comprised of computer scientists, molecular biologists, mathematicians, biochemists, research physicians, and structural biologists concentrating on basic and applied research in computational molecular biology. These investigators not only make important contributions to basic science but also develop new methods for applied research activities. Together they are studying fundamental biomedical problems at the molecular level using mathematical and computational methods. A sampling of research interests includes: detection of genes and analysis of gene organization, repeating sequence patterns, protein domains and structural elements; creation of a gene map of the human genome; mathematical modeling of the kinetics of HIV infection; analysis of effects of sequencing errors for database searching; development of new algorithms for database searching and multiple sequence alignment, construction of non-redundant sequence databases, mathematical models for estimation of statistical significance of sequence similarity, and vector models for text retrieval. Additionally, NCBI investigators maintain ongoing collaborations with several institutes within the NIH and also with numerous academic and government research laboratories.

Databases and Software

NCBI maintains GenBank®, the NIH genetic sequence database. NCBI staff with advanced training in molecular biology build the database from sequences submitted by researchers, individual laboratories and by data exchange among other members of the International Nucleotide Sequence Database Collaboration including the European Molecular Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ). Arrangements with the U.S. Patent and Trademark Office enable the incorporation of patent sequence data.

In addition to GenBank, NCBI supports and distributes a variety of databases for the medical and scientific communities. Resources range from molecular and literature databases to sequence similarity tools to structure information and genomic data.

Entrez is NCBI's search and retrieval system that provides users with integrated access to sequence, mapping, taxonomy, expression, and structural data. Entrez also provides graphical views of sequences and chromosome maps. Two powerful and unique features of Entrez are the ability to retrieve related sequences, structures, and references from pre-computed similarity searches, and provide integrated access across the various databases. The Entrez global query feature provides search capability for a subset of Entrez databases at one time. The Entrez Gene database is a gene-based resource supplying connections for a variety of data.

The Entrez literature databases provide users with a plethora of information at their fingertips. The journal literature is available through PubMed®, a Web search interface that provides access to the 12 million journal citations in MEDLINE® and contains links to full-text articles at participating publishers' Web sites. PubMed Central, a digital archive of full-text life sciences journal literature, provides access to over 200,000 full-text journal articles from over 140 journals. Online Mendelian Inheritance in Man (OMIM) is a catalog of human genes and genetic disorders and the Books database contains over 30 electronic books.

BLAST, a program for sequence similarity searching developed at NCBI, is instrumental in identifying genes and genetic features. BLAST can execute sequence searches against a DNA database of over 2 million sequences in less than 15 seconds. Various BLAST programs exist for different types of searches as well as specialized searching options for the protein database and organism genomes.

Other NCBI resources include the Map Viewer which provides integrated displays of various organism genomic maps, Cn3D structure viewer, and genome-specific resources including human. Additional software tools include: Open Reading Frame Finder (ORF Finder), Electronic PCR, and the sequence submission tools, Sequin and BankIt. All of NCBI's databases and software tools are available from the Web or by FTP.

Education and Training

NCBI fosters scientific communication in the area of computers, as applied to molecular biology and genetics, by sponsoring meetings, workshops, and lecture series. NCBI also staffs exhibit booths at various scientific conferences and meetings. Learning opportunities are available via courses and on-line tutorials that focus on NCBI services. Postdoctoral fellow positions are available as part of the NIH Intramural Research Program.

For Further Information

NCBI publishes a newsletter, NCBI News, four times per year. To receive a free subscription or for further information about any of our databases, services, or programs, contact NCBI at info@ncbi.nlm.nih.gov or (301) 496-2475.

A complete list of NLM Fact Sheets is available at:
(alphabetical list) http://www.nlm.nih.gov/pubs/factsheets/factsheets.html
(subject list): http://www.nlm.nih.gov/pubs/factsheets/factsubj.html

Or write to:

FACT SHEETS
Office of Communications and Public Liaison
National Library of Medicine
8600 Rockville Pike
Bethesda, Maryland 20894

Phone: (301) 496-6308
Fax: (301) 496-4450
email: publicinfo@nlm.nih.gov

U.S. National Library of Medicine