Skip to Content
NLM Home | About the Archives

Skip to Content

Documentation for Distribution of 2003 Baseline MEDLINE® Database

U.S. National Library of Medicine
November 26, 2002

This document accompanies the distribution of the baseline MEDLINE database for 2003 in XML format using the November 1, 2002 NLM MEDLINE DTD. It should be forwarded to appropriate technical and policy staff at your organization. It is also available in the MEDLINE documentation section of NLM's Licensee information web page along with other documentation and announcements to licensees.

    1. Record count and new DTD in use
      The baseline database contains 11,847,524 MEDLINE records through the MEDLINE 2002 production year. These data completely replace all previously distributed MEDLINE data. All baseline records contain <DateCompleted>; none are in-process records. Version 4 of NLM's MEDLINE DTD dated November 1, 2002 is now in use. This DTD references the MedlineCitation DTD which in turn references the NLMCommon DTD. The MEDLINE DTD, therefore, is the "parent" DTD and the starting point for licensees.

    2. Baseline DLT tape
      The 2003 baseline MEDLINE DLT tape contains one Tar file. When dearchived, the Tar file contains 396 data files. Records with any publication date may reside in any file. Refer to the corresponding 'File Names, Record Counts, and File Size' documentation and the tape transmittal sheet that provides specifications about the DLT tape.

    3. Baseline via ftp
      The 2003 baseline data also reside in files on NLM’s public server for ftp. Licensees who have requested ftp of baseline data instead of distribution on DLT tape are sent login and other instructions under separate cover.

    4. Use of 2003 MeSH® vocabulary
      The baseline MEDLINE data reflect the 2003 MeSH Vocabulary. Close to 1,400,000 records in the baseline distribution have been revised as a result of a change to 2003 MeSH. About 945,000 of these revised records involved a change from the old MeSH heading Adolescence to the new heading Adolescent. Many MEDLINE licensees download the MeSH Vocabulary File from to fully take advantage of the hierarchical nature of the controlled vocabulary in their implementations of MEDLINE (see and New files containing Pharmacologic Actions of a Given Substance and Substances with a Given Pharmacologic Action are also available from the MeSH web page.

    5. Additional maintenance
      Additional maintenance is needed to apply 2003 MeSH Vocabulary to MEDLINE records. NLM will process about 250 more MeSH heading changes expected to result in about 18,000 additional revised MEDLINE records. These records will be distributed in update files after generation of the baseline database (as will large numbers of other records involving various types of maintenance during the year). NLM urges licensees to process the revised records in addition to the new records and reminds licensees to process update files in ascending numeric order based on filename.

    Licensees are reminded to check the announcements and documentation sections of the NLM Licensee web page periodically for new technical and administrative information and also to forward e-mail messages to appropriate staff in the organization. Announcements are posted on the web after they have been e-mailed directly to licensees. The 'Database Documentation' section contains a MEDLINE Update/Documentation Chart for 2003 that summarizes the content of the daily update files. Licensees should also read the NLM Technical Bulletin, particularly the Nov-Dec 2002 issue that contains articles on the 2003 MeSH Vocabulary and MEDLINE data changes reflected in the baseline database, as well as forthcoming changes.

    Log in instructions for ftp of update files are enclosed. They are for NLM MEDLINE licensees only; do not share directory or file names with others. Be sure to use the IP address you registered with NLM. All other IP addresses will be blocked from retrieving the files. Update files must be processed in ascending file name numeric sequence. Licensees should be sure to read the _stats.html file that accompanies each data file and also look for occasional _notes.txt files that may appear later in the day for additional information.
Jane L. Rosov
MEDLARS Management Section
National Library of Medicine
8600 Rockville Pike
Bethesda MD 20894
phone: 301-496-7706
fax: 301-496-0822

Return to Information for Licensees of NLM Data

First published: 26 November 2002
Last updated: 11 December 2006
Date Archived: 20 March 2007
Metadata | Permanence level: Permanent: Dynamic Content