Documentation for Distribution of 2003 Baseline
MEDLINE® Database
U.S. National Library of Medicine
November 26, 2002
This document accompanies the distribution of the baseline MEDLINE database
for 2003 in XML format using the November 1, 2002 NLM MEDLINE DTD. It should be
forwarded to appropriate technical and policy staff at your organization. It is
also available in the MEDLINE documentation section of NLM's Licensee information web
page along with other documentation and announcements to licensees.
- THE DATA
- Record count and new DTD in use
The baseline database contains
11,847,524 MEDLINE records through the MEDLINE 2002 production year. These
data completely replace all previously distributed MEDLINE data. All
baseline records contain <DateCompleted>; none are in-process records.
Version 4 of NLM's
MEDLINE DTD dated November 1, 2002 is now in use. This DTD references
the MedlineCitation
DTD which in turn references the NLMCommon
DTD. The MEDLINE DTD, therefore, is the "parent" DTD and the starting
point for licensees.
- Baseline DLT tape
The 2003 baseline MEDLINE DLT tape contains one Tar
file. When dearchived, the Tar file contains 396 data files. Records with
any publication date may reside in any file. Refer to the corresponding 'File
Names, Record Counts, and File Size' documentation and the tape
transmittal sheet that provides specifications
about the DLT tape.
- Baseline via ftp
The 2003 baseline data also reside in files on NLM’s
public server for ftp. Licensees who have requested ftp of baseline data
instead of distribution on DLT tape are sent login and other instructions
under separate cover.
- Use of 2003 MeSH® vocabulary
The baseline MEDLINE data
reflect the 2003 MeSH Vocabulary. Close to 1,400,000 records in the baseline
distribution have been revised as a result of a change to 2003 MeSH. About
945,000 of these revised records involved a change from the old MeSH heading
Adolescence to the new heading Adolescent. Many MEDLINE licensees download
the MeSH Vocabulary File from http://www.nlm.nih.gov/mesh/ to
fully take advantage of the hierarchical nature of the controlled vocabulary
in their implementations of MEDLINE (see http://www.nlm.nih.gov/bsd/licensee/announce/2002.html#o_29_mesh
and http://www.nlm.nih.gov/bsd/licensee/announce/2002.html#n_06_mesh).
New files containing Pharmacologic Actions of a Given Substance and
Substances with a Given Pharmacologic Action are also available from the
MeSH web page.
- Additional maintenance
Additional maintenance is needed to apply 2003
MeSH Vocabulary to MEDLINE records. NLM will process about 250 more MeSH
heading changes expected to result in about 18,000 additional revised
MEDLINE records. These records will be distributed in update files after
generation of the baseline database (as will large numbers of other records
involving various types of maintenance during the year). NLM urges licensees
to process the revised records in addition to the new records and reminds
licensees to process update files in ascending numeric order based on
filename.
- DOCUMENTATION
Licensees are reminded to check the
announcements and documentation sections of the NLM Licensee web page
periodically for new technical and administrative information and also to
forward e-mail messages to appropriate staff in the organization.
Announcements are posted on the web after they have been e-mailed directly to
licensees. The 'Database Documentation' section contains a MEDLINE
Update/Documentation Chart for 2003 that summarizes the content of the daily
update files. Licensees should also read the NLM Technical
Bulletin, particularly the Nov-Dec 2002 issue that contains articles
on the 2003 MeSH Vocabulary and MEDLINE data changes reflected in the baseline
database, as well as forthcoming changes.
- UPDATE FILES VIA FTP
Log in instructions for ftp of
update files are enclosed. They are for NLM MEDLINE licensees only; do not
share directory or file names with others. Be sure to use the IP address you
registered with NLM. All other IP addresses will be blocked from retrieving
the files. Update files must be processed in ascending file name numeric
sequence. Licensees should be sure to read the _stats.html file that
accompanies each data file and also look for occasional _notes.txt files that
may appear later in the day for additional information.
- Jane L. Rosov
- MEDLARS Management Section
- National Library of Medicine
- 8600 Rockville Pike
- Bethesda MD 20894
- janer@nlm.nih.gov
- phone: 301-496-7706
- fax: 301-496-0822
Return to Information for
Licensees of NLM Data