Overview of Annual Baseline Distribution of MEDLINE/PubMed Data
A baseline MEDLINE/PubMed database is available each December, after NLM has maintained the MEDLINE records to reflect the new year's Medical Subject Headings (MeSH®) vocabulary. In addition, other end-of-year changes as deemed necessary are made to candidate records previously distributed to licensees. Licensees should discard all previously received records and use the new baseline files as a complete reload to ensure access to the most current and accurate version of each record as the new NLM data production year begins. The baseline MEDLINE database is distributed to licensees in XML format in several hundred files via FTP.
The baseline files include MEDLINE as well as completed and quality reviewed non-MEDLINE records found in PubMed (MedlineCitation Status = PubMed-not-MEDLINE and MedlineCitation Status = OLDMEDLINE). After the baseline files are generated, update files containing new, maintained, and deleted records (also including MedlineCitation Status = In-Data-Review and MedlineCitation Status = In-process records) are distributed daily via ftp, with a hiatus during November and December as NLM makes the transition to a new year of MeSH® vocabulary used to index the articles.
The baseline and update files do not include records in PubMed with the XML MedlineCitation Status= Publisher and are retrieved by the PubMed search strategy: publisher [sb]. These records, comprising approximately 2% of PubMed content, do not reside in NLM's Data Creation and Maintenance System (DCMS) from which the exports are made, and therefore are not distributed. See section '1., 6. MedlineCitation Status attribute: Publisher' of the MEDLINE/PubMed XML Element Descriptions for more information on Publisher status records.