MEDLINE®/PubMed® Data Maintenance Overview
NLM performs two general types of maintenance on MEDLINE records during each year:
- Individual maintenance takes place on individual records, for example, to identify citations as being retracted, add commentary, or correct data entry errors.
- Global maintenance is performed to make the same type of change to large numbers of records, for example:
- Maintenance on elements such as <MedlineTA>, <Title>, and <ISSN> when changes are made to their corresponding source serial records
- Updates to the <NameOfSubstance> element as the corresponding record in the MeSH® vocabulary is created or edited
- End-of-year changes in the MeSH® vocabulary from one heading to another.
It is possible that large quantities of maintained records may appear in update files from time to time as NLM is able to maintain large groups of data requiring maintenance during the year. This makes it particularly important to process revised records that are distributed in daily update files.
The most current date that a record was revised, whether for global or individual maintenance, often resides in the <DateRevised> element. It is possible, however, for large numbers of records to be maintained and not have an initial or updated <DateRevised> element. Do not depend on initial presence of <DateRevised> or change to an existing <DateRevised> value to indicate that a record has been maintained.
Read the medline17nxxxx_stats.html on the FTP server to see the breakdown of various categories of records in each file and other information.
The new baseline database produced each year contains all records in the previous year's Baseline files AND all the records from the previous year's Daily Update files (minus deleted citations) AND those records that have and have not been maintained during and at the end of the previous production year).
Maintained and Deleted Records
New records are distributed in update files. Records that are maintained and MedlineCitation PMIDs of records that are deleted during the production year are also included in the daily update files. NLM urges users to process these records so their version of PubMed can be as current as possible.
Points to consider regarding update files:
- Update files should be applied after the baseline files and processed in ascending numeric order by filename.
- Records which have the same date in the <DateCreated> element and the <DateRevised> element will be new records to the database.
- Records in MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE statuses are considered to be completed records and thus contain the <DateCompleted> element. Completed records that are subsequently revised receive and updated <DateRevised> element.
- In-Data-Review, In-process, and Publisher Supplied status records are not in a completed status, thus do not contain the <DateCompleted> elements.
- Users should compare PMIDs in update files with those in records previously loaded. If there is no match, the record is new. If there is a match, the record is either a completed record that has been revised, or the record has changed its <MedlineCitation> Status; e.g., been elevated from In-Data-Review status to In-process status or from In-Process status to MEDLINE or PubMed-not-MEDLINE status.
- Replace records with <DateRevised> only if that date is later than that on your existing record; this will be a concern only if files are processed out of ascending numeric order. It is possible for large numbers of records to be maintained
- A record may contain more than one <PMID>. The highest level PMID immediately following <MedlineCitation> is the unique number identifying the record. Do not confuse it with the <PMID> element that resides in the <CommentsCorrections> group of elements which reference, for example, a citation that is associated with (e.g., corrects or retracts) the record in hand.
- DeleteCitation is created only if there are PMIDs to delete.