Skip Navigation Bar
National Library of Medicine Technical BulletinNational Library of Medicine Technical Bulletin

Table of Contents: 2018 JULY–AUGUST No. 423

Previous Next


Incorporating Values for Indexing Method in MEDLINE/PubMed XML

Incorporating Values for Indexing Method in MEDLINE/PubMed XML. NLM Tech Bull. 2018 Jul-Aug;(423):e2.

2018 August 15 [posted]

The MEDLINE/PubMed DTD was modified in 2017 to incorporate the attribute "IndexingMethod" for the element <MedlineCitation> (see MEDLINE/PubMed XML Element Descriptions and their Attributes). Values will now be applied as appropriate for this attribute in citations indexed for MEDLINE to provide documentation of the method by which the set of Medical Subject Heading (MeSH) indexing terms was determined for a citation. IndexingMethod values are for computational analysis of MEDLINE XML and are not searchable in PubMed. It is particularly important for researchers using MEDLINE indexing as a gold standard for training machine learning algorithms to be able to identify in the MEDLINE XML those citations that were indexed solely by a human method versus those that were indexed by a semi-automated method (algorithm results reviewed by a human) or an automated method (algorithm alone). 

IndexingMethod is an implied attribute, meaning that it will only be present if a value is specified. If the IndexingMethod attribute is not present, the indexing method is fully human indexed.

The values to be added are:

Curated – MeSH indexing is provided algorithmically and a human reviewed (and possibly modified) the algorithm results
Automated – MeSH indexing is provided algorithmically

The algorithm that currently supports MEDLINE indexing is the Medical Text Indexer (MTI), a product of the National Library of Medicine (NLM) Indexing Initiative.

Beginning in September 2018, these values will be added as appropriate for newly completed MEDLINE citations. For previously completed citations that were indexed by one of these methods, values will be added with the 2019 MEDLINE/PubMed baseline file that is produced in December.

Curated
Citations with the value Curated are those for which MTI has been the "first line indexer," and a human has reviewed (and potentially modified) the results. This includes citations from approximately 650 journals that currently have all citations completed by MTI First Line (MTIFL), and citations from issues of other journals for which humans have reviewed the MTI indexing for the citation. Upon implementation, approximately 18% of newly completed citations will have the value of Curated. With the 2019 MEDLINE/PubMed baseline, approximately 450,000 previously completed citations will have this value added.

Automated
Citations with the value Automated are citations for comments, which currently represent approximately 5% of newly completed citations. With the 2019 MEDLINE baseline, the value Automated will also be applied to OLDMEDLINE citations (approximately 2 million), previously completed comments (approximately 250,000 citations), and citations for an experimental automatically indexed batch that was completed in 2016 (approximately 11,000 citations).

Citations completed by an indexing method of Automated or Curated represent a small proportion of all MEDLINE citations. MEDLINE citations that have been completed by a human indexing method currently number approximately 22 million.

While MEDLINE indexing has traditionally involved full human curation, these automated and semi-automated methods of MEDLINE indexing have been explored in recent years to increase our efficiency and focus expert human effort in key areas to keep up with the ever-expanding volume of biomedical literature. In addition, NLM recently initiated MEDLINE 2022: A Five-Year Development Plan to maintain the usefulness of MEDLINE as a tool for discovering and analyzing the biomedical literature. One of the goals of the MEDLINE 2022 project is to implement a range of indexing methods to ensure the timely assignment of MeSH to MEDLINE citations. Providing XML data on the method used to index citations for MEDLINE supports our effort to be transparent about all facets of the MEDLINE 2022 project.

Additional information about the projects and citation sets mentioned in this article can be found here:

NLM Indexing Initiative
MTI and MTIFL
OLDMEDLINE Data
MEDLINE 2022: A Five-Year Development Plan

Please send any comments and questions regarding changes to the MEDLINE indexing process to NLM Support Center.

NLM Technical Bulletin National Library of Medicine National Institutes of Health