Skip Navigation Bar
National Library of Medicine Technical BulletinNational Library of Medicine Technical Bulletin

Table of Contents: 2021 NOVEMBER–DECEMBER No. 443

Previous Next

MEDLINE 2022 Initiative: Transition to Automated Indexing

MEDLINE 2022 Initiative: Transition to Automated Indexing. NLM Tech Bull. 2021 Nov-Dec;(443):e5.

2021 December 01 [posted]

As part of the efforts of the National Library of Medicine (NLM) to transform and accelerate biomedical discovery and improve health and health care, we are transitioning to automated MeSH indexing of MEDLINE citations in PubMed. Automated indexing will provide users with timely access to MeSH indexed metadata and allow NLM to scale MeSH indexing for MEDLINE to the volume of published biomedical literature. Human indexers have been and will continue to be involved in the refinement of automated indexing algorithms and will play a significant role in the quality assurance approaches for automated indexing.

In 2018, NLM launched the MEDLINE 2022 initiative, a five-year development plan that aims to ensure that MEDLINE continues to evolve to meet the needs of users in an age of data-driven discovery. A key goal of this initiative involved implementing a range of indexing methods to ensure the timely assignment of MeSH to MEDLINE citations. Based on the successful pilot of automated indexing on a limited scale since 2016, it was determined that fully automated MEDLINE indexing be implemented with quality control, and that human curation and automation be specifically applied to improve the discoverability of chemical and gene information in MEDLINE.

Three different images of computer keyboards captioned with the Medline 2022 Initiative Goals. 1. A 24-hour response time for MeSH indexed citations to appear in PubMed; 2. Expanded chemical and gene curation by subject matter experts; 3. Continuous improvement of the automatic indexing algorithm
Figure 1: MEDLINE 2022 Initiative Goals.

Automated MeSH indexing has been under development at NLM for many years and the most significant outcome is the development of the Medical Text Indexer (MTI) by researchers in the Lister Hill National Center for Biomedical Communications. MTI is not new; it has been used to provide indexing suggestions for human indexers since 2002 and was incorporated as the "first line" of indexing with subsequent human curation for a set of journals starting in 2011. Automated indexing with a version of MTI has been used for comments since 2016, OLDMEDLINE citations since 2015, and for processing an experimental batch of backlogged citations in 2016. Since 2018, the method of indexing has been identified in the XML of all completed citations.

The MTI algorithm has been undergoing refinements in recent years as we move towards automation, including incorporation of deep learning approaches to improve the application of MeSH subheadings, the incorporation of rules and triggers for the indexing of Publication Types, and the application of IM designation. The version of MTI used for current automated indexing is called MTIA, and it is being applied to citations from a variety of journals. Human curation of MTIA-indexed citations originally involved a scan of all citations indexed by MTIA but has been modified to focus curation on specific sets of citations (e.g., those involving genes and proteins) to scale curation and to ensure that indexed terms are correct and irrelevant terms are not indexed.

Recognizing that searching for chemicals and genes are some of the most searched data points in PubMed, we are working to improve recognition of these entities by MTIA and are evaluating the incorporation of chemicals identified by the NLM-Chem identification tool. We are also evaluating NLM-Gene as a tool to support curation at scale for the creation of GeneRIFs (the links made between PubMed and the Gene database).

By mid-2022 we expect that all citations indexed for MEDLINE will be indexed by MTIA, with human curation applied as indicated. Beyond achievement of this major milestone, the MTIA algorithm will continue to be refined and improved.

Watch for future NLM Technical Bulletin articles for updates on developments. If you have questions or suggestions, please contact NLM Customer Support.

NLM Technical Bulletin National Library of Medicine National Institutes of Health