Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Metathesaurus Life Cycle

The Metathesaurus is a large, multi-purpose, and multi-lingual vocabulary database containing information about biomedical and health related concepts, their various names, and the relationships among them. The Metathesaurus is built from electronic versions of various thesauri, classifications, code sets, and lists of controlled terms used in patient care, biomedical literature, and clinical and health services research. The Metathesaurus is created through a process called the Metathesaurus life cycle, which involves acquiring the electronic vocabulary data and converting it into the standard Metathesaurus format.

The Metathesaurus life cycle can be broken down into five major stages:

Inversion is the process of converting an electronic version of vocabulary data, such as Logical Observation Identifiers Names and Codes (LOINC) or SNOMED CT, to the common Metathesaurus input format. Before converting the vocabulary data, the explicit and implied semantics (or meaning) and structure of the vocabulary must be carefully analyzed. During the analysis processprocess, considerable care is taken to:

  • find systematic naming patterns that may differ from those used to name the same concepts in other sources
  • determine whether terms labeled as synonyms by the source reflect a strict interpretation of synonymy or are closely related terms
  • develop algorithms for assigning default semantic types to various categories of terms in the vocabulary
  • identify certain term types (such as short forms or non-standard abbreviations) that should be flagged as suppressible synonyms in the Metathesaurus
  • determine whether there may be considerable undetected synonymy within the vocabulary
  • determine whether there is intentional use of the same name for different concepts

Once the semantics and structure of the vocabulary are clearly understood, it is "inverted" into a standard Metathesaurus input format, with unique Metathesaurus atomic identifiers generated and assigned to all of its strings or concept names. When converting the vocabulary files to a useable format, the inversion process makes sure to maintain source transparency. Source transparency guarantees that the conversion of the vocabulary data from its native format to the Metathesaurus format does not change or obscure the data.

After the source vocabulary is inverted into a standard Metathesaurus format, it is inserted into the Metathesaurus Information Database (MID) editing database. This involves the development of a set of rules, or recipe, for what types of merges with existing Metathesaurus content should or should not be allowed. For example, since a newer version of SNOMED CT includes existing SNOMED CT identifiers and most of these previous identifiers were already in the Metathesaurus, rules for when to merge based on these identifiers are created.

A basic recipe contains a series of steps that load data from the vocabulary files into tables, determine where new concepts should start, perform integrity checks, merge the source vocabulary to the editing database, and perform post merge operations. A test insertion is completed to make sure the insertion recipe is implemented properly and that the recipe is correct for the vocabulary data. Once the test insertion is approved, the real insertion is scheduled and performed.

Once a new vocabulary is inserted into the editing database, human experts with the required clinical, chemical, or basic science expertise review and edit Metathesaurus entries affected by the automated insertion routines. Editors ensure that the Metathesaurus accurately reflects the meanings present in its source vocabularies and that value-added information (e.g., semantic types) is applied correctly. Editing resolves conflicts between a source’s view of synonymy and the Unified Medical Language System (UMLS) view of synonymy. When two source vocabularies differ in their views of synonymy, the editors determine the view (perhaps a third alternative) that will be reflected in the Metathesaurus concept structure.

Concepts are reviewed for synonymy by identifying all synonymous strings and making sure they appear in the same concept. Synonymy is first determined by the source provided vocabulary data, where strings that a source say are synonymous are algorithmically placed in the same concept. Next, algorithms and merge functions search for synonymy among sources. Finally, editors analyze the concepts to make sure the asserted synonymy is correct and that there are no cases of missed synonymy.

If patterns of missed synonymy are discovered then the concepts in question will be further reviewed to determine if a merge is required. Concepts may be merged if it is determined that two concepts have the same meaning. Editors may also choose to keep one concept and retire the other rather than completing a merge of the two concepts. An editor may decide to split a concept apart into two concepts if they conclude that one or more atoms in a concept do not mean the same thing as the other atoms in the concept. The split atoms are removed from the concept and placed in a new concept.

Editors consider the following when determining synonymy:

  1. Respect the synonymy asserted by a source unless there are disagreements with other sources.
  2. Lexical variations: strings that vary only in singular/plural form, direct/indirect form, punctuation, etc. are considered synonymous (e.g., Feet vs Foot). However, not all lexical variants are synonymous (e.g., Home Nursing and Nursing Homes).
  3. Synonymy between unlike strings (e.g., Kidney Failure and Renal Failure).
  4. Clues from the source vocabulary, such as contexts, definitions, and scope notes.
  5. Editors own biomedical knowledge.
  6. Synonymous strings belong in the same concept.

Examples of Synonymy:

  • Gallbladder Diseases is synonymous with Disease of gallbladder and with Gall Bladder Diseases.
  • Ensure Plus strawberry liquid Tetrapak is synonymous with Ensure Plus strawb Tetrapak.
  • AFDC (Aid to Families with Dependent Children) is synonymous with Aid to Families with Dependent Children.
  • Spine (Vertebral Column): Excision is synonymous with Vertebral Column: Excisions.
  • Pelvic Neoplasms is synonymous with Neoplasms of Pelvis and Pelvis Neoplasms.
  • Sudafed 60mg tablet is synonymous with Sudafed, 60 mg oral tablet.
  • Grape is synonymous with Grapes.

The production stage is the most important in the Metathesaurus life cycle because it generates the product distributed to the public. This level of importance is reflected in the complex nature of the code and processes (especially validation) involved in this stage. The bulk of the production process involves extracting the releasable data from the editing database and producing and validating the final release files.

The production process is completed in four broad steps:

  1. Pre-production –The MID editing database is checked to make sure there are no additional concepts that need review and all pre-production operations are completed.
  2. Synchronization -Through a process called synchronization, the files in the MID are transferred to the Metathesaurus Release Database (MRD) database.

  3. File Generation -Once the MRD data files are ready, a series of algorithms are applied to produce each of the desired release files.

  4. File Validation -After a release file is produced, it undergoes a rigorous quality assurance process.

The newly generated and validated Metathesaurus release files are made available to the public through the UMLS. The installation program for all UMLS knowledge sources, including the Metathesaurus, is the Java program MetamorphoSys, which is included with each UMLS release. This is a full featured installation tool that allows the user to configure the data in a variety of ways before installing it on a local machine, and it enables users to create customized Metathesaurus subsets. Users can also search and view Metathesaurus data through the UMLS Terminology Services (UTS) Metathesaurus Browser. A free UMLS Metathesaurus License is required in order to use the UTS. In addition to MetamorphoSys and the UTS Metathesaurus browser, the UMLS API (REST or SOAP) can be used to search the Metathesaurus data.

Metathesaurus Lifecycle

Last Reviewed: May 20, 2021