Skip Navigation Bar

Unified Medical Language System® (UMLS®)

2015AB UMLS NCBI Taxonomy Source Information

This page lists UMLS Metathesaurus data elements and traces them back to the specific source data that populates them.


Skip to: Atoms, Attributes, Definitions, Relationships, Mappings

VSAB: NCBI2015_03_23

Summary of Changes

None

Notes:

Many concepts and terms from the NCBI Taxonomy are excluded during Metathesarus source processing.  The criteria for determining which concepts and terms are excluded or retained are outlined below.  See term type descriptions for additional information.

1.  Exclude all names that do not have one of the following name classes:
    scientific name
    synonym
    equivalent name
    common name
    authority

2.  Exclude all concepts below the "species" level in the hierarchy.  Selected concepts with a rank of "no rank" may be retained, depending on their hierarchical level.

3.  Exclude all concepts that have a "division id" value of 11 (environmental samples) and exclude their descendents.

4.  Exclude concepts and terms based on certain patterns, e.g. remove concepts with rank = "species" and the following words in the scientific name "uncultured," "clone," "unidentified," "uncultivated."

5.   Exclude concepts with ugly names (e.g., "xxxx", "4").

6.   Exclude concepts and their children if the information is enclosed in single or double quotes

7.   Exclude concepts starting with "other", "unclassified", "unclassified sequences", "artificial sequences", "insertion sequences", "midivariant sequence", "transposons" and all their children.

8.   Exclude concepts containing "?" and their children

Source file: taxdmp.zip.
Files included in taxdmp.zip are:

File Description
citations.dmp Citations (not directly processed)
delnodes.dmp Deleted nodes (not directly processed)
division.dmp Divisions
gc.prt Genetic code table (not directly processed)
gencode.dmp Genetic codes (not directly processed)
merged.dmp Merged nodes (not directly processed)
names.dmp Taxonomy names
nodes.dmp Taxonomy nodes (hierarchy)
readme.txt README file

Identifiers:

Identifiers are assigned as follows:
  • CODE: names.dmp.tax_id
  • SAUI:  not applicable
  • SCUI: names.dmp.tax_id
  • SDUI: not applicable

Atoms (MRCONSO):

  (return to top)
Term Type Description Origin
AUN Authority name CODE = names.dmp.tax_id
STRING = names.dmp.name_txt
SCUI = names.dmp.tax_id
TTY = "AUN" is assigned where "name class" = "authority"
CMN Common name CODE = names.dmp.tax_id
STRING = names.dmp.name_txt
SCUI = names.dmp.tax_id
TTY = "CMN" is assigned where "name class" = "common name"
EQ Equivalent name CODE = names.dmp.tax_id
STRING = names.dmp.name_txt
SCUI = names.dmp.tax_id
TTY = "EQ" is assigned where "name class" = "equivalent name"
SCN Scientific name CODE = names.dmp.tax_id
STRING = names.dmp.name_txt
SCUI = names.dmp.tax_id
TTY = "SCN" is assigned where "name class" = "scientific name"
SY Designated synonym CODE = names.dmp.tax_id
STRING = names.dmp.name_txt
SCUI = names.dmp.tax_id
TTY = "SY" is assigned where "name class" = "synonym"
UAUN Unique authority name CODE = names.dmp.tax_id
STRING = names.dmp.unique_name
SCUI = names.dmp.tax_id
TTY = "UAUN" is assigned where "name class" = "authority"
UCN Unique common name CODE = names.dmp.tax_id
STRING = names.dmp.unique_name
SCUI = names.dmp.tax_id
TTY = "UCN" is assigned where "name class" = "common name"
USN Unique scientific name CODE = names.dmp.tax_id
STRING = names.dmp.unique_name
SCUI = names.dmp.tax_id
TTY = "USN" is assigned where "name class" = "scientific name"
USY Unique synonym CODE = names.dmp.tax_id
STRING = names.dmp.unique_name
SCUI = names.dmp.tax_id
TTY = "USY" is assigned where "name class" = "synonym"

Note on suppressibility:  For term types "AUN" and "UAUN," MRCONSO.SUPPRESS is set to "Y" 

Atoms with other term types may systematically be set to "E" based on certain string patterns which indicate ambiguity or are not terminologically useful.  In addition, for term types other than "SCN," if names.dmp contains a "unique_name," the atom created from "name_txt" will have SUPPRESS set to "E." 

Attributes (MRSAT):

  (return to top)
Attribute Name Description
Origin
DIV Division/phyla nodes.dmp.division id
The ATV is the textual value of the "division id" from division.dmp
RANK Taxonomic rank (e.g. kingdom, species, etc.) nodes.dmp.rank

Definitions (MRDEF):

  (return to top)

No definitions are included in the NCBI data.


Relationships (MRREL):

  (return to top)
REL RELA
Inverse RELA
ORIGIN
PAR
CHD
(no RELA) nodes.dmp.parent tax_id
SY expanded_form_of
has_expanded_form
Connects names.dmp.name_txt to names.dmp.unique_name

Mappings (MRMAP):

  (return to top)

No mappings are included in the NCBI data.