NLM logo

NCBI (NCBI Taxonomy) - Source Representation


This page lists specific source data elements and provides information on their representation in the UMLS Metathesaurus.

VSAB: NCBI2023_04_13

Notes

The UMLS includes concepts where the "specified_species" value in nodes.dmp = 1, indicating that a species in the node's lineage has a formal name. All higher level concepts needed to make a complete hierarchy are also included. Other concepts are excluded.


Summary of Changes:

No changes were made to the source data format or to Metathesaurus source processing

Source-Provided Files: Summary

The complete NCBI release can be downloaded from the taxonomy ftp site: https://ftp.ncbi.nih.gov/pub/taxonomy/

Documentation and Reference

File Name Description
readme.txt README for file descriptions

Data Files

File Name Description
citations.dmp* Citations file (not processed)
delnodes.dmp* Deleted nodes file (not processed)
division.dmp Divisions file
gc.prt* Genetic code table (not processed)
gencode.dmp* Genetic codes file (not processed)
merged.dmp* Merged nodes file (not processed)
names.dmp Taxonomy names file
nodes.dmp Taxonomy nodes file

Not included: Selected files and fields are not processed. In addition, certain concepts and terms are not included in the Metathesaurus based on the criteria described in the "Notes" section above.


Source-Provided Files: Details

Details on format of input files and representation of source data.

file: division.dmp

Divisions

# Field Name Description Representation
1 division id taxonomy database division id Used to map the "division id" field of nodes.dmp to the expanded value found in the "division name"
2 division cde GenBank division code (three characters) not processed
3 division name division name MRSAT.ATN = "DIV"
4 comments comments not processed

file: names.dmp

# Field Name Description Representation
1 tax_id identifier of node associated with this name MRCONSO.CODE
MRCONSO.SCUI
2 name_txt name itself MRCONSO.STR
3 unique name unique variant of the name if not unique MRCONSO.STR
4 name class type of name

Only the following name class values are included in the Metathesaurus:

scientific name
synonym
equivalent name
common name
Used to assign MRCONSO.TTY

TTY values are assigned as follows:
name class name_txt TTY unique name TTY (if populated)
scientific name SCN USN
synonym SY USY
equivalent name EQ UE
common name CMN UCN


NCBI names with name class = "authority" are used to create MRSAT.ATN = "AUTHORITY_NAME". Atoms with other "name class" values are excluded

file: nodes.dmp

# Field Name Description Representation
1 tax_id node id in GenBank taxonomy database Used to create the hierarchy
2 parent_tax_id parent node id in GenBank taxonomy database Used to create the hierarchy
3 rank rank of this node (e.g. superkingdome, kingdom, etc.) MRSAT.ATN = "RANK"
4 embl code locus-name prefix not processed
5 division id division id (see division.dmp file) MRSAT.ATN = "DIV"
The ATV is the value of the division name for this division id, from division.dmp
6 inherited div flag 1 if node inherits division from parent not processed
7 genetic code id see gencode.dmp fille not processed
8 inherited GC flag 1 if node inherits genetic code from parent not processed
9 mitochondrial genetic code id see gencode.dmp file not processed
10 inherited MGC flag 1 if node inherits mitochondrial gencode from parent not processed
11 GenBank hidden flag 1 if name is suppressed in GenBank entry lineage not processed
12 hidden subtree root flag 1 if this subtree has no sequence data yet not processed
13 comments free text comments and citations not processed
14 plastid genetic code id see gencode.dmp file not processed
15 inherited PGC flag 1 if node inherits plastid gencode from parent not processed
16 specified_species 1 if species in the node's lineage has formal name Used to identify atoms to include UMLS processing; see "Note" above