This page lists specific source data elements and provides information on their representation in the UMLS Metathesaurus.
The UMLS includes concepts where the "specified_species" value in nodes.dmp = 1, indicating that a species in the node's lineage has a formal name. All higher level concepts needed to make a complete hierarchy are also included. Other concepts are excluded.
The complete NCBI release can be downloaded from the taxonomy ftp site: https://ftp.ncbi.nih.gov/pub/taxonomy/
File Name | Description |
---|---|
readme.txt | README for file descriptions |
File Name | Description |
---|---|
citations.dmp* | Citations file (not processed) |
delnodes.dmp* | Deleted nodes file (not processed) |
division.dmp | Divisions file |
gc.prt* | Genetic code table (not processed) |
gencode.dmp* | Genetic codes file (not processed) |
merged.dmp* | Merged nodes file (not processed) |
names.dmp | Taxonomy names file |
nodes.dmp | Taxonomy nodes file |
Not included: Selected files and fields are not processed. In addition, certain concepts and terms are not included in the Metathesaurus based on the criteria described in the "Notes" section above.
Details on format of input files and representation of source data.
Divisions
# | Field Name | Description | Representation |
---|---|---|---|
1 | division id | taxonomy database division id | Used to map the "division id" field of nodes.dmp to the expanded value found in the "division name" |
2 | division cde | GenBank division code (three characters) | not processed |
3 | division name | division name | MRSAT.ATN = "DIV" |
4 | comments | comments | not processed |
# | Field Name | Description | Representation | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | tax_id | identifier of node associated with this name | MRCONSO.CODE MRCONSO.SCUI |
|||||||||||||||
2 | name_txt | name itself | MRCONSO.STR | |||||||||||||||
3 | unique name | unique variant of the name if not unique | MRCONSO.STR | |||||||||||||||
4 | name class | type of name Only the following name class values are included in the Metathesaurus: scientific name synonym equivalent name common name |
Used to assign MRCONSO.TTY TTY values are assigned as follows:
NCBI names with name class = "authority" are used to create MRSAT.ATN = "AUTHORITY_NAME". Atoms with other "name class" values are excluded |
# | Field Name | Description | Representation |
---|---|---|---|
1 | tax_id | node id in GenBank taxonomy database | Used to create the hierarchy |
2 | parent_tax_id | parent node id in GenBank taxonomy database | Used to create the hierarchy |
3 | rank | rank of this node (e.g. superkingdome, kingdom, etc.) | MRSAT.ATN = "RANK" |
4 | embl code | locus-name prefix | not processed |
5 | division id | division id (see division.dmp file) | MRSAT.ATN = "DIV" The ATV is the value of the division name for this division id, from division.dmp |
6 | inherited div flag | 1 if node inherits division from parent | not processed |
7 | genetic code id | see gencode.dmp fille | not processed |
8 | inherited GC flag | 1 if node inherits genetic code from parent | not processed |
9 | mitochondrial genetic code id | see gencode.dmp file | not processed |
10 | inherited MGC flag | 1 if node inherits mitochondrial gencode from parent | not processed |
11 | GenBank hidden flag | 1 if name is suppressed in GenBank entry lineage | not processed |
12 | hidden subtree root flag | 1 if this subtree has no sequence data yet | not processed |
13 | comments | free text comments and citations | not processed |
14 | plastid genetic code id | see gencode.dmp file | not processed |
15 | inherited PGC flag | 1 if node inherits plastid gencode from parent | not processed |
16 | specified_species | 1 if species in the node's lineage has formal name | Used to identify atoms to include UMLS processing; see "Note" above |