Unified Medical Language System® (UMLS®)
2012AB NCBI Taxonomy Source Information
Notes
Many concepts and terms from the NCBI Taxonomy are excluded during
Metathesarus source processing. The criteria for determining
which concepts and terms are excluded or retained are outlined
below. See term type descriptions for additional information
1. Exclude all names that do not have one of the following name
classes:
scientific name
synonym
equivalent name
common name
authority
2. Exclude all concepts below the "species" level in the
hierarchy. Selected concepts with a rank of "no rank" may be
retained, depending on their hierarchical level.
3. Exclude all concepts with a "division id" value of 11
(environmental samples) and their descendents.
4. Exclude concepts and terms based on certain patterns, e.g.
remove concepts with rank = "species" and have the following words in the
scientific name "uncultured," "clone," "unidentified," "uncultivated."
Summary of Changes:
)1) Dropped properties name class = unpublished name and name class = unique unpublished name in the name.dump file resulted in deprecated attribute names (ATNs):
- UNIQ_UNPUBL_NAME
- UNPUBL_NAME
Source-Provided Files: Summary
()The complete NCBI release can be downloaded from the taxonomy ftp site: ftp://ftp.ncbi.nih.gov/pub/taxonomy/
Documentation and Reference
| File Name | Description |
|---|---|
| readme.txt |
README for file descriptions |
Data Files
| File Name | Description |
|---|---|
| citations.dmp* | Citations file (not processed) |
| delnodes.dmp* | Deleted nodes file (not processed) |
| division.dmp |
Divisions file |
| gc.prt* | Genetic code table (not processed) |
| gencode.dmp* |
Genetic codes file (not processed) |
| merged.dmp* | Merged nodes file (not processed) |
| names.dmp | Taxonomy names file |
| nodes.dmp | Taxonomy nodes file |
Not included:
Selected files and fields are not processed. In addition,
certain concepts and terms are not included in the Metathesaurus based
on the criteria described in the "Notes" section above.
Source-Provided Files: Details
()Details on format of input files and representation of source data.
file: division.dmp
Divisions
| # | Field Name | Description | Representation |
|---|---|---|---|
| 1 |
division id |
taxonomy database division id |
Used to map the "division id" field of nodes.dmp to the
expanded value found in the "division name" |
| 2 |
division cde |
GenBank division code (three characters) |
not processed |
| 3 |
division name |
division name |
MRSAT.ATN = "DIV" |
| 4 |
comments |
comments |
not processed |
file: names.dmp
| # | Field Name | Description | Representation | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 |
tax_id |
identifier of node associated with this name |
MRCONSO.CODE MRCONSO.SCUI |
||||||||||||||||||
| 2 |
name_txt |
name itself |
MRCONSO.STR |
||||||||||||||||||
| 3 |
unique name |
unique variant of the name if
not unique |
MRCONSO.STR |
||||||||||||||||||
| 4 |
name class |
type of name Only the following name class values are included in the Metathesaurus: scientific name synonym equivalent name common name authority |
Used to assign MRCONSO.TTY TTY values are assigned as follows:
Atoms with other "name class" values are excluded during UMLS source processing |
file: nodes.dmp
| # | Field Name | Description | Representation |
|---|---|---|---|
| 1 |
tax_id |
node id in GenBank taxonomy database |
Used to create the hierarchy. Also used to identify concepts to be excluded based on "rank": all concepts below the "species" level are excluded. |
| 2 |
parent_tax_id |
parent node id in GenBank taxonomy database |
Used to create the hierarchy. Also used to identify concepts to be excluded based on "rank": all concepts below the "species" level are excluded. |
| 3 |
rank |
rank of this node (e.g.
superkingdome, kingdom, etc.) |
MRSAT.ATN = "RANK" Also used to identify concepts to be excluded based on "rank": all concepts below the "species" level are excluded. |
| 4 |
embl code |
locus-name prefix |
not processed |
| 5 |
division id |
division id (see division.dmp
ifle) |
MRSAT.ATN = "DIV" The ATV is the value of the division name for this division id, from division.dmp |
| 6 |
inherited div flag |
1 if node inherits division from
parent |
not processed |
| 7 |
genetic code id |
see gencode.dmp fille |
not processed |
| 8 |
inherited GC flag |
1 if node inherits genetic code
from parent |
not processed |
| 9 |
mitochondrial genetic code id |
see gencode.dmp file |
not processed |
| 10 |
inherited MGC flag |
1 if node inherits mitochondrial
gencode from parent |
not processed |
| 11 |
GenBank hidden flag |
1 if name is suppressed in
GenBank entry lineage |
not processed |
| 12 |
hidden subtree root flag |
1 if this subtree has no
sequence data yet |
not processed |
| 13 |
comments |
free text comments and citations |
not processed |
