|
DRAFT
SECTION 2
METATHESAURUS®
DRAFT
2.0 INTRODUCTION
The Metathesaurus is a very large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. Designed for use by system developers, the Metathesaurus is built from the electronic versions of many different thesauri, classifications, code sets, and lists of controlled terms used in patient care,
health services billing, public health statistics, indexing and cataloging biomedical literature, and/or basic, clinical, and health services research. These are referred to as the "source vocabularies" of the Metathesaurus. The term Metathesaurus draws on Webster's Dictionary third definition for the prefix "meta," i.e., "more comprehensive, transcending." In a sense, the Metathesaurus transcends the specific thesauri, vocabularies, and classifications it encompasses.
The Metathesaurus is organized by concept or meaning. In essence, its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts.
The Metathesaurus is linked to other UMLS Knowledge Sources. All concepts in the Metathesaurus are assigned to at least one semantic type from the Semantic Network (Section 3.0). This provides consistent categorization of all concepts in the Metathesaurus at the relatively general level represented in the Semantic Network. Many of the words and multi-word terms that appear in concept names or strings in the Metathesaurus also appear in the SPECIALIST
lexicon (Section 4). The lexical tools (Section 4) are used to generate the word, normalized word, and normalized string indexes to the Metathesaurus. MetamorphoSys (Section 6) is the software tool for customizing the Metathesaurus for specific purposes. It is also the install program for all of the UMLS resources.
2.0.1 Scope of the Metathesaurus
The scope of the Metathesaurus is determined by the combined scope of its source vocabularies. Many relationships (primarily synonymous), concept attributes, and some concept names are added by the NLM during Metathesaurus creation and maintenance, but essentially all the concepts themselves come from one or more of the source vocabularies. With very few exceptions, if none of the source vocabularies contains a concept, that concept will not appear in the Metathesaurus.
2.0.2 Preservation of Content and Meaning from Source Vocabularies
The Metathesaurus reflects and preserves the meanings, concept names, and relationships from its source vocabularies. When two different source vocabularies use the same name for differing concepts, the Metathesaurus represents both of the meanings and indicates which meaning is present in which source vocabulary. When the same concept appears in different hierarchical contexts in different source
vocabularies, the Metathesaurus includes all the hierarchies. When conflicting relationships between two concepts appear in different source vocabularies, both views are included in the Metathesaurus. Although specific concept names or relationships from some source vocabularies may be idiosyncratic and lack face validity, they are still included in the Metathesaurus.
In other words, the Metathesaurus does not represent a comprehensive NLM-authored ontology of biomedicine or a single consistent view of the world (except at the high level of the semantic types assigned to all its concepts).
The Metathesaurus preserves the many views of the world present in its source
vocabularies because these different views may be useful for different tasks.
Although it preserves all the meanings and content in its source vocabularies, the Metathesaurus stores this information in a single common format.
The native format of each vocabulary is carefully studied and then "inverted" into the common Metathesaurus format. For some vocabularies, this involves representing implied information in a more explicit format. For example, if a source vocabulary stores its preferred concept name as the first occurrence in a list of alternative concept names, that first name is explicitly tagged as the preferred name for that source in the Metathesaurus.
2.0.3 Need to Customize the Metathesaurus
Because it is a multi-purpose resource that includes concepts and terms from many different source vocabularies developed for very different purposes, the Metathesaurus must be customized for effective use in most specific applications. Your decisions about what to include in your customized subset(s) of the Metathesaurus will have a significant effect on its utility in your systems. Vocabulary sources that are essential for some purposes, e.g., LOINC for standard exchange of laboratory data, may be detrimental for others, such as natural language processing. It can also be important to exclude a subset of the concept names found in a vocabulary source that is otherwise useful, e.g., non-standard abbreviations or shortened forms that lack face validity or produce spurious results in natural language processing.
The Metathesaurus contains source vocabularies produced by many different copyright holders. The majority of the content of the Metathesaurus is available for use under the basic (and quite open) terms described in sections 1-11 and 13-16 of the Metathesaurus license. However, some vocabulary producers place additional restrictions on the use of their content as distributed within the Metathesaurus. The various levels of additional restrictions are described in Section 12 of the license. The level that applies to individual vocabularies is recorded in the Appendix to the license in Appendix B.4 to this
documentation, and in the MetamorphoSys install and customization program (section 6.0). If a UMLS user already has a separate license for use of one of the source
vocabularies, the user’s existing license also applies to that source as distributed within the Metathesaurus. In some cases, UMLS users may have to request permission or negotiate a separate license with a vocabulary producer in order to use that vocabulary in a production system. There may be a charge associated with these separate permissions or license agreements.
The Metathesaurus is designed to facilitate customization. All information in the Metathesaurus is labeled as to its source(s), so it is possible to determine which concept names, attributes, and relationships come from which source vocabularies and which attributes and relationships were added during Metathesaurus construction. The labels allow UMLS users to subset the Metathesaurus by excluding information from specific source vocabularies, including those for which they do not have necessary licenses or permissions. It is also easy to exclude all source vocabularies that have particular restriction levels or all information in particular languages. In addition to identifying the source(s), restriction levels, and language of the information it contains, the Metathesaurus includes various more specific concept name flags and relationship labels that can help UMLS users to exclude content that is not relevant or helpful for particular applications.
MetamorphoSys, the install and customization program distributed with the UMLS (Section 6), makes it easy to generate custom subsets. MetamorphoSys also includes default settings that generate subsets that may be generally useful. MetamorphoSys can be also used to change the default preferred names of concepts (explained in Section 2.2.6); to change the default character set (from 7-bit ASCII to Unicode UTF8); and to include versioned vocabulary source abbreviations in every Metathesaurus file (see section 2.1).
2.0.4 Metathesaurus Release Formats
Metathesaurus users may select from two relational formats: the Rich Release Format (RRF), introduced in 2004, and the Original Release Format (ORF). Both are available as output options of MetamorphoSys, the UMLS install and customization program (Section 6). All Rich Release Format file names have an extension (.RRF). Original Release Format files have no extension. Both formats are described in this documentation (usually abbreviated as RRF and ORF). There is also a White Paper explaining the rationale for the Rich Release Format and a detailed description of the differences between the .RRF files and the Original Format files.
The Rich Release Format has a number of advantages and is the preferred format for new users of the Metathesaurus and for most data creation applications.
2.1 SOURCE VOCABULARY
The Metathesaurus contains concepts, concept names, and other attributes from more than 100 terminologies, classifications, and thesauri, some in multiple editions. There is a concept in the Metathesaurus for each source vocabulary itself, which is assigned the semantic type "Intellectual Product". A special file (MRSAB.RRF and MRSAB in ORF) stores the version of each source vocabulary present in a particular edition of the Metathesaurus. All other Metathesaurus files that reference source vocabularies
use "root" or versionless abbreviations, e.g., ICD9CM, not ICD9CM2003, thus avoiding routine wholesale updates to reflect the new versions. If you prefer to have versioned vocabulary source abbreviations in your custom Metathesaurus subset files, MetamorphoSys offers this as an option.
A complete list of the Metathesaurus source vocabularies with their root and versioned source abbreviations appears in Appendix B.4 of this documentation. The list is alphabetized by the abbreviation for that vocabulary source that is used in the Metathesaurus. Appendix B.4 includes the other information including: the number of its concept names that are present in the Metathesaurus, the type of hierarchies or contexts it has (if any), and whether it is one of the small number of source vocabularies that is not routinely updated in the Metathesaurus.
The Metathesaurus source vocabularies include terminologies designed for use in patient-record systems; large disease and procedure classifications used for statistical reporting and billing; more narrowly focused vocabularies used to record data related to psychiatry, nursing, medical devices, adverse drug reactions, etc.; disease and finding terminologies from expert diagnostic systems; and some thesauri used in information retrieval. A categorized list of the English-language source vocabularies is available.
2.1.1 Inclusion of U.S. Standard Code Sets and Terminologies
The Metathesaurus includes the code sets mandated for use in electronic
administrative transactions in the U.S. under the provisions of the Health Insurance Portability and Accountability Act (HIPAA). With the exception of the National Drug Codes (NDC), the Metathesaurus includes all concepts and terms from these code sets. NDC codes available from the Food and Drug Administration are included as attributes of clinical drug concepts present in the FDA National Drug Code Directory (MTHFDA), which is a source vocabulary.
NLM intends to incorporate all clinical terminologies designated as target U.S. government-wide standards by the Consolidated Health Informatics (CHI) initiative and/or recommended as U.S. standards by the National Committee on Vital and Health Statistics. Several of these (e.g., LOINC, SNOMED CT, RxNorm) are already present in the Metathesaurus.
The fact that a vocabulary has been designated as a HIPAA or CHI standard is included in Appendix B.4
2.1.2 Inclusion of Languages Other than English
The Metathesaurus structure can accommodate translations of its source vocabularies into languages other than English. Many translations in many different languages are present in this edition of the Metathesaurus. The Metathesaurus includes many translations of some source vocabularies, e.g., NLM’s Medical Subject Headings (MeSH) and the International Classification of Primary Care; one or a few of others, and, in many cases, only the English version. As previously explained, MetamorphoSys (see Section 6) makes it easy to create a subset of the Metathesaurus that excludes the languages that are not relevant in a particular application.
2.2 CONCEPTS, CONCEPT NAMES, AND THEIR IDENTIFIERS
The Metathesaurus is organized by concept. One of its primary purposes is to connect
different names for the same concept from many different vocabularies. The Metathesaurus assigns several types of unique, permanent identifiers to the concepts and concept names it contains, in addition to retaining all identifiers that are present in the source vocabularies. The Metathesaurus "concept structure" includes concept names, their identifiers, and key characteristics of these concept names (e.g., language, vocabulary source, name type). The entire concept structure appears in a single file in the Rich Release Format (MRCONSO.RRF). An abbreviated version of the concept structure is split between two files in the Original Format (MRCON and MRSO).
2.2.1 Concepts and Concept Identifiers
A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms). This is not an exact science. The construction of the Metathesaurus is based on the assumption that specially trained subject experts can determine synonymy with a degree of accuracy that is highly useful. Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view.
Each concept or meaning in the Metathesaurus has a unique and permanent concept identifier (CUI). The CUI has no intrinsic meaning. In other words, you cannot infer anything about a concept just by looking at its CUI. In principle, the identifier for a concept never changes, irrespective of changes over time in the names that are attached to it in the Metathesaurus or in the source vocabularies.
In actuality, a CUI will be removed from the Metathesaurus when it is discovered that two CUIs actually name the same concept – in other words, when undiscovered synonymy comes to light. In these cases, one of the two CUIs will be retained, all relevant information in the Metathesaurus will be linked to it, and the other CUI will be retired.
Retired CUIs are never re-used. Each edition of the Metathesaurus includes files that detail any such changes from the previous edition. One Metathesaurus file (MRCUI.RRF and MRCUI in ORF) tracks such changes from 1991 to the present, allowing users to determine the fate of any CUI that is no longer present in the Metathesaurus.
2.2.2 Concept Names and String Identifiers
Each unique concept name or string in each language in the Metathesaurus has a unique and permanent string identifier (SUI). Any variation in character set, upper-lower case, or punctuation is a separate string, with a separate SUI. The same string in different languages (e.g., English and Spanish) will have a different string identifier for each language. If the same string, e.g., Cold, has more than one meaning, the string identifier will be linked to more than one concept identifier (CUI).
2.2.3 Atoms and Atom Identifiers
The basic building blocks or "atoms" from which the Metathesaurus is constructed are the concept names or strings from each of the source vocabularies. Each and every occurrence of a string in each source vocabulary is assigned a unique atom identifier (AUI). If exactly the same string appears twice in the same vocabulary, for example, as both the long name and the short name for the same concept or as an alternate name for two different concepts in the same vocabulary source, a unique AUI is assigned for each occurrence. When the same string appears in multiple source vocabularies, it will have AUIs for every time it appears as a concept name in each of those sources. All of these AUIs will be linked to a single string identifier (SUI), since they represent occurrences of the same string. Unlike string identifiers, a single AUI is always linked to a single concept identifier, because each
occurrence of a string in a source can only have one meaning.
AUIs appear in the RRF (.RRF files), but not in the ORF.
2.2.4 "Terms" and Lexical Identifiers
For English language entries in the Metathesaurus only, each
string is linked to all of its lexical variants or minor variations by means of a common term identifier (LUI).
(In the Metathesaurus, therefore, an English "term" is the group of all strings that are lexical variants
of each other.) English lexical variants are detected using the lvg program, one of the UMLS lexical tools
(see Section 4). As similar tools become available for other languages, they may be used to create lexical
variant groups in other languages. (In the meantime, the LUI
for a non-English string is really another string identifier.)
Like a string identifier, the LUI for an English string may be linked to more than one concept.
This occurs when strings that are lexical variants of each other have different meanings. In contrast,
each string identifier and each atom identifier can only be linked to a single LUI.
2.2.5 Uses of Concept, String, Atom, and Term Identifiers
In the Metathesaurus, every CUI (concept) is linked to at least one AUI (atom), SUI (string),
and LUI (term), but can be linked to many of each of these. Every AUI (atom) is linked to a single
SUI (string), a single LUI (term), and a single CUI (concept). Each SUI (string) can be linked to many AUIs
(atoms), to a single LUI (term), and to more than one CUI (concept) – although the typical case is
one CUI. Each LUI (term) can be linked to many AUIs (atoms), many SUIs (strings), and more than
one CUI (concept) – although the typical case is one CUI.
FIGURE 1.
|
Concept (CUI)
|
Terms (LUIs)
|
Strings (SUIs)
|
Atoms (AUIs)
* RRF Only
|
|
C0004238
Atrial Fibrillation
(preferred)
Atrial Fibrillations
Auricular Fibrillation
Auricular Fibrillations
|
L0004238
Atrial Fibrillation
(preferred)
Atrial Fibrillations
|
S0016668
Atrial Fibrillation
(preferred)
|
A0027665
Atrial Fibrillation
(from MSH)
A0027667
Atrial Fibrillation
(from PSY)
|
|
S0016669
Atrial Fibrillations
|
A0027668
Atrial Fibrillations
(from MSH)
|
|
L0004327
(synonym)
Auricular Fibrillation
Auricular Fibrillations
|
S0016899
Auricular Fibrillation
(preferred)
|
A0027930
Auricular Fibrillation
(from PSY)
|
|
S0016900
(plural variant)
Auricular Fibrillations
|
A0027932
Auricular Fibrillations
(from MSH)
|
In the abbreviated example in Figure 1, "Atrial Fibrillation"
appears as an atom in more than one source vocabulary and has a distinct AUI for each occurrence.
Since each of these atoms has an identical string or concept name, they are linked to a single
SUI. "Atrial Fibrillations", the plural of "Atrial Fibrillation" has a different string identifier.
Since the singular and plural are lexical variants of each other, both are linked to the same LUI.
There is a different LUI and different SUIs and AUIs for "Auricular Fibrillation" and its
plural "Auricular Fibrillations." Since "Atrial Fibrillation" and "Auricular Fibrillation" have been
judged to have the same meaning, they are linked to the same CUI.
All of these identifiers serve important purposes in building the Metathesaurus, in allowing
efficient and accurate customization for specific purposes, and in identifying changes in its concept and
concept name coverage over time.
CUIs link all information in the Metathesaurus related to particular concepts. In
other words, a CUI can be used to retrieve all the concept names, relationships, and attributes for a
particular concept that appear in any Metathesaurus file. CUIs also serve as permanent, publicly available
identifiers for biomedical concepts or meanings to which many individual source vocabularies are linked.
Users of the Metathesaurus are strongly encouraged to incorporate CUIs in their local applications –
to support data exchange and linking and to assist migration between the use of individual source
vocabularies should that become necessary in the future.
Users of the Metathesaurus are also encouraged to incorporate
SUIs in local applications. Inclusion of SUIs will allow more efficient updating of local systems as new
versions of the Metathesaurus are issued.
The value of retaining LUIs in local applications (as opposed to their use in creating the
customized version of the Metathesaurus to be used locally) will vary depending on local system approaches
to detecting and dealing with minor variations in language.
AUIs link all information in the Metathesaurus related to particular atoms or
occurrences of strings in a specific source vocabulary. AUIs can
assist users of the Metathesaurus in identifying those cases in which a source vocabulary’s concept
structure differs from that of the Metathesaurus. Many users of the Metathesaurus will have no
need to store these identifiers in local applications.
2.2.6 Default Preferred Names for Metathesaurus Concepts
As a convenience for those who build the Metathesaurus, one string from one English term is
designated and labeled as the default preferred name of each concept in the Metathesaurus.
To avoid laborious selection among alternative terms and strings, selection of the default preferred name
for any Metathesaurus concept is based on an order of precedence of all the types of English strings in
all the Metathesaurus source vocabularies. Different types of
strings, e.g., preferred terms, cross references, abbreviations, from each vocabulary will have different
positions in this order. The factors considered in establishing the default order of precedence include
breadth of subject coverage, frequency of update, and the degree to which the source's concept
names are used in regular clinical or biomedical discourse. The default order of precedence appears in the
MRRANK.RRF, in MRRANK in ORF, and in Appendix B, Section B.5 of this documentation.
The default order of precedence will not be suitable for all applications of the Metathesaurus.
MetamorphoSys (Section 6) can be used to change the selection of preferred names to feature terminology
from the source vocabularies most appropriate to particular user populations. For example, concept names
from SNOMED CT may be preferred in clinical
applications, and terminology from MeSH may be preferred in literature retrieval systems.
2.2.7 Strings with Multiple Meanings
In some cases, the same
name (with or without differences in upper-lower case) may apply to different concepts, usually (but
not always) in different Metathesaurus source vocabularies. In the abbreviated example that follows,
the string "Cold" is a name for the temperature in one vocabulary. In another vocabulary,
"Cold" is an alternate name for the "Common cold". In a third vocabulary, "COLD" is an acronym for
"chronic obstructive lung disease". As a result, "Cold" or "COLD" appears as a name of more than
one concept in the Metathesaurus. The plain strings "Cold"
and "COLD" have explicit "ambiguous string" indicators in the Metathesaurus (a value of A in the AM attribute) .
It necessary, descriptive names have also been created by Metathesaurus editors to avoid situations
in which ambiguous names, such as "Cold" might otherwise be the default preferred name for a Metathesaurus concept.
Where they exist, these disambiguating names have the highest precedence in the Metathesaurus.
In the past, artificial strings, e.g., cold <1> were created to give each meaning a unique name. Such strings continue to appear in the
Metathesaurus, but are not being generated for new entries. There are separate files containing too, LUIs and SUIs of all ambiguous terms and strings known to the Metathesaurus (AMBIGLUI.RRF, AMBIG.LUI in ORF, AMBIGSUI.RRF, AMBIG.SUI in ORF).
FIGURE 2.
|
Concepts
(CUIs)
|
Terms
(LUIs)
|
Strings
(SUIs)
|
Atoms
(AUIs)
** RRF only
|
|
C0009264
cold temperature
|
L0215040
cold temperature
|
S0288775
cold temperature
|
A0318651
cold temperature
(from CSP)
|
|
L0009264
Cold <1>
Cold
|
S0007170
Cold <1>
|
A0016032
Cold <1>
(from MTH)
|
|
S0026353
Cold
|
A0040712
Cold
(from MSH)
|
|
C0009443
Common Cold
|
L0009443
Common Cold
|
S0026747
Common Cold
|
A0041261
Common Cold
(from MSH)
|
|
L0009264
Cold
|
|
|
|
S0026353
Cold
|
A0040708
Cold
(from COSTAR)
|
|
C0024117
Chronic Obstructive
Airway Disease
|
L0498186
Chronic Obstructive
Airway Disease
|
S0837575
Chronic Obstructive
Airway Disease
|
A0896021
Chronic Obstructive
Airway Disease
(from MSH)
|
|
L0008703
Chronic Obstructive
Lung Disease
|
S0837576
Chronic Obstructive
Lung Disease
|
A0896023
Chronic Obstructive
Lung Disease
(from MSH)
|
|
L0009264
COLD <3>
COLD
|
S0829315
COLD <3>
|
A0887858
COLD <3>
(from MTH)
|
|
S0474508
COLD
|
A0539536
COLD
(from SNMI)
|
2.2.8
Concept
Names added during Metathesaurus Construction
Although the vast majority of concept names present in the
Metathesaurus
come from one or more of its source vocabularies, some concept names
are
created during Metathesaurus construction. This
occurs in the following circumstances:
A unique name is created for a string with multiple meanings (the case
explained in Section 2.2.7),
(b)
A
more explicit name is created when none of the source vocabulary names
for a
concept conveys its meaning adequately,
(c)
An
American English variant is generated for a British spelling,
(d)
An
equivalent basic Latin ASCII character set string is generated for a
string in
an extended character set, such as Unicode.
Like all other concept names in the Metathesaurus, names
created during
Metathesaurus construction are labeled to indicate their source.
2.3 RELATIONSHIPS AND RELATIONSHIP
IDENTIFIERS
The Metathesaurus includes many relationships between
different concepts (in
addition to the synonymous relationships in the Metathesaurus concept
structure
described in Section 2.2). Most of these
relationships come from individual source vocabularies.
Some are added by NLM during Metathesaurus
construction. Some have been contributed
by Metathesaurus users to support certain types of applications.
Relationships are expressed in terms of CUIs (in the RRF and
ORF) and AUIs
(in the RRF only). Metathesaurus
relationship files do not include concept names.
In general, the Metathesaurus indicates the author of each
relationship,
that is, one of the source vocabularies, the Metathesaurus itself, or
another
supplier. Some relationships added
in
the early years of Metathesaurus development (less than 6% of the
current total
and declining) are attributed to the Metathesaurus, but actually came
from
specific source vocabularies.
2.3.1 Basic categories of non-synonymous relationships
The Metathesaurus contains non-synonymous relationships
between concepts from the same source vocabulary (intra-source vocabulary relationships)
and between concepts in different vocabularies (inter-source vocabulary relationships).
The Metathesaurus does not include all possible non-synonymous
relationships between the concepts it contains.
It includes all relationships present in its source vocabularies and some additional
relationships designed to connect related concepts. In general, the relationships asserted by source
vocabularies connect closely related concepts, such as those that share some common property or are related
by definition. For example, a member of a class of drugs (e.g., penicillin) will be connected to the name
for the class (e.g., antibiotics); a bacterial infection
will be connected to the bacterium that causes it.
2.3.1.1
Intra-Source Relationships
The majority of intra-source relationships are asserted or implied by the
individual source vocabularies. Such relationships occur in a source
vocabulary’s explicit or implied hierarchical arrangements or contexts, cross-reference structures,
rules for applying qualifiers, or connections between different types of names for the same concept (e.g.,
abbreviations and full forms). The primary Metathesaurus relationships file,
that is, MRREL.RRF and MRREL in the ORF) contains the "distance -1" hierarchical relationships, i.e.; immediate parents; immediate child, and immediate sibling relationships; as well as other types of intra-source relationships.
A subset of the contextual or hierarchical relationships are also distributed in a special
contexts file (MRCXT.RRF and MRCXT in ORF) to facilitate the construction of user displays. A
"computable" representation of the complete hierarchies is provided in MRHIER.RRF (in RRF only). This file
represents all sibling relationships even when there are thousands of siblings. Appendix B.4 indicates
which source vocabularies have hierarchical contexts, which of these allow concepts to appear in multiple
hierarchies, and whether sibling relationships are represented in MRCXT or only in MRHIER.RRF.
Some of the intra-source vocabulary relationships are statistical relationships,
which are computed by determining the frequency with which concepts in specific vocabularies
co-occur in records in a database. For example, there are co-occurrence relationships for the number of
times concepts have co-occurred as key topics within the same articles, as evidenced by the
Medical Subject Headings assigned to those articles in the MEDLINE database. Co-occurrence relationships
have been also computed for different ICD-9-CM diagnosis codes assigned to the same patients
as reflected in a discharge summary database. In contrast to the relationships asserted within source
vocabularies, the statistical relationships in the Metathesaurus can connect very different
concepts, such as diseases and drugs. There are specific Metathesaurus files for the co-occurrence
relationships (MRCOC.RRF and MRCOC in ORF).
2.3.1.2 Inter-Source Relationships
The primary inter-source relationships in the Metathesaurus are the synonymous
relationships represented in the Metathesaurus concept structure (Section 2.2).
The Metathesaurus also includes some relationships between non-synonymous concepts from
different source vocabularies. Some of these inter-source relationships are generated during Metathesaurus
construction to connect specific "orphan" concepts (with few or no ancestors, siblings, or children in
their own source vocabularies) to the richer contextual information in another source vocabulary.
Some are supplied by Metathesaurus users who find "like" or "similar"
relationships a useful addition to the Metathesaurus’s relatively strict view of synonymy.
In both cases, these relationships are distributed in MRREL.RRF and MRREL in ORF.
Many inter-source relationships between non-synonymous concepts are produced through specific
efforts to create a mapping between two different source vocabularies. These mappings may be created by an
individual source vocabulary producer, by a third party with a particular need for a mapping, or by NLM
or under NLM supervision specifically for distribution within the Metathesaurus. The number of
NLM-supervised mappings is expected to increase. There are specific Metathesaurus files for
mappings in the RRF (MRMAP.RRF and MRSMAP.RRF). A subset of the mappings appear in MRATX in the ORF.
Mappings involving SNOMED CT appear in the RRF only.
2.3.2
Relationship Labels
In addition to being identified as to their source, all relationships (outside the basic concept
structure) in the Metathesaurus carry a general label (REL), describing their basic nature, such as Broader,
Narrower, Child of, Qualifier of, etc. Most of these relationships are either directly asserted in a
source vocabulary or are implied by the structure of the source vocabulary. A complete list of the general
relationship labels appears MRDOC.RRF and MRDOC in Appendix B.3 in this documentation.
About a quarter of the relationships in the Metathesaurus also carry an additional label (RELA),
obtained from a source vocabulary, that explains the nature of the relationship more exactly, such as is a,
branch_of, component_of.
The Digital Anatomist vocabulary and RxNorm are examples of source vocabularies that include
such relationship labels. A complete list of the additional relationship labels appears in MRDOC.RRF and in Appendix B.3 in this documentation.
2.3.3 Relationship Identifiers
Every relationship present in the Metathesaurus has a unique relationship identifier (RUI).
The primary purpose of these identifiers is to enable easy detection of changes in relationships
across versions of the Metathesaurus. The appearance or disappearance of a relationship identifier
indicates a change in the relationships present in the Metathesaurus.
Some source vocabularies have their own relationship identifiers. Where they exist, these
identifiers are also present in the Metathesaurus.
2.4
ATTRIBUTES AND ATTRIBUTE IDENTIFIERS
In the Metathesaurus, attributes include every discrete piece of information
about a concept, an atom, or a relationship that is not (1) part of the basic Metathesaurus concept
structure (Section 2.2) or (2) distributed in one of the relationship files (Section 2.3).
2.4.1 Kinds of Attributes
The Metathesaurus includes concept attributes, atom attributes, and relationship attributes.
Concept attributes are added during Metathesaurus construction and apply to all names
of a concept. For example, the semantic types
"Pathologic Function" and "Finding" are attributes of the concept with the preferred name "Atrial
Fibrillation" and are applicable to any atom connected to that concept.
Atom attributes come from a particular source vocabulary. Some of them are of general
interest; others are relevant only to a particular source vocabulary. For example, the definition
"Disorder of cardiac rhythm characterized by rapid, irregular atrial impulses and ineffective atrial
contractions." is an attribute of the atom "Atrial Fibrillation" that comes from the Medical Subject
Headings (MeSH). It may be one of several definitions connected to names
of this concept, because the Metathesaurus includes all definitions provided by any of its source
vocabularies. Although this particular definition comes from MeSH, it might well be useful in
Metathesaurus applications that otherwise do not use MeSH.
In contrast, the date an occurrence of a string (an atom) was added to a source vocabulary
applies only to that specific atom. The utility of specific atom attributes will vary considerably for
different applications of the Metathesaurus.
Relationship attributes come from a particular source vocabulary and describe special characteristics of particular relationships in that source, e.g.,
refinability.
The majority of attributes are distributed in MRSAT.RRF and MRSAT in the ORF.
In these files, each row contains the name of the attribute, the source of the attribute, and
the value of the attribute, in addition to all appropriate identifiers. There are
separate files for selected attributes such as the semantic types (MRSTY.RRF and MRSTY in the ORF) and
the definitions (MRDEF.RRF and MRDEF in the ORF).
2.4.2 Attribute Identifiers
Each occurrence of each attribute within the Metathesaurus is assigned a
unique attribute identifier (ATUI). The appearance or disappearance of ATUIs signals changes in the
content of the Metathesaurus, thus ATUIs assist the efficient production of a complete
change set for each new version of the Metathesaurus. ATUIs appear only in the RRF, not in the ORF.
2.5
DATA ABOUT THE METATHESAURUS
The Metathesaurus contains a number of files that provide useful "metadata" or
data about the Metathesaurus itself. The Metadata files describe (1) characteristics of the current version of
the Metathesaurus; (2) changes between the current version and the previous version; and (3) the history
of concept identifiers (CUIs) from 1991 to the present.
2.5.1
Characteristics of the Current Metathesaurus
There are discrete Metathesaurus files for:
a)
the names and sizes of every Metathesaurus file (MRFILES.RRF and MRFILES in ORF),
b)
the names and size range of every Metathesaurus data element (MRCOLS.RRF and MRCOLS
in ORF),
c)
the possible values for selected data elements
that contain a finite set of abbreviated values (MRDOC.RRF only). NOTE: eventually this file will include
values for every data element that contains a finite set of abbreviated values
d)
the source vocabularies in the Metathesaurus (MRSAB.RRF and MRSAB in ORF),
e)
the LUIs and SUIs for terms and strings that are known to be ambiguous, that is, to
have multiple meanings (to be linked to multiple concept identifiers) within the
Metathesaurus (AMBIGLUI.RRF and AMBIGSUI.RRF in RRF and AMBIGLUI and AMBIGSUI in ORF),
f)
the order of precedence of vocabulary source and term types that is used to compute
the default preferred concept name for each concept in the Metathesaurus (MRRANK.RRF and MRRANK in ORF).
NOTE: MetamorphoSys can be used to change this order.
MRCOLS, MRDOC, MRSAB, and MRRANK contain data that do not appear in the actual Metathesaurus
content files. The others are computable from the Metathesaurus content files. They are pre-computed and
provided in separate files as a convenience to users of the Metathesaurus.
2.5.2
Changes between the Current Metathesaurus and the Previous Version
Each version of the Metathesaurus contains a set of files that summarize changes
from the previous version.
CHANGE/MERGEDCUI.RRF in the RRF (CHANGE/MERGED.CUI in the ORF) documents cases in which two
discrete concepts in the previous version of the Metathesaurus are now considered to be synonyms.
CHANGE/MERGEDLUI.RRF in the RRF (CHANGE/MERGED.LUI in the ORF) documents cases in which two
discrete terms in the previous version of the Metathesaurus are now identified as lexical variants
of each other, based on the current version of luinorm (the program used to compute them).
Three files contain the CUIs, LUIs, and SUIs for Metathesaurus concepts, terms, and strings that
appeared in the previous version, but are not in the
current version (CHANGE/DELETEDCUI.RRF, CHANGE/DELETEDLUI.RRF,
CHANGE/DELETEDSUI.RRF in the RRF and CHANGE/DELETED.CUI,
CHANGE/DELETED.LUI,
CHANGE/DELETED.SUI in the ORF).
NOTE: In future versions of the Metathesaurus, change files will also be provided for
relationships and attributes in the RRF only. The generation of these files is dependent on the relationship
and attribute identifiers (RUI and ATUI) introduced in the 2004AA version of the Metathesaurus.
2.5.3
Historical
CUIs
The retired CUI file
(MRCUI.RRF in RRF and MRCUI in ORF)
includes all CUIs
present in any previous version of the Metathesaurus, but not in the
current
version. In general, the file maps the
retired CUI to one or more current CUIs.
2.6 CONCEPT NAME INDEXES
2.6.0 INTRODUCTION
To assist system developers in building applications
that retrieve all strings or concept names which include specific words
or
groups of words, three indexes to the concept names are provided: a
Word Index,
a Normalized Word Index (for English words only), and a Normalized
String Index
(for English strings only). The indexes are described in sections 2.6.1, and 2.6.3 respectively.
To
make the distinctions among them clearer, the examples include words or
strings
that would appear in each index for the following set of Metathesaurus
concept
names:
|
Lung Diseases, Obstructive
|
(C0024117, L0024117, S0058463)
|
|
Obstructive Lung Diseases
|
(C0024117, L0024117, S0068169)
|
|
Lung Disease, Obstructive
|
(C0024117, L0024117, S0058458)
|
|
Obstructive Lung Disease
|
(C0024117, L0024117, S0068168)
|
2.6.1 WORD INDEX
2.6.1.1 Description
The word index connects each individual word in any
Metathesaurus string to all its related string, term, and concept
identifiers. There are separate word index files for each language in
the Metathesaurus.
There is one entry for each word found in each unique string
in each language. Each entry has five subelements.
|
1.
|
LAT - 3-letter abbreviation for language
|
|
2.
|
WD - Word
|
|
3.
|
CUI - concept unique identifier
|
|
4.
|
LUI - term identifier
|
|
5.
|
SUI - string identifire
|
2.6.1.2 Definition of a Word
For the purpose of creating this index, a word is defined as a
token containing only alphanumeric characters with length one or
greater; for more information, see the SPECIALIST Lexicon and tools.
2.6.1.3 Word Index Example
For the four concept names listed in Section 2.6.0, the word
index will contain multiple entries for each of the following words:
disease, diseases, lung, obstructive. Two of the entries generated for
the names "Lung Disease, Obstructive" and Obstructive Lung Disease" are
shown
below:
ENG|disease|C0024117|L0024117|S0058458|
ENG|disease|C0024117|L0024117|S0068168|
2.6.2 NORMALIZED WORD INDEX
2.6.2.1 Description
The normalized word index connects each individual normalized
English word
to all its related string, term, and concept identifiers.
There is one entry for each normalized word found in each
unique English string. There are no entries for other languages in this
index. Each entry has five subelements.
|
1.
|
LAT - (always ENG in this edition of the Metathesaurus)
|
|
2.
|
NWD - normalized word
|
|
3.
|
CUI - concept unique identifire
|
|
4.
|
LUI - term unique identifire
|
|
5.
|
SUI - string identifire
|
2.6.2.2 Definition of Normalized Word
The normalization process involves breaking a string into its
constituent words, lowercasing each word and converting it to its
uninflected form. Normalized words are generated by uninflecting each
word and stripping out a small number of stop words. The uninflected
forms are generated using the SPECIALIST lexicon if the words appear in
the lexicon; otherwise they are generated algorithmically.
2.6.2.3 Normalized Word Example
For the four concept names listed in Section 2.6.0 the
normalized word index will contain multiple entries for each of the
following words: disease, lung, obstructive. Since the normalized word
index contains base forms only, it does not contain entries for the
plural "diseases". In this index, therefore, all four concept names are
linked to the normalized word "disease", as follows:
ENG|disease|C0024117|L0024117|S0058458|
ENG|disease|C0024117|L0024117|S0058463|
ENG|disease|C0024117|L0024117|S0068168|
ENG|disease|C0024117|L0024117|S0068169|
2.6.3 NORMALIZED STRING INDEX
2.6.3.1 Description
The normalized string index connects the normalized form of a
Metathesaurus string to all its related string, term, and concept
identifiers. There is one entry for each unique (non-normalized)
English string. There
are no entries for other languages in this index. Each entry has five
subelements.
|
1.
|
LAT (always ENG in this edition of the Metathesaurus)
|
|
2.
|
NSTR - normalized string
|
|
3.
|
CUI - concept unique identifire
|
|
4.
|
LUI - term identifire
|
|
5.
|
SUI - string identifire
|
2.6.3.2 Definition of Normalized String
The normalization process involves breaking a string into its
constituent words, lowercasing each word, converting each word to its
uninflected form, and sorting the words in alphabetic order. Normalized
strings are generated by uninflecting each word leaving out a small
number of stop words. The uninflected forms are generated using the
SPECIALIST lexicon if the words appear in the lexicon; otherwise they
are generated algorithmically.
2.6.3.3 Normalized String Example
Since the four concept names listed in Section 2.6.0 are
composed of the same set of normalized words, the Normalized String
Index will contain four entries for a single string: disease lung
obstructive, in which the component normalized words appear in
alphabetical order. The complete set of Normalized String Index
entries generated by the four concept names is as follows:
ENG|disease lung
obstructive|C0024117|L0024117|S0058458|
ENG|disease lung
obstructive|C0024117|L0024117|S0058463|
ENG|disease lung
obstructive|C0024117|L0024115|S0068168|
ENG|disease lung
obstructive|C0024117|L0024117|S0068169|
2.6.4 WORD INDEX PROGRAMS
The programs that generate these indexes are written in Java.
They may be of use to system developers who are developing their own
interfaces to the UMLS data or for other purposes. Section
4 includes information about these and other lexical programs provided
with the UMLS Knowledge Sources.
2.7 FILE FORMATS - METATHESAURUS RICH
RELEASE FORMAT (RRF) AND ORIGINAL
RELEASE FORMAT (ORF)
2.7.0 INTRODUCTION
Metathesaurus users may select from two relational
formats: the Rich Release Format (RRF), first introduced in 2004,
and the Original Release Format (ORF). Both are available as
output options of MetamorphoSys, the UMLS install and customization
program (Section 6).
Developers are encouraged to use the RRF, which offers
significant advantages in source vocabulary "transparency" (that
is,
ability to represent the detailed semantics of each source vocabulary
exactly); in the ability to generate complete and accurate change sets
between versions of the Metathesaurus; and in more convenient
representations of concept name, source, and hierarchical context
information. A more complete discussion of the rationale for the RRF and a detailed description of
the differences between the two formats are available.
Neither Metathesaurus format is fully normalized. By
design, there is duplication of data among different files and within
certain files. In particular, relationships between different
Metathesaurus concepts appear twice (e.g., from entry A to entry B and
from entry B to entry A). Developers will need to make their own
decisions about the extent to which this redundancy should be retained,
reduced, or increased for their specfic applications.
Section 2.7.1 describes the files in the RRF. Section
2.7.2 describes the files in the ORF.
2.7.1 METATHESAURUS RICH RELEASE FORMAT
(RRF)
All file names begin with the letters MR (Metathesaurus
Relational) and are followed by letters that denote the file contents
(e.g., MRREL=relationships, MRSAB=source abbreviations), and then a
file extension .RRF.
All files except MRRANK.RRF are sorted by row.
2.7.1.1 Data Files
The data in each Metathesaurus entry may be represented in
more than 20 different "relations" or files. These files correspond to
the four logical groups of data elements described in Sections 2.2- 2.5
and the indexes described in Section 2.6 as follows:
Concepts, Concept Names, and their sources (2.2) = MRCONSO.RRF
Attributes (2.3) = MRSAT.RRF, MRDEF.RRF, MRSTY.RRF, MRLO.RRF, MRHIST.RRF
Relationships (2.4) = MRREL.RRF, MRCOC.RRF, MRCXT.RRF, MRHIER.RRF, MRMAP.RRF, MRSMAP.RRF
Data about the Metathesaurus (2.5) = MRFILES.RRF, MRCOLS.RRF, MRDOC.RRF, MRRANK.RRF,
MRSAB.RRF, AMBIGLUI.RRF, AMBIGSUI.RRF, CHANGE/MERGEDCUI.RRF,
CHANGE/MERGEDLUI.RRF, CHANGE/DELETEDCUI.RRF, CHANGE/DELETEDLUI.RRF,
CHANGE/DELETEDSUI.RRF, MRCUI.RRF
Indexes (2.6) = MRXW_BAQ.RRF,
MRXW_DAN.RRF, MRXW_DUT.RRF, MRXW_ENG,MRP,
MRXW_FIN.RRF, MRXW_FRE.RRF,
MRXW_GER.RRF, MRXW_HEB.RRF, MRXW_HUN.RRF, MRXW_ITA.RRF, MRXW_NOR.RRF,
MRXW_POR.RRF, MRXW_RUS.RRF,
MRXW_SPA.RRF, MRXW_SWE.RRF, MRXNW_ENG.RRF, MRXNS_ENG.RRF
2.7.1.2 Columns and Rows
Each file or named table of data values has by definition a
fixed number of columns; the number of rows depends on the content of a
particular
version
of the Metathesaurus.
A column is a sequence of all the values in a given data
element or logical subelement. In general, columns for longer variable
length data elements will appear to the right of columns for shorter
and/or fixed length data elements. The information for all columns in
the files is described in MRCOLS.RRF and in Appendix
B.1.1,
Metathesaurus Column Descriptions.
A row contains the values for one or more data elements or
logical subelements for one Metathesaurus entry. Depending on the
nature of the data elements involved, each Metathesaurus entry may have
one or more rows in a
given file. The values for the different data elements or logical
subelements represented in the row are separated by vertical bars (|).
If an optional element is blank, the vertical bars are still used to
maintain the correct positioning of the subsequent elements. Each row
is terminated by a
vertical bar and line termination.
2.7.1.3 Descriptions of Each File
The descriptions of the files appear in the following order:
a) Key data about the Metathesaurus: Files; Columns or data
elements; Documentation that explains the meaning of abbreviations that
appear as values in Metathesaurus data elements and attributes,
b) Concept names and their vocabulary sources
c) Attributes
d) Relationships
e) Other data about the Metathesaurus
f) Indexes
Each file description lists the columns or data elements that
appear in the file and includes sample rows from the file.
2.7.1.3.1 Files(File = MRFILES.RRF)
There is exactly one row in this file for each physical segment of each logical file. Data elements that appear in multiple files, e.g., CUI, AUI, will have multiple rows in this file.
|
Col.
|
Description
|
|
FIL
|
Physical FILENAME
|
|
DES
|
Descriptive Name
|
|
FMT
|
Comma separated list of column names (COL),
in order
|
|
CLS
|
# of COLUMNS
|
|
RWS
|
# of ROWS
|
|
BTS
|
Size in bytes in this format (ISO/PC or
Unix)
|
Sample Records
MRCOC.RRF|Co-occurringConcepts|CUI1,AUI1,CUI2,AUI2,SAB,COT,COF,COA,CVF|9|13939548|786509996|
MRSTY.RRF|Semantic Types|CUI,TUI,STN,STY,ATUI,CVF|6|1146352|64528811|
2.7.1.3.2 Data Elements (File =
MRCOLS.RRF)
There is exactly one row in this file for each column or data
element in each file. Data elements that appear in multiple files, e.g., CUI, AUI, will have multiple rows in this file.
|
Col.
|
Description
|
|
COL
|
Column or data element name
|
|
DES
|
Descriptive Name
|
|
REF
|
Documentation Section Number
|
|
MIN
|
Minimum Length, Characters
|
|
AV
|
Average Length
|
|
MAX
|
Maximum Length, Characters
|
|
FIL
|
Physical FILENAME in which this field occurs
|
|
DTY
|
SQL-92 data type for this column
|
Sample Records
AUI|Unique identifier for atom||8|8.00|8|MRCONSO.RRF|char(8)|
CODE|Unique Identifier or code for string in
source||1|6.4|21|MRCONSO.RRF|varchar(50)|
2.7.1.3.3 Documentation for Abbreviated
Values (File =
MRDOC.RRF)
There is exactly one row in this table for each allowed value
of selected data elements or attributes that have a finite number of
abbreviations as allowed values. Examples of such data elements include
TTY, ATN, TS, STT, REL, RELA.
|
Col.
|
Description
|
|
KEY
|
Data element or attribute
|
|
VALUE
|
Abbreviation that is one of its values
|
|
TYPE
|
Type of information in EXPLAIN column
|
|
EXPL
|
Explanation of VALUE
|
Sample Records
ATN|DDF|expanded_form|Drug Doseform|
ATN|DHJC|expanded_form|HCPCS J-code|
2.7.1.3.4 Concept Names and Sources (File
=
MRCONSO.RRF)
There is exactly one row in this file for each atom (each
occurrence of each unique string or concept name within each source
vocabulary) in the Metathesaurus, i.e., there is exactly one row for
each unique AUI in the Metathesaurus. Every string or concept name in
the Metathesaurus appears in this file, connected to its language,
source vocabularies, and its concept identifier. The values of TS, STT, and ISPREF reflect the default order of precedence of vocabulary sources and term types in MRRANK.RRF.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
LAT
|
Language of Term
|
|
TS
|
Term status
|
|
LUI
|
Unique identifier for term
|
|
STT
|
String type
|
|
SUI
|
Unique identifier for string
|
|
ISPREF
|
Atom status - preferred (Y) or not (N) for this string within this concept
|
|
AUI
|
Unique identifier for atom
|
|
SAUI
|
Source asserted atom identifier [optional]
|
|
SCUI
|
Source asserted concept identifier [optional]
|
|
SDUI
|
Source asserted descriptor identifier [optional]
|
|
SAB
|
Source abbreviation
|
|
TTY
|
Term type in source
|
|
CODE
|
"Most useful" source asserted identifier (if the source
vocabulary has more than one ) or a Metathesaurus-generated source entry identifier
(if the source vocabulary has none) (optional - present of the UI is an AUI)
|
|
STR
|
String
|
|
SRL
|
Source Restriction Level
|
|
SUPPRESS
|
Suppressible flag - N or Y, Y indicate that the string may lack face validity or otherwise be problematic in many applications.
|
|
CVT
|
Content view flag [not yet
in use]
|
Sample Records
C0001175|ENG|P|L0001175|VO|S0010340|Y|A0019182||M0000245|D000163|MSH|PM|D000163|
Acquired Immunodeficiency
Syndromes|0|N||
C0001175|ENG|S|L0001842|PF|S0011877|N|A2878223|103840012|62479008||SNOMEDCT|PT|62479008|AIDS|4|N||
C0001175|ENG|P|L0001175|VC|S0354232|Y|A2922342|103845019|62479008||SNOMEDCT|SY|62479008|
Acquired immunodeficiency syndrome|4|Y||
C0001175|FRE|P|L0162173|PF|S0226654|Y|A0248753||||INS|MH|d000163|SIDA|3|N||
C0001175|RUS|P|L0904943|PF|S1108760|Y|A1165232||||RUS|MH|D000163|SPID|3|N||
2.7.1.3.5 Simple Concept and Atom
Attributes (File = MRSAT.RRF)
There is exactly one row in this table for each concept, atom,
or relationship attribute that does not have a sub-element structure.
All Metathesaurus concepts and a minority of Metathesaurus
relationships have entries in this file. This file includes all source
vocabulary attributes that do not fit into other categories.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept (if UI is a
relationship identifier, this will be CUI1 for that relationship)
|
|
LUI
|
Unique identifier for term (optional - present for atom attributes,
but not for relationship attributes)
|
|
SUI
|
Unique identifier for string
(optional - present for atom attributes, but not for relationship attributes)
|
|
METAUI
|
Metathesaurus atom identifier (will have
a leading A) or Metathesaurus relationship identifier (will have a leading R) or blank if it is a
concept attribute.
|
|
STYPE
|
The name of the column in MRCONSO.RRF or MRREL.RRF that contains
the identifier to which the attribute is attached, e.g., SAUI, SCUI, SRUI, CODE, CUI, AUI.
Many attributes currently shown as linked to Metathesaurus AUIs will be linked to one of
the source vocabulary identifiers as vocabularies that were added to the Metathesaurus
prior to the development of the RRF are updated and brought into complete alignment with
the RRF.
|
|
CODE
|
"Most useful" source asserted identifier
(if the source vocabulary contains more than one) or a
Metathesaurus-generated source entry identifier (if the source
vocabulary has none) (optional --
present if UI is an AUI)
|
|
ATUI
|
Unique identifier for
attribute
|
|
SATUI
|
Source asserted attribute
identifier (optional - present if it exists)
|
|
ATN
|
Attribute name. Possible values appear in
MRDOC.RRF and are described in Appendix B.2,
|
|
SAB
|
Abbreviation of the source of the
attribute. Possible values appear in MRSAB.RRF and are listed in Appendix
B.4
|
|
ATV
|
Attribute value described under specific
attribute name in Appendix B.2. A few attribute values exceed 1,000 characters. Many of the abbreviations used in attribute values are explained in MRDOC.RRF and included in Appendix B.3.
|
|
SUPPRESS
|
Suppressible flag
|
|
CVF
|
Content view flag [not yet
in use]
|
Sample Records
C0001175|L0001175|S0010339|A0019180|AUI|D000163|AT15797077||FX|MSH|AIDS Dementia Complex|N||
C0001175|L0001175|S0354232|A2922342|SAUI|62479008|AT34794876||DESCRIPTIONSTATUS|SNOMEDCT|0|N||
C0001175|L2810384|S3645548|A3814219|SCUI|62479008|AT33494582||CTV3ID|SNOMEDCT|XE0RX|N||
C0001175|L2810384|S3645548|A3814219|SCUI|62479008|AT33652930|\ISPRIMITIVE|SNOMEDCT|0|N||
C0001175|||R19334287|SRUI||AT37098279||REFINABILITY|SNOMEDCT|1|N||
2.7.1.3.6 Definitions (File = MRDEF.RRF)
There is exactly one row in this file for each definition in the Metathesaurus. A definition is an attribute of an atom (an
occurrence of a string in a source vocabulary). A few approach 3,000 characters in
length.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
AUI
|
Unique identifier for atom
|
|
ATUI
|
Unique identifier for
attribute
|
|
SATUI
|
Source asserted attribute identifier [optional-present if it exists]
|
|
SAB
|
Abbreviation of the source of the definition
|
|
DEF
|
Definition
|
|
SUPPRESS
|
Suppressible flag
|
|
CVF
|
Content fiew flag [not yet
in use]
|
Sample Records
|
C0001175|A0019180|AT15060425||MSH|An
acquired defect of cellular immunity associated with infection by the
human
immunodeficiency virus (HIV), a CD4-positive
T-lymphocyte count under 200 cells/microliter or less than 14% of total
lymphocytes,
and increased susceptibility to opportunistic
infections and malignant neoplasms. Clinical manifestations also
include emaciation
(wasting) and dementia. These elements reflect
criteria for AIDS as defined by the CDC in 1993.|N||
C0001175|A0021048|AT14042185||CSP|one or more indicator diseases,
depending on laboratory evidence of HIV infection (CDC);
late phase of HIV infection characterized by
marked suppression of immune function resulting in opportunistic
infections, neoplasms, and other systemic symptoms (NIAID).|N||
C0001175|A0021055|AT18420297||PDQ|Acquired immunodeficiency syndrome.
An acquired defect in immune system function caused
by human immunodeficiency virus 1 (HIV-1). AIDS is
associated with increased susceptibility to certain cancers and to
opportunistic
infections, which are infections that occur rarely
except in individuals with weak immune systems.|N||
|
2.7.1.3.7 Semantic Types (File =
MRSTY.RRF)
There is exactly one row in this file for each Semantic Type
assigned
to each concept. All Metathesaurus concepts have at least one entry in
this
file. Many have more than one entry. The TUI, STN, and STY are
all direct links to the UMLS Semantic Network (Section 3).
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept
|
|
TUI
|
Unique identifier of Semantic Type
|
|
STN
|
Semantic Type tree number
|
|
STY
|
Semantic Type. The valid values are defined
in the Semantic Network.
|
|
ATUI
|
Unique identifier for
attribute
|
|
CVF
|
Content view flag [not yet
in use]
|
Sample Record
C0001175|T047|B2.2.1.2.1|Disease or Syndrome|AT17683839||
2.7.1.3.8.a Locators (File = MRLO.RRF)
Note: NLM intends to
eliminate this file from the Metathesaurus with the 2004AB version. Some of the information is
outdated and some is duplicative of information contained in other Metathesaurus files, and some is easily
obtained from other publicly available sources, e.g., PubMed.
Selected information sources in which atoms from particular
source vocabularies were detected.
There is one row in this table for each atom identified as
appearing in each of a selected set of machine-readable information
sources.
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept
|
|
AUI
|
Unique identifier for atom
|
|
ISN
|
Name of information source or database in
which concept appears
|
|
FR
|
Frequency count of number of occurrences of
concept in the information source (optional)
|
|
UN
|
Meaning of frequency (optional)
|
|
SUI
|
Unique identifier of string if name used in
information source appears in MRCONSO.RRF (optional)
|
|
SNA
|
Actual name that occurs in the information
source if not otherwise present in the Metathesaurus (optional)
|
|
SOUI
|
Unique identifier of record in which the
concept appears in source (optional)
|
|
CVF
|
Content view flag [not yet
in use]
|
2.7.1.3.8.b History (File =
MRHIST.RRF)
This file tracks source-asserted history information. It currently includes SNOMED CT history only.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
SOURCEUI
|
Source asserted unique identifier
|
|
SAB
|
Source abbreviation
|
|
SVER
|
Release date or version number of a source
|
|
CHANGETYPE
|
Source asserted code for type of change
|
|
CHANGEKEY
|
CONCEPSTATUS (if history relates to a SNOMED CT concept) or
DESCRIPTIONSTATUS (if history relates to a SNOMED CT atom)
|
|
CHANGEVAL
|
CONCEPSTATUS value or DESCRIPTIONSTATUS value after the change took place
[NOTE: the change may have affected something other than the status value)
|
|
REASON
|
Explanation of change if present
|
|
CVF
|
Content view flaf [not yet in use]
|
Sample Records
C0000294|108821000|SNOMEDCT|20001101|0|CONCEPTSTATUS|0|||
C0000294|108821000|SNOMEDCT|20020731|2|CONCEPTSTATUS|0|FULLYSPECIFIEDNAME CHANGE||
C0000294|1185494016|SNOMEDCT|20020731|0|DESCRIPTIONSTATUS|0|||
C0000294|1461100014|SNOMEDCT|20030131|0|DESCRIPTIONSTATUS|0|||
2.7.1.3.9 Related Concepts (File =
MRREL.RRF)
There is one row in this table for each relationship between concepts or atoms known to the Metathesaurus, with the following
exceptions found in other files: co-occurrences found in MRCOC.RRF, and pair-wise mapping relationships between two source
vocabularies found in MRMAP.RRF and MRSMAP.RRF
Note that for asymmetrical relationships there is one row for
each direction of the relationship. Note also the direction of REL -
the relationship which the SECOND concept or atom (with Concept Unique
Identifier CUI2 and Atom Unique Identifier AUI2) HAS TO the FIRST concept or atom (with Concept Unique
Identifier CUI1 and Atom Unique Identifier AUI1).
|
Col.
|
Description
|
|
CUI1
|
Unique identifier of first concept
|
|
AUI1
|
Unique identifier for
first atom
|
|
STYPE1
|
The name of the column in MRCONSO.RRF that contains the
identifier used for the first concept or first atom in source of the relationship.
|
|
REL
|
Relationship of second concept or atom to first concept or atom
|
|
CUI2
|
Unique identifier of second concept
|
|
AUI2
|
Unique identifier for
second atom
|
|
STYPE2
|
The name of the column in MRCONSO.RRF that contains the
identifier used for the second concept or second atom in the source of the relationship.
|
|
RELA
|
Additional (more specific) relationship label (optional)
|
|
RUI
|
Unique identifier for
relationship
|
|
SRUI
|
Source asserted relationship identifier, if present
|
|
SAB
|
Abbreviation of the source of relationship
|
|
SL
|
Source of relationship labels
|
|
RG
|
Relationship or role group; an identifier that links
semantically connected relationships with the same CUI1 and AUI1 values
|
|
DIR
|
Source asserted directionality flag, Y indicates that this is the direction of the
relationship in its source; N indicates that it is not; a blank indicates that it is not important or has not yet been determined.
|
|
MG
|
Machine generated and unverified indicator (optional)
|
|
SUPPRESS
|
Suppressible flag
|
|
CVF
|
Content view flag [not yet in use]
|
Sample Records
2.7.1.3.10 Co-occurring Concepts (File = MRCOC.RRF)
This file includes statistical aggregations of co-occurrences
of meanings in external data sources. These exist at the AUI level.
There are two rows in this table for each pair of atoms that co-occur
in each information source represented: one for each direction of the
relationship. (Note that the COA data may be different for each
direction of the relationship.) Many Metathesaurus concepts have no
entries in this file. Due to the very large number of co-occurrence
relationships, they are distributed in a separate file.
|
Col.
|
Description
|
|
CUI1
|
Unique identifier of first concept
|
|
AUI1
|
Unique identifier of first
atom
|
|
CUI2
|
Unique identifier of second concept or not present
Note: Where CUI2 is not present and COT is LQ (MeSh topical qualifier),
the count of citations of CUI1 with no MeSH qualifiers is reported in
COF.
|
|
AUI2
|
Unique identifier of
second atom
|
|
SAB
|
Abbreviation of the Source of co-occurrence
information
|
|
COT
|
Type of co-occurrence
|
|
COF
|
Frequency of co-occurrence, if applicable
|
|
COA
|
Attributes of co-occurrence, if applicable
|
|
CVF
|
Content view flag [not yet in use]
|
Sample Records
Co-occurrences are concepts that occur together in the same
"entries" in
some information source. The relationships represented here are
obtained from
machine-manipulation of the information source. Co-occurrence
relationships may exist between similar concepts (e.g., "Atrial
Fibrillation" and "Arrhythmia") or between very different concepts that
nevertheless have some important connection in the field of biomedicine
(e.g., "Atrial Fibrillation" and "Digoxin"), or between a primary
concept and a qualifier e.g., "Lithotripsy" and "instrumentation". A
co-occurrence relationship can exist between two concepts that have no
other apparent relationship, although the frequency of such
co-occurrences will be small.
In the current Metathesaurus, there are three sources of
co-occurrence data: MEDLINE, AI/RHEUM, and CCPSS. From MEDLINE,
co-occurrence data was computed for concepts that were designated as
principal or main points in the same journal article i.e., the
co-occurrence counts do not include articles in which either or both of
the concepts were present and indexed in MEDLINE but not designated as
main points. (A concept is considered to be a main point if the * is
attached to the main heading or any of its subheadings.)
Two overall frequencies of MEDLINE co-occurrence are provided:
one for recent MEDLINE data (MED) and one for MEDLINE data from a
preceding block of years (MBD); see SOC
for
date ranges in the current edition. Separate counts are provided for
the frequencies with which the first concept was qualified by different
MeSH qualifiers or by no qualifier at all when it co-occurred with the
second concept. There are separate entries for each direction of the
co-occurrence relationship. The related subheading occurrence
information in each entry belongs to the first concept in the entry and
is therefore different for each direction of the relationship.
In addition to the specific qualifier information associated
with two co-occurring
concepts, this element also includes in entries with LQ and LQB values
for
type of co-occurrence, totals for the number of times each main concept
was
qualified by a specific subheading or by no subheading.
The AI/RHEUM co-occurrence data represent the co-occurrence of
diseases and findings in the AI/RHEUM knowledge base, i.e., the
diseases that co-occur with a particular finding and the findings that
co-occur with a particular disease. Each disease/finding pair can
co-occur only
once in the AI/RHEUM knowledge base.
In CCPSS, the co-occurrence data is extracted from patient
records and includes problem-problem co-occurrences within a patient
record as well as problem-modifier co-occurrences.
2.7.1.3.11 "Computable" Hierarchies (File = MRHIER.RRF)
This file contains one row for each hierarchy or context in which each atom appears. If a source
vocabulary does not contain hierarchies, its atoms will have no rows in this file. If a source
vocabulary is multi-hierarchical (allows the same atom to appear in more than one hierarchy), some of
its atoms will have more than one row in this file. MRHIER.RRF provides a complete and compact
representation of all hierarchies present in all Metathesaurus source vocabularies. Hierarchical
displays can be computed by combining data in this file with data in MRCONSO.RRF. The “distance-1
relationships”, i.e., immediate parent, immediate child, and sibling relationships, represented in
MRHIER.RRF also appear in MRREL.RRF. Most of the hierarchical relationships in MRHIER.RRF
(excluding some sibling relationships) also appear in a much larger, “pre-computed” format in
MRCXT.RRF (Section 2.7.1.3.12). NLM plans to phase out MRCXT.RRF (which has reached an unwieldy size)
in favor of providing users with tools that generate hierarchical displays based on MRHIER.RRF and
MRCONSO.RRF.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
AUI
|
Unique identifier for atom
|
|
CXN
|
Context number (e.g., 1,2,3)
|
|
PAUI
|
Unique identifier of atom's immediate parent within this context
|
|
SAB
|
Source of atom (and therefore of hierarchical context)
|
|
RELA
|
Relationship of atom to its immediate parent
|
|
PTR
|
Path to the top or “root” of the hierarchical context
from this atom, represented as a list of AUIs, separated by periods (.) The first one in the
list is top of the hierarchy; the last one in the list is the immediate parent of the atom,
which also appears as the value of PAUI.
|
|
HCD
|
Source asserted hierarchical number or code for this atom
in this context
|
|
CVF
|
Content view flag [not yet in use]
|
Sample Records
C0001175|A2878223|1|A3316611|SNOMEDCT|isa|A3684559.A2880798.A339606.A3287869.A3316611|||
C0001175|A2878223|2|A3512124|SNOMEDCT|isa|A3684559.A2880798.A3398606.A3287869.A3512124|||
C0001175|A2878223|3|A3696836|SNOMEDCT|isa|A3684559.A2880798.A3398606.A3399957.A3399109.A3144217.A3696836|||
C0001175|A2878223|4|A3512124|SNOMEDCT|isa|A3684559.A2880798.A3398606.A3399957.A3399109.A3512124|||
C0001175|A2878223|5|A3316611|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3082701.A3316611|||
C0001175|A2878223|6|A2888699|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3082701.A3398847.A3398762.A2888699|||
C0001175|A2878223|7|A3316611|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3287869.A3316611|||
C0001175|A2878223|8|A3512124|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3287869.A3512124|||
C0001175|A2988194|1|A2888699|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3082701.A3398847.A3398762.A2888699|||
To find the specific concept names used in a hierarchy, look up the atom identifiers in the AUI and
PTR data elements in MRCONSO.RRF.
For most source vocabularies, the value of RELA (if present) applies up the hierarchy to the top or
root. In other words, it also applies to the relationship between the atom' s parent and the atom's
grandparent, etc. The two exceptions in this version of the Metathesaurus are GO (Gene Ontology)
and NIC (Nursing Intervention Classification). Except for GO and NIC atoms, the MRHIER rows for an
atom's ancestors (parent, grandparent, etc.) contain no added information except the source asserted
hierarchical number or code (HCD). If this is not of interest, there may be no reason to find
MRHIER rows for an atom's ancestors.
To find an atom's siblings in a specific context, find all MRHIER.RRF rows that share its SAB,
RELA*, and PTR values.
To find an atom's children in a specific context, append a period (.) and the atom's AUI to its
PTR and find all MRHIER.RRF rows with its SAB, RELA*, and the expanded PTR.
*The RELA is needed to retrieve correct siblings and children for University of Washington
Digital Anatomist (UWDA) hierarchies. Some UWDA atoms appear in multiple hierarchies that are
distinguished ONLY by their RELA values.
2.7.1.3.12 Contexts (File = MRCXT.RRF)
This very large file contains pre-computed hierarchical context information (including concept names) intended to facilitate
the display of hierarchies present in UMLS source vocabularies. All of the information in this file (plus additional
sibling relationships) can be computed by joining the MRHIER.RRF file with MRCONSO.RRF. There can be many
rows in this file for each occurrence of an atom in a hierarchy in any of the UMLS source vocabularies - a "context in
this discussion. Many Metathesaurus concepts have many atoms with contexts while others may have none. The number of
rows per context differs depending on the number of ancestor, sibling, or child terms an atom has in that context.
Because some atoms have multiple contexts in the same source, e.g., MeSH, a context number (CXN - e.g., 1,2,3) is used
to identify all members of the same context. The CXNs are not global but are created as required for each atom.
Each distinct context for a single atom can be retrieved with a CUI-AUI-SAB-CXN key. The "distance-1 relationships,
" i.e., the immediate parent, immediate child, and sibling relationships, represented in MRCXT.RRF,
are also present in the MRREL.RRF file.
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept
|
|
SUI
|
Unique identifier for string used in this
context
|
|
AUI
|
Unique identifier for atom that has this context
|
|
SAB
|
Source abbreviation. Allowed values appear in MRSAB.RRF
and are listed in Appendix B.4
|
|
CODE
|
Unique Identifier or code for string in
that source
|
|
CXN
|
The context number (if the atom has multiple contexts)
|
|
CXL
|
Context member label, i.e., ANC for
ancestor of this atom, CCP for the atom itseff, SIB for sibling of this
atom, CHD for child of this atom
|
|
RNK
|
For rows with a CXL value of ANC, the rank
of the ancestors (e.g., a value of 1 denotes the most remote ancestor
in the hierarchy)
|
|
CXS
|
String or concept name for context member
|
|
CUI2
|
Concept identifier of context member
(may be empty if context member is not yet in the Metathesaurus)
|
|
AUI2
|
Atom identifier of context member
|
|
HCD
|
Source hierarchical number or code of context member (if present).
|
|
RELA
|
Additional relationship label providing further
categorization of the CXL, if applicable and known. Valid values listed in Appendix B.3.
|
|
XC
|
A plus(+) sign indicates that the CUI2 for
this row has children in this context. If this field is empty, the CUI2
does not have children in this context
|
|
CVF
|
Content view flag [not yet
in use]
|
Sample Records
C0001175|S1911299|A1855909|ICPC2P|B9001|1|ANC|1|ICPC2-Plus|C1140253|A1861145|||||
C0001175|S1911299|A1855909|ICPC2P|B90001|1|ANC|2|BLOOD/BLOOD FORMING
ORGANS/IMMUNE
MECHANISM|C0847039|A1852564|B||||
C0001175|S1911299|A1855909|ICPC2P|B90001|1|ANC|2|Diagnosis/Diseases
Component|C0497531|A0916974|7||||
C0001175|S1911299|A1855909|ICPC2P|B90001|1|ANC|3|HIV-INFECTION|AIDS|C0497169|A1852069|B90||||
C0001175|S1911299|A1855909|ICPC2P|B90001|1|CCP||Acquired
Immune-Deficiency Syndrome|C0001175|A1855909|B90001||||
2.7.1.3.13 Mappings (File = MRMAP.RRF)
Representations of simple and complex mappings between (1)
concept names or (usually) their surrogates (identifiers or codes) from
one source vocabulary or from the Metathesaurus and (2) concept names
or (usually) their surrogates from another source vocabulary or from
the Metathesaurus. This file can accommodate multiple purpose-specific
mappings between the same source vocabularies and/or conditional rules
for when mappings apply. Source asserted historical mappings
(i.e., mappings between obsolete terms/concepts and current ones) are
included here.
|
Col.
|
Description
|
|
MAPSETCUI
|
Unique identifier for the map set to which
this mapping belongs
|
|
MAPSETSAB
|
Source abbreviation for the map set
|
|
MAPSUBSETID
|
Map subset identifier (optional)
|
|
MAPRANK
|
Order in which mappings in a subset should
be applied (optional)
|
|
FROMUI
|
Mapped_from identifier
(source-id assigned by the Metathesaurus as a simple id for what may be
a complex expression in FROMEXPR) |
|
FROMEXPR
|
Mapped_from expression,
which can be a single identifier or concept name or a complex expresion
involving multiple identifiers or concept names, Boolean opertors,
and/or punctuation
|
|
FROMTYPE
|
Type of mapped_from
expression
|
|
FROMRULE
|
Machine processible rule
for when the mapped_from is valid (optional)
|
|
FROMRES
|
Restriction on when the
mapped_from should be used (optional)
|
|
REL
|
Relationship
|
|
RELA
|
Additional relationship
label (optional)
|
|
TOUI
|
Mapped_to identifier
(target id assigned by the Metathesaurus as a simple id for what may be
a complex expression in TOEXPR)
|
|
TOEXPR
|
Mapped_to expression,
which can be a single identifier or concept name or a complex
expression involving multiple identifiers or concept names, Boolean
operators, and/or punctuation
|
|
TOTYPE
|
Type of mapped _to
espression
|
|
TORULE
|
Machine processible rule
for when the mapped_to is valid (optional)
|
|
TORES
|
Restriction on when the
mapped_to should be used (optional)
|
|
MAPRULE
|
Machine processible rule
for when to apply mapping (optional)
|
|
MAPTYPE
|
Type of mapping
|
|
MAPATN
|
Row level attribute name
associated with this mapping [not yet in use]
|
|
MAPATV
|
Row level attribute value
associated with this mapping [not yet in use]
|
|
CVF
|
Content view flag [not yet
in use]
|
2.7.1.3.14 Simple Mappings (File =
MRSMAP.RRF)
A simpler representation of most of the mappings in MRMAP.RRF.
This file is provided to serve applications which do not require the
full richness of the MRMAP.RRF data structure. It does not include
entries for mappings that have MAPSUBSETID and MAPRANK values in
MRMAP.RRF.
|
Col.
|
Description
|
|
MAPSETCUI
|
Unique identifier for the map set
|
|
MAPSETSAB
|
Source abbreviation for the map set
|
|
FROMEXPR
|
Mapped_from expression
|
|
FROMTYPE
|
Type of mapped_from expression
|
|
REL
|
Relationship
|
|
RELA
|
Additional relationship
label
|
|
TOEXPR
|
Mapped_to expression
|
|
TOTYPE
|
Type of mapped_to
expression |
|
CVF
|
Content view flag [not yet
in use]
|
2.7.1.3.15 Source Information
(File=MRSAB.RRF)
The UMLS Metathesaurus has "versionless" or "root" Source
Abbreviations (SABs) in the data files. MRSAB.RRF connects the
"root" SAB to fully specified version information for the current
release. For example, the released SAB for MeSH is now simply "MSH". In
MRSAB.RRF, you will see a current versioned SAB, e.g.,
MSH2003_2002_10_24. MRSAB.RRF allows all other Metathesaurus files to use
versionless source abbreviations, so that all rows with no data change
between versions remain unchanged. MetamorphoSys can produce files with either the
root or versioned SABs so that either form can be available in custom subsets of the Metathesaurus.
There is one row in this file for every version of every
source in the current Metathesaurus; eventually there will also be
historical informtion with a row for each version of each source that
has appeared in any Metathesaurus relese. Note that the field CURVER
has the value 'Y' to identify the version in this Metathesaurus
release. Future releases of MRSAB.RRF will also contain historical
version information in rows with CURVER value 'N'.
The structure of MRSAB.RRF is as follows:
Field
|
Full Name
|
Description
|
VCUI
|
CUI
|
CUI of the versioned SRC
concept for a source
|
RCUI
|
Root CUI
|
CUI of the root SRC concept
for a source
|
VSAB
|
Versioned Source Abbreviation
|
The versioned source
abbreviation for a source, e.g., MSH2003_2002_10_24
|
RSAB
|
Root Source Abbreviation
|
The root source abbreviation
for a source e.g MSH
|
SON
|
Official Name
|
The official
name for a source
|
SF
|
Source Family
|
The Source Family for a source
|
SVER
|
Version
|
The source version, e.g., 2001
|
MSTART
|
Meta Start Date
|
The date a source became
active, e.g., 2001_04_03 |
MEND
|
Meta End Date
|
The date a source ceased to be
active, e.g., 2001_05_10
|
IMETA
|
Meta Insert Version
|
The version of the
Metathesaurus a source first appeared, e.g., 2001AB
|
RMETA
|
Meta Remove Version
|
The version of the
Metathesaurus a source was removed, e.g., 2001AC
|
SLC
|
Source License Contact
|
The source license contact
information
|
SCC
|
Source Content Contact
|
The source content contact
information
|
SRL
|
Source Restriction Level
|
0,1,2,3,4 - explained in the License Agreement.
|
TFR
|
Term Frequency
|
The number of terms for this
source in MRCONSO.RRF, e.g., 12343 |
CFR
|
CUI Frequency
|
The number of CUIs associated
with this source, e.g., 10234
|
CXTY
|
Context Type
|
The type of context (per section
2.3.2) from the UMLS documentation |
TTYL
|
Term Type List
|
Term type list from source,
e.g., MH,EN,PM,TQ |
ATNL
|
Attribute Name List
|
The attribute name list (from
MRSAT.RRF), e.g., MUI,RN,TH,... |
LAT
|
Language
|
The language
of the terms in the source
|
CENC
|
Character Encoding
|
Character set as specified by
the IANA official names for character assignments
http://www.iana.org/assignments/character-sets
|
CURVER
|
Current Version
|
A Y or N flag indicating
whether or not this row corresponds to the current version of the named
source |
SABIN
|
Source in Subset
|
A Y or N flag indicating
whether or not this row is represented in the current MetamorphoSys
subset. Initially always Y where CURVER is Y, but later is
recomputed by MetamorphoSys. |
SSN
|
Source short name
|
The short name of a source
as used by the NLM Knowledge Source Server.
|
SCIT
|
Source citation
|
Citation information for a
source. This is intented to replace the SOS attributes in the SRC
concepts.
|
2.7.1.3.16 Concept Name Ranking
(File=MRRANK.RRF)
There is exactly one row for each concept name type from each
Metathesaurus source vocabulary (each SAB-TTY combination). The RANK
and SUPPRES values in the distributed file are those used in Metathesaurus
production. Users are free to change these values to suit their needs
and preferences, then change the naming precedence and suppressibility
by using MetamorphoSys to create a customized Metathesaurus.
|
Col.
|
Description
|
|
RANK
|
Numeric order of precedence, higher value
wins
|
|
SAB
|
Abbreviation for source vocabulary
|
|
TTY
|
Abbreviation for concept name type in
source vocabulary
|
|
SUPPRES
|
Flag indicating that whether all atoms
(concept names) with this SAB and TTY have been identified as lacking
face validity or general utility.
|
Sample Records
0210|AIR|SY|N|
0209|ULT|PT|N|
0208|CPT|PT|N|
2.7.1.3.17 Ambiguous Term Identifiers
(File = AMBIGLUI.RRF)
There is exactly one row in this table for each Lexical Unique
Identifier (LUI) that is linked to multiple Concept Unique Identifiers
(CUIs); i.e., it identifies those lexical variant classes which have
multiple meanings in the Metathesaurus.
In the Metathesaurus, the LUI links all strings within the
English language that are identified as lexical variants of each other
by the luinorm program found in the UMLS SPECIALIST Lexicon and Tools
(see Sections
4). LUIs are assigned irrespective of the meaning of each
string. This table may be useful to system developers who wish to make
use of the lexical programs in their applications.
|
Col.
|
Description
|
|
LUI
|
Lexical Unique Identifier
|
|
CUIS
|
List of Concept Unique Identifiers to which
the LUI is linked, separated by commas, e.g., C#######,C#######
|
2.7.1.3.18 Ambiguous String Identifiers
(File=AMBIGSUI.RRF)
There is exactly one row in this file for each string
identifier (SUI)
that is linked to multiple concept identifiers (CUI).
This file is now in the META directory (use to be in CHANGE directory).
In the Metathesaurus, there is only one SUI for each unique string
within each language, even if the string has
multiple meanings. This table is only of interest to system developers
who make use of the SUI in their applications or in local data files.
|
Col.
|
Description
|
|
SUI
|
String Unique Identifier
|
|
CUIS
|
List of Concept Unique Identifiers to which
the SUI is linked, separated by commas, e.g., C#######,C#######
|
2.7.1.3.19 Metathesaurus Change Files
There are six files or relations that identify key differences
between entries in the previous and the current edition of the
Metathesaurus. Developers can use these special files to determine
whether there have
been changes that affect their applications.
The usefulness of individual files will depend on how data
from the Metathesaurus have been linked or incorporated in a particular
application.
Each relation or named table of data has a fixed number of
columns and variable number of rows. A column is a sequence of all the
values in a given data element. A row contains the values for two or
more data elements for one entry. The values for the different data
elements in the row are separated by vertical bars (|). Each row ends
with a vertical bar and line termination.
2.7.1.3.19.1 Deleted Concepts
(File=CHANGE/DELETEDCUI.RRF)
There is exactly one row in this table for each reviewed
concept that was
present in the previous Metathesaurus and is not present in the 2003AC
Metathesaurus.
Cols.
PCUI Concept Unique Identifier in the previous Metathesaurus
PSTR Preferred name of this concept in the previous Metathesaurus
2.7.1.3.19.2 Merged Concepts
(File=CHANGE/MERGEDCUI.RRF)
There is exactly one row in this table for each released
concept in the previous Metathesaurus (CUI1) that was merged into
another released concept from the previous Metathesaurus (CUI2). When
this merge occurs, the first CUI (CUI1) was retired; this table shows
the CUI (CUI2) for the merged concept in this Metathesaurus.
Entries in this file represent concepts pairs that were
considered to have
different meanings in the previous edition, but which are now
identified as
synonyms
Cols.
PCUI1 Concept Unique Identifier in the previous Metathesaurus
CUI Concept Unique Identifier in this Metathesaurus in
format C#######
2.7.1.3.19.3 Deleted Terms
(File=CHANGE/DELETEDLUI.RRF)
There is exactly one row in this table for each Lexical Unique
Identifier (LUI) that appeared in the previous Metathesaurus, but does
not appear in this Metathesaurus.
Metathesaurus Lexical Unique Identifiers (LUIs) are assigned
by the luinorm program, part of LVG program in the UMLS SPECIALIST
Lexicon and Tools; see Section
4.
These entries represent the cases where LUIs identified by the
previous release's luinorm program, when used to identify lexical
variants in the previous Metathesaurus, are no longer found with this
release's luinorm on this release's Metathesaurus. This does not
necessarily imply the deletion of a string or a concept from the
Metathesaurus.
Cols.
PLUI Lexical Unique Identifier in the previous Metathesaurus
PSTR Preferred Name of Term in the previous Metathesaurus
2.7.1.3.19.4 Merged Terms
(File=CHANGE/MERGEDLUI.RRF)
There is exactly one row in this file for each case in which
strings had different Lexical Unique Identifiers (LUIs) in the previous
Metathesaurus yet share the same LUI in this Metathesaurus; a LUI
present in the previous Metathesaurus is therefore absent from this
Metathesaurus.
Metathesaurus Lexical Unique Identifiers (LUIs) are assigned
by the luinorm program, part of the LVG program in the UMLS SPECIALIST
Lexicon and Tools; see Section 4.
These entries represent the cases where separate lexical
variants as identified by the previous release's luinorm program
version are a single lexical variant as identified by this release's
luinorm.
Cols.
PLUI Lexical Unique Identifier in the previous Metathesaurus but not
present in this Metathesaurus
LUI Lexical Unique Identifier into which it was merged in this
Metathesaurus
2.7.1.3.19.5 Deleted Strings
(File=CHANGE/DELETEDSUI.RRF)
There is exactly one row in this file for each string in each
language that was present in a entry in the previous Metathesaurus and
does not appear in this Metathesaurus.
Note that this does not necessarily imply the deletion of a
term (LUI) or a concept (CUI) from the Metathesaurus. A string deleted
in one language may still appear in the Metathesaurus in another
language.
Cols.
PSUI String Unique Identifier in previous Metathesaurus that is not
present in this Metathesaurus
PSTR Preferred name of term in previous Metathesaurus that is not
present in this Metathesaurus
2.7.1.3.19.6 Retired CUI Mapping
(File=MRCUI.RRF)
There are one or more rows in this file for each Concept
Unique Identifier (CUI) that existed in any prior release but is not
present in the current release. The file includes mappings to current
CUIs as synonymous or to one or more related current CUI where
possible. If a synonymous mapping cannot be found, other relationships
between the CUIs can be created. These relationships can be Broader
(RB), Narrower (RN) or Other
Related (RO). Some CUIs may be mapped to more than one other CUI using
these relationships.
CUIs may be retired when (1) two released concepts are found
to be synonyms and so are merged, retiring one CUI; (2) when the
concept no longer appears in any source vocabulary and is not 'rescued'
by NLM; or (3) where the concept is an acknowledged error in a source
vocabulary or determined to be a Metathesaurus production error.
See Sections 2.7.1.3.19, 1 through 5 for files
of changes from the last release only, without mappings.
|
Col.
|
Description
|
|
CUI1
|
Unique identifier for first concept --
Retired CUI - was present in some prior
release, but is currently missing
|
|
VER
|
The last release version in which CUI1 was
a valid CUI
|
|
REL
|
Relationship
|
|
RELA
|
Relationship attribute
|
|
MAPREASON
|
Reason for mapping
|
|
CUI2
|
Unique identifier for second concept -- The
current CUI that CUI1 most closely maps
to.
|
|
MAPIN
|
Mapping in current subset. Values of
Y or N or null, used with
MetamorphoSys to indicate excluded CUIs
|
Sample Records:
2.7.1.3.20 Word Index (File =
MRXW_BAQ.RRF,
MRXW_DAN.RRF, MRXW_DUT.RRF, MRXW_ENG.RRF, MRXW_FIN.RRF, MRXW_FRE.RRF,
MRXW_GER.RRF, MRXW_HEB.RRF,
MRXW_HUN.RRF, MRXW_ITA.MP, MRXW_NOR.RRF, MRXW_POR.RRF, MRXW_RUS.RRF,
MRXW_SPA.RRF, MRXW_SWE.RRF)
There is one row in these tables for each word found in each
unique Metathesaurus string (ignoring upper-lower case). All
Metathesaurus entries have entries in the word index. The entries are
sorted in ASCII order.
|
Col.
|
Description
|
|
LAT
|
Abbreviation of language of the string in
which the word appears
|
|
WD
|
Word in lowercase
|
|
CUI
|
Concept identifier
|
|
LUI
|
Term identifier
|
|
SUI
|
String identifier
|
Sample Records from MRXW_ENG.RRF
ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemias|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|
ENG|cells|C0002871|L0376533|S0500659|
Sample Records from MRXW_FRE.RRF
FRE|ANEMIE|C0002871|L0162748|S0227229|
2.7.1.3.21 Normalized Word Index
(File=MRXNW_ENG.RRF)
There is one row in this table for each normalized word found
in each unique English-language Metathesaurus string. All
English-language Metathesaurus entries have entries in the normalized
word index. There are no normalized string indexes for other languages
in this edition of the Metathesaurus.
|
Col.
|
Description
|
|
LAT
|
Abbreviation of language of the string in
which the word appears (always ENG in this edition of the Metathesaurus)
|
|
NWD
|
Normalized word in lowercase (described in Section
2.6.2.1)
|
|
CUI
|
Concept identifier
|
|
LUI
|
Term identifier
|
|
SUI
|
String identifier
|
Sample Records
ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemia|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|
ENG|cell|C0002871|L0376533|S0500659|
2.7.1.3.22 Normalized String Index
(File=MRXNS_ENG.RRF)
There is one row in this table for each normalized string
found
in each unique English-language Metathesaurus string (ignoring
upper-lower case). All English-language Metathesaurus entries have
entries in
the normalized string index. There are no normalized word indexes for
other languages in this edition of the Metathesaurus.
|
Col.
|
Description
|
|
LAT
|
Abbreviation of language of the string
(always ENG in this edition of the Metathesaurus)
|
|
NSTR
|
Normalized string in lowercase (described
in Section
2.6.3.1)
|
|
CUI
|
Concept identifier
|
|
LUI
|
Term identifier
|
|
SUI
|
String identifier
|
Sample Records
ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anaemia unspecified|C0002871|L0696700|S0803315|
ENG|anemia|C0002871|L0002871|S0013787|
2.7.2 METATHESAURUS ORIGINAL RELEASE
FORMAT (ORF)
Note: The preferred and more complete format is described above in
Section 2.7.1, the Metathesaurus Rich Release Format (RRF).
All files except MRRANK are sorted by row.
2.7.2.1. Data Files
The data in each Metathesaurus entry may be represented in
more than 20 different "relations" or files. These files correspond to
the four logical groups of data elements described in Section
2.2 - 2.5 and
the indexes described in Section 2.6 as follows:
Metathesaurus Concept Names and their sources (2.7.2.2)
=
MRCON, MRSO
Attributes (2.7.2.3) MRSAT, MRDEF, MRSTY, MRLO
Relationships between Different Concept Names (2.7.2.4)
=
MRREL, MRCOC, MRATX, MRCXT
Data about the Metathesaurus (2.7.2.5)=MRSAB, MRRANK, AMBIG.LUI,
AMBIG.SUI, DELETED.CUI, MERGED.CUI, DELETED.LUI, MERGED.LUI,
DELETED.SUI, MRCUI
Indexes (2.7.2.6)
= MRXW.BAQ, MRXW.DAN, MRXW.DUT, MRXW.ENG, MRXW.FIN, MRXW.FRE,
MRXW.GER, MRXW.HEB, MRXW.HUN, MRXW.ITA, MRXW.NOR, MRXW.POR, MRXW.RUS,
MRXW.SPA, MRXW.SWE, MRXNW.ENG, MRXNS.ENG
The AMBIG* files provide a convenient way to identify all
Metathesaurus terms and strings that have more than one meaning in
Metathesaurus source vocabularies.
2.7.2.2 Columns and Rows
Each relation or named table of data values has by definition
a fixed number
of columns; the number of rows depends on the content of a particular
version
of the Metathesaurus.
A column is a sequence of all the values in a given data
element or logical subelement. In general, columns for longer variable
length data elements will appear to the right of columns for shorter
and/or fixed length data elements. The information for all columns in
the ORF files is described in Appendix B.1.2,
ORF Columns or Data Elements
A row contains the values for one or more data elements or
logical subelements for one Metathesaurus entry. Depending on the
nature of the data elements involved, each Metathesaurus entry may have
one or more rows in a
given file. The values for the different data elements or logical
subelements represented in the row are separated by vertical bars (|).
If an optional element is blank, the vertical bars are still used to
maintain the correct positioning of the subsequent elements. Each row
is terminated by a
vertical bar and line termination.
2.7.2.3 Descriptions of Each File
The descriptions of the files appear in the following order:
- Key data about the Metathesaurus: Files, Columns or data elements
- Concept names and their vocabulary sources
- Attributes
- Relationships
- Other data about the Metathesaurus
- Indexes
2.7.2.3.1 Files (File =
MRFILES)
There is exactly one row in this file for each physical
segment of the files in the relational format. The columns or data
elements in the file are:
|
Col.
|
Description
|
|
FIL
|
Physical FILENAME
|
|
DES
|
Descriptive Name
|
|
FMT
|
Comma separated list of COL, in order
|
|
CLS
|
# of COLUMNS
|
|
RWS
|
# of ROWS
|
|
BTS
|
Size in bytes in this format (ISO/PC or
Unix)
|
Sample Records
MRATX|Associated
Expressions|CUI,SAB,REL,ATX|4|7295|442571|
MRCOC|Co-occurring
Concepts|CUI1,CUI2,SAB,COT,COF,COA|6|9061980|343331578|
MRCOLS|Attribute
Relation|COL,DES,REF,MIN,AV,MAX,FIL, DTY|8|115|5728|
2.7.2.3.2 Data Elements (File =
MRCOLS)
There is exactly one row in this file for each column or data
element
in each file in the relational format.
|
Col.
|
Description
|
|
COL
|
Column or data element name
|
|
DES
|
Descriptive Name
|
|
REF
|
Documentation Section Number
|
|
MIN
|
Minimum Length, Characters
|
|
AV
|
Average Length
|
|
MAX
|
Maximum Length, Characters
|
|
FIL
|
Physical FILENAME in which this field occurs
|
|
DTY
|
SQL-92 data type for this column
|
Sample Records
ATN|Attribute name||2|3.15|7|MRSAT|varchar(20)|
ATV|Attribute value||1|9.71|3634|MRSAT|varchar(4000)|
ATX|Associated expression||5|35.89|242|MRATX|varchar(300)|
2.7.2.3.3 Concept Names (File =
MRCON)
There is exactly one row in this file for each meaning of each
unique
string in the Metathesaurus, i.e., there is exactly one row for each
unique CUI-SUI combination in the Metathesaurus. Any difference in
upper-lower case, word order, etc. creates a different unique string.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
LAT
|
Language of Term
|
|
TS
|
Term status
|
|
LUI
|
Unique identifier for term
|
|
STT
|
String type
|
|
SUI
|
Unique identifier for string
|
|
STR
|
String
|
|
LRL
|
Least Restriction Level
|
Sample Records
C0002871|ENG|P|L0002871|PF|S0013742|Anemia|0|
C0002871|ENG|P|L0002871|VP|S0013787|Anemias|0|
C0002871|ENG|P|L0002871|VC|S0352787|ANEMIA|0|
C0002871|ENG|P|L0002871|VC|S0414880|anemia|0|
C0002871|ENG|P|L0002871|VO|S0470197|Anemia, NOS|3|
C0002871|ENG|S|L0280031|PF|S0803242|Anaemia|3|
2.7.2.3.4 Vocabulary Sources (File = MRSO)
The vocabulary source(s) for a concept, term, and string.
There is exactly one row in this file for each source of each
string in the Metathesaurus. All Metathesaurus concepts have entries in
this file.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
LUI
|
Unique identifier for term
|
|
SUI
|
Unique identifier for string
|
|
SAB
|
Source abbreviation. Allowed values are
listed in Appendix
B, Section B.2
|
|
TTY
|
Term type in that source. Allowed values
are listed in Appendix
B.,
Section B.4.
|
|
CODE
|
Unique Identifier or code for string in
that source.
|
|
SRL
|
Source Restriction Level
|
Sample Records
C0002871|L0002871|S0013742|CCS|MD|4.1|0|
C0002871|L0002871|S0013742|ICPCPAE|PT|B82005|3|
C0002871|L0002871|S0013742|LCH|PT|U000235|0|
C0002871|L0002871|S0013742|MSH|MH|D000740|0|
C0002871|L0002871|S0013742|MTH|PT|U000161|0|
C0002871|L0002871|S0013742|MTH|PT|U000164|0|
C0002871|L0002871|S0013742|PSY|PT|02450|3|
C0002871|L0002871|S0013742|RCDAE|PT|XM05A|3|
The information in MRSO can be used in combination with MRCON
to determine whether a
particular concept,
name, or code is present in a particular source, and in what form it
appears.
Note: In the RRF, the concept name and vocabulary source
information appear in a single file, MRCONSO.RRF.
2.7.2.3.5 Simple Concept and String
Attributes (File = MRSAT)
There is exactly one row in this table for each concept, term
and string attribute that does not have a sub-element structure. All
Metathesaurus concepts have entries in this file.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
LUI
|
Unique identifier for term (optional)
|
|
SUI
|
Unique identifier for string
(optional)
|
|
CODE
|
Unique identifier or code for entry in the
source of the attribute, e.g., for all attributes derived from MeSH,
the MeSH unique identifier (optional).
|
|
ATN
|
Attribute name. Possible values are all
described in Appendix
B,
Section B.1.2.
|
|
SAB
|
Abbreviation of the source of the
attribute. Allowed values are listed in Appendix
B,
Section B.2.)
|
|
ATV
|
Attribute value described under specific
attribute name in Appendix
B,
Section B.1.2. A
few attribute values exceed 1,000 characters.
|
Sample Records
C0002871|L0002871|S0013742|D000740|MMR|MSH|19960610|
C0002871|L0002871|S0013742|D000740|MN|MSH|C15.378.71|
C0002871|L0002871|S0013742|D000740|TH|MSH|POPLINE
(1994)|
C0002871|L0002871|S0414880|208/04453|SOS|PDQ|secondary related
condition|
C0002871|L0002871|S0470197|DC-10010|SIC|SNMI|285.9|
2.7.2.3.6 Definitions (File = MRDEF)
There is exactly one row in this file for each definition in
the Metathesaurus. A few definitions approach 3,000 characters in
length.
|
Col.
|
Description
|
|
CUI
|
Unique identifier for concept
|
|
SAB
|
Abbreviation of the source of the definition
|
|
DEF
|
Definition
|
Sample Records
C0002871|MSH|A reduction in the number of
circulating erythrocytes or in the quantity of hemoglobin.|
2.7.2.3.7 Semantic Types (File = MRSTY)
There is exactly one row in this file for each semantic type
assigned
to each concept. All Metathesaurus concepts have at least one entry in
this
file. Many have more than one entry.
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept
|
|
TUI
|
Unique identifier of Semantic type
|
|
STY
|
Semantic type. The valid values are defined
in the Semantic Network.
|
Sample Record
C0002871|T047|Disease or Syndrome|
2.7.2.3.8 Locators (File = MRLO)
Note: NLM intends to eliminate this file from the
Metathesaurus effective with the 2004AB version. Some of the
information is outdated and some is duplicative of information
contained in other Metathesaurus files.
There is one row in this table for each Metathesaurus concept
identified as appearing in each of a selected set of a machine-readable
information sources. If the same concept is identified as appearing in
more than one of these information sources (e.g., MEDLINE, DXPLAIN,
QMR), it will have multiple rows in this table.
These columns are described in the appendix:
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept
|
|
ISN
|
Name of information source or database in
which concept appears
|
|
FR
|
Frequency count of number of occurrences of
concept in the information source (optional)
|
|
UN
|
Meaning of frequency (optional)
|
|
SUI
|
Unique identifier of string if name used in
information source appears in MRCON (optional)
|
|
SNA
|
Actual name that occurs in the information
source if not otherwise present in the Metathesaurus (optional)
|
|
SOUI
|
Unique identifier of record in which the
concept appears in source (optional)
|
2.7.2.3.9 Related Concepts (File = MRREL)
There is one row in this table for each relationship between
Metathesaurus concepts known to the Metathesaurus, with the following
exceptions found in other files: co-occurrences found in MRCOC; Locator
information in MRLO; and Associated Expressions found in MRATX.
Note that for asymmetrical relationships there is one row for
each direction of the relationship. Note also the direction of REL -
the relationship which the SECOND concept (with Concept Unique
Identifier CUI2) HAS TO the FIRST concept (with Concept Unique
Identifier CUI1).
|
Col.
|
Description
|
|
CUI1
|
Unique identifier of first concept
|
|
REL
|
Relationship of SECOND to first concept
|
|
CUI2
|
Unique identifier of second concept
|
|
RELA
|
Relationship attribute
|
|
SAB
|
Abbreviation of the source of relationship
|
|
SL
|
Source of relationship labels
|
|
MG
|
Machine-generated and unverified indicator
(optional)
|
Sample Records
C0002871|CHD|C0002891|isa|MSH|MSH|| Anemia, Neonatal (C0002891) has CHILD REL and isa RELA to Anemia (C0002871)
C0002871|RB|C0221016||MTH|MTH|| [Red blood cell disorder, NOS (C0221016) has broader REL to Anemia (C0002871)]
C0002871|RL|C0002886|mapped_to|SNMI|SNMI|| [Anemia, Macrocytic (C0002886) has like relationship to Anemia (C0002871)]
C0002871|RO|C0002886|clinically_associated_with|CCPSS|CCPSS||
[Megaloblastic anemia due to folate deficiency, NOS (C0151482) has clinically_associated_with relationship to Anemia (C0002871)]
2.7.2.3.10 Co-occurring Concepts (File = MRCOC)
There are two rows in this table for each pair of concepts
that co-occur in each information source represented one for each
direction of the relationship. (Note that the COA
data may be different for each direction of the relationship). Many
Metathesaurus concepts
have no entries in this file. Due to the very large number of
co-occurrence relationships, they are distributed in a separate file.
|
Col.
|
Description
|
|
CUI1
|
Unique identifier of first concept
|
|
CUI2
|
Unique identifier of second concept
Note: Where COT is MeSH topical qualifier (LQ) and CUI2 is not present,
the count of citations of CUI1 with no MeSH qualifiers is reported.
|
|
SOC
|
Abbreviation of the Source of co-occurrence
information if applicable
|
|
COT
|
Type of co-occurrence
|
|
COF
|
Frequency of co-occurrence, if applicable
|
|
COA
|
Attributes of co-occurrence, if applicable
|
Sample Records
C0002871||MED|LQ|1||
C0002871|C0000530|MBD|L|2|CI=1,EN=1,ME=1,PA=1|
C0002871|C0000727|MBD|L|1|BL=1,ET=1|
C0002871|C0000737|MBD|L|1|ET=1|
C0002871|C0000772|MBD|L|2|CN=2|
Co-occurrences are concepts that occur together in the same
"entries" in
some information source. The relationships represented here are
obtained from
machine-manipulation of the information source. Co-occurrence
relationships may exist between similar concepts (e.g., "Atrial
Fibrillation" and "Arrhythmia") or between very different concepts that
nevertheless have some important connection in the field of biomedicine
(e.g., "Atrial Fibrillation" and "Digoxin"), or between a primary
concept and a qualifier e.g., "Lithotripsy" and "instrumentation". A
co-occurrence relationship can exist between two concepts that have no
other apparent relationship, although the frequency of such
co-occurrences will be small.
In the current Metathesaurus, there are three sources of
co-occurrence data: MEDLINE, AI/RHEUM, and CCPSS. From MEDLINE,
co-occurrence data was computed for concepts that were designated as
principal or main points in the same journal article i.e., the
co-occurrence counts do not include articles in which either or both of
the concepts were present and indexed in MEDLINE but not designated as
main points. (A concept is considered to be a main point if the * is
attached to the main heading or any of its subheadings.)
Two overall frequencies of MEDLINE co-occurrence are provided:
one for recent MEDLINE data (MED) and one for MEDLINE data from a
preceding block of years (MBD); see SOC
for
date ranges in the current edition. Separate counts are provided for
the frequencies with which the first concept was qualified by different
MeSH qualifiers or by no qualifier at all when it co-occurred with the
second concept. There are separate entries for each direction of the
co-occurrence relationship. The related subheading occurrence
information in each entry belongs to the first concept in the entry and
is therefore different for each direction of the relationship.
In addition to the specific qualifier information associated
with two co-occurring
concepts, this element also includes in entries with LQ and LQB values
for
type of co-occurrence, totals for the number of times each main concept
was
qualified by a specific subheading or by no subheading.
The AI/RHEUM co-occurrence data represent the co-occurrence of
diseases and findings in the AI/RHEUM knowledge base, i.e., the
diseases that co-occur with a particular finding and the findings that
co-occur with a particular disease. Each disease/finding pair can
co-occur only
once in the AI/RHEUM knowledge base.
In CCPSS, the co-occurrence data is extracted from patient
records and includes problem-problem co-occurrences within a patient
record as well as problem-modifier co-occurrences.
2.7.2.3.11 Concept contexts (File =
MRCXT)
There are rows in this file for each occurrence of a concept
in a hierarchy in any of the UMLS source vocabularies - a "context" in
this discussion. Many Metathesaurus concepts have multiple contexts
while others may have none. The number of rows per context differs
depending on the number of ancestor, sibling, or child terms the
concept has in that context. Because some concepts have multiple
contexts in the same source (e.g., MeSH), a context number (CXN - e.g.,
1, 2, 3) is used to identify all members of the same context. The CXNs
are not global but are created as required for each concept. Since some
concepts have multiple contexts in the same vocabulary with the same
SUI, each distinct context can be retrieved with a CUI-SUI-SAB-CXN key.
The "distance-1 relationships," i.e., the immediate parent, immediate
child, and sibling relationships, represented in this file are also
present in the MRREL file.
(Note: The RELA was incorrectly called REL in versions before
2001.)
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept
|
|
SUI
|
Unique identifier for string used in this
context
|
|
SAB
|
Source abbreviation. Allowed values are listed in Appendix B.4
|
|
CODE
|
Unique Identifier or code for string in
that source.
|
|
CXN
|
The context number (to distinguish multiple
contexts in the same source with the same SUI).
|
|
CXL
|
Context member label, i.e., ANC for
ancestor of this concept, CCP for concept, SIB for sibling of this
concept, CHD for child of this concept.
|
|
RNK
|
For rows with a CXL value of ANC, the rank
of the ancestors (e.g., a value of 1 denotes the most remote ancestor
in the hierarchy)
|
|
CXS
|
String for context member.
|
|
CUI2
|
Unique concept identifier of context member
(may be empty if context member is not yet in the Metathesaurus).
|
|
HCD
|
Hierarchical number or code of context member in this
source (optional).
|
|
RELA
|
Relationship attribute providing further
categorization of the CXL, if applicable and known. Allowed values are listed in Appendix B.3.
|
|
XC
|
A plus(+) sign indicates that the CUI2 for
this row has children in this context. If this field is empty, the CUI2
does not have children in this context.
|
Sample Records
C0002871|S0013742|MSH|D000740|1|ANC|1|MeSH|C0220876||||
C0002871|S0013742|MSH|D000740|1|ANC|2|Diseases (MeSH
Category)|C0012674|C|||
C0002871|S0013742|MSH|D000740|1|ANC|3|Hemic and Lymphatic
Diseases|C0018981|C15|||
C0002871|S0013742|MSH|D000740|1|ANC|4|Hematologic
Diseases|C0018939|C15.378|isa||
C0002871|S0013742|MSH|D000740|1|CCP||Anemia|C0002871|C15.378.71|isa|+|
C0002871|S0013742|MSH|D000740|1|CHD||Anemia,
Aplastic|C0002874|C15.378.71.85|isa|+|
C0002871|S0013742|MSH|D000740|1|SIB||Blood Protein
Disorders|C0005830|C15.378.147|isa|+|
C0002871|S0013742|MSH|D000740|1|CHD||Anemia,
Hemolytic|C0002878|C15.378.71.141|isa|+|
2.7.2.3.12 Associated Expressions (File
= MRATX)
There is one row in this table for each vocabulary expression
(i.e., combination of terms from a specific Metathesaurus source
vocabulary) identified as having a relationship to a concept in the
Metathesaurus. The majority of Metathesaurus entries have no entries in
this table.
|
Col.
|
Description
|
|
CUI
|
Unique identifier of concept to which the
expression is related
|
|
SAB
|
Abbreviation of source of terms in
expression. Allowed values are listed in Appendix B, Section B.1)
|
|
REL
|
Relationship of meaning of expression to
main concept
|
|
ATX
|
Associated expression
|
Sample Records
C0001207|MSH|S|<Acromegaly> AND
<Gigantism>|
C0001296|LCH|U|<Insurance>/<Statistics>|
C0001355|MSH|S|<Kidney Failure, Acute> AND
<Kidney Papillary Necrosis>|
2.7.2.3.13 Source Information
(File=MRSAB)
The UMLS Metathesaurus has
"versionless"
or "root" Source Abbreviations (SABs) in the data files. MRSAB
connects the "root" SAB to fully specified
version information for the current release. For example, the released
SAB for MeSH is now simply "MSH". In MRSAB,
you will find the current versioned SAB, e.g., MSH2003_2002_10_24.
MetamorphoSys can produce
files with either the root or versioned SABs so that either form can be
utilized by a user.
There is one row in this file for every version of every source in the
current Metathesaurus; when complete, there will also be historical
information with a row for each version of each source that has
appeared in any Metathesaurus release. Note that the field
CURVER has the value 'Y' to identify the version in this Metathesaurus
release. Future releases of MRSAB will also contain historical
version information in rows with CURVER value 'N'.
MRSAB allows all other Metathesaurus files to use versionless source
abbreviations, so that rows with no data change between versions
also remain unchanged.
The full structure of MRSAB is as follows:
Field
|
Full Name
|
Description
|
VCUI
|
CUI
|
CUI of the versioned SRC
concept for a source
|
RCUI
|
Root CUI
|
CUI of the root SRC concept
for a source
|
VSAB
|
Versioned Source Abbreviation
|
The versioned source
abbreviation for a source e.g. MSH2003_2002_10_24
|
RSAB
|
Root Source Abbreviation
|
The root source abbreviation
for a source e.g MSH
|
SON
|
Official Name
|
The official
name for a source
|
SF
|
Source Family
|
The Source Family for a source
|
SVER
|
Version
|
The source version e.g. 2001
|
VSTART
|
Valid Start Date For A Source
|
Source's start date for valid
use,
e.g. 2004_04_03 |
VEND
|
Valid End Date For A Source
|
Source's end date for valid
use,
e.g. 2003_05_10
|
IMETA
|
Meta Insert Version
|
The version of the
Metathesaurus a source first appeared, e.g.2001AB
|
RMETA
|
Meta Remove Version
|
The version of the
Metathesaurus a source was removed, e.g.2001AC
|
SLC
|
Source License Contact
|
The source license contact
information
|
SCC
|
Source Content Contact
|
The source content contact
information
|
SRL
|
Source Restriction Level
|
0,1,2,3
|
TFR
|
Term Frequency
|
The number of terms for this
source in MRCON/MRSO, e.g., 12343 |
CFR
|
CUI Frequency
|
The number of CUIs associated
with this source, e.g. 10234
|
CXTY
|
Context Type
|
The type of context (per section
2.3.2) from the UMLS documentation |
TTYL
|
Term Type List
|
Term type list from source ,
e.g. MH,EN,PM,TQ |
ATNL
|
Attribute Name List
|
The attribute name list (from
MRSAT), e.g., MUI,RN,TH,... |
LAT
|
Language
|
The language
of the source
|
CENC
|
Character Encoding
|
Character set as specified by
the IANA official names for character assignments
http://www.iana.org/assignments/character-sets
|
CURVER
|
Current Version
|
A Y or N flag indicating
whether or not this row corresponds to the current version of the named
source |
SABIN
|
Source in Subset
|
A Y or N flag indicating
whether or not this row is represented in the current MetamorphoSys
subset. Initially always Y where CURVER is Y, but later is
recomputed by MetamorphoSys. |
2.7.2.3.14 Concept Name Ranking
(File=MRRANK)
There is exactly one row for each concept name type from each
Metathesaurus source vocabulary (each SAB-TTY combination). The RANK
and SUPPRES
values in the distributed file are those used in Metathesaurus
production. Users are free to change these values to suit their needs
and preferences, then change the naming precedence and suppressibility
(TS in MRCON)
by using MetamorphoSys to create a customized Metathesaurus.
|
Col.
|
Description
|
|
RANK
|
Numeric order of precedence, higher value
wins
|
|
SAB
|
Abbreviation for source vocabulary
|
|
TTY
|
Abbreviation for concept name type in
source vocabulary
|
|
SUPPRESS
|
Flag indicating that this SAB and TTY will
create a TS=s MRCON entry; see TS
|
Sample Records
0210|AIR|SY|N|
0209|ULT|PT|N|
0208|CPT|PT|N|
2.7.2.3.15 Ambiguous Term Identifiers
(File = AMBIG.LUI)
There is exactly one row in this table for each Lexical Unique
Identifier (LUI) that is linked to multiple Concept Unique Identifiers
(CUIs); i.e., it identifies those lexical variant classes which have
multiple meanings in the Metathesaurus.
In the Metathesaurus, the LUI links all strings within the
English language that are identified as lexical variants of each other
by the luinorm program found in the UMLS SPECIALIST Lexicon and Tools
(see section 4). LUIs are assigned irrespective of the meaning of each
string. This table may be useful to system developers who wish to make
use of the lexical programs in their applications.
|
Col.
|
Description
|
|
LUI
|
Lexical Unique Identifier
|
|
CUIs
|
List of Concept Unique Identifiers to which
the LUI is linked, separated by commas, e.g., C#######,C#######
|
2.7.2.3.16 Ambiguous String Identifiers
(File=AMBIG.SUI)
There is exactly one row in this file for each string
identifier (SUI)
that is linked to multiple concept identifiers (CUI).
This file is now in the META directory (use to be in CHANGE directory).
In the Metathesaurus, there is only one SUI for each unique string
within each language, even if the string has
multiple meanings. This table is only of interest to system developers
who make use of the SUI in their applications or in local data files.
|
Col.
|
Description
|
|
SUI
|
String Unique Identifier
|
|
CUIs
|
List of Concept Unique Identifiers to which
the SUI is linked, separated by commas, e.g., C#######,C#######
|
2.7.2.3.17 Metathesaurus Change Files
There are six files or relations that identify key differences
between entries in the previous and the current edition of the
Metathesaurus.
Developers can use these special files to determine whether there have
been changes that affect their applications.
The usefulness of individual files will depend on how data
from the Metathesaurus have been linked or incorporated in a particular
application.
Each relation or named table of data has a fixed number of
columns and variable number of rows. A column is a sequence of all the
values in a given data element. A row contains the values for two or
more data elements for one entry. The values for the different data
elements in the row are separated by vertical bars (|). Each row ends
with a vertical bar and line termination.
2.7.2.3.17.1 Deleted Concepts
(File=DELETED.CUI)
There is exactly one row in this table for each reviewed
concept that was
present in the previous Metathesaurus and is not present in the current
Metathesaurus.
Cols.
CUI Concept Unique Identifier in the previous Metathesaurus
STR Preferred name of this concept in the previous Metathesaurus
2.7.2.3.17.2 Merged Concepts
(File=MERGED.CUI)
There is exactly one row in this table for each released
concept in the previous Metathesaurus (CUI1) that was merged into
another released concept from the previous Metathesaurus (CUI2). When
this merge occurs, the first CUI (CUI1) was retired; this table shows
the CUI (CUI2) for the merged concept in this Metathesaurus.
Entries in this file represent concepts pairs that were
considered to have
different meanings in the previous edition, but which are now
identified as
synonyms
Cols.
CUI1 Concept Unique Identifier in the previous Metathesaurus
CUI2 Concept Unique Identifier in this Metathesaurus in
format C#######
2.7.2.3.17.3 Deleted Terms
(File=DELETED.LUI)
There is exactly one row in this table for each Lexical Unique
Identifier (LUI) that appeared in the previous version of Metathesaurus, but does
not appear in this version.
Metathesaurus Lexical Unique Identifiers (LUIs) are assigned
by the luinorm program, part of LVG program in the UMLS SPECIALIST
Lexicon and Tools; see Sections 4 in this manual.
These entries represent the cases where LUIs identified by the
previous release's luinorm program, when used to identify lexical
variants in the previous Metathesaurus, are no longer found with this
release's luinorm on this release's Metathesaurus. This does not
necessarily imply the deletion of a string or a concept from the
Metathesaurus.
Cols.
LUI Lexical Unique Identifier in the previous Metathesaurus
STR Preferred Name of Term in the previous Metathesaurus
2.7.2.3.17.4 Merged Terms
(File=MERGED.LUI)
There is exactly one row in this file for each case in which
strings had different Lexical Unique Identifiers (LUIs) in the previous
Metathesaurus yet share the same LUI in this Metathesaurus; a LUI
present in the previous Metathesaurus is therefore absent from this
Metathesaurus.
Metathesaurus Lexical Unique Identifiers (LUIs) are assigned
by the luinorm program, part of the LVG program in the UMLS SPECIALIST
Lexicon and Tools; see Sections 4 and 4.8 in this manual.
These entries represent the cases where separate lexical
variants as identified by the previous release's luinorm program
version are a single lexical variant as identified by this release's
luinorm.
Cols.
LUI Lexical Unique Identifier in the previous Metathesaurus but not
present in this Metathesaurus
LUI Lexical Unique Identifier into which it was merged in this
Metathesaurus
2.7.2.3.17.5 Deleted Strings
(File=DELETED.SUI)
There is exactly one row in this file for each string in each
language that was present in a entry in the previous Metathesaurus and
does not appear in this Metathesaurus.
Note that this does not necessarily imply the deletion of a
term (LUI) or a concept (CUI) from the Metathesaurus. A string deleted
in one language may still appear in the Metathesaurus in another
language.
Cols.
SUI String Unique Identifier in previous Metathesaurus that is not
present in this Metathesaurus
LAT Three character abbreviation of language of string that has been
deleted.
STR Preferred name of term in previous Metathesaurus that is not
present in this Metathesaurus.
2.7.2.3.17.6 Retired CUI Mapping
(File=MRCUI)
There are one or more rows in this file for each Concept
Unique Identifier (CUI) that existed in any prior release but is not
present in the current release. The file includes mappings to current
CUIs as synonymous or to one or more related current CUI where
possible. If a synonymous mapping can not be found, other relationships
between the CUIs can be created. These relationships can be Broader
(RB), Narrower (RN) or Other
Related (RO). Some CUIs may be mapped to more than one other CUI using
these relationships.
CUIs may be retired when (1) two released concepts are found
to be synonyms and so are merged, retiring one CUI; (2) when the
concept no longer appears in any source vocabulary and is not 'rescued'
by NLM; or (3) where the concept is an acknowledged error in a source
vocabulary or determined to be a Metathesaurus production error.
See the META/CHANGE files, especially MERGED.CUI and
DELETED.CUI, for the
changes from the last release only, without mappings.
|
Col.
|
Description
|
|
CUI1
|
Retired CUI - was present in some prior
release, but is currently missing
|
|
VER
|
The last release version in which CUI1 was
a valid CUI
|
|
CREL
|
The relationship CUI2 has to CUI1, if
present, or DEL if CUI2 is not present. Valid values currently are
SY,DEL, RO, RN, RB
|
|
CUI2
|
The current CUI that CUI1 most closely maps
to.
|
|
MAPIN
|
Values of Y or N or null, used with
MetamorphoSys to indicate excluded CUIs
|
Sample Records:
C0079138|2001|DEL||Y|
C0079138|2001|RO|C0037440|Y|
C0079151|1993|DEL||N|
C0079158|1997|SY|C0009081||
C0079167|1997|SY|C0010042|N|
2.7.2.3.18 Word Index (File = MRXW.BAQ,
MRXW.DAN, MRXW.DUT, MRXW.ENG, MRXW.FIN, MRXW.FRE, MRXW.GER, MRXW.HEB,
MRXW.HUN, MRXW.ITA, MRXW.NOR, MRXW.POR, MRXW.RUS, MRXW.SPA, MRXW.SWE)
There is one row in these tables for each word found in each
unique Metathesaurus string (ignoring upper-lower case). All
Metathesaurus entries have entries in the word index. The entries are
sorted in ASCII order.
|
Col.
|
Description
|
|
LAT
|
Abbreviation of language of the string in
which the word appears
|
|
WD
|
Word in lowercase
|
|
CUI
|
Concept identifier
|
|
LUI
|
Term identifier
|
|
SUI
|
String identifier
|
Sample Records from MRXW.ENG
ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemias|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|
ENG|cells|C0002871|L0376533|S0500659|
Sample Records from MRXW.FRE
FRE|ANEMIE|C0002871|L0162748|S0227229|
2.7.2.3.19 Normalized Word Index
(File=MRXNW.ENG)
There is one row in this table for each normalized word found
in each unique English-language Metathesaurus string. All
English-language Metathesaurus entries have entries in the normalized
word index. There are no normalized string indexes for other languages
in this edition of the Metathesaurus.
|
Col.
|
Description
|
|
LAT
|
Abbreviation of language of the string in
which the word appears (always ENG in this edition of the Metathesaurus)
|
|
NWD
|
Normalized word in lowercase (described in Section
2.6.2.1)
|
|
CUI
|
Concept identifier
|
|
LUI
|
Term identifier
|
|
SUI
|
String identifier
|
Sample Records
ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemia|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|
ENG|cell|C0002871|L0376533|S0500659|
2.7.2.3.20 Normalized String Index
(File=MRXNS.ENG)
There is one row in this table for each normalized string
found
in each unique English-language Metathesaurus string (ignoring
upper-lower case). All English-language Metathesaurus entries have
entries in
the normalized string index. There are no normalized word indexes for
other languages in this edition of the Metathesaurus.
|
Col.
|
Description
|
|
LAT
|
Abbreviation of language of the string
(always ENG in this edition of the Metathesaurus)
|
|
NSTR
|
Normalized string in lowercase (described
in Section
2.6.3.1)
|
|
CUI
|
Concept identifier
|
|
LUI
|
Term identifier
|
|
SUI
|
String identifier
|
Sample Records
ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anaemia unspecified|C0002871|L0696700|S0803315|
ENG|anemia|C0002871|L0002871|S0013787|
|