No changes were made to the release format of CHV or the processing for the 2012AA Metathesaurus. Updates to the CHV content primarily consisted of deletion and/or correction of misspelled terms.
In the previous version, CHV term values with potential spelling errors were identified by setting MRCONSO.SUPPRESS="E". There are 20 atoms remaining with SUPPRESS="E" however these will be reviewed in future versions to determine if they should be changed to "N".
The CHV distribution includes the following. These files, along with additional information can be accessed at http://www.consumerhealthvocab.org
The CUI and UMLS Preferred Name are not explicitly represented in the Metathesaurus, however they are processed to help discover synonymy between CHV terminology and other UMLS sources. The UMLS preferred flag column is not processed. CHV terms with disparaged = "yes" are not included in the Metathesarus at this time. Attributes with value "\N" are not included in the Metathesaurus release.
The following is a list of elements available for CHV in the tab-separated data file.
# | Field | Description |
Representation |
---|---|---|---|
1 | CUI |
UMLS CUI for this term. (string) |
Used to discover synonymy between CHV terms and terms from other UMLS sources |
2 | CHV_term |
Term as found in text. (string) |
MRCONSO.STR |
3 | UMLS_preferred_name |
Preferred name for UMLS CUI. (string) |
Used to discover synonymy between CHV terms and terms from other UMLS sources |
4 | CHV_preferred_name | Preferred name as defined in Consumer Health Vocabulary. (string) |
Not directly processed |
5 | Explanation |
Explanation or definition for the term, if available. (string) |
MRDEF.DEF |
6 | CHV_preferred |
A boolean variable (yes/no) indicating whether this is the preferred CHV name for the concept. (string) |
Used to determine TTY. CHV Terms with preferred flag = "yes" are assigned TTY="PT". CHV Terms with preferred flag = "no" are assigned TTY = "SY". |
7 | UMLS_preferred |
A boolean variable (yes/no) indicating whether this is the preferred CHV name for the concept. (string) |
Not processed |
8 | Disparaged |
A value of "yes" in the CHV data indicates a misspelling or other abnormality. For this version, disparaged terms were not processed, so all cases of ATN="DISPARAGED" have ATV="no". (string) |
CHV Terms with Disparaged = "yes" are not included in the Metathesaurus at this time |
9 | Frequency_score |
Estimate of thedifficulty of a term, i.e. how likely it is that an average reader will be familiar with or understand a given term. Based on the frequency in several large text corpora. A higher score indicates that a term is more familiar (less difficult). (real number) | MRSAT.ATN = "FREQUENCY" |
10 | Context_score |
Context based estimate of the difficulty of the term. (real number) | MRSAT.ATN = "CONTEXT_SCORE" |
11 | CUI_score |
Estimate of the difficulty of the concept (CUI) derived from determining how closely related the concept is to known examples of easy and difficult concepts. (real number) | MRSAT.ATN = "CUI_SCORE" |
12 | Combo_score |
Combination of frequency, context and CUI scores. Also uses whether or not the term is a top word. (real number) | MRSAT.ATN = "COMBO_SCORE" |
13 | Combo_score_no_top_words |
A slight modification to Combo_score that ignores top word criterion. The top word list is a list of easy words from the Dale-Chall list. (real number) | MRSAT.ATN = "COMBO_SCORE_NO_TOP_WORDS" |
14 | CHV_string_id |
Unique identifier for each entry in the CHV. (string) |
MRCONSO.SAUI |
15 | CHV_concept_id |
Unique identifier for every concept in the CHV. (string) |
MRCONSO.CODE, MRCONSO.SCUI |