NLM logo

CHV (Consumer Health Vocabulary) - Source Representation



Summary of Changes:

No changes were made to the release format of CHV or the processing for the 2012AA Metathesaurus.  Updates to the CHV content primarily consisted of deletion and/or correction of misspelled terms. 

In the previous version, CHV term values with potential spelling errors were identified by setting MRCONSO.SUPPRESS="E".   There are 20 atoms remaining with SUPPRESS="E" however these will be reviewed in future versions to determine if they should be changed to "N".


Source-Provided Files: Summary

The CHV distribution includes the following. These files, along with additional information can be accessed at http://www.consumerhealthvocab.org

Documentation:
File
Description
http://www.consumerhealthvocab.org/ source website
ReadMe.pdf
README file
Data:
File
Description
CHV_concepts_terms_flatfile_20110204.tsv Tab-separated data file
Not included

The CUI and UMLS Preferred Name are not explicitly represented in the Metathesaurus, however they are processed to help discover synonymy between CHV terminology and other UMLS sources. The UMLS preferred flag column is not processed. CHV terms with disparaged = "yes" are not included in the Metathesarus at this time. Attributes with value "\N" are not included in the Metathesaurus release.


Source-Provided Files: Details

The following is a list of elements available for CHV in the tab-separated data file.

Notes: 
  • During Metathesaurus source processing, CHV term values with potential spelling errors were identified by comparing words in the CHV term to words in the Specialist Lexicon and to a subset of words in the Metathesaurus (MRXW.ENG).  Term values which had potential spelling errors have MRCONSO.SUPPRESS="E".  It is anticipated that any errors will be corrected in a future update of CHV.
  • All score attributes have a range 0 to 1 (a higher score implies the term is easier). A value of -1 indicates the score could not be estimated.
# Field Description
Representation
1 CUI
UMLS CUI for this term.  (string)
Used to discover synonymy between CHV terms and terms from other UMLS sources
2 CHV_term
Term as found in text.  (string)
MRCONSO.STR
3 UMLS_preferred_name
Preferred name for UMLS CUI.  (string)
Used to discover synonymy between CHV terms and terms from other UMLS sources
4 CHV_preferred_name Preferred name as defined in Consumer Health Vocabulary. (string)
Not directly processed
5 Explanation
Explanation or definition for the term, if available. (string)
MRDEF.DEF
6 CHV_preferred
A boolean variable (yes/no) indicating whether this is the preferred CHV name for the concept.  (string)
Used to determine TTY.  CHV Terms with preferred flag = "yes" are assigned TTY="PT".  CHV Terms with preferred flag = "no" are assigned TTY = "SY".
7 UMLS_preferred
A boolean variable (yes/no) indicating whether this is the preferred CHV name for the concept.  (string)
Not processed
8 Disparaged
A value of "yes" in the CHV data indicates a misspelling or other abnormality.  For this version, disparaged terms were not processed, so all cases of ATN="DISPARAGED" have ATV="no".  (string)
CHV Terms with Disparaged = "yes" are not included in the Metathesaurus at this time
9 Frequency_score
Estimate of thedifficulty of a term, i.e. how likely it is that an average reader will be familiar with or understand a given term.  Based on the frequency in several large text corpora.  A higher score indicates that a term is more familiar (less difficult).   (real number) MRSAT.ATN = "FREQUENCY"
10 Context_score
Context based estimate of the difficulty of the term.  (real number) MRSAT.ATN = "CONTEXT_SCORE"
11 CUI_score
Estimate of the difficulty of the concept (CUI) derived from determining how closely related the concept is to known examples of easy and difficult concepts. (real number) MRSAT.ATN = "CUI_SCORE"
12 Combo_score
Combination of frequency, context and CUI scores.  Also uses whether or not the term is a top word.  (real number) MRSAT.ATN = "COMBO_SCORE"
13 Combo_score_no_top_words
A slight modification to Combo_score that ignores top word criterion.  The top word list is a list of easy words from the Dale-Chall list. (real number) MRSAT.ATN = "COMBO_SCORE_NO_TOP_WORDS"
14 CHV_string_id
Unique identifier for each entry in the CHV.  (string)
MRCONSO.SAUI
15 CHV_concept_id
Unique identifier for every concept in the CHV.  (string)
MRCONSO.CODE, MRCONSO.SCUI