VSAB: CHV2011_02
No changes were made to the release format of CHV or the processing for the 2012AA Metathesaurus. Updates to the CHV content primarily consisted of deletion and/or correction of misspelled terms.
In the previous version, CHV term values with potential spelling errors were identified by setting MRCONSO.SUPPRESS="E". There are 20 atoms remaining with SUPPRESS="E" however these will be reviewed in future versions to determine if they should be changed to "N".
File |
Description |
http://www.consumerhealthvocab.org/ | source website |
---|---|
ReadMe.doc |
README file |
File |
Description |
CHV_concepts_terms_flatfile_20110204.tsv | Tab-separated data file |
---|
Term Type | Origin |
---|---|
PT |
CODE: CHV_concept_id TTY= PT is assigned where CHV_preferred_name="yes" |
SY |
CODE: CHV_concept_id STR: Term SAUI: CHV_string_id SCUI: CHV_concept_id TTY=SY is assigned where CHV_preferred_name="no" |
Note: All score attributes have a range 0 to 1 (a higher score implies the term is easier). A value of -1 indicates the score could not be estimated.
Attribute Name | Description |
Origin |
---|---|---|
COMBO_SCORE |
Combination of frequency, context and CUI scores. Also uses whether or not the term is a top word. (real number) |
Combo_score |
COMBO_SCORE_NO_TOP_WORDS |
A slight modification to Combo_score that ignores top word criterion. The top word list is a list of easy words from the Dale-Chall list. (real number) |
Combo_score_no_top_words |
CONTEXT_SCORE |
Context based estimate of the difficulty of the term. (real number) |
Context_score |
CUI_SCORE |
Estimate of the difficulty of the concept (CUI) derived from determining how closely related the concept is to known examples of easy and difficult concepts. (real number) |
CUI_score |
DISPARAGED | A value of "yes" in the CHV data indicates a misspelling or other abnormality. For this version, disparaged terms were not processed, so all cases of ATN="DISPARAGED" have ATV="no" | Disparaged field (yes/no flag) |
FREQUENCY |
Estimate of thedifficulty of a term, i.e. how likely it is that an average reader will be familiar with or understand a given term. Based on the frequency in several large text corpora. A higher score indicates that a term is more familiar (less difficult). (real number) |
Frequency_score |