Unified Medical Language System® (UMLS®)
MetamorphoSys is the UMLS installation wizard and Metathesaurus customization tool included in each UMLS release. MetamorphoSys installs one or more of the UMLS Knowledge Sources and enables you to create customized Metathesaurus subsets. Each UMLS release includes an updated version of MetamorphoSys. Use only the version of MetamorphoSys distributed with the most current UMLS release.
- System Requirements
- Starting MetamorphoSys
- MetamorphoSys Screens and Tabs
- Customizing the Metathesaurus Tabs
- MetamorphoSys Metathesaurus Filters
- Options for Advanced Users
- Completing a Subset
MetamorphoSys has been tested on the following operating systems:
- Windows 7 Enterprise, Vista, XP
- Macintosh OS X (Leopard, Snow Leopard)
*Linux note: All releases are fully tested under Ubuntu Desktop. Other Linux releases may work equally well.
- Minimum 30 GB of free hard disk space.
- Minimum 2 GB of RAM, preferably more. Smaller memory size will cause virtual memory paging with exponentially increased processing time.
- CPU speed of at least 2 GHz for reasonable installation times.
Use a high-speed internet connection to download the files from the UMLS Web site. Download and extract all UMLS data and zip files to the same directory.
All file sizes are checked at installation. The Validate Distribution option allows users to verify the integrity of the downloaded .nlm files. It compares special MD5 signatures to those in the release MD5 and CHK files, and is a useful first step for trouble-shooting when problems occur with a UMLS installation.
After the UMLS release is downloaded, it must include these files*, in the same directory:
- mmsys.zip (zipped MetamorphoSys application)
- 2010ab-1-meta.nlm (compressed Metathesaurus data)
- 2010ab-2-meta.nlm (compressed Metathesaurus data)
- 2010ab-otherks.nlm (compressed Semantic Network and SPECIALIST Lexicon)
You must unzip mmsys.zip to the same directory as the other downloaded files. After you unzip the mmsys.zip file, the MetamorphoSys application will begin.
*File names for the 2010AB UMLS Release
Open a terminal window and change to the directory of the downloaded files. Type the appropriate command for your platform:
- ./run.bat (Windows)
- ./run_mac.sh (or click on the run_mac.command file)
MetamorphoSys screens and tabs will lead you through the process of installing the UMLS Knowledge Sources and customizing the Metathesaurus.
Welcome to MetamorphoSys
Select one of the following:
- Install UMLS — to install one or more UMLS Knowledge Sources.
- Browse my Subset — to open the RRF subset browser.
File menu option:
Advanced menu options:
Customize My Subset (customizes an existing Metathesaurus subset)
Build MRCXT (opens the MRCXT Builder)
Copy Database Load Scripts to Hard Drive (copies database load scripts to local storage)
Validate Distribution confirms that all UMLS files have transferred correctly and are complete. The process takes approximately 30 minutes and produces a log file (validation.log) and an alert box that displays a statement regarding the validity of the files. Use Validate Distribution as the first step in troubleshooting when experiencing any malfunctions.
In order to create correct subsets, use the version of MetamorphoSys that matches the version of the Metathesaurus release files being subsetted. Do not use older versions of MetamorphoSys with newer or older release files; use the version of MetamorphoSys included with the release files.
MetamorphoSys creates a top-level destination directory in local storage for the UMLS Knowledge Sources. The directory is named with the UMLS Release version, for example: 2010AB. The following directory structure is created beneath the destination directory, shown below for the 2010AB release:
Install any one, two, or all three Knowledge Sources as follows:
|Semantic Network||NET directory|
|SPECIALIST Lexicon||LEX directory|
The META directory is populated with the Metathesaurus subset files created during installation. Depending on your configuration, some of these files may contain zero bytes.
Use the Browse button to locate source and destination directory locations.
Click OK to proceed with installation. A progress monitor tracks each step of the installation process. If the Metathesaurus is selected, installation will begin after all Metathesaurus options are selected.
To cancel installation at any time, click 'Cancel' at the bottom of the Install UMLS progress screen, or at the bottom of the MetamorphoSys progress window.
Configuration files include all of the options and filters that have been selected. By saving a configuration file users can reproduce a subset exactly. Select New Configuration to create a new subset configuration. Select Open Configuration to open a previously saved configuration file.
License Agreement Notice
The Metathesaurus contains source vocabularies produced by many different copyright holders. The majority of the content of the Metathesaurus is available for use under the basic (and quite open) terms described in the Metathesaurus license agreement.
Some vocabulary producers place additional restrictions on the use of their content as distributed within the Metathesaurus.
Levels of additional restrictions are described in Section 12 of the UMLS Metathesaurus License. Individual vocabularies and their restriction levels are listed in Appendix 1 of the UMLS Metathesaurus License. If you already heve a separate license for one of the source vocabularies, your existing license also applies to that source as distributed within the Metathesaurus. In some cases, you may have to request permission or negotiate a separate license with a vocabulary producer in order to use that vocabulary in a production system. There may be a charge associated with these separate permissions or license agreements.
Click Accept or Do Not Accept after reviewing the UMLS Metathesaurus License.
Select Default Subset
You must select one of four pre-defined default subsets as a starting point:
- Active Subset: excludes "legacy" sources that have not been updated for several years in the UMLS Metathesaurus.
- Level 0: contains vocabulary sources for which no additional license agreements are necessary beyond the UMLS license.
- Level 0 + SNOMED CT: contains all Level 0 sources and SNOMED CT.
- SNOMEDCT + SCTUSX: includes only SNOMED CT and the US Extension to SNOMED CT.
Use the Source List tab to modify your default subset.
We encourage you to suggest additional default subsets for future releases.
Select and complete Option tabs in any order. Note that the selections made in one option tab may affect the display and available choices on other Options tabs.
Select 'Reset' on the menu bar, then select the appropriate Reset command to the default settings for any option.
When you have completed configuring your Metathesaurus subset, go to the menu bar, select 'Done', and then 'Begin Subset'.
You will be prompted to save your configuration. Name your configuration file, which will be stored in the destination META directory. This file documents your configuration choices, and can be used as the starting point for a later customization using the Customize My Subset option on the Welcome screen.
Input Options Tab
This tab allows users to indicate the location of required directories, the configuration file, and the input and output directories.
For the initial installation, NLM Data File Format must be selected.
When customizing an existing subset, use Browse to select its current format of either Original Release Format or Rich Release Format.
Output Options Tab
- Select Output Format
Select either Rich Release Format or Original Release Format. Rich Release Format is the default selection for the initial installation and for customizing an existing subset in the Rich Release Format. Original Release Format is the default for customizing an existing subset in the Original Release Format. Note: You cannot generate a correct Rich Release Format subset from Original Release
- Subset Folder
Indicate where the new subset files should be placed.
- Write Database Load Scripts
Outputs a load script in either Oracle or MySQL format, which you may further optimize or customize. See the Load Scripts page for more information on UMLS load scripts.
- Source Abbreviation Format
Source vocabulary information in the Metathesaurus content can be identified by a versionless, or Root Source Abbreviation (RSAB), or by the longer and more descriptive Versioned Source Abbreviation (VSAB). The default is the RSAB, but you may choose to include the VSABs. For example,
- MSH is the Root Source Abbreviation (RSAB)
- MSH_2003_12_12 is the Versioned Source Abbreviation (VSAB)
- Maximum Field Length
Restrict fields in your output to the maximum field length allowed in your application or database software. Beginning with the 2007AA Release the default value for this field is 3990 characters.
- Remove MTH Only Concepts
Select this option to retain MTH atoms ONLY when they overlap with atoms from other sources in your subset.
- Calculate MD5 Values for Output Files
When this box is checked, the MD5 algorithm generates a mmsys.md5 file in the Metathesaurus subset directory. The information in this file can be used to verify data integrity of the Metathesaurus files (RRF or ORF), and can be useful when troubleshooting problems. The MD5s values appear in the META/mmsys.md5 file. Please note that these MD5s are intended for comparison of different runs and are calculated in a platform independent way, and they ignore differing line terminations. For this reason, native MD5 calculation programs may differ from those in the mmsys.md5 file.
- Add UTF-8 BOM Characters to Output Files
When this box is checked, all output data files are prepended with a byte order mark. This beginning-of-file marker (3 bytes) indicates that the file is encoded as UTF-8.
- Build Browser Index Files
This option creates index files used by the RRF Browser to lok up data. Leaving this box checked is highly recommended. Unchecking this option will cause the RRF Browser to run much slower.
Source List Tab
The Source List tab displays all source vocabularies in the current version of the Metathesaurus. Sources are sorted alphabetically by Source Abbreviation in the default display. At the top of the Source List tab there are two radio buttons:
The highlighted sources reflect the default subset selected earlier in the installation process. You may select or deselect additional sources to include or exclude from your subset. Leave the button set to 'Select Sources to EXCLUDE from subset' in order to highlight sources that will be removed from your customized Metathesaurus subset.
Or you may choose 'Select sources to INCLUDE in subset'. When selected, only the highlighted sources will be included in your local subset.
Note: The highlighted sources do NOT change when you switch between these two options. If a source is highlighted for EXCLUSION from a subset, and you choose Select sources to INCLUDE in subset , that source will now be highlighted for INCLUSION in your subset.
To select or deselect additional rows, hold down the <CTRL> key while making your selection.
You may sort the Source List by Full Source Name, Source Abbreviation, Source Family, Language, or Level (UMLS License Restriction Level). Click on the column header to re-sort the list by that data.
The complete Metathesaurus contains over 150 source vocabularies and in its entirety is an extremely large and unwieldy set of data files. Carefully consider what sources will contribute useful data to your application, and then exclude other sources, to reduce the size of output subsets and to improve application performance.
Consider also that the data from some sources may be incompatible with your intended application. They may contain terms that are recognizable only within the context of a specific source; or they may contain abbreviations that are confusing, or not particularly useful to your application.
Additional information source vocabularies is available under UMLS Source Release Documentation. You may also contact the source providers included in the Appendix to the License Agreement for additional documentation or information.
You may select individual sources to remove based on the Full Source Name or Source Abbreviation. You may take advantage of groups of related vocabularies, called Source Families, to assist in the removal of related sources when one source is selected.
Note, for example, that CPT (the AMA's Physicians' Current Procedural Terminology, CPT4) is also a part of HCPT (the Health Care Financing Administration Common Procedure Coding System, HCPCS). Both vocabularies must be removed to exclude all sources of CPT information.
You may also exclude sources by language, or by license restriction level. To reset source selections and return to the default list, select Reset Sources to Exclude Defaults under Reset on the menu bar.
The Precedence tab displays the default order of precedence of Metathesaurus source and term type combinations as determined by NLM. One string from one English term is designated and labeled as the default preferred name of each concept in the Metathesaurus. Selection of the default preferred name for any Metathesaurus concept is based on an order of precedence of all the types of English strings in all the Metathesaurus source vocabularies. Different types of strings, e.g., preferred terms, cross references, and abbreviations from each vocabulary will have different positions in this order.
The default order of precedence determined by NLM will not be suitable for all applications of the Metathesaurus. MetamorphoSys can be used to change the selection of preferred names to feature terminology from the source vocabularies most appropriate to particular user populations.
You may reorder the ranking of source and term type combinations by cutting and pasting, or dragging and dropping the rows in the Precedence List. Term types from sources that have been excluded on the Source List tab will not be displayed.
The ranking of sources and term types will affect the output subset. In particular, the name of a concept will be determined by the highest ranking term type in that concept.
The Suppressibility tab displays source/term type combinations to be marked as suppressible in the output subset. Term types from sources that have been excluded on the Source List will not display. For a new subset, the initial display highlights default source/term types made suppressible by NLM. You may select or deselect source/term types to be marked as suppressible in your output subsets. When customizing an existing subset, the initial display highlights your suppressibility settings for that subset.
MetamorphoSys Filters allow users to create a custom subset containing a specific group of terms. The Enable/Disable Filter option is listed under the File Menu. When a filter is enabled, its corresponding tab appears on the UMLS Metathesaurus Configuration screen. When a filter is disabled, its tab disappears. An option for specific help information for each filter is displayed on the Help Menu when that filter is selected. The current filters are:
- Attributes Type List
- Content View List
- Languages List
- Relations List
- Sample Meta
- Semantic Types List
- Source Subset List
- Source Term Types List
- UI List
This command allows the user to import filters developed according to the Filter API. Filters cannot be exported or removed from the application, but they can be disabled. A window will pop up with all filters available for import. These filters are found in the METAMSYS/ext directory.
Two simple import filters are provided as examples of custom filtering:
- NosNec (for Testing): To exclude "NOS" or "NEC" strings from the output subset
- OddEven (for Testing): To exclude odd or even numbered CUIs from the output subset
When an import filter is selected, its option tab appears on the Metathesaurus configuration screen.
Opens a configuration window which contains the following user capability:
Auto Select Related Items - If this check-box is selected, there is no prompt when the selected row shares a Source Family or has a Dependent Source. The system automatically selects the Dependent Source rows or the rows with the same Source Family. The default for this flag is false.
Advanced Source List Options
Opens a configuration window which contains the following user capabilities:
Enforce Family Selection
If Enforce Family Selection is selected, you will be prompted to select other sources in the same Source Family.
Enforce Dependent Source Selection
If Enforce Dependent Source Selection is selected, and you select a source in the Dependent Source Associations table, you may select any dependent sources listed. As with Enforce Family Selection, this function exists for deselection of sources as well. The default for this flag is true.
This selection also provides the following capabilities:
- Click the Add button to add Source/Dependent Source relationships to the Dependent Source Associations table.
- Click the Clear button to clear the whole table.
- Click on a specific line(s) and press the Delete button to remove the line(s).
- Click the Source or Dependent Source table header to sort the table.
- Click a table header to do a reverse sort.
- Click the Done button at the bottom of the window to exit the Advanced Options dialog.
Advanced Suppressibility Options (Remove Suppressible Data)
You may specify which of three types of suppressible data to exclude from your customized subsets:
- Source Term Type: groups of terms are marked suppressible by Source/Term Type.
- Editor Assigned: specific terms are marked suppressible by Metathesaurus editors.
- Obsolete: terms identified as Obsolete in their source vocabularies.
Rich Release Format (RRF)
If you select Remove Source Term Type suppressible data, data with a SUPPRESS flag set to Y will be removed.
If you select Remove Editor Assigned suppressible data, data with a SUPPRESS flag set to E will be removed.
If you select Remove Obsolete data, data with a SUPPRESS flag set to O will be removed.
Original Release Format (ORF)
In ORF, all three types of suppressibility are represented by ts=s or ts=p. Selecting only one or two options above will result in a subset that still contains some terms where ts=s or ts=p.
Advanced Semantic Types To Exclude
These options are available when the Semantic Types to Exclude filter has been enabled from the File menu and allow you to set the predicate for concept removal. There are two choices:
- Remove concept with ANY selected Semantic Type - If this option is selected, a concept will be removed if any of its Semantic Types appear on the exclude list.
- Remove concept with ALL selected Semantic Types - If this option is selected, a concept will be removed only if all of its Semantic Types are on the exclude list.
When all options have been explored and you have completed configuring your Metathesaurus subset, select Done from the menu bar, and then Begin Subset. To save your configuration in order to create a subset at a later time, select Save Configuration from the File menu.
To return to Metathesaurus default selections for all of the filter tabs (Input Options, Output Options, Source List, Precedence and Suppressibility) use the Reset Menu. The default selections are those listed in the mmsys.prop.default file in the config folder. The mmsys.prop.sav file contains the properties used in the last run of MetamorphoSys. Please note: the choice of version, Original Release Format or Rich Release Format, will not be reset on the Output Options tab and the Input options tab.
The Install UMLS Metathesaurus progress monitor charts the process through the following steps: Initializing the CUI list; Subsetting Content, Subsetting Indexes, and Final Processes. To stop processing and exit MetamorphoSys at any time, press Cancel at the bottom of the progress monitor. The interrupted process cannot be resumed. The configuration must be recalled (if saved), or recreated (if not saved), and subsetting must be started again.
MetamorphoSys produces an install.log file in the release directory, containing the log of the installation process up to the start of Metathesaurus subsetting. It records which operations were selected, and reports the results of file validations against both CHK and MD5 files. If the downloaded files pass validation, processing continues and subsetting begins. If files fail validation, the install.log is displayed.
When subsetting is complete, progress and error messages and the configuration settings are displayed on the screen and also written to a log file called mmsys.log in the directory containing the subsetted files. The subsetted Metathesaurus files are located in the chosen destination directory.