Skip Navigation Bar
 

Unified Medical Language System® (UMLS®)

Rich Release Format MySQL Load Script

This document is a guide to loading an RRF subset of the UMLS Metathesaurus into a MySQL database by using the scripts and files created by MetamorphoSys. To generate the necessary files and scripts while creating your subset, select the MySQL database in the “Write Database Load Scripts” section of the "Output Options" tab. You may also generate these files after creating your subset: Select "Advanced" then "Copy Database Load Scripts to Hard Drive" from the MetamorphoSys welcome screen.

Recommended Software Version: MySQL Server 5.5

Creating a Database

In order to load your subset into a MySQL database, you will have to either use an existing database or create a new one. Two important considerations when creating a MySQL database are the default character set and collation settings. We recommend using UTF8 and the utf8_unicode_ci collation setting. For example:

CREATE DATABASE IF NOT EXISTS umls CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Configuring Parameters for Performance

To optimize for read performance, the MySQL 5.5 server requires changing buffer sizes to make use of the memory available. Each parameter is subject to the amount of memory on each server and should be sized appropriately in the my.cnf or my.ini file

  • The key_buffer parameter stores indexes in memory. This buffer should be large enough to hold all indexes. To allocate a buffer of 600MB use:

    key_buffer = 600M
  • The table_cache parameter informs MySQL about how many tables it can have open at any given time. Make sure this parameter is not set to high for MySQL may run out of file descriptors and refuse connections or fail to execute queries. To allocate a cache of 300 use:

    table_cache = 300
  • The sort_buffer_size parameter determines how much area in memory MySQL has to sort data before they are returned.To allocate a buffer size of 500MB use:

    sort_buffer_size = 500M
  • The read_buffer_size parameter determines that amount of memory to allocate for each sequential scan of a table. To allocate a buffer size of 200MB use:

    read_buffer_size = 200M
  • The query_cache_limit parameter set the maximum size of each query that will be cached.  If the query results exceed the query cache limit, the results or data will not be cached. The default is 1M.

    query_cache_limit = 3M
  • The query_cache_size parameter sets the amount of memory globally available for the query cache. This should be adjusted upward as the database increases in size. The default size is 0, so the query cache is disabled by default.

    query_cache_size = 100M
  • The myisam_sort_buffer_size parameter is the buffer size for creating indexes on myisam tables. For the table load, this setting is important. To allocate a buffer size of 200M, use:

    myisam_sort_buffer_size = 200M
  • The bulk_insert_buffer_size parameter caches data on bulk inserts to increase performance. To allocate a buffer of size 100M, use:

    bulk_insert_buffer_size = 100M
  • The join_buffer_size parameter is used for certain kinds of non-index table joins. Increasing the size of this parameter makes certain queries faster. To allocate a buffer of size 100M, use:

    join_buffer_size = 100M

Configuring and Running the Load Script

Windows

If you are running MetamorphoSys in a Windows environment, in your output directory (after running MetamorphoSys with MySQL selected in the “Write Database Load Scripts” section of the "Output Options" tab) you should see a batch file script called "populate_mysql_db.bat".  This script can be used to load your subset into a MySQL database. The directory will also contain an SQL script for creating the tables, and a SQL script for creating the indexes.

Edit the script and configure the four parameters at the top of the script. These are:

set MYSQL_HOME=<path to MYSQL_HOME>
set user=<username>
set password=<password>
set db_name=<db_name>

MYSQL_HOME should be set to the installation directory of your local MySQL server. This is likely to be something like "C:\Program Files\MySQL\MySQL Server 5.5". Remember to enclose the value in quotes if the directory name contains spaces. After configuring these four parameters, this section of your script may look something like the following:

set MYSQL_HOME="C:\Program Files\MySQL\MySQL Server 5.5"
set user=myuser
set password=p4ssw0rd
set db_name=umls

You are now ready to run the script. To do so, navigate to your subset directory using the Windows explorer and double-click on the "populate_mysql_db.bat" file you just finished editing.

Here are some things to keep in mind when using this script:

  1. The script is written to expect that there is a local MySQL Server installation and that you plan to load your UMLS Metathesaurus subset into a database being managed by this local server.   Before getting this far, you should have already installed and configured your MySQL Server.   For more information on that, you are encouraged to visit http://www.mysql.com.
  2. If your database is configured without a password you can leave the password setting blank, but you will have to update the script "populate_mysql_db.bat" and remove all references to the "-p%password%" parameter.  Thus, a line like this:

    %MYSQL_HOME%\bin\mysql -u %user% -p%password% --local-infile=1 %db_name% < mysql_tables.sql >> mysql.log 2>&1

    Must be changed to this

    %MYSQL_HOME%\bin\mysql -u %user% --local-infile=1 %db_name% < mysql_tables.sql >> mysql.log 2>&1
  3. The script will automatically create a log file called "mysql.log" in the same directory as the script.  If you are planning on running the script more than once, you should consider renaming the log file before your second run, otherwise the contents of the first run will be lost.
  4. While the script is running, you can monitor the process from within the MySQL command shell by using "mysql> show processlist;"
  5. The load may take many hours (especially if you have selected a large subset). When it completes, each of the tables will be loaded, and an initial set of indexes will have been created.

Linux, Macintosh, or Solaris

If you are running MetamorphoSys in a Linux, Macintosh, or Solaris environment, in your output directory (after running MetamorphoSys with MySQL selected in the “Write Database Load Scripts” section of the "Output Options" tab) you should see a .sh shell script called "populate_mysql_db.sh".  This script can be used to load your subset into a MySQL database. The directory will also contain a SQL script for creating the tables, and an SQL script for creating the indexes.

Edit the script and configure the four parameters at the top of the script. These are:

MYSQL_HOME=<path to MYSQL_HOME>
user=<username>
password=<password>
db_name=<db_name>

MYSQL_HOME should be set to the installation directory of your local MySQL server. Another way to think of this is that the setting of MYSQL_HOME should contain a bin/ directory that contains the "mysql" command. This may even be something like "/usr". After configuring these four parameters, this section of your script may look something like the following.

MYSQL_HOME=/usr
user=myuser
password=p4ssw0rd
db_name=umls

You are now ready to run the script. To do so, navigate to your subset directory and invoke the script from the command line.It may be necessary to change the file permissions to make the script executable before it can be started. For example:

% cd <subset directory>
% chmod 775 populate_mysql_db.sh
% populate_mysql_db.sh &

Here are some things to keep in mind when using this script:

  1. The script is written to expect that there is a local MySQL Server installation and that you plan to load your UMLS Metathesaurus subset into a database being managed by this local server. Before getting this far, you should have already installed and configured your MySQL Server. For more information on that, you are encouraged to visit http://www.mysql.com.
  2. If your database is configured without a password you can leave the password setting blank, but you will have to update the script "populate_mysql_db.bat" and remove all references to the "-p%password%" parameter. Thus, a line like this:

    $MYSQL_HOME/bin/mysql -u $user -p$password $db_name < mysql_tables.sql >> mysql.log 2>&1

    Must be changed to this

    $MYSQL_HOME/bin/mysql -u $user $db_name < mysql_tables.sql >> mysql.log 2>&1
  3. The script will automatically create a log file called "mysql.log" in the same directory as the script.  If you are planning on running the script more than once, you should consider renaming the log file before your second run, otherwise the contents of the first run will be lost.
  4. While the script is running, you can monitor the process from within the MySQL command shell by using "mysql> show processlist;"
  5. The load may take many hours (especially if you have selected a large subset). When it completes, each of the tables will be loaded, and an initial set of indexes will have been created.

Sample Queries

Sample queries are available from the UMLS Database Query Diagrams Web page.