![]() |
|
| Unified Medical Language System | |
Original Release Format Oracle Load Script
This document is a guide to loading an ORF subset of the UMLS Metathesaurus into an Oracle database by using the scripts and files created by MetamorphoSys. To generate the necessary files and scripts, select the "Write Oracle Load Script" checkbox on the "Output Options" tab of the MetamorphoSys configuration screen..
Recommended Software Version
Oracle 9i Release 2 (Standard Edition, Enterprise Edition) or
Oracle 10g Release 2 (Standard
Edition, Enterprise Edition)
To load your data into an Oracle database, you will
have to either use an existing database or create a new one. The
full complexity of creating an Oracle database is outside the scope of
this document. However, we have included some discussion
regarding this process below and we do have a couple of specific
recommendations. In particular, recommend that you specify UTF8
as
the
CHARACTER SET and
NATIONAL CHARACTER SET and configure the database
to use "character" instead of "byte" semantics by default when
interpreting character field lengths. These settings are
configured through three Oracle parameters:
% setenv ORACLE_SID UMLSDBNote: the SYS password is typically configured when Oracle is installed. Now, you are ready to issue the CREATE DATABASE command. Here is a sample create database command that addresses the character set issues described above.
% sqlplus /nolog
SQL> CONNECT SYS/<sys pwd> AS SYSDBA
SQL> -- create the server parameter file
SQL> CREATE SPFILE='/d1/oracle/dbs/spfileUMLSDB.ora' FROM
SQL> PFILE='/d1/oracle/dbs/init.ora';
SQL> SHUTDOWN
SQL> STARTUP NOMOUNT
SQL> ALTER SYSTEM SET nls_length_semantics='CHAR' scope=BOTH;
CREATE DATABASE umlsdb
USER SYS IDENTIFIED BY
sys_p4ssw0rd
USER SYSTEM IDENTIFIED BY
system_p4ssw0rd
CHARACTER SET UTF8
NATIONAL CHARACTER SET UTF8
LOGFILE GROUP 1 ('/d1/..../redo01.log)
SIZE 500M,
GROUP 2 ('/d1/..../redo02.log) SIZE 500M,
GROUP 3 ('/d1/..../redo02.log) SIZE 500M
MAXLOGFILES 5
MAXLOGMEMBERS 5
MAXLOGHISTORY 1
MAXDATAFILES 100
MAXINSTANCES 1
ARCHIVELOG
EXTENT MANAGEMENT LOCAL
DATAFILE '/d2/....' SIZE 500M
REUSE
SYSAUX DATAFILE '/d2/....' SIZE
500M REUSE
DEFAULT TABLESPACE META
DATAFILE
'/d1/..../meta01.dbf' SIZE 15000M REUSE
EXTENT MANAGEMENT LOCAL
UNIFORM SIZE 10M
DEFAULT TEMPORARY TABLESPACE TEMP
TEMPFILE '/d2/..../temp01.dbf'
SIZE 5000M REUSE
EXTENT MANAGEMENT LOCAL
UNIFORM SIZE 10M
UNDO TABLESPACE UNDO_TBS
DATAFILE
'/d3/..../undotbs01.dbf' SIZE 2000M
REUSE AUTOEXTEND ON MAXSIZE
UNLIMITED
SET TIME_ZONE = 'PST';
One consideration in the create table statement is the size of the main
data tablespace datafile. In the example above, we create a META
tablespace with an initial data file called "meta01.dbf" whose starting
size is 15 GB. We recommend that you either create a very large
data file like this one or configure the tablespace to use
AUTOEXTEND. Ensure your system has enough disk space to
accommodate the full volume of data.
Once the database is created, you should run the admin scripts to
create the data dictionary tables and the structures needed for PL/SQL
procedures. For example,
% setenv ORACLE_SID UMLSDB
% sqlplus /nolog
SQL> CONNECT SYS/<sys pwd>AS SYSDBA
SQL> @$ORACLE_HOME/rdbms/admin/catalog.sql
SQL> @$ORACLE_HOME/rdbms/admin/catproc.sql
With this done, you should now be ready to create a user.
Creating a User
Whether you create a database manually or with the configuration
assistant, the next step is to create your own user. It is very
important that you not load data as the SYS or
SYSTEM user. We recommend that you set the default tablespace for
this user to the META tablespace you created and set the user quota to
UNLIMITED. After creating a user, the next step is to grant this user
the appropriate
appropriate privileges. For example,
% setenv ORACLE_SID UMLSDB
% sqlplus /nolog
SQL> CONNECT SYS/<sys pwd> AS SYSDBA
SQL> CREATE USER umls
SQL> IDENTIFIED BY umls_p4ssw0rd
SQL> DEFAULT TABLESPACE META
SQL> QUOTA UNLIMITED ON META;
SQL> GRANT CONNECT TO umls;
SQL> GRANT RESOURCE TO umls;
Note: this process can also be performed through a GUI interface
with a tool like Oracle Enterprise Manager.
Configuring Parameters for Performance
Following are a number of initialization parameter recommendations
designed for an Oracle 10g system to handle the performance load of
your database. This
guide is not intended to be exhaustive and is not guaranteed to
increase performance of your specific system. Please take the
time to
understand the individual Oracle parameters to maximize the individual
performance of your system.
Windows
If you are running MetamorphoSys in a Windows environment, in your
output directory (after running MetamorphoSys with the "Write Oracle
Load Script" box checked) you should see a batch file script called
"populate_oracle_db.bat". This script can be used to load your
subset
into an Oracle database. The directory will also contain an
SQL script for creating the tables,
an SQL script for creating the indexes, an SQL*Loader control file
(.ctl) for each of the data files to be loaded, and a
script for creating the join of MRCON and MRSO (MRCONSO).
SQL*Loader is the Oracle tool for loading text files into a database
and has a variety of parameters and configuration options (this too is
outside the scope of this document). The control files used by
the script to load each file into an Oracle database table are designed
to be relatively simple and relatively efficient. If you are
interested in more information, you are encouraged to look up the SQL*Loader Control File Reference
in the Oracle manual.
Edit the script and configure the four parameters at the top of the
script. These are:
set ORACLE_HOME=<path to
ORACLE_HOME>
set user=<username>
set password=<password>
set tns_name=<db_name>
ORACLE_HOME should be set to the installation directory of your
local
Oracle client or server. This may be something like
"D:\oracle\product\10.2.0\db_1". After
configuring these four parameters, this section of your script may look
something like the following.
set
ORACLE_HOME=D:\oracle\product\10.2.0\db_1
set user=myuser
set password=p4ssw0rd
set tns_name=UMLSDB
You are now ready to run the script. To do so, navigate to
your subset directory using the Windows explorer and double-click on
the "populate_oracle_db.bat" file you just finished editing.
Here are some things to keep in mind when using this script:
Linux, Macintosh, or Solaris
If you are running MetamorphoSys in a Linux, Macintosh, or Solaris environment, in your output directory (after running MetamorphoSys with the "Write Oracle Load Script" box checked) you should see a .sh shell script called "populate_oracle_db.sh". This script can be used to load your subset into a Oracle database. The directory will also contain an SQL script for creating the tables, an SQL script for creating the indexes, an SQL*Loader control file (.ctl) for each of the data files to be loaded, and a script for creating the join of MRCON and MRSO (MRCONSO).
SQL*Loader is the Oracle tool for loading text files into a database and has a variety of parameters and configuration options (this too is outside the scope of this document). The control files used by the script to load each file into an Oracle database table are designed to be relatively simple and relatively efficient. If you are interested in more information, you are encouraged to look up the SQL*Loader Control File Reference in the Oracle manual.export ORACLE_HOME=<path to
ORACLE_HOME>
user=<username>
password=<password>
tns_name=<tns_name>
ORACLE_HOME should be set to the installation directory of your
local Oracle client or server. This may be something like
"/u01/app/oracle/product/10.2.0". After
configuring these four
parameters, this section of your script may look something like the
following.
export
ORACLE_HOME=/u01/app/oracle/product/10.2.0
user=myuser
password=p4ssw0rd
tns_name=umls
Note: passwords containing $ or # characters may cause problems as
these have special meaning within the shell environment. In this
case, you may want to enclose the password in quotes in the script
itself (e.g. password="pass#word"). You are now ready to run the
script. To do so, navigate to
your
subset directory and invoke the script from the command line. It
may be necessary to change the file permissions to make the script
executable before it can be started. For example:
% cd <subset directory>
% chmod 775 populate_oracle_db.sh
% populate_oracle_db.sh &
Here are some things to keep in mind when using this script:
Sample Queries
We have provided a number of sample queries to demonstrate how to access certain kinds of data.
Get the English preferred name for a UMLS CUI.
SELECT str FROM MRCON
WHERE cui = ? AND stt = 'PF'
AND ts = 'P' AND lat = 'ENG';
Get MRCONSO rows (atoms) for a SAB and CODE.
SELECT * FROM MRCONSO
WHERE sab = ? AND code = ?
ORDER BY cui, lat, lower(ts), lui, stt;
Get all MRCONSO rows (atoms) for a CUI, in precedence order.
SELECT * FROM MRCONSO
WHERE cui = ? ORDER BY cui, lat, lower(ts), lui, stt;
Get all MRCONSO rows (atoms), with STYs, in precedence order.
SELECT a.*, sty FROM MRCONSO a. MRSTY b
WHERE cui = ?
ORDER BY cui, lat, lower(ts), lui, stt;
Perform a norm string search in MRXNS_ENG to get CUI.
SELECT a.* FROM MRCONSO a, MRXNS_ENG b
WHERE a.cui = b.cui and nstr = ? and a.sui = b.sui;
Find all Semantic Types for a particular string.
SELECT sty FROM MRCON
a, MRSTY b WHERE a.cui = b.cui AND str = ?;
Find all contexts for a particular "atom". (Note: this requires creating an ORF subset
with a MRCXT file).
SELECT * FROM MRCXT
WHERE cui = ? AND sui = ?
ORDER BY cui, sui, code, cxn, cxl, rnk;
Find all definitions for a particular string.
SELECT def FROM MRCON a, MRDEF b
WHERE a.cui = b.cui AND str = ?;
Find preferred MUI of a MeSH D#.
SELECT atv AS mui FROM MRSAT a, MRSO b
WHERE a.cui = b.cui
AND a.sui = b.sui
AND b.sab = 'MSH'
AND b.tty = 'MH'
AND atn = 'MUI'
AND b.code = ?;
Last reviewed: 05 July 2007
Last updated: 05 July 2007
First published: 05 July 2007
Metadata| Permanence level: Permanent: Dynamic Content