Skip Navigation Bar
 

Medical Subject Headings

XML MeSH, 2014. Documentation and Availability

1. General

XML MeSH contains all currently maintained MeSH data. A list of XML Data Elements is available. Also available is a conversion table which lists ELHILL MeSH and ASCII MeSH elements, with the corresponding XML MeSH tag. XML also includes data which are not in earlier MeSH formats, such as the concept structure similar to the UMLS. For more detailed information on the concept structure as well as the XML format, a background narrative is available.

2. Restrictions on use

There is no charge. Use of the XML MeSH file data is subject to conditions which are detailed in the Memorandum of Understanding.

3. Availability

The data for Descriptors and Qualifiers are updated annually and users of the data are encouraged to obtain the new year's data.

Supplementary Concept Records (formerly Supplementary Chemical Records) are updated in-house on a daily basis and are released in XML weekly (Sunday). They are coordinated with 2014 MeSH descriptors so that the data elements that refer to specific descriptor, such as the <HeadingMappedTo> element, have been updated to match a descriptor in 2014 MeSH.

MeSH Descriptors and Qualifiers are also published annually in a printed version, including MeSH Trees.

4. File format

4.1 Compressed files

Because the XML MeSH files are considerably larger than ASCII MeSH files, the Descriptor and SCR files are available in both compressed format (ZIP and GZ) format and full format. The file of Qualifiers is the full XML file.

4.2 ASCII and UTF-8

Data in XML MeSH files are encoded in the Unicode character set, specifically UTF-8. Most of the data are in 7-bit ASCII format, i.e., US-ASCII, which is a subset of UTF-8. However, a relatively small number of terms and Annotations contain one or more diacritical characters,  for example, "Carbocaïne", a French trade name for the anesthetic Mepivacaine. (Note the small "i" with dieresis, known in French as the tréma.). These are coded in UTF-8 format and will be correctly displayed by UTF-8 applications. Otherwise they may appear differently in different displays. Codings for diacritics in NLM data can be found in the table MEDLINE Character Database.

4.3 XML tagged format

Like all XML data, XML MeSH data consist in text bounded by beginning and end tags specific for each data element. The tag for a Descriptor record for example is:

 <DescriptorRecord ...> ... </DescriptorRecord>

An example of a term is:

 <String>Heart</String>

Each data element or occurrence is contained on a single line but this is not required by XML format which uses the end tag to unambiguously mark the end of data.

For more detailed information on the XML format of MeSH a background narrative is available.

5. Contents of files - 2014 MeSH.

Files updated weekly. Counts as of September 9, 2013.

Record TypeTotal
Records
Total
Terms
File size1Bytes1ZIP/GZ
file
Descriptors 27,149    294MB 308,280,313  16MB
Qualifiers 83  *** 468KB 473,328  ***
Supplementary
Concept Records 2
218,817    677MB 709,642,214  40MB

ZIP, GZ = compressed formats.

All sizes apply to files on the NLM Unix file server. (So byte counts do not include characters for CR or EOF.)

1 Uncompressed.
2 Formerly Supplementary Chemical Records.

6. Contact

For questions concerning distribution, format, etc., contact:

Jacque-Lynne Schulman
Medical Subject Headings
voice: 301-496-1495; FAX: 301-402-2002
email: schulman@nlm.nih.gov