Skip Navigation Bar

XML MeSH, 2012. Documentation and Availability

1. General

XML MeSH contains all currently maintained MeSH data. A list of XML Data Elements is available. Also available is a conversion table which lists ELHILL MeSH and ASCII MeSH elements, with the corresponding XML MeSH tag. XML also includes data which are not in earlier MeSH formats, such as the concept structure similar to the UMLS. For more detailed information on the concept structure as well as the XML format, a background narrative is available.

2. Restrictions on use

There is no charge. Use of the XML MeSH file data is subject to conditions which are detailed in the Memorandum of Understanding.

3. Availability

The data for Descriptors and Qualifiers are updated annually and users of the data are encouraged to obtain the new year's data.

Supplementary Concept Records (formerly Supplementary Chemical Records) are updated in-house on a daily basis and are released in XML weekly (Sunday). They are coordinated with 2012 MeSH descriptors so that the data elements that refer to specific descriptor, such as the <HeadingMappedTo> element, have been updated to match a descriptor in 2012 MeSH.

MeSH Descriptors and Qualifiers are also published annually in a printed version, including MeSH Trees.

4. File format

4.1 Compressed files

Because the XML MeSH files are considerably larger than ASCII MeSH files, the Descriptor and SCR files are available in both compressed format (ZIP and GZ) format and full format. The file of Qualifiers is the full XML file.

4.2 ASCII and UTF-8

Data in XML MeSH files are encoded in the Unicode character set, specifically UTF-8. Most of the data are in 7-bit ASCII format, i.e., US-ASCII, which is a subset of UTF-8. However, a relatively small number of terms and Annotations contain one or more diacritical characters,  for example, "Carbocaïne", a French trade name for the anesthetic Mepivacaine. (Note the small "i" with dieresis, known in French as the tréma.). These are coded in UTF-8 format and will be correctly displayed by UTF-8 applications. Otherwise they may appear differently in different displays. Codings for diacritics in NLM data can be found in the table MEDLINE Character Database.

4.3 XML tagged format

Like all XML data, XML MeSH data consist in text bounded by beginning and end tags specific for each data element. The tag for a Descriptor record for example is:

 <DescriptorRecord ...> ... </DescriptorRecord>

An example of a term is:


Each data element or occurrence is contained on a single line but this is not required by XML format which uses the end tag to unambiguously mark the end of data.

For more detailed information on the XML format of MeSH a background narrative is available.

5. Contents of files - 2012 MeSH.

Files updated weekly. Counts as of September 9, 2011.

Record TypeTotal
File size1Bytes1ZIP/GZ
Descriptors 26,581 209,237 288MB 301,810,673 15MB
Qualifiers 83 *** 476KB 487,012 ***
Concept Records 2
203,012 506,3953 638MB 669,171,583 39MB

ZIP, GZ = compressed formats.

All sizes apply to files on the NLM Unix file server. (So byte counts do not include characters for CR or EOF.)

1 Uncompressed.
2 Formerly Supplementary Chemical Records.
3 Total SCR terms as of September 22, 2011.

6. Contacts

For questions concerning the content of XML MeSH, contact:

Stuart Nelson, M.D.
Head, Medical Subject Headings
National Library of Medicine
6701 Democracy Blvd, Suite 202
Bethesda, MD 20894

voice: 301-496-1495; FAX: 301-402-2002

For questions concerning distribution, format, etc., contact:

Jacque-Lynne Schulman
Medical Subject Headings
voice: 301-496-1495; FAX: 301-402-2002