Skip Navigation Bar

XML MeSH Documentation and Availability

1. General

XML MeSH contains all currently maintained MeSH data. A list of XML Data Elements is available. Also available is a conversion table which lists ELHILL MeSH and ASCII MeSH elements, with the corresponding XML MeSH tag. XML also includes data which are not in earlier MeSH formats, such as the concept structure similar to the UMLS. For more detailed information on the concept structure as well as the XML format, a background narrative is available.

2. Restrictions on use

There is no charge. Use of the XML MeSH file data is subject to conditions which are detailed in the Memorandum of Understanding.

3. Availability

The data for Descriptors and Qualifiers are updated annually, near the end of the calendar year. Users of the data are encouraged to obtain the latest data. A preview of next year's MeSH is released in the fall of the preceeding year. 

Supplementary Concept Records (formerly Supplementary Chemical Records) are updated daily and released Monday through Friday. They are coordinated with MeSH descriptors so that the data elements that refer to specific descriptor, such as the <HeadingMappedTo> element, match the same release year.

MeSH Descriptors and Qualifiers are only published electronically, including MeSH Trees.

4. File format

4.1 Compressed files

Because the XML MeSH files are considerably larger than ASCII MeSH files, the Descriptor and SCR files are available in both compressed format (ZIP and GZ) format and full format. The file of Qualifiers is the full XML file.

4.2 ASCII and UTF-8

Data in XML MeSH files are encoded in the Unicode character set, specifically UTF-8. Most of the data are in 7-bit ASCII format, i.e., US-ASCII, which is a subset of UTF-8. However, a relatively small number of terms and Annotations contain one or more diacritical characters,  for example, "Carbocaïne", a French trade name for the anesthetic Mepivacaine. (Note the small "i" with dieresis, known in French as the tréma.). These are coded in UTF-8 format and will be correctly displayed by UTF-8 applications. Otherwise they may appear differently in different displays. Codings for diacritics in NLM data can be found in the table MEDLINE Character Database.

4.3 XML tagged format

Like all XML data, XML MeSH data consist in text bounded by beginning and end tags specific for each data element. The tag for a Descriptor record for example is:

 <DescriptorRecord ...> ... </DescriptorRecord>

An example of a term is:


Each data element or occurrence is contained on a single line but this is not required by XML format which uses the end tag to unambiguously mark the end of data.

For more detailed information on the XML format of MeSH a background narrative is available.

5. Contact

For questions concerning distribution, format, etc., contact:

Jacque-Lynne Schulman
Medical Subject Headings
voice: 301-496-1495; FAX: 301-402-2002