Medical Subject Headings
XML MeSH, 2010. Documentation and Availability
XML MeSH contains all currently maintained MeSH data. A list of XML Data Elements is available. Also available is a conversion table which lists ELHILL MeSH and ASCII MeSH elements, with the corresponding XML MeSH tag. XML also includes data which are not in earlier MeSH formats, such as the concept structure similar to the UMLS. For more detailed information on the concept structure as well as the XML format, a background narrative is available.
2. Restrictions on use.
There is no charge. Use of the XML MeSH file data is subject to conditions which are detailed in the Memorandum of Understanding.
The data for Descriptors and Qualifiers are updated annually and users of the data are encouraged to obtain the new year's data.
Supplementary Concept Records (formerly Supplementary Chemical Records) are updated in-house on a daily basis and are released in XML weekly (Sunday). They are coordinated with 2010 MeSH descriptors so that the data elements that refer to specific descriptor, such as the <HeadingMappedTo> element, have been updated to match a descriptor in 2010 MeSH.
MeSH Descriptors and Qualifiers are also published annually in a printed version, including MeSH Trees.
4. File format
4.1 Compressed files
Because the XML MeSH files are considerably larger than ASCII MeSH files, the Descriptor and SCR files are available in both compressed format (ZIP and GZ) format and full format. The file of Qualifiers is the full XML file.
4.2 ASCII and UTF-8
Data in XML MeSH files are encoded in the Unicode character set, specifically UTF-8. Most of the data are in 7-bit ASCII format, i.e., US-ASCII, which is a subset of UTF-8. However, a relatively small number of terms and Annotations contain one or more diacritical characters, for example, "Carbocaïne", a French trade name for the anesthetic Mepivacaine. (Note the small "i" with dieresis, known in French as the tréma.). These are coded in UTF-8 format and will be correctly displayed by UTF-8 applications. Otherwise they may appear differently in different displays. Codings for diacritics in NLM data can be found in the table MEDLINE Character Database.
4.3 XML tagged format
Like all XML data, XML MeSH data consist in text bounded by beginning and end tags specific for each data element. The tag for a Descriptor record for example is:
<DescriptorRecord ...> ... </DescriptorRecord>
An example of a term is:
Each data element or occurrence is contained on a single line but this is not required by XML format which uses the end tag to unambiguously mark the end of data.
For more detailed information on the XML format of MeSH a background narrative is available.
5. Contents of files. 2010 MeSH.
Concept Records 2
ZIP, GZ = compressed formats.
All sizes apply to files on the NLM Unix file server. (So byte counts do not include characters for CR or EOF.)
2 Formerly Supplementary Chemical Records.
6. For questions concerning the content of XML MeSH, contact:
- Stuart Nelson, M.D.
Head, Medical Subject Headings
National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894
voice: 301-496-1495; FAX: 301-402-2002
- For questions concerning distribution, format, etc., contact:
- Jacque-Lynne Schulman
- Medical Subject Headings
- voice: 301-496-1495; FAX: 301-402-2002
- email: email@example.com