Skip Navigation Bar
 

Medical Subject Headings

Citation Maintenance tasks in XML format

1. The need for citation maintenance - MeSH changes

The Global Citation Maintenance (GCM) data in XML format makesavailable the annual changes which are made by NLM in the MeSHindexing of citations in PubMed and distributed MEDLINE. Users ofother systems that use MeSH for subject indexing may also findthe GCM data helpful for their indexed documents, but they mustbe aware of relevant differences from the NLM database. Forexample, the searches required by manual tasks are specific toPubMed syntax.

The MeSH vocabulary is updated annually. The primary goal ofcitation maintenance is ensure that the existing indexing withMeSH of the citations is consistent with the current version ofthe MeSH vocabulary while retaining the intent of the existingindexing. Changes in MeSH which may impact citations are: (a)deletions of MeSH headings, and (b) changes in the preferred termof a MeSH heading. Indexing terms which have been deleted orreplaced in the MeSH vocabulary must themselves be removed orreplaced in the citation in order to remain consistent with MeSH.Citation maintenance is concerned with how to appropriatelyreplace the old reference.

Citations or other documents indexed with MeSH terms areusually indexed by the MeSH term or the MeSH Unique Identifiers(UIs) which refer to a MeSH vocabulary record. The GCM data areintended to provide sufficient information to allow systems usingeither terms or UIs to be updated correctly.

In the past, the MeSH Section has made available lists ofDeletedHeadings (deleted Descriptor records) and ReplacedHeadings (changes in Descriptor preferred terms). However, noinformation has been available for changes in SupplementaryConcept Records (SCRs), nor has more detailed record information,such as the unique identifier, been included.

Citation maintenance is accomplished by "tasks" - databasetransactions which make a specific change in the indexing of aset of citations. See section 3 for types of tasks, section 6 fora detailed description of task elements. One of the essentialfeatures of executing these tasks is the relative order amongtasks of different types as well as the order of requiredcitation queries. See section 4 for a fuller account of thesequence of tasks and queries. A chart is also available whichrepresents the maintenance procedure graphically.

2. Availability

GCM data represent annual changes in the MeSH vocabulary whichare available in MEDLINE by January of each year. Annual changesin Supplementary Concept Records (SCRs), especially changesaffecting Descriptors, are included in the data, though SCRchanges made regularly throughout the year are not currentlyincluded.

3. Types of maintenance tasks

For a detailed explanation of the GCM files format, seesections 4 and 5, below. The format of the task data and how theyare to be used, depends on the type of task, which is explainedin the following.

3.1 Updating the indexing - the MeSH preferred term andUI

Indexing with MeSH headings consists in the assignment to acitation of a reference to a MeSH Descriptor, Qualifier, orSupplementary Concept Record (SCR). The reference may be either:(a) the preferred term in the record, for example, 'HeartArrest', or (b) an alpha-numeric unique identifier (UI) for theMeSH record, for example, 'D006323'. Citations in NLM's MedlineXML, for example, use the preferred term in the <MeshHeading> , <NameOfSubstance>, and <QualifierName> . Other systems may index with only theMeSH UI and not the MeSH term. To accomodate both types ofindexing, the GCM data include both a MeSH UI and thecorresponding preferred term for every update action.

Specific "tasks" or transactions are created to change theMeSH indexing in a citation. A task either: (a) replaces anexisting MeSH reference with another, (b) adds a reference, or(c) deletes a reference.

3.2 Main types of tasks

Maintenance tasks are divided into three categories thatreflect the source of the task. This affects the order in whichthe task is executed and its scope.

  • Preferred Term changes

    When the preferred term in a MeSH record has changed, indexingby MeSH term must be replaced by the new preferred term. Forexample, in 2005 MeSH the preferred term for the heading Myocardial Diseases was changed to 'Cardiomyopathies'. This is essentially aname change and is usually the most transparent of indexingchanges.

    Preferred term tasks are applied to every citation in thedatabase and always replace an existing preferred term with adifferent preferred term.

  • "Automatic" tasks - algorithmic replacements

    When a MeSH record is deleted, references to the record areusually replaced with references to a different MeSH record. Forexample, in 2005 MeSH the Descriptor record for Methanogens (UI =D008699) was deleted. Existing citation references were replacedwith references to another record Euryarchaeota (UI = D019605).These tasks are called automatic because the replacement isdetermined by algorithm, though the replacement is originallyspecified by the MeSH subject specialist when the MeSH record isdeleted.

    Automatic tasks are applied to every citation in the databaseand either replace an existing value with a new value, or deletethe old value altogether.

    Note that the result of applying Automatic tasks is that everyMeSH record referenced in the citations is valid in the New MeSHyear. Combined with the application of Preferred Term changes,the result is that all citation references to MeSH records arevalid MeSH terms or UIs for the New MeSH year. (Assuming thatcitation references prior to maintenance were valid for theprevious MeSH year.)

  • "Manual" tasks - case by case changes, requiring a search

    This type of task is called "Manual" because a MeSH specialistdetermines the proper maintenance on a case-by-case basis. Manualtasks are often used to refine the results of a previously-runAutomatic task. For this reason, Manual tasks must be run afterAutomatic tasks. (Thus a Manual task may apply to data introducedby a previously run Automatic task.)

    While Automatic and Preferred Term tasks are applied to everycitation in the database, Manual tasks apply only to citationsidentified by searches in GCM_SEARCH.XML. A Manual task mayreplace an existing value with a new value, but may also just adda value or just delete a value. Manual tasks are not essentialfor preserving valid MeSH references, but they are necessary forpreserving the intent of the existing indexing.

4. Order of tasks and queries

The order in which the tasks and queries must be performed canbe critical because a task or query may be affected by a previoustask. This is especially true when the indexing is done with MeSHterms rather than by MeSH Unique Identifiers (UIs), since termsmay be changed without a change in UI.

4.1 Queries for Manual tasks are run beforemaintenance.

Whether indexing by MeSH term or UI, if Manual tasks are to beused, the queries for the Manual tasks must be independent oflater maintenance tasks. This is because the queries used torestrict the application of Manual tasks refer to MeSH terms inthe previous year's MeSH and so could be affected by either theAutomatic tasks or Preferred Term changes implemented after thequeries are formulated. So the queries must be independent ofthese changes. There are at least two ways to do this. NLM usesthe first method.

  1. Save citation identifiers for later Manual tasks.

    One way to implement this is to save the citationidentifiers which are retrieved by the search, mapped to a given<MTaskID>. These may range in number from a handful tohundreds. Then when Manual tasks are run, they apply to thecitation UIs associated with that <MTaskID>. NLM uses thismethod.
  2. Preserve parallel unmaintained citations for Manualtasks.

    An alternative is to create two copies of the citationdatabase - the first of which is not maintained, and the secondof which is maintained. Then run the search statements for Manualtasks against the first, non-maintained, database, but apply themaintenance to the second, maintained database. This obviates theneed to create special storage for citation references, butrequires a duplicate database.

4.2 Automatic tasks

Automatic tasks are the principal maintenance tasks and thefirst tasks to be done. Manual tasks are run after the Automaticand Preferred Term tasks because the manual tasks are written tosupplement or adjust those results. The order among Automatictasks does not matter since one Automatic task cannot impactanother Automatic task - the maintained-to Descriptor cannot be adeleted record.

4.3 Preferred Term tasks - run after Automatic tasks butbefore Manual tasks

When updating indexing by term, rather than indexing by UI, itis possible for a Preferred Term task to impact an Automatictask. Therefore, Preferred Term tasks must be run after Automatictasks.

However, Manual tasks are written with the expectation thatAutomatic and Preferred Term tasks have already been run.Therefore, Preferred Term tasks must be completed before Manualtasks.

As noted earlier, changes in the MeSH preferred term areimplemented only for systems that index by MeSH term rather thanMeSH Unique Identifier (UI). However, systems that index withMeSH UI must have available a database of MeSH terms for the newMeSH year in order to display or otherwise produce theappropriate preferred term.

4.4 Manual tasks - run after Preferred Term tasks

Manual tasks are usually created to supplement Automatictasks. They are therefore written with the assumption that theAutomatic tasks have already run, and are therefore always runagainst the citation database after the automatic tasks. Forsimilar reasons Manual tasks are run after Preferred Termtasks.

4.5 Summing up the order of processing

The following table summarizes the steps required for updatinga term-indexed database . The processing will be the same forUI-indexed databases except that step (3) - PrefTerm tasks - willnot be applicable. A chart isalso available which represents the maintenance proceduregraphically.

ProcessDescription Sequence
1. Queries for Manual tasksRetrieve sets of citations to beused to specify the range of Manual tasks to be run later.Query results must be obtained firstsince later maintenance could impact the queries, written for theprevious year's MeSH.
2. Automatic tasksReplace all references to deletedMeSH records with references to other MeSH records.Must be run before Manual taskssince Manual tasks are written to supplement Automatictasks.
3. PrefTerm tasksReplace MeSH preferred term with adifferent preferred term.Must be run after Automatic tasks toavoid impacting these tasks.
4. Manual tasksSupplement Automatic tasks, usuallyby adding additional references. Applied to citations previouslyobtained by query.Must be run after Automatic tasks,applied to citations identified earlier by queries for eachManual task.

The <Sequence> element in the GCM XML is designed toensure this order, as well as the order among Manual tasks.

5. Files

GCM data are distributed in two files.

  • GCM.XML. The main file includes a list of every maintenancetask, with the old and new values, MeSH UI, etc. See below for amore detailed description of the elements.

  • GCM_SEARCH.XML. Some maintenance tasks apply only to aspecified subset of the database and so they require a searchdescription that narrows the scope of the task. This file is alist of the searches (in PubMed format) for each of the Manualtasks.

In practice the file names will reflect the MeSH year ofannual changes. So, for example, for 2005 MeSH, the files will beGCM2005.XML and GCM_SEARCH2005.XML.

The XML structure for GCM.XML is relatively simple, with onlytwo element levels and two attributes. See the GCM2005.DTD and sample GCM2005.XML file. TheGCM_SEARCH.XML file is even simpler, with a task ID mapping thesearch to the corresponding task in the GCM.XML file. SeeGCM_SEARCH2005.DTD and sampleGCM_SEARCH2005.XML file.See also the more detailed data element descriptions for bothsets of files, below.

Data are encoded in UTF-8 format. Currently the data are alsocompatible with 7-bit ASCII encoding.

Files are also available for all MeSH records inXML format. Medline and other NLM data inXML format are also available.

6. XML elements

The following two tables list each XML element and attributefor the two files, with a brief description. Following thetables, there is a more discursive description of the elements,including examples in XML format.

6.1 Synopsis of XML elements

The following is a list of GCM elements in tabular format,with a brief description of each.

GCM.XML

Element/attributeValue RangeDescription
CitMaintTaskSet Set of all tasks. Root element.
CitMaintTask Specific task to replace, add, ordelete indexing data.
/ActionReplace, Add, DeleteNature of the change to thecitation.
/TaskSourceTypeManual, Automatic, PrefTermProcess by which task wascreated.
MTaskIDM..., A...., P....Unique identifier for the task.Leading alphabetic, remainder numeric.
MeSHYear(YYYY)Year when annual MeSH changes firstappear in January.
ExistingMeSHUID......, C......, Q......UI of the MeSH record referencebeing replaced or deleted. Null when Action is Add. Same value asNewMeSHUI for PrefTterm change.
NewMeSHUID......, C......, Q.....UI of the MeSH record referencereplacing the old value, or being added. Null when Action isDelete. Same value as ExistingMeSHUI when only preferred termbeing changed. May include attached Qualifier UI.
ExistingMeSHPrefTerm(string) Preferred term forExistingMeSHUI.
NewMeSHPrefTerm(string) Preferred term for NewMeSHUI.
ExistingMeSHRecTypeDESCRIPTOR, SCR, QUALIFIER
NewMeSHRecTypeDESCRIPTOR, SCR, QUALIFIER
MajorTopicYNY, NNew value may be marked as the majortopic of the citation.
Sequence(positive integer)Order in which tasks must berun.

GCM_SEARCH.XML

Element/attributeValue RangeDescription
CitMaintSearchSet Set of all searches for Manualtasks. Root element.
CitMaintSearch Information needed to identifysearch which is needed to apply a Manual task in GCM.XML.
MTaskIDM..., A...., P....Maps search to Manual task inSEARCH.XML having the same <MTaskID>.
MeSHYear(YYYY)Year when annual MeSH changes firstappear in January. Not the MeSH year of the MeSH terms in thesearch, which is one year previous to <MeSHYear>.
SearchPubMed(free text)Search limiting application of aManual task. Must be run prior to any maintenance.

6.2 Alphabetic List of XML elements

The following are the elements in the two XML files.

Action
Description: Nature of the change to the citation. One of thefollowing: Replace, Add, Delete.
Example:

  <CitMaintTask Action="Replace" TaskSourceType="Automatic">

Subelement of: n/a; attribute of <CitMaintTask>
In file: GCM.XML.
Required element: yes

<CitMaintSearch>
Description: Information needed to apply a citation search to agiven Manual task. Used to restrict the application of a Manualtasks to a given set of citations. The search applies to theManual task in the GCM.XML which has the same<MTaskID>.
Subelement of: <CitMaintSearchSet>.
In file: GCM_SEARCH.XML.
Required element: yes

<CitMaintSearchSet>
Description: Set of all <CitMaintSearch> elements inGCM_SEARCH.XML. Root element.
Subelement of: none; this is the root element of theGCM_SEARCH.XML.
In file: GCM_SEARCH.XML.
Required element: yes

<CitMaintTask>
Description: Transaction consisting of all the information neededto change an instance of MeSH-indexing in a citationrecord.
Subelement of: <CitMaintTaskSet>
In file: GCM.XML.
Required element: yes

<CitMaintTaskSet>
Description: The set of all <CitMaintTask> elements in theGCM.XML file
Subelement of: none; this is the root element of theGCM.XML.
In file: GCM.XML.
Required element: yes

<ExistingMeSHPrefTerm>
Description: Preferred term in MeSH for <ExistingMeSHUI>.Null when Action is Add. Critical for PrefTerm changes, may beredundant for Automatic and Manual changes. May be the same as<NewMeSHPrefTerm> in the same task when TaskSourceType isManual or Automatic. Example:

  <ExistingMeSHPrefTerm>Aborigines</ExistingMeSHPrefTerm> 

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<ExistingMeSHRecType>
Description: The MeSH record type of the <ExistingMeSHUI>.One of the following DESCRIPTOR, QUALIFIER, SCR. Null when Action is Add.Redundant in that the record type may be inferred from theinitial character of <ExistingMeSHUI> (D, Q, C). Designedto make it easier for users of XML to extract actions pertainingto only one record type. May be different from<NewMeSHRecType> in the same task.
Example:

 <ExistingMeSHRecType>SCR</ExistingMeSHRecType> 

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<ExistingMeSHUI>
Description: UI of the MeSH record reference being replaced ordeleted. Matches the seven-character string in a<DescriptorUI>, <SupplementalRecordUI>, or<QualifierUI>. Null when Action is Add. Same value as<NewMeSHUI> in the same task for PrefTterm change. Notnecessarily in the previous year of MeSH but could be anintermediate value in the maintenance process.
Example:

  <ExistingMeSHUI>C039562</ExistingMeSHUI>    

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<MajorTopicYN>
Description: Medline indexing includes an optional indicator forDescriptors representing a main point of a citation. So in amaintenance task which adds a reference to a citation (Add orReplace), major topic of the citations may be indicated by a "Y"value. (Cf. Medline MajorTopicYN, which is an attribute of the<DescriptorName>, rather than a separate element. TheGCM.XML uses a separate element for the MajorTopicYN rather thanmake it an attribute of two elements - the<NewMeSHPrefTerm> and the <NewMeSHUI>.)
Example:

 <MajorTopicYN>Y</MajorTopicYN> 

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<MeSHYear>
Description: Year when annual MeSH changes first appear inJanuary. All "new" data in the XML will be consistent with MeSHdata in that <MeSHYear>. In the GCM_SEARCH.XML it has thismeaning as well and does not mean the MeSH year of the MeSH termsin the <Search> element, which will be the year prior tothe <MeSHYear>
Example:

  <MeSHYear>2005</MeSHYear> 

Subelement of: <CitMaintTask>
In file: GCM.XML, GCM_SEARCH.XML.
Required element: yes

<MTaskID>
Description: Unique identifier for each <CitMaintTask>. ForPrefTerm tasks the value begin with 'P', for Automatic tasks 'A',and for Manual tasks 'M'. Will be unique across years. Thenumeric portion has no inherent significance.
Examples:

  <MTaskID>A2</MTaskID>   <MTaskID>M1107</MTaskID> 

Subelement of: <CitMaintTask>
In file: GCM.XML; GCM_SEARCH.XML.
Required element: yes

<NewMeSHPrefTerm>
Description: Preferred term in MeSH for <ExistingMeSHUI>.Null when Action is Delete. Critical for PrefTerm changes, may beredundant for Automatic and Manual changes. May be the same as<ExistingMeSHPrefTerm> in the same task when TaskSourceTypeis Manual or Automatic. Example:

  <NewMeSHPrefTerm>Oceanic Ancestry Group</NewMeSHPrefTerm>

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<NewMeSHRecType>
Description: The MeSH record type of the <NewMeSHUI>. Oneof DESCRIPTOR, QUALIFIER, SCR. Null when Action is Delete.Redundant in that the record type may be inferred from theinitial character of <NewMeSHUI> (D, Q, C). Designed tomake it easier for users of XML to extract actions pertaining toonly one record type. May be different from<ExistingMeSHRecType> in the same task.
Example:

  <ExistingMeSHRecType>DESCRIPTOR</ExistingMeSHRecType> 

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<NewMeSHUI>
Description: UI of the MeSH record reference replacing theexisting value, or being added. Matches the seven-characterstring in a <DescriptorUI>, <SupplementalRecordUI>,or <QualifierUI>. Null when Action is Delete. Same value as<ExistingMeSHUI> in the same task for PrefTterm change.When a <DescriptorUI>, the value may include anattached<QualifierUI>. (See example.)
Examples:

  <NewMeSHUI>D043203</NewMeSHUI>   <NewMeSHUI>D008628/Q000627</NewMeSHUI> 

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: no

<Sequence>
Description: Number indicating order in which tasks for a givenyear are executed. The order in which the tasks must be performedis: (a) Automatic, (b) Preferred Term, and (c) Manual. Inaddition, a specific order may be required within the Manualtasks. To guarantee this order, the <Sequence> values areassigned in the follow way:

All Automatic tasks have a value of 1.
All PrefTerm tasks have a value of 2.
All Manual tasks have a value of 3 or greater, depending on theorder specified by the analyst creating the Manual task.

Example:

  <Sequence>1</Sequence> 

Subelement of: <CitMaintTask>
In file: GCM.XML.
Required element: yes

<SearchPubMed>
Description: A citation search used to restrict the applicationof a Manual task specified in GCM.XML. PubMed format - seehttp://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html.

Example:

  <SearchPubMed>biota [nm] AND+MEDLINE+[sb]</SearchPubMed>

Subelement of: <CitMaintSearch>
In file: GCM_SEARCH.XML.
Required element: yes

TaskSourceType
Description: Process by which task was created. One of thefollowing: PrefTerm, Automatic, Manual.
Example:

  <CitMaintTask Action="Replace" TaskSourceType="Automatic"> 

Subelement of: n/a; attribute of <CitMaintTask>
In file: GCM.XML.
Required element: yes