|April 20, 2000 [posted]|
|Changes in MeSH Data Structure|
he underlying structure of MeSH has changed. The data creation system for MeSH has been converted from a term-based system to a concept-oriented system to make it more compatible with the Unified Medical Language System (UMLS). This was part of a two-year project in which the old Model 204 mainframe Database Management System (DBMS) was replaced with an Oracle®-based client-server system. While these changes are transparent to most users, they greatly facilitate the creation and maintenance of the MeSH vocabulary, and ultimately have a fundamental role in the underlying structure of MeSH.
MeSH Descriptors (main headings), Qualifiers, and Supplementary Concept Records still exist. Entry terms, whether printed in the MeSH Tools (Print Entry Terms) or not (Non-Print Entry Terms), still provide access points to the MeSH components.
Descriptors (Main Headings) Now Concept-based
The new structure is centered on descriptors, concepts, and terms rather than just descriptors and terms. Our understanding of what a descriptor consists of has been refined. A descriptor is now viewed as a class of concepts, and a concept as a class of synonymous terms.
A descriptor class consists of one or more concepts closely related to each other in meaning. For the purposes of indexing, retrieval, and organization of the literature, these concepts are best lumped together in one class. It has been recognized for some time that not every term that we might wish to explore is sufficiently distinct in meaning that it would serve well as a descriptor. For example, the NISO standard for Monolingual Thesauri talks of quasi-synonyms (terms that don't have the same meaning, such as "roughness" and "smoothness", but are a means of addressing the same underlying phenomenon). Entry terms like "Isometric Exercise" are narrower in meaning than the main heading "Exercise", but left in the exercise descriptor class because of the overlap in meaning with another entry term, "Aerobic Exercise." The recognition of the nature of a descriptor as a class of concepts helps us to understand what we are dealing with.
The descriptor will have a preferred concept, one of the terms naming that concept will be the preferred term of the preferred concept, and take on the role of naming the descriptor. It will thus be the main heading. Each of the subordinate concepts also will have a preferred term, as well as a labeled (broader, narrower, related) relationship to the preferred concept. Terms meaning the same will be grouped in the same concept.
The above "related" relationship between a subordinate concept and the preferred concept is not to be confused with the relationship between descriptors described with the traditional thesaurus moniker "see related." Those "see related" relationships are between descriptors, and serve as pointers from one descriptor to other descriptors whose use should be considered in indexing or in searching. The "related" relationship between a subordinate concept and a preferred concept indicates that the same descriptor is to be used for both meanings in indexing and in searching.
The more formal new structure allows such relationships to be expressed in a way that can be manipulated computationally. Furthermore, it allows each concept to carry its own unique attributes that have not been previously represented. This may include such things as separate definitions, and translations into foreign languages. If a given concept becomes sufficiently distinct to warrant its own descriptor class, it can be moved to its own new descriptor class and thus receive its own place in the hierarchical MeSH trees.
As an example of how the modifications are represented in the structure, consider the Main Heading, AIDS Dementia Complex. Under the old term-based creation system, AIDS Dementia Complex had six print entry terms:
Each term's relationship to the descriptor is listed in parentheses. But there was no way to tell the relationship of the narrower entry terms to each other. In the new creation system using the concept-oriented structure we have:
It can be seen that concept classes II and III are respectively narrower and related to concept class I (the preferred concept), but are not equivalent to each other. Each concept class could be given its own definition if desired. It can also be seen that HIV Encephalopathy and AIDS Encephalopathy are synonymous terms within the same concept class.
Searching Remains the Same
By using the concept as the key unit in the new structure, appropriate non-synonymous relationships can now be represented separately and finer shades of meaning may be clarified. Lumping these non-synonymous concepts together into one descriptor class does not alter the traditional function of entry vocabulary.
The parent/child structure of the MeSH trees has not been changed. These relationships are between descriptors, not between concepts, although in many instances they can appear to be identical. In the past, the blurring of the significance of the hierarchical relationships has often led to confusion about the motivation for a given representation. Now, a simpler test, "should a search for documents dealing with A find all (or most) documents dealing with B?", should clarify the motivation for a given tree structure.
Searching with MeSH via any of NLM's interfaces has not changed. Finding and displaying MeSH headings and their tree structures from all of the MeSH Browsers also remains the same.
Supplementary Concept Records
While the conceptual structure of the descriptors and qualifiers has been relatively well established, work is proceeding on the 110,000+ Supplementary Concept Records. Editing this number of records will take some time. However, this work is behind the scenes and should not affect any functionality.
By Allan Savage
Savage A. Changes in MeSH Data Structure. NLM Tech Bull. 2000 Mar-Apr;(313):e2.