Symbolic and Spatial Knowledge Model

Carol Bean, Ph.D., Pat Molholt, Ph.D.
Office of Scholarly Resources and Dept. of Medical Informatics
Columbia University Health Sciences

Celina Imielinska, Ph.D., Lisa Laino-Pepper, M.S.
Dept. of Electrical Engineering and Computer Science
Stevens Institute of Technology

INTRODUCTION

The NLM's Visible Human Project dataset of digitized anatomical images, supported by the concurrent technological advances needed to process and manipulate them, promise near-term availability of tools for medicine only heretofore envisioned in our dreams. The ability to realize the full potential of these developments will ultimately rest upon how well we are able to associate meaning with these image data. We believe that this can best be accomplished using a knowledge base rich in content, organized by a robust and flexible structure.

A network-based, platform-independent electronic "course" in anatomy is being developed to augment the traditional classroom- laboratory experience, as part of the integrated systems-based electronic curriculum recently introduced at Columbia University's College of Physicians and Surgeons. This paper presents the ratinale of the conceptual model and the development of the core elements of the model. The model integrates images and content and is the basis of the collaborative design effort between Columbia University and Stevens Institute of Technology.

The model is consistent with the general vision of "structural informatics" presented by researchers at the University of Washington, which views anatomical information as having two distinct knowledge components: spatial knowledge about the images of structural representation, and symbolic knowledge about the structures themselves, that are represented in the images [1- 3]. The present conceptual data model may be considered "relational" for two reasons: it uses non-hierarchical associative relationships to provide a rich and expressive network linking the conceptual entities, and it is designed to be implemented as a modular component of a relational database management system. A simplified version of the model is presented in modified ER-diagram format (Figure 1) and its class structure is described.

APPLICATION

The desiderata driving the conceptual design were to develop a modular knowledge base of anatomy, capable of accommodating different perspectives of the content and of providing linkages within and among both local and global resources. It should be independent from yet compatible with such resources, supporting seamless integration among them. The medical domain has an exceptionally rich concept content, and developing an equally rich relationship structure will optimize expressivity and flexibility and enable the desired linkages. Implementation using the Columbia-Presbyterian Medical Center's (CPMC) approach of a modular component-based architecture will ensure internal institutional integration and enable external integration via WWWeb-based interfaces to Internet [4].

The knowledge model accommodates different perspectives or views of anatomy. Within the pedagogical setting(s), a student's orientation within the model varies according to the curricular context, and consequently defines the dominant or default perspective on the material; for example, clinical anatomy traditionally uses a regional perspective, while the basic sciences require one more oriented toward functional systems. The necessity and utility for such distinctions becomes more acute under clinical considerations; for example, even basic anatomical orientation varies for the radiologist, depending on what imaging modality is used.

In addition to the introductory course in clinical (gross) anatomy, the anatomy knowledge base is planned as the main cognitive perspective for the first-year course Science Basic to the Practice of Medicine and Dentistry (SBPMD), and will eventually serve as the primary integrating axis to organize the entire core curriculum in the basic health sciences, and eventually, into clinical training. Beyond its use in the first- and second-year curriculum, the knowledge model also will be incorporated into various aspects of the third- and fourth-year curriculum to support integration of the core content into subsequent clinical training. Then, a student might move from a clinical record or set of records to a curriculum component to review various aspects of a particular case, disease, or procedure, based on the links existing between the anatomy knowledge base and the Medical Entities Dictionary (MED) underlying the CPMC Clinical Information System. The decision to organize the curriculum knowledge structure model around various anatomical perspectives is based on the fact that medical training, and ultimately medical practice itself, proceeds on the expectation of a solid foundation in this most basic of medical domains; another reason may be found in the maturity and stability of the anatomy domain content.

The model must also support a variety of linkages among widely disparate scholarly and professional resources both inside and external to the institution, ranging from on-line textbooks to bibliographic databases such as MEDLINE, from the hospital's clinical information system to the automated alert and decision support system. Such demands require a design both flexible and expressive enough to be consistent and compatible with information resources that will vary widely in their structure and content, in sometimes unpredictable ways.

The combined knowledge base will be mounted on a commercial relational database management system that is accessible via the WWWeb-based electronic curriculum using Perl-scripted Common Gateway Interface calls to perform SQL queries to retrieve and present both text and images from their respective database components. The conceptual design enables the model to be independent of physical design and implementation considerations.

SYMBOLIC KNOWLEDGE BASE ORGANIZATION

With its Meta-Thesaurus providing a compilation of controlled medical vocabularies, and the Semantic Net a conceptual model covering much of the biomedical domain, the NLM's Unified Medical Language System (UMLS) was judged to provide the ideal basis for the fundamental content and structure of the model. The anatomy model is based on a monohierarchy, somewhat of a departure from current trends. The simplicity of this structure provides the flexibility to accommodate the integration of a variety of other structural paradigms. From there, the design of the knowledge model reflects primarily the hierarchy and class system that can be derived from an analysis of the relational structure of the subject domain, clinical anatomy. The use of associative relationships to help determine the structure and organization of the model is an unusual approach to classification and conceptual design.

When two or more closely related but distinct classes enter into many of the same relationships with the same semantic types or classes, it suggests the existence of a common superclass, and necessity for creating one. Thus, a single, superclass (ANATOMIC ENTITY) was created to subsume all anatomic concepts in the knowledge base, so relationships and attributes that affect all of them could be efficiently applied using inheritance principles. Likewise, several other high-level and intermediary superclasses were created specifically to cluster concepts that entered into the same relationships, based on the view that a concept class may be defined by these relationships.

One important facet that emerged as an important distinguishing characteristic among anatomic entities among anatomic entities was the totion of form as distinct from substance (structure and situation versus material) and function. To Express and preserve this we needed to create basic, or high-level classes to accommodate the concepts that were defined by these distinctions and entered into relationships based on them. ORGAN SYSTEMS, TISSUE SYSTEMS, CELLULAR SYSTEMS, and PHYSIOLOGICAL SYSTEMS were classified as functional entities under ANATOMIC SYSTEM. Concepts that reflect the orientation or location of anatomic entities in physical space were grouped together as ANATOMIC SITES, differentiating positive space (BODY LOCATION OR REGION) from negative space (BODY SPACE OR JUNCTION). All compositional elements and products were combined under ANATOMIC SUBSTANCES, distinguishing those that "make up" (STRUCTURAL COMPONENTS) from those that are "made by" (BODY SUBSTANCES). ANATOMIC STRUCTURES includes the discrete, concrete anatomic entities, as well as their variants over temporal and functional dimensions.

The scope of clusterings was evaluated and adjusted accordingly, narrowing it for some, such as those which played a primary role of providing a conceptual axis for organization (e.g., ANATOMIC SYSTEMS and ANATOMIC SITES), and enriching the scope of others, particularly those with substantial concept proliferation (e.g., ANATOMICAL STRUCTURES and ANATOMIC SUBSTANCES).

Some new classes required reorganization into subclasses, both for simple efficiency's sake, and to disambiguate complex composite concepts. For example, BONE AND BONES, MUSCLE AND MUSCLES, and NERVE AND NERVES are ambiguous composite terms under two classes in different hierarchical axes, with specific meanings indistinguishable out of context. They were split into the unambiguous atomic terms and reassigned as BONE, MUSCLE, and NERVE under TISSUE; and BONES, MUSCLES, and NERVES under BODY PARTS. Other classes were combined along a particular unifying facet to create a superordinate class, e.g., TISSUE, CELL, CELL COMPONENT, and GENE/GENOME as subclasses to the superclass STRUCTURAL COMPONENTS.

Compared to class and hierarchical structures, there has been a relative lack of attention to the associative relationships among classes, and among the application of these relationships to individual class members. For example, even though the UMLS contains some 50 "non-hierarchical" associations, there is little direct connection between the relational structure of the Semantic Net classes and the concepts themselves in the Meta-Thesaurus, which must be present in a working knowledge base. As shown above, there are gaps in the UMLS class structure applicable to the anatomical subject domain, with concomitant gaps in the associated relational structure. Further, recent research [5] indicates that many of the associative relationships for anatomic entities among the Medical Subject Headings (MeSH) are insufficiently specified to support the level of detailed expression needed here; unpublished data on UMLS as a whole suggest similar trends. These areas provide the focus for our current efforts in symbolic knowledge base development.

nlmknow2.gif

Figure 1: Knowledge Base-Visible Human Project Data

SPATIAL REPRESENTATION

The original color anatomy data are stored in the global Exact Coordinate system, as a collection of slices and a voxel-based volume. We associate with each anatomical structure its Volume of Interest, which can be viewed either as a stack of successive 2D arrays of pixels, which are subsets of original color slices, or a voxel-based cuboid, a sub-volume of the voxel-based volume.

The Exact Volume of Interest is a cuboid defined in the global Exact Data Coordinate system. The Compressed Volume of Interest is a cuboid with compressed voxels. We store the segmented anatomical structures in 2D Boundary and Regions masks over the slices in the Exact Volume of Interest. Also 3D Compressed Region masks will be used to identify the compressed solid interiors of the structures. The extracted anatomical entities are stored in two representations: 3D void inside surfaces and compressed solid, voxel-based, interiors obtained from the Exact and Compressed Volume of Interest, respectively [6]. Each such representation will reside in a fixed location in the 3D model of complete anatomy. The 3D surface representation of a single structure is stored at a number of mesh resolutions, to support zooming. The vertices in such multi-resolution meshes will store information about color and the texture, to enable rendering of the structure with the original colors. The solid interiors are derived from 3D Compressed Region Masks and the Compressed Volume of Interest.

Finally, a scene is a collection of 3D surface-based anatomical structures displayed in 3D space, specified by a 3D box. Such a scene may contain only a subset of all anatomical structures whose spatial locations intersect the scene box. For each anatomical structure, displayed in a scene, one may access its solid interior.

The integration of the actual data between the symbolic and spatial components of the knowledge base itself will be provided by means of a central "canonical" key or reference table. Here, all named ANATOMIC ENTITIES in the symbolic knowledge base are associated with their respective sets of 2D boundary masks in the spatial knowledge base, providing a derivation of the full range of xyz coordinates for each.

ACKNOWLEDGEMENTS

This work was supported in part by NLMTraining Grant LM07079

REFERENCES

[1] Brinkley JF. Structural informatics and its applications in medicine and biology. October 1991. Academic Medicine 66(10):589-591.

[2] Brinkley JF, Eno K, Sundsten JW. Knowledge-based client-server approach to structural information retrieval: the Digital Anatomist Browser. June 1993. Computer Methods and Programs in Biomedicine 40(2):131-145.

[3] Rosse C. The potential of computerized representations of anatomy in the training of health care professionals. June 1995. Academic Medicine 70(6):499-505.

[4] Clayton PD, Sideli RV, Sengupta S. Open architecture and integrated information at CPMC. M.D. Computing 1992; 9(5):297-303.

[5] Bean CA. Analysis of explicit non-hierarchical associative relationships among medical subject headings (MeSH): Anatomical and related terminology. IN R. Green (ed.). Knowledge Organization and Change: Proceedings of the Fourth International ISKO Conference, Washington DC, July 1996, Frankfurt/Main:INDEKS Verlag, pp. 80-86.

[6] Imielinska C., Bean C., Laino-Pepper L., Molholt P. Network-based Medical Visualization. April 1996. Technical Report No. 9604, Stevens Institute of technology.