1. The 8 Karat Gold Standard
The work of Guttmann et al ([2]) focussed on the reproducibility and comparison of various segmentation techniques within useful limits, but has not addressed the truth thereof, in the absence of a ground truth. In other words, it made possible the relative comparison of different techniques against each other, but it failed to answer the question of how well the computer generated model of a particular structure, normal or pathologic, reflects the real structure. Based on this work, we proceeded to building the first module of a validation suite, consisting of an arbitrary number of individual labelmaps of a given structure from the VHF data. The segmentation is being performed by means of interactive technique, done by trained specialists, who are familiar with the anatomy of the region as well as with gross and cross-sectional anatomy. Each segmentation is “as good as it gets” based on clinical and anatomical judgement that has been tested and is currently in use in clinical, image guide neurosurgery procedures as well as in several federally funded research projects. Each individual label set, as well as a statistical mean of all the labels can serve as a rough, yet dependable (8 karat) standard against which automatic segmentation techniques can be compared.. Figure 1. illustrates the process, and Figure 2. presents a magnified and simplified example. An algorithm will have passed the test if its result will fall within the limits of either of the individual labelmaps, or within the “average" thereof. A detailed description of this suite will be published upon completion, in the near future. Admittedly, this is a relativistic method which falls short of scientific expectations. Nevertheless, once completed, it will present a valuable progress, since at present no common standard exists and the performance of each algorithm is being assessed in a clinical judgement style, with no means to compare several algorithms in a uniform, if not objective manner.
2. The 14 Karat Gold Standard - Mutual anatomical information
In the current practice, a given anatomical structure is being
segmented
based on its image properties, as they appear in the data set, such as
color, edges and shape, with or without the use of preexisting knowledge
of such properties. The structure is being focussed on in an anatomical
vacuum, so to speak. While this is the case in the situations in which
the structure (e.g. viscera, blood vessel or nerve) appears surrounded
by relatively homogeneous tissue, such as connective tissue, almost any
given structure is intersecting other significant and well defined structures,
and these neighborhoods are anatomically significant. A validation method
that takes into account these critical points can exclude not only "improbable"
boundaries but also "impossible" ones. In other words, the boundaries of
a nerve may be hard to establish with certainty where the nerve is surrounded
by connective tissue, but it becomes clear-cut when the nerve intersects
a blood vessel, which has its own precise boundaries. Hence, the boundaries
of the nerve can be defined as a multitude of points, some of which are
"probable" other being "certain".
In the example shown in Figure
3, the C3 nerve on the right side (arrow) is neighbored by the vertebral
artery and by the second and third cervical vertebrae. Thus, the relative
indeterminateness of the boundaries can be significantly improved upon,
if one takes into account these anatomical neighborhoods of the nerve (Figure
4 A through D). In this way, the cloud of points from the 8 karat
suite, within which any segmentation should
"reasonably" be found, can
be imporved on by adding a number of points which the segmentation may
not include without failing the test.
3. The 18 Karat Gold Standard
The space between the structures in Figure 4. still leaves room for uncertainty and makes us wishing for an unattainable solid gold standard. The VHF data set has an isotropic resolution of 0.33 mm/voxel. On the other hand, the segmentation algorithms that we want to validate are typically developed for radiologic, mostly magnetic resonance imaging (MRI) data, with a resolution of 1-1.5 mm/pixel and a slice thickness of 2-4 mm. This translates into a loss of resolution by a factor of 5 in plane (Figure 5.) and a factor of 12 in the z-direction (Figure 6.). Hence, we can greatly increase the relative accuracy of our 14 karat gold standard by creating it on the full resolution data set and applying it to a reduced data set approximating the image properties of the clinical MRI images.
4. The 24 Karat Gold Standard
If the image properties of a synthetic MRI data set generated
from the
VHF cryosections can simulate adequately those of a real data set, it follows
that the segmentation of the high resolution color cryosections can yield
information of a higher order of magnitude than the data used for validation
and hence it can be used as a gold standard for algorithms designed to
perform on radiologic data.