IMAGING & THERAPEUTIC TECHNOLOGY
Thre e-dimensional Three-dimensional Visualization Visualiza tion and Analysis Methodologies: A Current Perspective1 Jayaram K. Udupa, PhD PhD Three-dimensional (3D) imaging was developed to provide both qualitative and quantitative information about an object or object system from images obtained with multiple modalities including includ ing digital radiography, computed tomography, magnetic resonance imaging, positron emission tomography, single photon emission computed tomography, and ultrasonography. Three-dimensional imaging operations may be classified under four basic headings: preprocessing, visualization, manipulation, and analysis. Preprocessing operations (volume of interest, filtering, interpolation, registration, segmentation) are aimed at extracting or improving the extraction of object information in given images. Visualization operations facilitate seeing and comprehending objects in their full dimensionality and may be either scene-based or object-based. Manipulation may be either rigid or deformable and allows alteration of object structures and of relationships between objects. Analysis operations, like visualization operations, may be either scene-based or object-based and deal with methods of quantifying object information. There are many challenges involving matters of precision, accuracy, and efficiency in 3D imaging. Nevertheless, 3D imaging is an exciting technology that promises to offer an expanding number and variety of applications.
INTRODUCTION
The main purpose of three-dimensional (3D) imaging is to provide both qualitative and quantitative information about an object or object system from images obtained with multipl multiple e modalities modalities including including digita digitall radiography radiography,, computed computed tomograp tomography hy (CT), magmagnetic resonance (MR) imaging, positron emission tomography (PET), single photon emission computed tomography (SPECT), and ultrasonography (US). Objects that are studied may be rigid (eg, bones), deformable (eg, muscles), static (eg, skull), dynamic (eg, heart, joints), or conceptual (eg, activity regions in PET, SPECT, and functional MR imaging; isodose surfaces in radiation therapy). At present, it is possible to acquire medical images in two, three, four, or even five dimensions. For example, two-dimensional (2D) images might include a digital radiograph or a tomographic section obtained with CT, MR imaging, PET, SPECT, or US; a 3D image might be used to demonstrate a volume of tomographic sections of a static object; a time sequence of 3D images of a dynamic object would be displayed in four Abbreviations: MIP = maximum intensity projection, PD = proton density, PET = positron emission tomography, 3D = three-dimensional, 2D = two-dimensional Index terms: Computed tomography (CT) • Computers • Computers, simulation • Images, analysis • Images, display Images, processing • Magnetic resonance (MR) • Single-photon emission tomography (SPECT) • Ultrasound (US) RadioGraphicss 1999; 19:783–806 RadioGraphic 1
From the Department of Radiology, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104-6021. Received April 21, 1998; revision requested May 21 and receiv ed December 14; accepted December 21. Address reprin t requests to the author. ©
RSNA, 1999
783
Figure 1. Schematic illustrates a typical 3D imaging system.
dimensions; and an image of a dynamic object for a range of parameters (eg, MR spectroscopic images of a dynamic object) would be displayed in five dimensions. It is not currently feasible to acquire truly realistic-looking four- and five-dimensional images; consequently, approximations are made. In most applications, the object system being investigated consists of only a few static ob jects. For example, a 3D MR i maging study of the head may focus on white matter, gray matter, and cerebrospinal fluid. A textbook with a systematic presentation of 3D imaging is not currently available. However, edited works may be helpful for readers unfamiliar with the subject (1–3). The reference list at the end of this article is representative but not exhaustive. In this article, we provide an overview of the current status of the science of 3D imaging, identify the primary challenges now being encountered, and point out the opportunities available for advancing the science. We describe and illustrate the main 3D imaging operations currently being used. In addition, we delineate major concepts and attempt to clear up some common misconceptions. Our intended audience includes developers of 3D imaging methods and software as well as developers of 3D imaging applications and clinicians interested in these applications. We assume the reader has some familiarity with medical imaging modalities and a knowledge of the rudimentary concepts related to digital images.
CLASSIFICATION OF 3D IMAGING OPERATIONS
Three-dimensional imaging operations can be broadly classified into the following categories: ( a ) preprocessing (defining the object system to create a geometric model of the objects under investigation), ( b ) visualization visualization (viewing and comprehending the object system), ( c c ) manipulation (altering the objects [eg, virtual surgery]), d ) analysis (quantifying information about and ( d object system). These operations are highly in-
784
Imaging & Therapeutic Technology
terdependent. For example, some form of visualization is essential to facilitate the other three classes of operations. Similarly, object definition through an appropriate set of preprocessing operations is vital to the effective visualization, manipulation, and analysis of the object system. We use the phrase “3D imaging” to collectively refer to these four classes of operations. A monoscopic or stereoscopic video display monitor of a computer workstation is the most commonly used viewing medium for images. However, other media such as holography and head-mounted displays are also available. Unlike the 2D computer monitor, holography offers a 3D medium for viewing. The headmounted display basically consists of two tiny monitors positioned in front of the eyes as part of a helmetlike device worn by the user. This arrangement creates the sensation of being free from one’s natural surroundings and immersed in an artificial environment. However, the computer monitor is by far the most commonly used viewing medium, mainly because of its superior flexibility, speed of interaction, and resolution compared with other media. A generic 3D imaging system is represented in Figure 1. A workstation with appropriate software implementing 3D imaging operations forms the core of the system. A wide variety of input or output devices are used depending on the application. On the basis of the core of the system (ie, independent of input or output), 3D imaging systems may be categorized as those having ( a ) physician display consoles provided by imaging equipment vendors, ( b ) image processing and visualization workstations supplied by other independent vendors, ( c c ) 3D imaging software supplied independent of the workstation, and ( d d ) university-based 3D imaging soft ware (often freely available avai lable via the Internet). Systems produced by scanner manufacturers and workstation vendors usually provide effective solutions but may cost $50,000–$150,00 $50,000–$150,000. 0. For users with expertise in accessing, installing, and running the software, university-based 3D imaging software is available that can pro vide very effective, inexpensive solutions. For
Volume 19
Number 3
Frequently Used Terms in 3D Imaging Term
Definition
Scene
Multidimensional image; rectangular array of voxels with assigned values Anatomic region represented by the scene Values assigned to the voxels in a scene Length of a side of the square cross section of a voxel Origin and orthogonal axes system affixed to the imaging device Origin and orthogonal axes system affixed to the scene (origin usually assumed to be upper left corner of first section of scene, axes are edges of scene domain that converge at the origin) Origin and orthogonal axes system affixed to the ob ject or object system Origin and orthogonal axes system affixed to the display device 2D image depicting the object information captured in a scene or object system
Scene domain Scene intensity Pixel size
Scanner coordinate system Scene coordinate system
Object coordinate system Display coordinate system Rendition
example, for under $5,000 it is possible to configure a complete system running on modern personal computers (eg, Pentium 300; Intel, Santa Clara, Calif) that performs as well as or even better than the costly systems described in the other three categories.
Terminology
Some frequently used terms in 3D imaging are defined in the Table and illustrated in Figure 2. The region of the image corresponding to the anatomic region of interest is divided into rectangular elements (Fig 2). These elements are usually referred to as pixels for 2D images and voxels for 3D images; in this article, however, we refer to them as voxels for all images. In 2D imaging, the voxels are usually squares, whereas in 3D imaging they are cuboids with a square cross section.
Figure 2. Drawing provides graphic representation of the basic terminology used in 3D imaging. abc = scanner coordinate system, rst = display coordinate system, uvw = object coordinate system, xyz = scene coordinate system.
aging operations: graded composition and “hanging-togetherness.” Graded Composition.—Most objects in the body have a heterogeneous material composition. In addition, imaging devices introduce blurring into acquired images due to various limitations. As a result, regions corresponding to the same object display a gradation of scene intensities. On the knee CT scan shown in Figure 3, both the patella and the femur exhibit this property (ie, the region corresponding to bone in these anatomic locations has not just one CT value but a gradation of values). Hanging-togetherness (Gestalt).—Despite the gradation described in the preceding paragraph, when one views an image, voxels seem to “hang together” (form a gestalt) to form an object. For example, the high-intensity voxels of the patella do not (and should not) hang together with similar voxels in the femur, although voxels with dissimilar intensities in the femur hang together (Fig 3).
Object Characteristics in Images
There are two important object characteristics whose careful management is vital in all 3D im-
May-June 1999
Udupa
RadioGraphics
785
PREPROCESSING
The aim of preprocessing operations is to take a set of scenes and output computer object models or another set of scenes from the given set, which facilitates the creation of computer ob ject models. The most commonly used operations are volume of interest, filtering, interpolation, registration, and segmentation.
Volume of Interest
Volume of interest converts a given scene into another scene. Its purpose is to reduce the amount of data by specifying a region of interest and a range of intensity of interest. A region of interest is specified by creating a rectangular box that delimits the scene domain in all dimensions (Fig 4a). A range of intensity of interest is specified by designating an intensity interval. Within this interval, scene intensities are transferred unaltered to the output. Outside the interval, they are set to the lower and upper limits. The range of intensity of interest is indicated as an interval on a histogram of the scene (Fig 4b). The corresponding section in the output scene is shown in Figure 4c. This operation can often reduce storage requirements for scenes by a factor of 2–5. It is advisable to use the volume of interest operation first in any sequence of 3D imaging operations. The challenge in making use of the volume of interest operation is to completely automate this operation and to do so in an optimal fashion, which requires explicit delineation of ob jects at the outset.
Filtering
Filtering converts a given scene into another scene. Its purpose is to enhance wanted (object) information and suppress unwanted (noise, background, other object) information in the output scene. Two kinds of filters are available: suppressing filters and enhancing filters. Ideally, unwanted information is suppressed without affecting wanted information and wanted information is enhanced without affecting un wanted information. The most commonly used suppressing filter is a smoothing operation used mainly for suppressing noise (Fig 5a, 5b). In this operation, a voxel v in the output scene is assigned an intensity that represents a weighted average of the intensities of voxels in the neighborhood of v in the input scene (4). Methods differ as to how neighborhoods are determined and how weight is assigned (5). Another commonly used method is median filtering. In this method, the voxel v in the output scene is assigned a value that simply represents the middle value
786
Imaging & Therapeutic Technology
Figure 3. Graded composition and hanging-togetherness. CT scan of the knee illustrates graded composition of intensities and hanging-togetherness. Voxels within the same object (eg, the femur) are assigned considerably different values. Despite this gradation of values, however, it is not difficult to identify the voxels as belonging to the same object (hanging-togetherness).
(median) of the intensities of the voxels in the neighborhood of v in the input scene when the voxels are arranged in ascending order. In another method (5), often used in processing MR images, a process of diffusion and flow is considered to govern the nature and extent of smoothing. The idea is that in regions of voxels with a low rate of change in intensity, voxel intensities diffuse and flow into neighboring regions. This process is prevented by voxels with a high rate of change i n intensity. Certain parameters control the extent of diffusion that takes place and the limits of the magnitude of the rate of change in scene intensity that are considered “low” and “high.” This method is quite effective in overcoming noise but sensitive enough not to suppress subtle details or blur edges. The most commonly used enhancing filter is an edge enhancer (Fig 5c) (4). With this filter, the intensity of a voxel v in the output is the rate of change in the intensity of v in the input. If we think of the input scene as a function, then this rate of change is given by the magnitude of the gradient of the function. Because this function is not known in analytic form, various digital approximations are used for this operation. The gradient has a magnitude (rate of change) and a direction in which this change is maximal. For filtering, the direction is usually
Volume 19
Number 3
a. b. c. Figure 5. Preprocessing with suppressing and enhancing filters. (a) Head CT scan illustrates the appearance of an image prior to filtering. (b) Same image as in a after application of a smoothing filter. Note that noise is suppressed in regions of uniform intensity, but edges are also blurred. (c) Same image as in a after application of an edge-enhancing filter. Note that regions of uniform intensity are unenhanced because the gradient in these regions is small. However, the boundaries (especially of skin and bone) are enhanced.
a.
b. Figure 4. Preprocessing with a volume of interest operation. (a) Head CT scan includes a specified region of interest (rectangle). (b) Histogram depicts the intensities of the scene designated in a and includes a specified intensity of interest. (c) Resulting image corresponds to the specified region of interest in a .
c.
May-June 1999
Udupa
RadioGraphics
787
a. b. Figure 6. Shape-based interpolation of a binary CT scene created by designating a threshold. (a) CT scene after shape-based interpolation at a “coarse” resolution and subsequent surface rendering. (b) The same scene after interpolation at a “fine” resolution and surface rendering.
ignored, although it is used in operations used to create renditions. Methods differ as to how to determine the digital approximation, which is extensively studied in computer vision (6). Unfortunately, most existing suppressing filters often also suppress object information and enhancing filters enhance unwanted information. Explicit incorporation of object knowledge into these operations is necessary to minimize these effects.
Interpolation
Like filtering, interpolation converts a given scene into another scene. Its purpose is to change the level of discretization (sampling) of the input scene. Interpolation becomes necessary when the objective is ( a ) to change the nonisotropic discretization of the input scene to isotropic discretization or to a desired level of discretization, ( b ) to represent longitudinal scene acquisitions in a registered common coordinate system, ( c ) to represent multimodality scene acquisitions in a registered coordinate system, or ( d ) to re-section the given scene. Two types of interpolation are currently available: scene-based interpolation and object-based interpolation. Scene-based Interpolation. —The intensity of a voxel v in the output scene is determined on the basis of the intensity of voxels in the neighborhood of v in the input scene. Methods differ as to how the neighborhoods are determined and what form of the functions of the neighboring intensities is used to estimate the
788
Imaging & Therapeutic Technology
intensity of v (3,6,7). In 3D interpolation, the simplest solution is to estimate new sections between sections of the input scene, keeping the pixel size of the output scene the same as that of the input scene. This leads to a one-dimensional interpolation problem: estimating the scene intensity of any voxel v in the output scene from the intensities of voxels in the input scene on the two sides of v in the z direction (the direction orthogonal to the sections). In “nearest neighbor” interpolation, v is assigned the value of the voxel that is closest to v in the input scene. In linear interpolation, two voxels v1 and v2 (one on either side of v ) are considered. The value of v is determined with the assumption that the input scene intensity changes linearly from the intensity at v1 to that at v2. In higher-order (eg, cubic) interpolations, more neighboring voxels are considered. When the size of v in the output scene differs in all dimensions from that of voxels in the input scene, the situation becomes more general, and intensities are assumed to vary linearly or as a higher-order polynomial in each of the three directions in the input scene. Object-based Interpolation.—Object information derived from scenes is used to guide the interpolation process. At one extreme (8), the given scene is first converted to a “binary” scene (ie, a scene with only two intensities: 0 and 1) with a segmentation operation (see “Segmentation”). The voxels with a value of 1 represent the object of interest, whereas the voxels with a value of 0 represent the rest of the scene domain. The “shape” of the region
Volume 19
Number 3
a.
b. Figure 7. Scene-based registration. (a) Three-dimensional scenes corresponding to proton-density (PD)–weighted MR images of the head obtained in a patient with multiple sclerosis demonstrate a typical “preregistration” appearance. The scenes were acquired at four different times. (b) Same scenes as in a after 3D registration. The progression of the disease (hyperintense lesions around the ventricles) is now readily apparent. At registration, the scenes were re-sectioned with a scene-based interpolation method to obtain sections at the same location.
represented by the “1” voxels (the object) is then used to create an output binary scene with a similar shape (9,10) by way of interpolation. This is done by first converting the binary scene back into a (gray-valued) scene by assigning every voxel in this scene a value that represents the shortest distance between the voxel and the boundary between the “0” voxels and the “1” voxels. The “0” voxels are assigned a negative distance, whereas the “1” voxels are assigned a positive distance. This scene is then interpolated with a scene-based technique and is subsequently converted back to a binary scene by setting a threshold at 0. At the other extreme, the shape of the intensity profile of the input scene is itself considered an “object” to be used to guide interpolation so that this shape is retained as faithfully as possible in the output scene (11). For example, in the interpolation of a 2D scene with this method, the scene is converted into a 3D surface of intensity profile wherein the height of the surface represents pixel intensities. This (binary) object is then interpolated with a shapebased method. Several methods exist between these two extremes (12,13). The shape-based methods have been shown to produce more accurate results (8–11) than most of the commonly used scene-based methods. Figure 6 demonstrates binary shape-based interpolation of an image derived from CT data at coarse and fine levels of discretization. The origi-
May-June 1999
nal 3D scene was first assigned a threshold to create a binary scene. This binary scene was then interpolated at coarse (Fig 6a) and fine (Fig 6b) levels and surface rendered. The challenge in interpolation is to identify specific object information and incorporate it into the process. With such information, the accuracy of interpolation can be improved.
Registration
Registration takes two scenes or objects as input and outputs a transformation that, when applied to the second scene or object, matches it as closely as possible to the first. Its purpose is to combine scene or object information from multiple modalities and protocols to determine change, growth, motion, and displacement of objects as well as aid in object identification. Registration may be either scene-based or ob ject-based. Scene-based Registration. —To match two scenes, a rigid transformation made with translation and rotation (and often scaling) is calculated for one scene S 2 such that the intensity pattern of the transformed scene ( S 2′ ) matches that of the first scene ( S1 ) as closely as possible (Fig 7) (14). Methods differ with respect to the matching criterion used and the means of
Udupa
RadioGraphics
789
Figure 8. Rigid object-based registration. Sequence of 3D MR imaging scenes of the foot allows kinematic analysis of the midtarsal joints. The motion (ie, translation and rotation) of the talus, calcaneus, and navicular and cuboid bones from one position to the other is determined by registering the bone surfaces in the two different positions.
determining which of the infinite number of possible translations and rotations are optimal (15). Scene-based registration methods are also available for cases in which objects undergo elastic (nonrigid) deformation (16). Object-based Registration.—In object-based registration, two scenes are registered on the basis of object information extracted from the scenes. Ideally, the two objects should be as similar as possible. For example, to match 3D scenes of the head obtained with MR imaging and PET, one may use the outer skin surface of the head as computed from each scene and match the two surfaces (17). Alternatively (or in addition), landmarks such as points, curves, or planes that are observable in and computable from both scenes as well as implanted objects may be used (18–20). Optimal translation and rotation parameters for matching the two ob jects are determined by minimizing some measure of “distance” between the two (sets of) objects. Methods differ as to how distances are defined and optimal solutions are computed. Rigid object-based registration is illustrated in Figure 8. In contrast, deformable matching operations can also be used on objects (21,22). These operations may be more appropriate than rigid matching for nonrigid soft-tissue structures. Typically, a global approximate rigid matching operation is performed, followed by local deformations for more precise matching. Deformable registration is also used to match computerized brain atlases to brain scene data obtained in a given patient (23). Initially, some object information has to be identified in the scene. This procedure has several potential applications in functional imaging, neurology, and neurosurgery as well as in object definition per se. The challenge in registration is that scenebased methods require that the intensity patterns in the two scenes be similar. This is often not the case, however. Converting scenes into fuzzy (nonbinary) object descriptions that re-
790
Imaging & Therapeutic Technology
tain object gradation can potentially overcome this but may still retain the strength of scenebased methods. Deformable fuzzy object matching seems natural and appropriate in most situations but will require the development of fuzzy mechanics theory and algorithms.
Segmentation
From a given set of scenes, segmentation outputs computer models of object information captured in the scenes. Its purpose is to identify and delineate objects. Segmentation consists of two related tasks: recognition and delineation. Recognition.—Recognition consists of roughly determining the whereabouts of an object in the scene. In Figure 3, for example, recognition involves determining that “this is the femur and this is the patella.” This task does not in volve the precise specification of the region occupied by the object. Recognition may be accomplished either automatically or with human assistance. In automatic (knowledge- and atlas-based) recognition, artificial intelligence methods are used to represent knowledge about objects and their relationships (24–26). Preliminary delineation is usually needed in these methods to extract object components and to form and test hypotheses related to whole objects. A carefully created “atlas” consisting of a complete description of the geometry and interrelationships of objects is used (16,27,28). Some delineation of object components in the given scene is necessary. This information is used to determine the mapping necessary to transform voxels or other geometric elements from the scene space to the atlas. Conversely, the information is also used to deform the atlas so that it matches the delineated object components in the scene. In human-assisted recognition, simple assistance is often sufficient to help solve a segmentation problem. This assistance may take several forms: for example, specification of several
Volume 19
Number 3
“seed” voxels inside the 3D region occupied by the object or on its boundary, creation of a box (or some other simple geometric shape) that just encloses the object and can be quickly specified, or a click of a mouse button to accept a real object (eg, a lesion) or reject a false object. Delineation.—Delineation involves determining the precise spatial extent and composition of an object including gradation. In Figure 3, if bone is the object system of interest, then delineation consists of defining the spatial extent of the femur and patella separately and specifying an “objectness” value for each voxel in each object. Once the objects are defined separately, the femur and the patella can be visualized, manipulated, and analyzed individually. Delineation may be accomplished with a variety of methods. Often, delineation is itself considered to be the entire segmentation problem, in which case related solutions are considered to be solutions to the segmentation problem. However, it is helpful to distinguish between recognition and delineation to understand and help solve the difficulties encountered in segmentation. Approaches to delineation can be broadly classified as boundary-based or region-based. In boundary-based delineation, an object description is output in the form of a boundary surface that separates the object from the background. The boundary description may take the form of a hard set of primitives (eg, points, polygons, surface patches, voxels) or a fuzzy set of primitives such that each primitive has an assigned grade of “boundariness.” In region-based delineation, an object description is output in the form of the region occupied by the object. The description may take the form of a hard set of voxels or of a fuzzy set such that each voxel in the set has an assigned grade of “objectness.” With the former method, each voxel in the set is considered to contain 100% object material; with the latter method, this value may be anywhere from 0% to 100%. Object knowledge usually facilitates recognition and delineation of that object. Paradoxically, this implies that segmentation is required for effective object segmentation. As we have noted, segmentation is needed to perform most of the preprocessing operations in an optimal fashion. It will be seen later that segmentation is essential for most visualization, manipulation, and analysis tasks. Thus, segmentation is the most crucial among all 3D imaging operations and also the most challenging.
May-June 1999
Knowledgeable human beings usually outperform computer algorithms in the high-level task of recognition. However, carefully designed computer algorithms outperform human beings in achieving precise, accurate, and efficient delineation. Clearly, human delineation cannot account for graded object composition. Most of the challenges in completely automating segmentation may be attributed to shortcomings in computerized recognition techniques and the lack of delineation techniques that can handle graded composition and hanging-togetherness. There are eight possible combinations of approaches to recognition and delineation, resulting in eight different methods of segmentation. With hard, boundary-based automatic segmentation, thresholding and isosurfacing are most commonly used (29–32). In these techniques, a scene intensity threshold is specified and the surface that separates voxels with an intensity above the threshold from those with an intensity below the threshold is computed. Methods differ as to how the surface is represented and computed and whether surface connectedness is taken into account. The surface may be represented in terms of voxels, voxel faces, points, triangles, or other surface elements. If connectedness is not used, the surface obtained from a scene will combine discrete objects (eg, the femur and the patella in Fig 3); with connectedness, each of the objects can be represented as a separate surface (assuming they are separated in the 3D scene). In Figure 6, the isosurface is connected and is represented as a set of faces of voxels (29). In addition to scene intensity threshold, intensity gradient has also been used in defining boundaries (33). Another method of segmentation is fuzzy, boundary-based automatic segmentation. Concepts related to fuzzy boundaries (eg, connectedness, closure, orientedness) that are well established for hard boundaries are difficult and as yet undeveloped. However, computational methods have been developed that identify only those voxels that are in the vicinity of the object boundary and that assign each voxel a grade of “boundariness” (34,35). These methods use scene intensity or intensity gradient to determine boundary gradation (Fig 9). In hard, boundary-based, human-assisted segmentation, the degree of human assistance ranges from tracing the boundary entirely by hand (manual recognition and delineation) to specifying only a single point inside the object
Udupa
RadioGraphics
791
Figure 9. Fuzzy, boundary-based automatic segmentation. Rendition created with both intensity and gradient criteria shows the fuzzy boundaries of “bone” detected in the CT data from Figure 3.
or on its boundary (manual recognition and automatic delineation) (36–41). In routine clinical applications, manual boundary tracing is perhaps the most commonly used method. On the other hand, boundary detection methods requiring simple user assistance based on intensity (36,37) and gradient criteria (38–41) have been developed. However, these methods cannot be guaranteed to always work correctly in large applications. There are many user-assisted methods besides those just described that require different degrees of human assistance for segmentation of each scene (42–48). In view of the inadequacy of the minimally user–assisted methods mentioned earlier, much effort is currently being directed toward developing methods that take a largely manual approach to recognition and a more automatic approach to delineation. These methods go under various names: active contours or snakes (42–44), active surfaces (45,46), and live-wire (live-lane) (47,48). In active contour and active surface methods, a boundary is first specified (eg, by creating a rectangle or a rectangular box close to the boundary of interest). The boundary is considered to have certain stiffness properties. In addition, the given scene is considered to exert forces on the boundary whose strength depends on the intensity gradients. For example, a voxel exerts a strong attractive force on the boundary if the rate of change in intensity of the voxel is high. Within this static mechanical system, the initial boundary deforms and eventually assumes a shape for which the combined potential energy is at a minimum. Unfortunately, the steady-state shape is usually impossible to compute. Furthermore, whatever shape is accepted as an alternative may not match with the
792
Imaging & Therapeutic Technology
Figure 10. Live-wire segmentation. Section created on the basis of data from an MR image of the foot shows a live-wire segment representing a portion of the boundary of interest, which in this case outlines the talus.
desired boundary, in which case further correction of the boundary is needed. In assessing the effectiveness of these segmentation methods, it is important to evaluate their precision (repeatability) and efficiency (defined in terms of the number of scenes that can be segmented per unit time). Such evaluations have not been performed for methods described in the literature. The principles underlying live-wire (livelane) methods (47,48) are different from those for active boundary methods. In live-wire methods, every pixel edge is considered to represent two directed edges whose orientations are opposite each other. The “inside” of the boundary is considered to be to the left of the directed edge, and its outside to the right. Each directed edge is assigned a cost that is inversely related to the “boundariness” of the edge. Several local features are used to determine the cost and include intensity to the left (inside) and right (outside) as well as intensity gradient and its direction. In the 2D live-wire method, the user initially selects a point (pixel vertex) vo on the boundary of interest. The computer now shows a “live-wire” segment from vo to the current mouse cursor position v. This segment is an oriented path consisting of a connected sequence of directed pixel edges that represents the shortest possible path from vo to v. As the user changes v through mouse movement, the optimal path is computed and displayed in real time. If v is on or close to the boundary, the live wire “snaps” onto the boundary (Fig 10); v is now deposited and becomes the new starting point and the process continues. Typically, two to five points are sufficient to segment a boundary (Fig 10). This method and its derivatives are shown to be two
Volume 19
Number 3
a. b. c. Figure 11. Hard, region-based, automatic segmentation with use of thresholding. Once the desired scene is selected (a), an intensity interval is specified on a histogram (b). The segmented object is then depicted as a binary scene (c).
a. b. c. Figure 12. Clustering. (a) Sections from an MR imaging scene with T2 (top) and PD (bottom) values assigned to voxels. (b) Scatter plot of the sections in a . A cluster outline for cerebrospinal fluid is indicated. (c) Segmented binary section demonstrates cerebrospinal fluid.
to three times faster and statistically significantly more repeatable than manual tracing (47). Its 3D version (48) is about 3–15 times faster than manual tracing. Note that, in this method, recognition is manual but delineation is automatic. To our knowledge, no fuzzy, boundary-based, human-assisted methods have been described in the literature. The most commonly used hard, region-based, automatic method of segmentation is thresholding (Fig 11). A voxel is considered to belong to the object region if its intensity is at an upper or lower threshold or between the two thresholds. If the object is the brightest in the scene (eg, bone in CT scans), then only the lower threshold needs to be specified. The threshold interval is specified with a scene intensity histogram in Figure 11b, and the segmented ob ject is shown as a binary scene in Figure 11c.
May-June 1999
Another commonly used method is clustering (Fig 12). If, for example, multiple values associated with each voxel are determined (eg, T2 and PD values), then a 2D histogram (also known as a scatter plot) represents a plot of the number of voxels in the given 3D scene for each possible value pair. The 2D histogram of all possible value pairs is usually referred to as a feature space. The idea in clustering is that feature values corresponding to the objects of interest cluster together in the feature space. Therefore, to segment an object, one need only identify and delineate this cluster. In other words, the problem of segmenting the scene becomes the problem of segmenting the 2D scene representing the 2D histogram. In addition to T2 and PD values, it is possible to use computed values such as the rate of change in T2 and PD for
Udupa
RadioGraphics
793
every voxel. In this case, the feature space would be four-dimensional. There are several well-developed techniques in the area of pattern recognition (49) for automatically identifying clusters, and these techniques have been extensively applied to medical images (50–56). One of the popular cluster identification methods is the k–nearest neighbor ( kNN) technique (49). Assume, for example, that the problem is segmenting the white matter (WM) region in a 3D MR imaging scene in which the T2 and PD values have been determined for each voxel. The first step would be to identify two sets X WM and X NWM of points in the 2D feature space that correspond to white matter and non–white matter regions. These sets of points will be used to determine whether a voxel in the given scene belongs to white matter. The sets are determined with use of a “training” set. Suppose that one or more scenes were previously segmented manually. Each voxel in the white matter and non–white matter regions in each scene contributes a point to set X WM or set X NWM. The next step would be to assign a value to k (eg, k = 7), which is a fixed parameter. The location P in the feature space is determined for each voxel v in the scene to be segmented. In this case, the seven points from sets X WM and X NWM that are “closest” to P are determined. If a ma jority (>4) of these points are from X WM, then v is considered to be in white matter; otherwise, v does not belong to white matter. Note that the step of obtaining X WM and X NWM need not be repeated for every scene to be segmented. Note also that thresholding is essentially clustering in a one-dimensional feature space. All clustering methods have parameters whose values must be determined somehow. If these parameters are fixed in an application, the effectiveness of the method in routine processing cannot be guaranteed and some user assistance usually becomes necessary eventually. Examples of other nonclustering methods have been described by Kamber (57) and Wells (58). The simplest of the fuzzy, region-based, automatic methods of segmentation is fuzzy thresholding, which represents a generalization of the thresholding concept (Fig 13) (59). Fuzzy thresholding requires the specification of four intensity thresholds ( t 1–t 4 ). If the intensity of a voxel v is less than t 1 or greater than t 4, the objectness of v is 0. If the intensity is between t 2 and t 3, its objectness is 1 (100%). For other intensity values, objectness lies between 0% and 100%. Other functional forms have also been used. Figure 14 shows a rendition of bone and soft tissue identified with fuzzy thresholding on the basis of the CT data from Figure 3.
794
Imaging & Therapeutic Technology
Figure 13.
Diagram illustrates fuzzy thresholding.
Many of the clustering methods can be generalized to output fuzzy object information. For example, in the kNN method described previously, if a number m of the k points closest to P is from X WM, then the objectness (“white matter–ness”) of v is m/k. Note that the fuzzy thresholding described earlier is a form of fuzzy clustering. One approach to more generalized fuzzy clustering is the fuzzy c-means method (60). The application of this method has been investigated for segmenting brain tissue components in MR images (50). The idea is something like this: Suppose there are two types of tissues, white matter and gray matter, to be segmented in a 3D MR imaging scene, and that the feature space is 2D (composed of T2 and PD values). Actually, three classes must be considered: white matter, gray matter, and everything else. The task is to define three clusters in the 2D scatter plot of the given scene that correspond to these three classes. The set X of points to which the given scene maps in the feature space can be partitioned into three clusters in a large (although finite) number of ways. In the hard c-means method, the objective is to choose that particular cluster arrangement for which the sum (over all clusters) of the squared distances between the points in each cluster and the cluster center is the smallest. In the fuzzy c-means method, each point in X is allowed to have an objectness value between 0 and 1, making the number of cluster arrangements infinite. The distance in the criterion to be minimized is modified by the objectness value. Algorithms have been described for both methods that are designed to find clusters that approximately minimize the pertinent criterion. As with hard clustering methods, the effectiveness of fuzzy clustering methods in routine applications cannot be guaranteed because some user assistance on a per-scene basis is usually needed. The simplest of the hard, region-based, humanassisted methods of segmentation is manual painting of regions with a mouse-driven paintbrush (61). This method is an alternative to manual boundary tracing.
Volume 19
Number 3
Figure 14. Fuzzy thresholding. Rendition of CT data from Figure 3 with fuzzy thresholding depicts bone and soft tissue.
In contrast to this completely manual recognition and delineation scheme, there are methods in which recognition is manual but delineation is automatic. Region growing is a popular technique in this group (62–64). At the outset, the user specifies a seed voxel within the ob ject region with use of (for example) a mouse pointer on a section display. A set of criteria for inclusion of a voxel in the object is also specified; for example, ( a ) the scene intensity of the voxel should be within an interval t 1 to t 2, ( b ) the mean intensity of voxels included in the growing region at any time during the growing process should be within an interval t 3 to t 4, and ( c ) the intensity variance of voxels included in the growing region at any time during the growing process should be within an interval t 5 to t 6. Starting with the seed voxel, the algorithm examines its 3D neighbors (usually the closest six, 18, or 26 neighbors) for inclusion. Those that are included are marked so that they will not be reconsidered for inclusion later. The neighbors of the voxels selected for inclusion are in turn examined, and the process continues until no more voxels can be selected for inclusion. If only criterion a in the preceding paragraph is used and t 1 and t 2 are fixed during the growing process, this method outputs essentially a connected component of voxels satisfying a hard threshold interval. Note also that for any combination of criteria a and b, or if t 1 and t 2 are not fixed, it is not possible to guarantee that the set of voxels (object) O( v ) obtained with a seed l voxel vl is the same as object O( v2 ), where v2 ≠ v1 is a voxel in O( v1 ). This lack of robustness constitutes a problem with most region-based methods. In the sense that the fuzzy region-based methods of segmentation described earlier eventu-
May-June 1999
ally entail human assistance, they fall into the category of fuzzy, region-based, human-assisted methods. A recent technique that was designed to make use of human assistance is the fuzzy connected technique (65). In this method, recognition is manual and involves pointing at an object in a section display. Delineation is automatic and takes both graded composition and hanging-togetherness into account. It has been effectively applied in several applications including quantification of multiple sclerosis lesions (66–69), MR angiography (70), and softtissue display for planning of craniomaxillofacial surgery (71). In the fuzzy connected technique, nearby voxels in the voxel array are thought of as having a fuzzy adjacency relation that indicates their spatial nearness. This relation, which varies in strength from 0 to 1, is independent of any scene intensity values and is a nonincreasing function of the intervening distance. Fuzzy adjacency roughly captures the blurring characteristic of imaging devices. Similarly, nearby voxels in a scene are thought of as having a fuzzy affinity relation that indicates how they hang together locally in the same ob ject. The strength of this relation (varyi ng from 0 to 1) between any two voxels is a function of their fuzzy adjacency as well as their scene intensity values. For example, this function may be the product of the strength of their adjacency and (l − | I [vl] − I [v2] |), where I [v1] and I [v2] are the intensity values of voxels v1 and v2 scaled in some appropriate way to the range between 0 and 1. Affinity expresses the degree to which voxels hang together to form a fuzzy object. Of course, the intent is that this is a local property; voxels that are far apart will have negligible affinity as defined in this function. The real “hanging-togetherness” of voxels in a global sense is captured through a fuzzy relation called fuzzy connectedness. A strength of connectedness is assigned to each pair of voxels ( vl, v2 ) as follows: There are numerous possible paths between two voxels v1 and v2, any one of which consists of a sequence of voxels starting from v1 and ending on v2. Successive voxels are nearby and have a certain degree of adjacency. The “strength” of a path is simply the smallest of the affinities associated with pairs of successive voxels along the path. The strength of connectedness between v1 and v2 is simply the largest of the strengths associated with all possible paths between v1 and v2. A fuzzy object is a pool of voxels together with a membership (between 0 and 1) assigned to each voxel that represents its objectness. The pool is such that the strength of connectedness
Udupa
RadioGraphics
795
a.
b.
c. d. e. Figure 15. Fuzzy connected segmentation. (a, b) Sections from an MR imaging scene with T2 (a) and PD (b) values assigned to voxels. (c–e) Sections created with 3D fuzzy connected segmentation demonstrate the union of white matter and gray matter objects (c), the cerebrospinal fluid object (d), and the union of multiple sclerosis lesions (e) detected from the scene in a and b.
between any two voxels in the pool is greater than a small threshold value (typically about 0.1) and the strength between any two voxels (only one of which is in the pool) is less than the threshold value. Obviously, computing fuzzy objects even for this simple affinity function is computationally impractical if we proceed straight from the definitions. However, the theory allows us to simplify the complexity considerably for a wide variety of affinity relations so that fuzzy object computation can be done in practical time (about 15–20 minutes for a 256 × 256 × 64 3D scene (16 bits per voxel) on a SPARCstation 20 workstation (Sun Microsystems, Mountain View, Calif). A wide spectrum of application-specific knowledge of image characteristics can be incorporated into the affinity relation. Figure 15 shows an example of fuzzy connected segmentation (in 3D) of white matter,
796
Imaging & Therapeutic Technology
gray matter, cerebrospinal fluid, and multiple sclerosis lesions in a T2, PD scene pair. Figure 16a shows an MIP rendition of an MR angiography data set, whereas Figure 16b demonstrates a rendition of a 3D fuzzy connected vessel tree detected from a point specified on the vessel. There are a number of challenges associated with segmentation, including ( a ) developing general segmentation methods that can be easily and quickly adapted to a given application, ( b ) keeping human assistance required on a per scene basis to a minimum, ( c ) developing fuzzy methods that can realistically handle uncertainties in data, and ( d ) assessing the efficacy of segmentation methods.
VISUALIZATION
Visualization operations create renditions of given scenes or object systems. Their purpose is to create renditions from a given set of
Volume 19
Number 3
a. b. Figure 16. Fuzzy connected segmentation. (a) Three-dimensional maximum-intensity-projection (MIP) rendition of an MR angiography scene. (b) MIP rendition of the 3D fuzzy connected vessels detected from the scene in a. Fuzzy connectedness has been used to remove the clutter that obscures the vasculature.
Figure 17.
Montage display of a 3D CT scene of the head.
scenes or objects that facilitate the visual perception of object information. Two approaches are available: scene-based visualization and ob ject-based visualization.
Scene-based Visualization
In scene-based visualization, renditions are created directly from given scenes. Within this approach, two further subclasses may be identified: section mode and volume mode.
May-June 1999
Section Mode.—Methods differ as to what constitutes a “section” and how this information is displayed. Natural sections may be axial, coronal, or sagittal; oblique or curved sections are also possible. Information is displayed as a montage with use of roam-through (fly-through) and gray scale and pseudocolor. Figure 17 shows a montage display of the natural sections of a CT
Udupa
RadioGraphics
797
18.
19a. 19b. Figures 18, 19. (18) Three-dimensional display–guided extraction of an oblique section from CT data obtained in a patient with a craniofacial disorder. A plane is selected interactively by means of the 3D display to indicate the orientation of the section plane (left). The section corresponding to the oblique plane is shown on the right. (19) Pseudocolor display. (a) Head MR imaging sections obtained at different times are displayed in green and red, respectively. Where there is a match, the composite image appears yellow. Green and red areas indicate regions of mismatch. (b) On the same composite image displayed after 3D scene-based registration, green and red areas indicate either a registration error or a change in an object (eg, a lesion) over the time interval between the two acquisitions.
scene. Figure 18 demonstrates a 3D display– guided extraction of an oblique section from a CT scene of a pediatric patient’s head. This resectioning operation illustrates how visualization is needed to perform visualization itself. Figure 19 illustrates pseudocolor display with two sections from a brain MR imaging study in a patient with multiple sclerosis. The two sections, which represent approximately the same location in the patient’s head, were taken from 3D scenes that were obtained at different times and subsequently registered. The sections are assigned red and green hues. The display shows yellow (produced by a combination of
798
Imaging & Therapeutic Technology
red and green hues) where the sections match perfectly or where there has been no change (for example, in the lesions). At other places, either red or green is demonstrated. Volume Mode.—In volume mode visualization, information may be displayed as surfaces, interfaces, or intensity distributions with use of surface rendering, volume rendering, or MIP. A projection technique is always needed to move from the higher-dimensional scene to the 2D screen of the monitor. For scenes of four or more dimensions, 3D “cross sections” must first be determined, after which a projection technique can be applied to move from 3D to
Volume 19
Number 3
Figure 20. Schematic illustrates pro jection techniques fo r volume mode visualization. Pro jections are created for rendition either by ray casting from the viewing plane to the scene or by projecting voxels from the scene to the viewing plane.
2D. Two approaches may be used: ray casting (34), which consists of tracing a line perpendicular to the viewing plane from every pixel in the viewing plane into the scene domain, or voxel projection (72), which consists of directly projecting voxels encountered along the pro jection line from the scene onto the viewing plane (Fig 20). Voxel projection is generally considerably faster than ray casting; however, either of these projection methods may be used with any of the three rendering techniques (MIP, surface rendering, volume rendering). In MIP, the intensity assigned to a pixel in the rendition is simply the maximum scene intensity encountered along the projection line (Fig 16a) (73,74). MIP is the simplest of all 3D rendering techniques. It is most effective when the objects of interest are the brightest in the scene and have a simple 3D morphology and a minimal gradation of intensity values. Contrast material–enhanced CT angiography and MR angiography are ideal applications for this method; consequently, MIP is commonly used in these applications (75,76). Its main advantage is that it requires no segmentation. However, the ideal conditions mentioned earlier frequently go unfulfilled, due (for example) to the presence of other bright objects such as clutter from surface coils in MR angiography, bone in CT angiography, or other obscuring vessels that may not be of interest. Consequently, some segmentation eventually becomes necessary. In surface rendering (77), object surfaces are portrayed in the rendition. A threshold interval must be specified to indicate the object of interest in the given scene. Clearly, speed is of the utmost importance in surface rendering because the idea is that object renditions are cre-
May-June 1999
ated interactively directly from the scene as the threshold is changed. Instead of thresholding, any automatic, hard, boundary- or region-based method can be used. In such cases, however, the parameters of the method will have to be specified interactively, and the speed of segmentation and rendition must be sufficient to make this mode of visualization useful. Although rendering based on thresholding can presently be accomplished in about 0.03–0.25 seconds on a Pentium 300 with use of appropriate algorithms in software (61), more sophisticated segmentation methods (eg, kNN) may not offer interactive speed. The actual rendering process consists of three basic steps: projection, hidden part removal, and shading. These steps are needed to impart a sense of three-dimensionality to the rendered image that is created. Additional cues for three-dimensionality may be provided by techniques such as stereoscopic display, motion parallax by rotation of the objects, shadowing, and texture mapping. If ray casting is used as the method of pro jection, hidden part removal is performed by stopping at the first voxel encountered along each ray that satisfies the threshold criterion (78). The value (shading) assigned to the pixel in the viewing plane that corresponds to the ray is determined as described later. If voxel pro jection is used, hidden parts can be removed by projecting voxels from the farthest to the closest (with respect to the viewing plane) and always overwriting the shading value, which can be achieved in a number of computationally efficient ways (72,79–81).
Udupa
RadioGraphics
799
The shading value assigned to a pixel p in the viewing plane depends on the voxel v that is eventually projected onto p. The faithfulness with which this value reflects the shape of the surface around v largely depends on the surface normal vector estimated at v. Two classes of methods are available for this purpose: objectbased methods and scene-based methods. In object-based methods (82,83), the vector is determined purely from the geometry of the shape of the surface in the vicinity of v. In scene-based methods (78), the vector is considered to be the gradient of the given scene at v; that is, the direction of the vector is the same as the direction in which scene intensity changes most rapidly at v. Given the normal vector N at v, the shading assigned to p is usually determined as [ f d( v, N , L ) + f s( v, N , L, V )] f D( v ), where f d is the diffuse component of reflection, f s is the specular component, f D is a component that depends on the distance of v from the viewing plane, and L and V are unit vectors indicating the direction of the incident light and of the viewing rays. The diffuse component is independent of the viewing direction but depends solely on L (as a cosine of the angle between L and N ). It captures the scattering property of the surface, whereas the specular component captures surface shininess. The specular component is highest in the direction of ideal reflection R whose angle with N is equal to the angle between L and N. This reflection decreases as a cosine function on either side of R. By weighting the three components in different ways, different shading effects can be achieved. In scene-based surface rendering, a hard ob ject is implicitly created and rendered “on the fly” from the given scene. In scene-based volume rendering, a fuzzy object is implicitly created and rendered on the fly from the given scene. Clearly, surface rendering becomes a special case of volume rendering. Furthermore, volume rendering in this mode is generally much slower than surface rendering, typically requiring 3–20 seconds even on specialized hardware rendering engines. The basic idea in volume rendering is to assign an opacity from 0% to 100% to every voxel in the scene. The opacity value is determined on the basis of the objectness value at the voxel and of how prominently one wishes to portray this particular grade of objectness in the rendition. This opacity assignment is specified interactively by way of an opacity function (Fig 13), wherein the vertical axis indicates percentage
800
Imaging & Therapeutic Technology
Figure 21. Scene-based volume rendering with voxel projection. Rendition of knee CT data from Figure 3 shows bone, fat, and soft tissue.
of opacity. Every voxel is now considered to transmit, emit, and reflect light. The goal is to determine the amount of light reaching every pixel in the viewing plane. The amount of light transmitted depends on the opacity of the voxel. Light emission depends on objectness and hence on opacity: The greater the objectness, the greater the emission. Similarly, reflection depends on the strength of the surface that is present: The greater the strength, the greater the reflection. Like surface rendering, volume rendering consists of three basic steps: projection, hidden part removal, and shading or compositing. The principles underlying projection are identical to those described for surface rendering. Hidden part removal is much more complicated for volume rendering than for surface rendering. In ray casting, a common method is to discard all voxels along the ray from the viewing plane beyond a point at which the “cumulative opacity” is above a high threshold (eg, 90%) (84). In voxel projection, a voxel can also be discarded if the voxels surrounding it in the direction of the viewing ray have “high” opacity (35). The shading operation, which is more appropriately termed compositing , is also more complicated for volume rendering than for surface rendering. Compositing must take into account all three components: transmission, reflection, and emission. One may start from the voxel farthest from the viewing plane along each ray and work toward the front, calculating the output light for each voxel. The net light output by the voxel closest to the viewing plane is assigned to the pixel associated with the ray. Instead of using this back-to-front strategy, one
Volume 19
Number 3
a. b. Figure 22. Object-based visualization of the skull in a child with agnathia. (a) Surface-rendered image. (b) Subsequent volume-rendered image was preceded by the acquisition of a fuzzy object representation with use of fuzzy thesholding (cf Fig 13).
may also make calculations from front to back, which has actually been shown to be faster (35). In volume rendering (as in surface rendering), voxel projection is substantially faster than ray casting. Figure 21 shows the CT knee data set illustrated in Figure 3 as rendered with this method. Three types of tissue—bone, fat, and soft tissue—have been identified.
Object-based Visualization
In object-based visualization, objects are first explicitly defined and then rendered. In difficult segmentation situations, or when segmentation is time consuming or involves too many parameters, it is impractical to perform direct scenebased rendering. The intermediate step of completing object definition then becomes necessary. Surface Rendering.—Surface rendering methods take hard object descriptions as input and create renditions. The methods of projection, hidden-part removal, and shading are similar to those described for scene-based surface rendering, except that a variety of surface description methods have been investigated using voxels (72,79,81), points, voxel faces (29, 80,85,86), triangles (30,37,87), and other surface patches. Therefore, projection methods that are appropriate for specific surface elements have been developed. Figure 22a shows a rendition, created with use of voxel faces on the basis of CT data, of the craniofacial skeleton in a patient with agnathia. Figure 8 shows renditions of the bones of the foot created by way of the same method on the basis of MR imaging data.
May-June 1999
Volume Rendering. —Volume rendering methods take as input fuzzy object descriptions, which are in the form of a set of voxels wherein values for objectness and a number of other parameters (eg, gradient magnitude) are associated with each voxel (35). Because the object description is more compact than the original scene and additional information for increasing computation speed can be stored as part of the object description, volume rendering based on fuzzy object description can be performed at interactive speeds even on personal computers such as the Pentium 300 entirely in software. In fact, the rendering speed (2–15 sec) is now comparable to that of scene-based volume rendering with specialized hardware engines. Figure 22b shows a fuzzy object rendition of the data set in Figure 22a. Figure 23a shows a rendition of craniofacial bone and soft tissue, both of which were defined separately with use of the fuzzy connected methods described earlier. Note that if one uses a direct scene-based volume rendering method with the opacity function illustrated in Figure 13, the skin becomes indistinguishable from other soft tissues and always obscures the rendition of muscles (Fig 23b).
Misconceptions in Visualization
Several inaccurate statements concerning visualization frequently appear in the literature. The following statements are seen most often:
Udupa
RadioGraphics
801
Figure 23. Visualization with volume rendering. (a) Objectbased volume-rendered image demonstrates bone and soft-tissue structures (muscles) that had been detected earlier as separate fuzzy connected objects in a 3D craniofacial CT scene. The skin is essentially “peeled away” because of its weak connectedness to muscles. (b) Scenebased volume-rendered version of the scene in a was acquired with use of the opacity function (cf Fig 13) separately for bone and soft tissue. The skin has become indistinguishable from muscles because they have similar CT numbers and hence obscures the rendition of the muscles.
a.
“Surface rendering is the same as thresh- olding .”— Clearly, thresholding is only one— indeed, the simplest—of the many available hard region- and boundary-based segmentation methods, the output of any of which can be surface rendered. “Volume rendering does not require seg- mentation .”—Although volume rendering is a general term and is used in different ways, the statement is false. The only useful volume rendering or visualization technique that requires no segmentation is MIP. The opacity assignment schemes illustrated in Figure 13 and described in the section entitled “Scene-based Visualization” are clearly fuzzy segmentation strategies and involve the same problems that are encountered with any segmentation method. It is untenable to hold that opacity functions such as the one shown in Figure 13 do not represent segmentation while maintaining that the manifestation that results when t 1 = t 2 and t 3 = t 4 (corresponding to thresholding) does represent segmentation. “The term volume rendering may be used to refer to any scene-based rendering tech- nique as well as object-based rendering techniques .”—The meaning of the term as used in the literature varies widely. In one sense, it can also apply to the section mode of visualization. It is better to use volume rendering to refer to fuzzy object rendering, whether performed with scene-based or object-based methods, but not to hard object rendering methods. There are many challenges associated with visualization. First, preprocessing operations
802
Imaging & Therapeutic Technology
b.
(and, therefore, visualization operations) can be applied in many different sequences to achieve the desired result. For example, the filtering-interpolation-segmentation-rendering sequence may produce renditions that are significantly different from those produced by interpolationsegmentation-filtering-rendering. With the large number of different methods possible for each operation and the various parameters associated with each operation, there are myriad ways of achieving the desired results. Figure 24 shows five images derived from CT data that were created by performing different operations. Systematic study is needed to determine which combination of operations is optimal for a given application. Normally, the fixed combination provided by the 3D imaging system is assumed to be the best for that application. Second, ob jective comparison of visualization methods becomes an enormous task in view of the vast number of ways one may reach the desired goal. A third challenge is achieving realistic tissue display that includes color, texture, and surface properties.
MANIPULATION
Manipulation operations are used primarily to create a second object system from a given ob ject system by changing objects or their relations. The main goal of these operations is to simulate surgical procedures on the basis of patient-specific scene data and to develop aids for interventional and therapy procedures. Compared with preprocessing and visualization, manipulation is in its infancy; consequently, it will not be discussed in as much detail. Two classes of operations are being developed: rigid manipulation and deformable manipulation.
Volume 19
Number 3
Figure 24. Preprocessing and visualization operations. Renditions from CT data were created with use of five different preprocessing and visualization operations.
Figure 25. Rigid manipulation. Rendition created from CT data obtained in a child demonstrates rigid manipulation for use in surgical planning. This “virtual surgery” mimics an osteotomy procedure used in craniomaxillofacial surgery to advance the frontal bone.
Rigid Manipulation
Operations to cut, separate, add, subtract, move, and mirror objects and their components have been developed with use of both hard and fuzzy object definitions (81,88–91). In rigid manipulation, the user interacts directly with an objectbased surface or volume rendition of the object system (Fig 25). Clearly, an operation must be executable at interactive speeds to be practical. This is possible with both hard and fuzzy ob ject definitions (81,91) on a person al computer such as a Pentium 300.
Deformable Manipulation
Operations to stretch, compress, bend, and so on are being developed. Mechanical modeling of soft-tissue structures including muscles, ten-
May-June 1999
dons, ligaments, and capsules is complicated because the forces that they generate and their behavior under external forces are difficult to determine, especially in a patient-specific fashion. Therefore, in past attempts at modeling, properties of soft tissues have been taken into consideration in a generic but not patient-specific fashion. Generic models based on local deformations are used as an aid in facial plastic surgery (92) quite independent of the underlying bone and muscle, treating only the skin surface. The use of multiple layers—skin, fatty tissue, muscle, and bone that does not move—has been explored in different combinations to model facial expression animation (93–95). Although attempts have been made to model soft tissue in this fashion and to duplicate its mechanical properties (96), no attempt seems to have been made to integrate hard-tissue (bone) changes with the soft-tissue modifications in a model. Reasons include the lack of adequate visualization tools and the lack of tools to simulate osteotomy procedures or integrate soft-tissue models with hard-tissue changes. The area of deformable manipulation is open for further research. Because most of the tissues in the body are deformable and movable and object information in images is inherently fuzzy, basic fuzzy mechanics theories and algorithms need to be developed to carry out patient-specific object manipulation and analysis.
ANALYSIS
The main purpose of analysis operations is to generate a quantitative description of the morphology, architecture, and function of the ob ject system from a given set of scenes or object system.
Udupa
RadioGraphics
803
The goal of many 3D imaging applications is analysis of an object system. Although visualization is used as an aid, it may not be an end in itself. As such, many of the current applicationdriven works are related to analysis. Like, other operations, analysis operations may be classified into two groups: scene-based analysis and object-based analysis.
Scene-based Analysis
In scene-based analysis, quantitative descriptions are based directly on scene intensities and include region-of-interest statistics as well as measurements of density, activity, perfusion, and flow. Object structure information derived from another modality is often used to guide the selection of regions for these measurements.
Object-based Analysis
In object-based analysis, quantitative description is obtained from an object on the basis of morphology, architecture, change over time, relationships with other objects in the system, and changes in these relationships. Examples of measurements obtained in this manner include distance, length, curvature, area, volume, kinematics, kinetics, and mechanics. Object information in images is fuzzy; therefore, the challenge is to develop fuzzy morphometry and mechanics theories and algorithms to enable realistic analysis of the object information.
be repeated (including, for example, how the patient is positioned in the scanner). Accuracy refers to how the measurement agrees with truth. Establishing recognition accuracy requires histologic assessment of object presence or absence for small objects. For large (anatomic) objects, an expert reader can pro vide truth. Receiver operating characteristic analysis can then be applied. Establishing delineation accuracy requires a point-by-point assessment of object grade. Truth is very difficult to establish. Typically, the following surrogates of truth are used: physical phantoms, mathematic phantoms, manual (expert) delineation, simulation of mathematical objects that are then imbedded in actual images, and comparison with a process whose accuracy is known. Efficiency refers to the practical viability of the method—for example, in terms of the number of studies that can be processed per hour. This variable has two components: computer time and operator time. Computer time is not crucial so long as it remains within the bounds of practicality. However, operator time is crucial in determining whether a method is practically viable regardless of its precision or accuracy. This aspect of validation is usually ignored, but it should be conducted and its statistical variation analyzed and reported.
DIFFICULTIES IN 3D IMAGING
There are two main types of difficulties associated with 3D imaging: those related to object definition and those related to validation. The former have already been discussed in detail (see “Preprocessing”). Difficulties related to validation are discussed in this section. Validation may be either qualitative or quantitative. The purpose of qualitative validation is to compare visualization methods for a given task. Observer studies and receiver operating characteristic analysis may be used (97). A ma jor challenge is how to select a small number of methods for formal receiver operating characteristic analysis from among the numerous combinations of operations and methods and their parameter settings. The purpose of quantitative validation is to assess the precision, accuracy, and efficiency of the measurement process. Precision refers to the reliability of the method and is usually easy to establish by repeating the measurement process and assessing variation using coefficient of variation, correlation coefficient, κ statistic, and analysis of variance. All steps that involve subjectivity must
804
Imaging & Therapeutic Technology
CONCLUSIONS
Although 3D imaging poses numerous challenges for mathematicians, engineers, physicists, and physicians, it is an exciting technology that promises to make important contributions in a wide range of disciplines in years to come.
REFERENCES
1. Kaufman A. A tutorial on volume visualization. Los Alamitos, Calif: IEEE Computer Society, 1991. 2. Höhne K, Fuchs H, Pizer S. 3D imaging in medicine: algorithms, systems, applications. Berlin, Germany: Springer-Verlag, 1990. 3. Udupa J, Herman G. 3D imaging in medicine. Boca Raton, Fla: CRC, 1991. 4. Udupa JK, Goncalves R. Imaging transforms for visualizing surfaces and volumes. J Digit Imaging 1993; 6: 213–236. 5. Gerig G, Kübler O, Kikinis R, Jolesz FA. Nonlinear anisotropic filtering of MRI data. IEEE Trans Med Imaging 1992; 11:221–232. 6. Pratt WK. Digital image processing. New York, NY: Wiley, 1991. 7. Stytz MR, Parrott RW. Using kriging for 3-D medical imaging. Comput Med Imaging Graph 1993; 17:421– 442. 8. Raya SP, Udupa JK. Shape-based interpolation of multidimensional objects. IEEE Trans Med Imaging 1990; 9:32–42. 9. Higgins WE, Morice C, Ritman EL. Shape-based interpolation of tree-like structures in three-dimensional images. IEEE Trans Med Imaging 1993; 12:439–450.
Volume 19
Number 3
10. Herman GT, Zheng J, Bucholtz CA. Shape-based interpolation. IEEE Comput Graph Appl 1992; 12:69–79. 11. Grevera GJ, Udupa JK. Shape-based interpolation of multidimensional grey-level images. IEEE Trans Med Imaging 1996; 15:882–892. 12. Puff DT, Eberly D, Pizer SM. Object-based interpolation via cores. Image Process 1994; 2167:143–150. 13. Goshtasby A, Turner DA, Ackerman LV. Matching tomographic slices for interpolation. IEEE Trans Med Imaging 1992; 11:507–516. 14. Woods RP, Mazziotta JC, Cherry SR. MRI-PET registration with automated algorithm. J Comput Assist Tomogr 1993; 17:536–546. 15. Studholme C, Hill DLG, Hawkes DJ. Automated 3D registration of MR and CT images of the head. Med Image Anal 1996; 1:163–175. 16. Miller MI, Christensen GE, Amit Y, Grenander U. Mathematical textbook of deformable neuroanatomies. Proc Natl Acad Sci U S A 1993; 90:11944–11948. 17. Pelizzari CA, Chen GTY, Spelbring DR, Weichselbaum RR, Chen CT. Accurate three-dimensional registration of CT, P ET, and/or MR images of the brain. J Comput Assist Tomogr 1989; 13:20–26. 18. Toennies KD, Udupa JK, Herman GT, Wornom IL, Buchman SR. Registration of three-dimensional ob jects and surfaces. IEEE Comput Graph Appl 1990; 10:52–62. 19. Arun KS, Huang TS, Blostein SD. Least-squares fitting of two 3-D point sets. IEEE Trans Pattern Anal Machine Intell 1987; 9:698–700. 20. Maintz JBA, van den Elsen PA, Viergever MA. Evaluation of ridge seeking operators for multimodality medical image matching. IEEE Trans Pattern Anal Machine Intell 1996; 18:353–365. 21. Bajcsy R, Kovacic S. Multiresolution elastic matching. Comput Vision Graph Image Process 1989; 46:1– 21. 22. Gee JC, Barillot C, Le Briquer L, Haynor DR, Bajcsy R. Matching structural images of the human brain using statistical and geometrical image features. SPIE Proc 1994; 2359:191–204. 23. Evans AC, Dai W, Collins L, Neelin P, Marrett S. Warping of a computerized 3-D atlas to match brain image volumes for quantitative neuroanatomical and functional analysis. SPIE Proc 1991; 1445:236–246. 24. Gong L, Kulikowski C. Comparison of image analysis processes through object-centered hierarchical planning. IEEE Trans Pattern Anal Machine Intell 1995; 17: 997–1008. 25. Raya S. Low-level segmentation of 3-D magnetic resonance brain images: a rule-based system. IEEE Trans Med Imaging 1990; 9:327–337. 26. Collins D, Peters T. Model-based segmentation of individual brain structures from MRI data. SPIE Proc 1992; 1808:10–23. 27. Gee J, Reivich M, Bajcsy R. Elastically deforming 3D atlas to match anatomical brain images. J Comput Assist Tomogr 1993; 17:225–236. 28. Christensen GE, Rabbitt RD, Miller MI. 3-D brain mapping using a deformable neuroanatomy. Phys Med Biol 1994; 39:609–618. 29. Udupa J, Srihari S, Herman G. Boundary detection in multidimensions. IEEE Trans Pattern Anal Machine Intell 1982; 4:41–50. 30. Wyvill G, McPheeters C, Wyvill B. Data structures for soft objects. Visual Comput 1986; 2:227–234. 31. Lorensen W, Cline H. Marching cubes: a high resolution 3D surface construction algorithm. Comput Graph 1989; 23:185–194. 32. Doi A, Koide A. An efficient method of triangulating equi-valued surfaces by using tetrahedral cells. IEICE Trans Commun Electron Inf Syst 1991; 74:214–224.
May-June 1999
33. Liu H. Two- and three-dimensional boundary detection. Comput Graph Image Process 1977; 6:123–134. 34. Levoy M. Display of surfaces from volume data. IEEE Comput Graph Appl 1988; 8:29–37. 35. Udupa J, Odhner D. Shell rendering. IEEE Comput Graph Appl 1993; 13:58–67. 36. Udupa J, Hung H, Chuang K. Surface and volume rendering in 3D imaging: a comparison. J Digit Imaging 1990; 4:159–168. 37. Kalvin A. Segmentation and surface-based modeling of objects in three-dimensional biomedical images. Thesis. New York University, New York, NY, 1991 . 38. Herman GT, Liu HK. Dynamic boundary surface detection. Comput Graph Image Process 1978; 7:130–138. 39. Pope D, Parker D, Gustafson D, Clayton P. Dynamic search algorithms in left ventricular border recognition and analysis of coronary arteries. IEEE Proc Comput Cardiol 1984; 9:71–75. 40. Amini A, Weymouth T, Jain R. Using dynamic programming for solving variational problems in vision. IEEE Trans Pattern Anal Machine Intell 1990; 12:855– 867. 41. Gieger D, Gupta A, Costa L, Vlontzos J. Dynamic programming for detecting, tracking and matching deformable contours. IEEE Trans Pattern Anal Machine Intell 1995; 17:294–302. 42. Kass M, Witkin A, Terzopoulos D. Snakes: active contour models. Int J Comput Vision 1987; 1:321–331. 43. Lobregt S, Viergever M. A discrete dynamic contour model. IEEE Trans Med Imaging 1995; 14:12–24. 44. Cohen L. On active contour models. Comput Vision Graph Image Process 1991; 53:211–218. 45. Cohen I, Cohen LD, Ayache N. Using deformable surfaces to segment 3-D images and infer differential structures. Comput Vision Graph Image Process 1992;56:242–263. 46. McInerney T, Terzopoulos D. A dynamic finite element surface model for segmentation and tracking in multidimensional medical images with application to cardiac 4D image analysis. Comput Med Imaging Graph 1995; 19:69–93. 47. Falcao AX, Udupa JK, Samarasekera S, Hirsch BE. User-steered image boundary segmentation. SPIE Proc 1996; 2710:279–288. 48. Falcao A, Udupa JK. Segmentation of 3D objects using live wire. SPIE Proc 1997; 3034:228–235. 49. Duda RO, Hart PE. Pattern classification and scene analysis. New York, NY: Wiley, 1973. 50. Bezdek JC, Hall LO, Clarke LP. Review of MR image segmentation techniques using pattern recognition. Med Phys 1993; 20:1033–1048. 51. Clarke L, Velthuizen RP, Camacho MA, et al. MRI segmentation: methods and applications. JMRI 1995; 13: 343–368. 52. Vannier M, Butterfield R, Jordan D, Murphy W. Multispectral analysis of magnetic resonance images. Radiology 1985; 154:221–224. 53. Cline H, Lorensen W, Kikinis R, Jolesz F. Three-dimensional segmentation of MR images of the head using probability and connectivity. J Comput Assist Tomogr 1990; 14:1037–1045. 54. Kikinis R, Shenton M, Jolesz F, et al. Quantitative analysis of brain and cerebrospinal fluid spaces with MR images. JMRI 1992; 2:619–629. 55. Mitchell JR, Karlick SJ, Lee DH, Fenster A. Computerassisted identification and quantification of multiple sclerosis lesions in MR imaging volumes in the brain. JMRI 1994; 4:197–208.
Udupa
RadioGraphics
805
56. Kohn M, Tanna N, Herman G, et al. Analysis of brain and cerebrospinal fluid volumes with MR imaging. I. Methods, reliability and validation. Radiology 1991; 178:115–122. 57. Kamber M, Shingal R, Collins DL, Francis GS, Evans AC. Model-based 3-D segm entation of multiple scl erosis lesions in magnetic resonance brain images. IEEE Trans Med Imaging 1995; 14:442–453. 58. Wells WM III, Grimson WEL, Kikinis R, Jolesz FA. Adaptive segment ation of MRI data. IEEE Trans Med Imaging 1996; 15:429–442. 59. Drebin R, Carpenter L, Hanrahan P. Volume rendering. Comput Graph 1988; 22:65–74. 60. Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New York, NY: Plenum, 1981. 61. Udupa JK, Odhner D, Samarasekera S, et al. 3DVIENIX: an open transportable, multidimensional, multimodality, multiparametric imaging software system. SPIE Proc 1994; 2164:58–73. 62. Burt PJ, Hong TH, Rosenfeld A. Segmentation and estimation of region properties through co-operative hierarchical computation. IEEE Trans Syst Man Cybernet 1981; 11:802–809. 63. Hong TH, Rosenfeld A. Compact region extraction using weighted pixel linking in a pyramid. IEEE Trans Pattern Anal Machine Intell 1984; 6:222–229. 64. Dellepiane S, Fontana F. Extraction of intensity connectedness for image processing. Pattern Recogn Lett 1995; 16:313–324. 65. Udupa J, Samarasekera S. Fuzzy connectedness and object definition: theory algorithms and applications in image segmentation. Graph Models Image Process 1996; 58:246–261. 66. Udupa JK, Wei L, Samarasekera S, Miki Y, van Buchem MA, Grossman RI. Multiple sclerosis lesion quantification using fuzzy connectedness principles. IEEE Trans Med Imaging 1997; 16:598–609. 67. Samarasekera S, Udupa JK, Miki Y, Grossman RI. A new computer-assisted method for enhancing lesion quantification in multiple sclerosis. J Comput Assist Tomogr 1997; 21:145–151. 68. Miki Y, Grossman RI, Samarasekera S, et al. Clinical correlation of computer assisted enhancing lesion quantification in multiple sclerosis. AJNR 1997; 18:705 –710. 69. van Buchem MA, Udupa JK, Heyning FH, et al. Quantitation of macroscopic and microscopic cerebral disease burden in multiple sclerosis. AJNR 1997; 18:1287–1290. 70. Udupa JK, Odhner D, Tian J, Holland G, Axel L. Automatic clutter-free volume rendering for MR angiography using fuzzy connectedness. SPIE Proc 1997; 3 034: 114–119. 71. Udupa JK, Tian J, Hemmy DC, Tessier P. A pentiumbased craniofacial 3D imaging and analysis system. J Craniofac Surg 1997; 8:333–339. 72. Frieder G, Gordon D, Reynolds R. Back-to-front display of voxel-based objects. IEEE Comput Graph Appl 1985; 5:52–60. 73. Brown DG, Riederer SJ. Contrast-to-noise ratios in maximum intensity projection images. Magn Reson Med 1992; 23:130–137. 74. Schreiner S, Paschal CB, Galloway RL. Comparison of projection algorithms used for the construction of maximum intens ity projec tion images. J Comput Assist Tomogr 1996; 20:56–67. 75. Napel S, Marks MP, Rubin GD, et al. CT angiography with spiral CT and maximum intens ity projection. Radiology 1992; 185:607–610.
806
Imaging & Therapeutic Technology
76. Hertz SM, Baum RA, Owen RS, Holland GA, Logan DR, Carpenter JP. Comparison of magnetic resonance angiography and contrast arteriography in peripheral arterial stenosis. Am J Surg 1993; 166:112–116. 77. Goldwasser S, Reynolds R. Real-time display and manipulation of 3-D medical objects: the voxel machine architecture. Comput Vision Graph Image Process 1987; 39:1–27. 78. Höhne KH, Bernstein R. Shading 3D images from CT using gray-level gradients. IEEE Trans Med Imaging 1986; 5:45–47. 79. Reynolds R, Gordon D, Chen L. A dynamic screen technique for shaded graphics display of slice-represented objects. Comput Vision Graph Image Process 1987; 38:275–298. 80. Herman G, Liu L. Display of three-dimensional information in computed tomography. J Comput Assist Tomogr 1977; 1:155–160. 81. Udupa J, Odhner D. Fast visualization, manipulation, and analysis of binary volumetric objects. IEEE Comput Graph Appl 1991; 11:53–62. 82. Chen LS, Herman GT, Reynolds RA, et al. Surface rendering in the cuberille environment. IEEE Comput Graph Appl 1985; 5:33–43. 83. Gordon D, Reynolds RA. Image-space shading of three-dimensional objects. Comput Vision Graph Image Process 1985; 29:361–376. 84. Levoy M. Display of surfaces from volume data. ACM Trans Graph 1990; 9:245–271. 85. Udupa JK. Multidimensional digital boundaries. Graph Models Image Process 1994; 50:311–323. 86. Artzy E, Frieder G, Herman G. The theory, design, implementation and evaluation of a three-dimensional surface detection algorithm. Comput Graph Image Process 1981; 15:1–24. 87. Chuang JH, Lee WC. Efficient generation of isosurfaces in volume rendering. Comput Graph 1995; 19:805– 812. 88. Cutting C, Grayson B, Bookstein F, Fellingham L, McCarthy J. Computer-aided planning and evaluation of facial and orthognathic surgery. Comput Plast Surg 1986; 13:449–461. 89. Trivedi S. Interactive manipulation of three-dimensional binary images. Visual Comput 1986; 2:209–218. 90. Patel VV, Vannier MW, Marsh JL, Lo LJ. Assessing craniofacial surgical simulation. IEEE Comput Graph Appl 1996; 16:46–54. 91. Odhner D, Udupa JK. Shell manipulation: interactive alteration of multiple-material fuzzy structures. SPIE Proc 1995; 2431:35–42. 92. Paouri A, Thalmann NM, Thalmann D. Creating realistic three-dimensional human shape characters for computer-generated films. In: Proceedings of Computer Animation ’91, Tokyo. Berlin, Germany: Springer-Verlag, 1991; 89–100. 93. Waters K. A muscle model for animating three dimensional facial expression. Proceedings of SIGGRAPH ’87, 1987; 21:17–24. 94. Platt S, Badler N. Animating facial expressions. Proceedings of SIGGRAPH ’81, 1981; 245–252. 95. Terzopoulos D, Waters K. Techniques for realistic facial modeling and animation. In: Proceedings of Computer Animation ’91, Tokyo. Berlin, Germany: Springer-Verlag, 1991; 59–74. 96. Jianhua S, Thalmann MN, Thalmann D. Muscle-based human body deformations. In: Proceedings of CAD/ Graphics ’93, Beijing, China, 1993; 95–100. 97. Vannier MW, Hildebolt CF, Marsh JL, et al. Craniosynostosis: diagnostic value of three-dimensional CT reconstruction. Radiology 1989; 173:669–673.
Volume 19
Number 3