Algorithms |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In this
Section we will describe operations that are fundamental to digital
image processing. These operations can be divided into four categories:
operations based on the image histogram, on simple mathematics, on
convolution, and on mathematical morphology. Further, these operations
can also be described in terms of their implementation as a point
operation, a local operation, or a global operation as described in
Section 2.2.1.
Histogram-based Operations
An important class of point operations is based upon the manipulation of an image histogram or a region histogram. The most important examples are described below. Contrast stretchingFrequently, an image is scanned in such a way that the resulting brightness values do not make full use of the available dynamic range. This can be easily observed in the histogram of the brightness values shown in Figure 6. By stretching the histogram over the available dynamic range we attempt to correct this situation. If the image is intended to go from brightness 0 to brightness 2B-1 (see Section 2.1), then one generally maps the 0% value (or minimum as defined in Section 3.5.2) to the value 0 and the 100% value (or maximum) to the value 2B-1. The appropriate transformation is given by: This
formula, however, can be somewhat sensitive to outliers and a less
sensitive and more general version is given by: In
this second version one might choose the 1% and 99% values for plow%
and phigh%, respectively, instead of the 0% and
100% values represented by eq. . It is also possible to apply the
contrast-stretching operation on a regional basis using the histogram
from a region to determine the appropriate limits for the algorithm.
Note that in eqs. and it is possible to suppress the term 2B-1
and simply normalize the brightness range to 0 <= b[m,n]
<= 1. This means representing the final pixel brightnesses as reals
instead of integers but modern computer speeds and RAM capacities make
this quite feasible. EqualizationWhen one wishes to compare two or more images on a specific basis, such as texture, it is common to first normalize their histograms to a "standard" histogram. This can be especially useful when the images have been acquired under different circumstances. The most common histogram normalization technique is histogram equalization where one attempts to change the histogram through the use of a function b = f (a) into a histogram that is constant for all brightness values. This would correspond to a brightness distribution where all values are equally probable. Unfortunately, for an arbitrary image, one can only approximate this result. For
a "suitable" function f (a) the relation between the
input probability density function, the output probability density
function, and the function f (a) is given by:
From
eq. we see that "suitable" means that f (a) is
differentiable and that df/da >= 0. For histogram equalization
we desire that pb(b) = constant and this means
that:
where
P(a) is the probability distribution function
defined in Section 3.5.1 and illustrated in Figure 6a. In other words,
the quantized probability distribution function normalized from 0
to 2B-1 is the look-up table required for histogram
equalization. Figures 21a-c illustrate the effect of contrast stretching
and histogram equalization on a standard image. The histogram
equalization procedure can also be applied on a regional basis.
Figure 21a Figure 21b Figure 21c Original Contrast Stretched istogram Equalized Other histogram-based operations
The histogram derived from a local region can also be used to drive local filters that are to be applied to that region. Examples include minimum filtering, median filtering, and maximum filtering . The concepts minimum, median, and maximum were introduced in Figure 6. The filters based on these concepts will be presented formally in Sections 9.4.2 and 9.6.10.
Mathematics-based OperationsWe
distinguish in this section between binary arithmetic and ordinary
arithmetic. In the binary case there are two brightness values
"0" and "1". In the ordinary case we begin with 2B
brightness values or levels but the processing of the image can easily
generate many more levels. For this reason many software systems
provide 16 or 32 bit representations for pixel brightnesses in order to
avoid problems with arithmetic overflow. Binary operations Operations based on binary (Boolean) arithmetic form the basis for a powerful set of tools that will be described here and extended in Section 9.6, mathematical morphology. The operations described below are point operations and thus admit a variety of efficient implementations including simple look-up tables. The standard notation for the basic set of binary operations is:
The
implication is that each operation is applied on a pixel-by-pixel basis.
For example,
These
operations are illustrated in Figure 22 where the binary value
"1" is shown in black and the value "0" in white.
c)
NOT(b) =
Figure 22: Examples of the various binary point operations. The
SUB(*) operation can be particularly useful when the image a
represents a region-of-interest that we want to analyze systematically
and the image b represents objects that, having been analyzed,
can now be discarded, that is subtracted, from the region. Arithmetic-based operationsThe gray-value point operations that form the basis for image processing are based on ordinary mathematics and include:
Convolution-based OperationsConvolution,
the mathematical, local operation defined in Section 3.1 is
central to modern image processing. The basic idea is that a window of
some finite size and shape--the support--is scanned across the
image. The output pixel value is the weighted sum of the input pixels
within the window where the weights are the values of the filter
assigned to every pixel of the window itself. The window with its
weights is called the convolution kernel. This leads
directly to the following variation on eq. . If the filter h[j,k]
is zero outside the (rectangular) window {j=0,1,...,J-1; k=0,1,...,K-1},
then, using eq. , the convolution can be written as the following finite
sum:
This
equation can be viewed as more than just a pragmatic mechanism for
smoothing or sharpening an image. Further, while eq. illustrates the
local character of this operation, eqs. and suggest that the operation
can be implemented through the use of the Fourier domain which requires
a global operation, the Fourier transform. Both of these aspects will be
discussed below. BackgroundIn a variety of image-forming
systems an appropriate model for the transformation of the physical
signal a(x,y) into an electronic signal c(x,y)
is the convolution of the input signal with the impulse response of the
sensor system. This system might consist of both an optical as well as
an electrical sub-system. If each of these systems can be treated as a
linear, shift-invariant (LSI) system then the convolution model
is appropriate. The definitions of these two, possible, system
properties are given below: Linearity
-
Shift-Invariance
-
where
w1 and w2 are arbitrary complex
constants and xo and yo are
coordinates corresponding to arbitrary spatial translations. Two
remarks are appropriate at this point. First, linearity implies (by
choosing w1 = w2 = 0) that
"zero in" gives "zero out". The offset described in
eq. means that such camera signals are not the output of a linear system
and thus (strictly speaking) the convolution result is not applicable.
Fortunately, it is straightforward to correct for this non-linear
effect. (See Section 10.1.) Second,
optical lenses with a magnification, M, other than 1x
are not shift invariant; a translation of 1 unit in the input image a(x,y)
produces a translation of M units in the output image c(x,y).
Due to the Fourier property described in eq. this case can still be
handled by linear system theory. If
an impulse point of light d(x,y) is imaged through an LSI
system then the impulse response of that system is called the point
spread function (PSF). The output image then
becomes the convolution of the input image with the PSF. The
Fourier transform of the PSF is called the optical transfer
function (OTF). For optical systems that are
circularly-symmetric, aberration-free, and diffraction-limited the PSF
is given by the Airy disk shown in Table 4-T.5. The OTF of the
Airy disk is also presented in Table 4-T.5. If
the convolution window is not the diffraction-limited PSF of the lens
but rather the effect of defocusing a lens then an appropriate model for
h(x,y) is a pill box of radius a as
described in Table 4-T.3. The effect on a test pattern is illustrated in
Figure 23.
Figure 23:
Convolution of test pattern with a pill box of radius a=4.5
pixels. The
effect of the defocusing is more than just simple blurring or smoothing.
The almost periodic negative lobes in the transfer function in
Table 4-T.3 produce a 180deg. phase shift in which black turns to white and vice-versa. The phase
shift is clearly visible in Figure 23b. Convolution in the spatial domain
In describing filters based on
convolution we will use the following convention. Given a filter h[j,k]
of dimensions J x K, we will
consider the coordinate [j=0,k=0] to be in the center of
the filter matrix, h. This is illustrated in Figure 24. The
"center" is well-defined when J and K are odd;
for the case where they are even, we will use the approximations (J/2,
K/2) for the "center" of the matrix.
Figure 24:
Coordinate system for describing h[j,k] When
we examine the convolution sum (eq. ) closely, several issues become
evident. *
Evaluation of formula for m=n=0 while rewriting the limits
of the convolution sum based on the "centering" of h[j,k]
shows that values of a[j,k] can be required that
are outside the image boundaries:
The
question arises - what values should we assign to the image a[m,n]
for m<0, m>=M, n<0, and n>=N?
There is no "answer" to this question. There are only
alternatives among which we are free to choose assuming we understand
the possible consequences of our choice. The standard alternatives are
a) extend the images with a constant (possibly zero) brightness value,
b) extend the image periodically, c) extend the image by mirroring it at
its boundaries, or d) extend the values at the boundaries indefinitely.
These alternatives are illustrated in Figure 25.
(a)
(b) (c) (d) Figure
25:
Examples of various alternatives to extend an image outside its formal
boundaries. See text for explanation. *
When the convolution sum is written in the standard form (eq. ) for an
image a[m,n] of size M x N:
we
see that the convolution kernel h[j,k] is mirrored
around j=k=0 to produce *
The computational complexity for a K x
K convolution kernel implemented in the spatial domain on an
image of N x
N is O(K2) where the complexity is
measured per pixel on the basis of the number of
multiplies-and-adds (MADDs). *
The value computed by a convolution that begins with integer
brightnesses for a[m,n] may produce a rational
number or a floating point number in the result c[m,n].
Working exclusively with integer brightness values will, therefore,
cause roundoff errors. *
Inspection of eq. reveals another possibility for efficient
implementation of convolution. If the convolution kernel h[j,k]
is separable, that is, if the kernel can be written as:
then
the filtering can be performed as follows:
This
means that instead of applying one, two-dimensional filter it is
possible to apply two, one-dimensional filters, the first one in the k
direction and the second one in the j direction. For an N x
N image this, in general, reduces the computational complexity
per pixel from O(J* K) to O(J+K). An
alternative way of writing separability is to note that the convolution
kernel (Figure 24) is a matrix h and, if separable, h can
be written as:
where
"t" denotes the matrix transpose operation.
In other words, h can be expressed as the outer product
of a column vector [hcol] and a row vector [hrow].
*
For certain filters it is possible to find an incremental implementation
for a convolution. As the convolution window moves over the image (see
eq. ), the leftmost column of image data under the window is shifted out
as a new column of image data is shifted in from the right. Efficient
algorithms can take advantage of this and, when combined with separable
filters as described above, this can lead to algorithms where the
computational complexity per pixel is O(constant). Convolution in the frequency domain
In Section 3.4 we indicated that
there was an alternative method to implement the filtering of images
through convolution. Based on eq. it appears possible to achieve the
same result as in eq. by the following sequence of operations: i)
Compute A(
ii)
Multiply A(
iii)
Compute the result c[m,n] = F-1{A(
*
While it might seem that the "recipe" given above in eq.
circumvents the problems associated with direct convolution in the
spatial domain--specifically, determining values for the image outside
the boundaries of the image--the Fourier domain approach, in fact,
simply "assumes" that the image is repeated periodically
outside its boundaries as illustrated in Figure 25b. This phenomenon is
referred to as circular convolution. If
circular convolution is not acceptable then the other possibilities
illustrated in Figure 25 can be realized by embedding the image a[m,n]
and the filter (
*
The computational complexity per pixel of the Fourier approach for an
image of N x
N and for a convolution kernel of K x
K is O(logN) complex MADDs independent
of K. ere we assume that N > K and that N
is a highly composite number such as a power of two. (See also 2.1.)
This latter assumption permits use of the computationally-efficient Fast
Fourier Transform (FFT) algorithm. Surprisingly then, the
indirect route described by eq. can be faster than the direct route
given in eq. . This requires, in general, that K2
>> logN. The range of K and N for which this
holds depends on the specifics of the implementation. For the machine on
which this manuscript is being written and the specific image processing
package that is being used, for an image of N = 256 the Fourier
approach is faster than the convolution approach when K >= 15.
(It should be noted that in this comparison the direct convolution
involves only integer arithmetic while the Fourier domain approach
requires complex floating point arithmetic.)
Smoothing Operations
These
algorithms are applied in order to reduce noise and/or to prepare images
for further processing such as segmentation. We distinguish between
linear and non- linear algorithms where the former are amenable to
analysis in the Fourier domain and the latter are not. We also
distinguish between implementations based on a rectangular support for
the filter and implementations based on a circular support for the
filter. Linear Filters
Several filtering algorithms
will be presented together with the most useful supports. *
Uniform filter - The output image is based on a local averaging
of the input filter where all of the values within the filter support
have the same weight. In the continuous spatial domain (x,y)
the PSF and transfer function are given in Table 4-T.1 for
the rectangular case and in Table 4-T.3 for the circular (pill box)
case. For the discrete spatial domain [m,n] the filter
values are the samples of the continuous domain case. Examples for the
rectangular case (J=K=5) and the circular case (R=2.5)
are shown in Figure 26.
Figure 26:
Uniform filters for image smoothing Note
that in both cases the filter is normalized so that
*
Triangular filter - The output image is based on a local
averaging of the input filter where the values within the filter support
have differing weights. In general, the filter can be seen as the
convolution of two (identical) uniform filters either rectangular or
circular and this has direct consequences for the computational
complexity . (See Table 13.) In the continuous spatial domain the PSF
and transfer function are given in Table 4-T.2 for the
rectangular support case and in Table 4-T.4 for the circular (pill box)
support case. As seen in Table 4 the transfer functions of these
filters do not have negative lobes and thus do not exhibit phase
reversal. Examples
for the rectangular support case (J=K=5) and the circular
support case (R=2.5) are shown in Figure 27. The filter is again
normalized so that
(a)
Pyramidal filter (J=K=5) (b) Cone filter (R=2.5)
Figure 27:
Triangular filters for image smoothing *
Gaussian filter - The use of the Gaussian kernel for smoothing
has become extremely popular. This has to do with certain properties of
the Gaussian (e.g. the central limit theorem, minimum space-bandwidth
product) as well as several application areas such as edge finding and
scale space analysis. The PSF and transfer function for
the continuous space Gaussian are given in Table 4-T6. The Gaussian
filter is separable:
There
are four distinct ways to implement the Gaussian: -
Convolution using a finite number of samples (No) of
the Gaussian as the convolution kernel. It is common to choose No
=
-
Repetitive convolution using a uniform filter as the convolution kernel.
The
actual implementation (in each dimension) is usually of the form:
This
implementation makes use of the approximation afforded by the central
limit theorem. For a desired
-
Multiplication in the frequency domain. As the Fourier transform of a
Gaussian is a Gaussian (see Table -T.6), this means that it is
straightforward to prepare a filter (
-
Use of a recursive filter implementation. A recursive filter has an
infinite impulse response and thus an infinite support. The separable
Gaussian filter can also be implemented by applying the following recipe
in each dimension when
i)
Choose the
The
relation between the desired
The
filter coefficients {b0,b1,b2,b3,B}
are defined by:
The
one-dimensional forward difference equation takes an input row
(or column) a[n] and produces an intermediate output
result w[n] given by:
The
one-dimensional backward difference equation takes
the intermediate result w[n] and produces the output c[n]
given by:
The
forward equation is applied from n = 0 up to n = N
- 1 while the backward equation is applied from n = N
- 1 down to n = 0. The
relative performance of these various implementation of the Gaussian
filter can be described as follows. Using the root-square error
The
root-square error measure is extremely conservative and thus all
filters, with the exception of "Uniform 3x"
for large
a)
Accuracy comparison b) Speed comparison Figure
28:
Comparison of various Gaussian algorithms with N=256. The legend
is spread across both graphs *
Other - The Fourier domain approach offers the opportunity to
implement a variety of smoothing algorithms. The smoothing filters will
then be lowpass filters. In general it is desirable to use
a lowpass filter that has zero phase so as not to produce phase
distortion when filtering the image. The importance of phase was
illustrated in Figures 5 and 23. When the frequency domain
characteristics can be represented in an analytic form, then this can
lead to relatively straightforward implementations of (
Non-Linear Filters
A variety of smoothing filters
have been developed that are not linear. While they cannot, in general,
be submitted to Fourier analysis, their properties and domains of
application have been studied extensively. *
Median filter - The median statistic was described in Section
3.5.2. A median filter is based upon moving a window over an image (as
in a convolution) and computing the output pixel as the median value of
the brightnesses within the input window. If the window is J x
K in size we can order the J*K pixels in brightness
value from smallest to largest. If J*K is odd then the
median will be the (J*K+1)/2 entry in the list of ordered
brightnesses. Note that the value selected will be exactly equal to one
of the existing brightnesses so that no roundoff error will be involved
if we want to work exclusively with integer brightness values. The
algorithm as it is described above has a generic complexity per pixel of
O(J*K*log(J*K)). Fortunately, a fast
algorithm (due to uang et al. ) exists that reduces the complexity to O(K)
assuming J >= K. A
useful variation on the theme of the median filter is the percentile
filter. ere the center pixel in the window is replaced not by the
50% (median) brightness value but rather by the p% brightness
value where p% ranges from 0% (the minimum filter)
to 100% (the maximum filter). Values other then (p=50)%
do not, in general, correspond to smoothing filters. *
Kuwahara filter - Edges play an important role in our perception
of images (see Figure 15) as well as in the analysis of images. As such
it is important to be able to smooth images without disturbing the
sharpness and, if possible, the position of edges. A filter that
accomplishes this goal is termed an edge-preserving filter
and one particular example is the Kuwahara filter . Although this filter
can be implemented for a variety of different window shapes, the
algorithm will be described for a square window of size J = K
= 4L + 1 where L is an integer. The window is partitioned
into four regions as shown in Figure 29.
Figure 29:
Four, square regions defined for the Kuwahara filter. In this example L=1
and thus J=K=5. Each region is [(J+1)/2] x
[(K+1)/2]. In
each of the four regions (i=1,2,3,4), the mean brightness, mi
in eq. , and the variancei, si2
in eq. , are measured. The output value of the center pixel in the
window is the mean value of that region that has the smallest variance. Summary of Smoothing Algorithms
The following table summarizes
the various properties of the smoothing algorithms presented above. The
filter size is assumed to be bounded by a rectangle of J x
K where, without loss of generality, J >= K. The
image size is N x N.
Table
13: Characteristics of smoothing filters.
ªSee text for additional explanation. Examples
of the effect of various smoothing algorithms are shown in Figure 30.
a)
Original b) Uniform 5 x
5 c) Gaussian (
|