Biosignal and
Biomedical Image
Processing
MATLAB-Based Applications
JOHN L. SEMMLOW
Robert WoodJohnson Medical School
New Brunswick, New Jersey, U.S.A.
Rutgers University
Piscataway, New Jersey, U.S.A.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Although great care has been taken to provide accurate and current information, neither
the author(s) nor the publisher, nor anyone else associated with this publication, shall be
liable for any loss, damage, or liability directly or indirectly caused or alleged to be
caused by this book. The material contained herein is not intended to provide specific
advice or recommendations for any specific situation.
Trademark notice: Product or corporate names may be trademarks or registered trade-
marks and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.
ISBN: 0–8247-4803–4
This book is printed on acid-free paper.
Headquarters
Marcel Dekker, Inc., 270 Madison Avenue, New York, NY 10016, U.S.A.
tel: 212-696-9000; fax: 212-685-4540
Distribution and Customer Service
Marcel Dekker, Inc., Cimarron Road, Monticello, New York 12701, U.S.A.
tel: 800-228-1160; fax: 845-796-1772
Eastern Hemisphere Distribution
Marcel Dekker AG, Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland
tel: 41-61-260-6300; fax: 41-61-260-6333
World Wide Web
http://www.dekker.com
The publisher offers discounts on this book when ordered in bulk quantities. For more
information, write to Special Sales/Professional Marketing at the headquarters address
above.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording,
or by any information storage and retrieval system, without permission in writing from
the publisher.
Current printing (last digit):
10 9 8 7 6 5 4 3 2 1
PRINTED IN THE UNITED STATES OF AMERICA
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
To Lawrence Stark, M.D., who has shown me the many possibilities . . .
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Series Introduction
Over the past 50 years, digital signal processing has evolved as a major engi-
neering discipline. The fields of signal processing have grown from the origin
of fast Fourier transform and digital filter design to statistical spectral analysis
and array processing, image, audio, and multimedia processing, and shaped de-
velopments in high-performance VLSI signal processor design. Indeed, there
are few fields that enjoy so many applications—signal processing is everywhere
in our lives.
When one uses a cellular phone, the voice is compressed, coded, and
modulated using signal processing techniques. As a cruise missile winds along
hillsides searching for the target, the signal processor is busy processing the
images taken along the way. When we are watching a movie in HDTV, millions
of audio and video data are being sent to our homes and received with unbeliev-
able fidelity. When scientists compare DNA samples, fast pattern recognition
techniques are being used. On and on, one can see the impact of signal process-
ing in almost every engineering and scientific discipline.
Because of the immense importance of signal processing and the fast-
growing demands of business and industry, this series on signal processing
serves to report up-to-date developments and advances in the field. The topics
of interest include but are not limited to the following:
• Signal theory and analysis
• Statistical signal processing
• Speech and audio processing
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
• Image and video processing
• Multimedia signal processing and technology
• Signal processing for communications
• Signal processing architectures and VLSI design
We hope this series will provide the interested audience with high-quality,
state-of-the-art signal processing literature through research monographs, edited
books, and rigorously written textbooks by experts in their fields.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Preface
Signal processing can be broadly defined as the application of analog or digital
techniques to improve the utility of a data stream. In biomedical engineering
applications, improved utility usually means the data provide better diagnostic
information. Analog techniques are applied to a data stream embodied as a time-
varying electrical signal while in the digital domain the data are represented as
an array of numbers. This array could be the digital representation of a time-
varying signal, or an image. This text deals exclusively with signal processing
of digital data, although Chapter 1 briefly describes analog processes commonly
found in medical devices.
This text should be of interest to a broad spectrum of engineers, but it
is written specifically for biomedical engineers (also known as bioengineers).
Although the applications are different, the signal processing methodology used
by biomedical engineers is identical to that used by other engineers such electri-
cal and communications engineers. The major difference for biomedical engi-
neers is in the level of understanding required for appropriate use of this technol-
ogy. An electrical engineer may be required to expand or modify signal
processing tools, while for biomedical engineers, signal processing techniques
are tools to be used. For the biomedical engineer, a detailed understanding of
the underlying theory, while always of value, may not be essential. Moreover,
considering the broad range of knowledge required to be effective in this field,
encompassing both medical and engineering domains, an in-depth understanding
of all of the useful technology is not realistic. It is important is to know what
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
tools are available, have a good understanding of what they do (if not how they
do it), be aware of the most likely pitfalls and misapplications, and know how
to implement these tools given available software packages. The basic concept
of this text is that, just as the cardiologist can benefit from an oscilloscope-type
display of the ECG without a deep understanding of electronics, so a biomedical
engineer can benefit from advanced signal processing tools without always un-
derstanding the details of the underlying mathematics.
As a reflection of this philosophy, most of the concepts covered in this
text are presented in two sections. The first part provides a broad, general under-
standing of the approach sufficient to allow intelligent application of the con-
cepts. The second part describes how these tools can be implemented and relies
primarily on the MATLAB software package and several of its toolboxes.
This text is written for a single-semester course combining signal and
image processing. Classroom experience using notes from this text indicates
that this ambitious objective is possible for most graduate formats, although
eliminating a few topics may be desirable. For example, some of the introduc-
tory or basic material covered in Chapters 1 and 2 could be skipped or treated
lightly for students with the appropriate prerequisites. In addition, topics such
as advanced spectral methods (Chapter 5), time-frequency analysis (Chapter 6),
wavelets (Chapter 7), advanced filters (Chapter 8), and multivariate analysis
(Chapter 9) are pedagogically independent and can be covered as desired with-
out affecting the other material.
Although much of the material covered here will be new to most students,
the book is not intended as an “introductory” text since the goal is to provide a
working knowledge of the topics presented without the need for additional
course work. The challenge of covering a broad range of topics at a useful,
working depth is motivated by current trends in biomedical engineering educa-
tion, particularly at the graduate level where a comprehensive education must
be attained with a minimum number of courses. This has led to the development
of “core” courses to be taken by all students. This text was written for just such
a core course in the Graduate Program of Biomedical Engineering at Rutgers
University. It is also quite suitable for an upper-level undergraduate course and
would be of value for students in other disciplines who would benefit from a
working knowledge of signal and image processing.
It would not be possible to cover such a broad spectrum of material to a
depth that enables productive application without heavy reliance on MATLAB-
based examples and problems. In this regard, the text assumes the student
has some knowledge of MATLAB programming and has available the basic
MATLAB software package including the Signal Processing and Image Process-
ing Toolboxes. (MATLAB also produces a Wavelet Toolbox, but the section on
wavelets is written so as not to require this toolbox, primarily to keep the num-
ber of required toolboxes to a minimum.) The problems are an essential part of
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
this text and often provide a discovery-like experience regarding the associated
topic. A few peripheral topics are introduced only though the problems. The
code used for all examples is provided in the CD accompanying this text. Since
many of the problems are extensions or modifications of examples given in the
chapter, some of the coding time can be reduced by starting with the code of a
related example. The CD also includes support routines and data files used in
the examples and problems. Finally, the CD contains the code used to generate
many of the figures. For instructors, there is a CD available that contains the
problem solutions and Powerpoint presentations from each of the chapters.
These presentations include figures, equations, and text slides related to chapter.
Presentations can be modified by the instructor as desired.
In addition to heavy reliance on MATLAB problems and examples, this
text makes extensive use of simulated data. Except for the section on image
processing, examples involving biological signals are rarely used. In my view,
examples using biological signals provide motivation, but they are not generally
very instructive. Given the wide range of material to be presented at a working
depth, emphasis is placed on learning the tools of signal processing; motivation
is left to the reader (or the instructor).
Organization of the text is straightforward. Chapters 1 through 4 are fairly
basic. Chapter 1 covers topics related to analog signal processing and data acqui-
sition while Chapter 2 includes topics that are basic to all aspects of signal and
image processing. Chapters 3 and 4 cover classical spectral analysis and basic
digital filtering, topics fundamental to any signal processing course. Advanced
spectral methods, covered in Chapter 5, are important due to their widespread
use in biomedical engineering. Chapter 6 and the first part of Chapter 7 cover
topics related to spectral analysis when the signal’s spectrum is varying in time,
a condition often found in biological signals. Chapter 7 also covers both contin-
uous and discrete wavelets, another popular technique used in the analysis of
biomedical signals. Chapters 8 and 9 feature advanced topics. In Chapter 8,
optimal and adaptive filters are covered, the latter’s inclusion is also motivated
by the time-varying nature of many biological signals. Chapter 9 introduces
multivariate techniques, specifically principal component analysis and indepen-
dent component analysis, two analysis approaches that are experiencing rapid
growth with regard to biomedical applications. The last four chapters cover
image processing, with the first of these, Chapter 10, covering the conventions
used by MATLAB’s Imaging Processing Toolbox. Image processing is a vast
area and the material covered here is limited primarily to areas associated with
medical imaging: image acquisition (Chapter 13); image filtering, enhancement,
and transformation (Chapter 11); and segmentation, and registration (Chapter 12).
Many of the chapters cover topics that can be adequately covered only in
a book dedicated solely to these topics. In this sense, every chapter represents
a serious compromise with respect to comprehensive coverage of the associated
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
topics. My only excuse for any omissions is that classroom experience with this
approach seems to work: students end up with a working knowledge of a vast
array of signal and image processing tools. A few of the classic or major books
on these topics are cited in an Annotated bibliography at the end of the book.
No effort has been made to construct an extensive bibliography or reference list
since more current lists would be readily available on the Web.
TEXTBOOK PROTOCOLS
In most early examples that feature MATLAB code, the code is presented in
full, while in the later examples some of the routine code (such as for plotting,
display, and labeling operation) is omitted. Nevertheless, I recommend that stu-
dents carefully label (and scale when appropriate) all graphs done in the prob-
lems. Some effort has been made to use consistent notation as described in
Table 1. In general, lower-case letters n and k are used as data subscripts, and
capital letters, N and K are used to indicate the length (or maximum subscript
value) of a data set. In two-dimensional data sets, lower-case letters m and n
are used to indicate the row and column subscripts of an array, while capital
letters M and N are used to indicate vertical and horizontal dimensions, respec-
tively. The letter m is also used as the index of a variable produced by a transfor-
mation, or as an index indicating a particular member of a family of related
functions.* While it is common to use brackets to enclose subscripts of discrete
variables (i.e., x[n]), ordinary parentheses are used here. Brackets are reserved
to indicate vectors (i.e., [x1, x2, x3 , . . . ]) following MATLAB convention.
Other notation follows standard conventions.
Italics (“) are used to introduce important new terms that should be incor-
porated into the reader’s vocabulary. If the meaning of these terms is not obvi-
ous from their use, they are explained where they are introduced. All MATLAB
commands, routines, variables, and code are shown in the Courier typeface.
Single quotes are used to highlight MATLAB filenames or string variables.
Textbook protocols are summarized in Table 1.
I wish to thank Susanne Oldham who managed to edit this book, and
provided strong, continuing encouragement and support. I would also like to
acknowledge the patience and support of Peggy Christ and Lynn Hutchings.
Professor Shankar Muthu Krishnan of Singapore provided a very thoughtful
critique of the manuscript which led to significant improvements. Finally, I
thank my students who provided suggestions and whose enthusiasm for the
material provided much needed motivation.
*For example, m would be used to indicate the harmonic number of a family of harmonically related
sine functions; i.e., fm(t) = sin (2 π m t).
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
TABLE 1 Textbook Conventions
Symbol Description/General usage
x(t), y(t) General functions of time, usually a waveform or signal
k, n Data indices, particularly for digitized time data
K, N Maximum index or size of a data set
x(n), y(n) Waveform variable, usually digitized time variables (i.e., a dis-
creet variable)
m Index of variable produced by transformation, or the index of
specifying the member number of a family of functions (i.e.,
fm(t))
X(f), Y(f) Frequency representation (complex) of a time function
X(m), Y(m) Frequency representation (complex) of a discreet variable
h(t) Impulse response of a linear system
h(n) Discrete impulse response of a linear system
b(n) Digital filter coefficients representing the numerator of the dis-
creet Transfer Function; hence the same as the impulse re-
sponse
a(n) Digital filter coefficients representing the denominator of the dis-
creet Transfer Function
Courier font MATLAB command, variable, routine, or program.
Courier font MATLAB filename or string variable
John L. Semmlow
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Contents
Preface
1 Introduction
Typical Measurement Systems
Transducers
Further Study: The Transducer
Analog Signal Processing
Sources of Variability: Noise
Electronic Noise
Signal-to-Noise Ratio
Analog Filters: Filter Basics
Filter Types
Filter Bandwidth
Filter Order
Filter Initial Sharpness
Analog-to-Digital Conversion: Basic Concepts
Analog-to-Digital Conversion Techniques
Quantization Error
Further Study: Successive Approximation
Time Sampling: Basics
Further Study: Buffering and Real-Time Data Processing
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Data Banks
Problems
2 Basic Concepts
Noise
Ensemble Averaging
MATLAB Implementation
Data Functions and Transforms
Convolution, Correlation, and Covariance
Convolution and the Impulse Response
Covariance and Correlation
MATLAB Implementation
Sampling Theory and Finite Data Considerations
Edge Effects
Problems
3 Spectral Analysis: Classical Methods
Introduction
The Fourier Transform: Fourier Series Analysis
Periodic Functions
Symmetry
Discrete Time Fourier Analysis
Aperiodic Functions
Frequency Resolution
Truncated Fourier Analysis: Data Windowing
Power Spectrum
MATLAB Implementation
Direct FFT and Windowing
The Welch Method for Power Spectral Density Determination
Widow Functions
Problems
4 Digital Filters
The Z-Transform
Digital Transfer Function
MATLAB Implementation
Finite Impulse Response (FIR) Filters
FIR Filter Design
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Derivative Operation: The Two-Point Central Difference
Algorithm
MATLAB Implementation
Infinite Impulse Response (IIR) Filters
Filter Design and Application Using the MATLAB Signal
Processing Toolbox
FIR Filters
Two-Stage FIR Filter Design
Three-Stage Filter Design
IIR Filters
Two-Stage IIR Filter Design
Three-Stage IIR Filter Design: Analog Style Filters
Problems
5 Spectral Analysis: Modern Techniques
Parametric Model-Based Methods
MATLAB Implementation
Non-Parametric Eigenanalysis Frequency Estimation
MATLAB Implementation
Problems
6 Time–Frequency Methods
Basic Approaches
Short-Term Fourier Transform: The Spectrogram
Wigner-Ville Distribution: A Special Case of Cohen’s Class
Choi-Williams and Other Distributions
Analytic Signal
MATLAB Implementation
The Short-Term Fourier Transform
Wigner-Ville Distribution
Choi-Williams and Other Distributions
Problems
7 The Wavelet Transform
Introduction
The Continuous Wavelet Transform
Wavelet Time—Frequency Characteristics
MATLAB Implementation
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The Discrete Wavelet Transform
Filter Banks
The Relationship Between Analytical Expressions and
Filter Banks
MATLAB Implementation
Denoising
Discontinuity Detection
Feature Detection: Wavelet Packets
Problems
8 Advanced Signal Processing Techniques:
Optimal and Adaptive Filters
Optimal Signal Processing: Wiener Filters
MATLAB Implementation
Adaptive Signal Processing
Adaptive Noise Cancellation
MATLAB Implementation
Phase Sensitive Detection
AM Modulation
Phase Sensitive Detectors
MATLAB Implementation
Problems
9 Multivariate Analyses: Principal Component Analysis
and Independent Component Analysis
Introduction
Principal Component Analysis
Order Selection
MATLAB Implementation
Data Rotation
Principal Component Analysis Evaluation
Independent Component Analysis
MATLAB Implementation
Problems
10 Fundamentals of Image Processing: MATLAB Image
Processing Toolbox
Image Processing Basics: MATLAB Image Formats
General Image Formats: Image Array Indexing
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Data Classes: Intensity Coding Schemes
Data Formats
Data Conversions
Image Display
Image Storage and Retrieval
Basic Arithmetic Operations
Advanced Protocols: Block Processing
Sliding Neighborhood Operations
Distinct Block Operations
Problems
11 Image Processing: Filters, Transformations,
and Registration
Spectral Analysis: The Fourier Transform
MATLAB Implementation
Linear Filtering
MATLAB Implementation
Filter Design
Spatial Transformations
MATLAB Implementation
Affine Transformations
General Affine Transformations
Projective Transformations
Image Registration
Unaided Image Registration
Interactive Image Registration
Problems
12 Image Segmentation
Pixel-Based Methods
Threshold Level Adjustment
MATLAB Implementation
Continuity-Based Methods
MATLAB Implementation
Multi-Thresholding
Morphological Operations
MATLAB Implementation
Edge-Based Segmentation
MATLAB Implementation
Problems
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
13 Image Reconstruction
CT, PET, and SPECT
Fan Beam Geometry
MATLAB Implementation
Radon Transform
Inverse Radon Transform: Parallel Beam Geometry
Radon and Inverse Radon Transform: Fan Beam Geometry
Magnetic Resonance Imaging
Basic Principles
Data Acquisition: Pulse Sequences
Functional MRI
MATLAB Implementation
Principal Component and Independent Component Analysis
Problems
Annotated Bibliography
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Annotated Bibliography
The following is a very selective list of books or articles that will be of value of in
providing greater depth and mathematical rigor to the material presented in this text.
Comments regarding the particular strengths of the reference are included.
Akansu, A. N. and Haddad, R. A., Multiresolution Signal Decomposition: Transforms,
subbands, wavelets. Academic Press, San Diego CA, 1992. A modern classic that
presents, among other things, some of the underlying theoretical aspects of wavelet
analysis.
Aldroubi A and Unser, M. (eds) Wavelets in Medicine and Biology, CRC Press, Boca
Raton, FL, 1996. Presents a variety of applications of wavelet analysis to biomedical
engineering.
Boashash, B. Time-Frequency Signal Analysis, Longman Cheshire Pty Ltd., 1992. Early
chapters provide a very useful introduction to time–frequency analysis followed by a
number of medical applications.
Boashash, B. and Black, P.J. An efficient real-time implementation of the Wigner-Ville
Distribution, IEEE Trans. Acoust. Speech Sig. Proc. ASSP-35:1611–1618, 1987.
Practical information on calculating the Wigner-Ville distribution.
Boudreaux-Bartels, G. F. and Murry, R. Time-frequency signal representations for bio-
medical signals. In: The Biomedical Engineering Handbook. J. Bronzino (ed.) CRC
Press, Boca Raton, Florida and IEEE Press, Piscataway, N.J., 1995. This article pres-
ents an exhaustive, or very nearly so, compilation of Cohen’s class of time-frequency
distributions.
Bruce, E. N. Biomedical Signal Processing and Signal Modeling, John Wiley and Sons,
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
New York, 2001. Rigorous treatment with more of an emphasis on linear systems
than signal processing. Introduces nonlinear concepts such as chaos.
Cichicki, A and Amari S. Adaptive Bilnd Signal and Image Processing: Learning Algo-
rithms and Applications, John Wiley and Sons, Inc. New York, 2002. Rigorous,
somewhat dense, treatment of a wide range of principal component and independent
component approaches. Includes disk.
Cohen, L. Time-frequency distributions—A review. Proc. IEEE 77:941–981, 1989.
Classic review article on the various time-frequency methods in Cohen’s class of
time–frequency distributions.
Ferrara, E. and Widrow, B. Fetal Electrocardiogram enhancement by time-sequenced
adaptive filtering. IEEE Trans. Biomed. Engr. BME-29:458–459, 1982. Early appli-
cation of adaptive noise cancellation to a biomedical engineering problem by one of
the founders of the field. See also Widrow below.
Friston, K. Statistical Parametric Mapping On-line at: http://www.fil.ion.ucl.ac.uk/spm/
course/note02/ Through discussion of practical aspects of fMRI analysis including
pre-processing, statistical methods, and experimental design. Based around SPM anal-
ysis software capabilities.
Haykin, S. Adaptive Filter Theory (2nd
ed.), Prentice-Hall, Inc., Englewood Cliffs, N.J.,
1991. The definitive text on adaptive filters including Weiner filters and gradient-
based algorithms.
Hyva¨rinen, A. Karhunen, J. and Oja, E. Independent Component Analysis, John Wiley
and Sons, Inc. New York, 2001. Fundamental, comprehensive, yet readable book on
independent component analysis. Also provides a good review of principal compo-
nent analysis.
Hubbard B.B. The World According to Wavelets (2nd
ed.) A.K. Peters, Ltd. Natick, MA,
1998. Very readable introductory book on wavelengths including an excellent section
on the foyer transformed. Can be read by a non-signal processing friend.
Ingle, V.K. and Proakis, J. G. Digital Signal Processing with MATLAB, Brooks/Cole,
Inc. Pacific Grove, CA, 2000. Excellent treatment of classical signal processing meth-
ods including the Fourier transform and both FIR and IIR digital filters. Brief, but
informative section on adaptive filtering.
Jackson, J. E. A User’s Guide to Principal Components, John Wiley and Sons, New
York, 1991. Classic book providing everything you ever want to know about principal
component analysis. Also covers linear modeling and introduces factor analysis.
Johnson, D.D. Applied Multivariate Methods for Data Analysis, Brooks/Cole, Pacific
Grove, CA, 1988. Careful, detailed coverage of multivariate methods including prin-
cipal components analysis. Good coverage of discriminant analysis techniques.
Kak, A.C and Slaney M. Principles of Computerized Tomographic Imaging. IEEE Press,
New York, 1988. Thorough, understandable treatment of algorithms for reconstruc-
tion of tomographic images including both parallel and fan-beam geometry. Also
includes techniques used in reflection tomography as occurs in ultrasound imaging.
Marple, S.L. Digital Spectral Analysis with Applications, Prentice-Hall, Englewood
Cliffs, NJ, 1987. Classic text on modern spectral analysis methods. In-depth, rigorous
treatment of Fourier transform, parametric modeling methods (including AR and
ARMA), and eigenanalysis-based techniques.
Rao, R.M. and Bopardikar, A.S. Wavelet Transforms: Introduction to Theory and Appli-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
cations, Addison-Wesley, Inc., Reading, MA, 1998. Good development of wavelet
analysis including both the continuous and discreet wavelet transforms.
Shiavi, R Introduction to Applied Statistical Signal Analysis, (2nd
ed), Academic Press,
San Diego, CA, 1999. Emphasizes spectral analysis of signals buried in noise. Excel-
lent coverage of Fourier analysis, and autoregressive methods. Good introduction to
statistical signal processing concepts.
Sonka, M., Hlavac V., and Boyle R. Image processing, analysis, and machine vision.
Chapman and Hall Computing, London, 1993. A good description of edge-based and
other segmentation methods.
Strang, G and Nguyen, T. Wavelets and Filter Banks, Wellesley-Cambridge Press,
Wellesley, MA, 1997. Thorough coverage of wavelet filter banks including extensive
mathematical background.
Stearns, S.D. and David, R.A Signal Processing Algorithms in MATLAB, Prentice Hall,
Upper Saddle River, NJ, 1996. Good treatment of the classical Fourier transform and
digital filters. Also covers the LMS adaptive filter algorithm. Disk enclosed.
Wickerhauser, M.V. Adapted Wavelet Analysis from Theory to Software, A.K. Peters,
Ltd. and IEEE Press, Wellesley, MA, 1994. Rigorous, extensive treatment of wavelet
analysis.
Widrow, B. Adaptive noise cancelling: Principles and applications. Proc IEEE 63:1692–
1716, 1975. Classic original article on adaptive noise cancellation.
Wright S. Nuclear Magnetic Resonance and Magnetic Resonance Imaging. In: Introduc-
tion to Biomedical Engineering (Enderle, Blanchard and Bronzino, Eds.) Academic
Press, San Diego, CA, 2000. Good mathematical development of the physics of MRI
using classical concepts.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
1
Introduction
TYPICAL MEASUREMENT SYSTEMS
A schematic representation of a typical biomedical measurement system is
shown in Figure 1.1. Here we use the term measurement in the most general
sense to include image acquisition or the acquisition of other forms of diagnostic
information. The physiological process of interest is converted into an electric
FIGURE 1.1 Schematic representation of typical bioengineering measurement
system.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
signal via the transducer (Figure 1.1). Some analog signal processing is usually
required, often including amplification and lowpass (or bandpass) filtering.
Since most signal processing is easier to implement using digital methods, the
analog signal is converted to digital format using an analog-to-digital converter.
Once converted, the signal is often stored, or buffered, in memory to facilitate
subsequent signal processing. Alternatively, in some real-time* applications, the
incoming data must be processed as quickly as possible with minimal buffering,
and may not need to be permanently stored. Digital signal processing algorithms
can then be applied to the digitized signal. These signal processing techniques
can take a wide variety of forms and various levels of sophistication, and they
make up the major topic area of this book. Some sort of output is necessary in
any useful system. This usually takes the form of a display, as in imaging sys-
tems, but may be some type of an effector mechanism such as in an automated
drug delivery system.
With the exception of this chapter, this book is limited to digital signal
and image processing concerns. To the extent possible, each topic is introduced
with the minimum amount of information required to use and understand the
approach, and enough information to apply the methodology in an intelligent
manner. Understanding of strengths and weaknesses of the various methods is
also covered, particularly through discovery in the problems at the end of the
chapter. Hence, the problems at the end of each chapter, most of which utilize
the MATLABTM
software package (Waltham, MA), constitute an integral part
of the book: a few topics are introduced only in the problems.
A fundamental assumption of this text is that an in-depth mathematical
treatment of signal processing methodology is not essential for effective and
appropriate application of these tools. Thus, this text is designed to develop
skills in the application of signal and image processing technology, but may not
provide the skills necessary to develop new techniques and algorithms. Refer-
ences are provided for those who need to move beyond application of signal
and image processing tools to the design and development of new methodology.
In subsequent chapters, each major section is followed by a section on imple-
mentation using the MATLAB software package. Fluency with the MATLAB
language is assumed and is essential for the use of this text. Where appropriate,
a topic area may also include a more in-depth treatment including some of the
underlying mathematics.
*Learning the vocabulary is an important part of mastering a discipline. In this text we highlight,
using italics, terms commonly used in signal and image processing. Sometimes the highlighted term
is described when it is introduced, but occasionally determination of its definition is left to responsi-
bility of the reader. Real-time processing and buffering are described in the section on analog-to-
digital conversion.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
TRANSDUCERS
A transducer is a device that converts energy from one form to another. By this
definition, a light bulb or a motor is a transducer. In signal processing applica-
tions, the purpose of energy conversion is to transfer information, not to trans-
form energy as with a light bulb or a motor. In measurement systems, all trans-
ducers are so-called input transducers, they convert non-electrical energy into
an electronic signal. An exception to this is the electrode, a transducer that
converts electrical energy from ionic to electronic form. Usually, the output of
a biomedical transducer is a voltage (or current) whose amplitude is proportional
to the measured energy.
The energy that is converted by the input transducer may be generated by
the physiological process itself, indirectly related to the physiological process,
or produced by an external source. In the last case, the externally generated
energy interacts with, and is modified by, the physiological process, and it is
this alteration that produces the measurement. For example, when externally
produced x-rays are transmitted through the body, they are absorbed by the
intervening tissue, and a measurement of this absorption is used to construct an
image. Many diagnostically useful imaging systems are based on this external
energy approach.
In addition to passing external energy through the body, some images are
generated using the energy of radioactive emissions of radioisotopes injected
into the body. These techniques make use of the fact that selected, or tagged,
molecules will collect in specific tissue. The areas where these radioisotopes
collect can be mapped using a gamma camera, or with certain short-lived iso-
topes, better localized using positron emission tomography (PET).
Many physiological processes produce energy that can be detected di-
rectly. For example, cardiac internal pressures are usually measured using a
pressure transducer placed on the tip of catheter introduced into the appropriate
chamber of the heart. The measurement of electrical activity in the heart, mus-
cles, or brain provides other examples of the direct measurement of physiologi-
cal energy. For these measurements, the energy is already electrical and only
needs to be converted from ionic to electronic current using an electrode. These
sources are usually given the term ExG, where the ‘x’ represents the physiologi-
cal process that produces the electrical energy: ECG–electrocardiogram, EEG–
electroencephalogram; EMG–electromyogram; EOG–electrooculargram, ERG–
electroretiniogram; and EGG–electrogastrogram. An exception to this terminology
is the electrical activity generated by this skin which is termed the galvanic skin
response, GSR. Typical physiological energies and the applications that use
these energy forms are shown in Table 1.1
The biotransducer is often the most critical element in the system since it
constitutes the interface between the subject or life process and the rest of the
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
TABLE 1.1 Energy Forms and Related Direct Measurements
Energy Measurement
Mechanical
length, position, and velocity muscle movement, cardiovascular pressures,
muscle contractility
force and pressure valve and other cardiac sounds
Heat body temperature, thermography
Electrical EEG, ECG, EMG, EOG, ERG, EGG, GSR
Chemical ion concentrations
system. The transducer establishes the risk, or noninvasiveness, of the overall
system. For example, an imaging system based on differential absorption of
x-rays, such as a CT (computed tomography) scanner is considered more inva-
sive than an imagining system based on ultrasonic reflection since CT uses
ionizing radiation that may have an associated risk. (The actual risk of ionizing
radiation is still an open question and imaging systems based on x-ray absorp-
tion are considered minimally invasive.) Both ultrasound and x-ray imaging
would be considered less invasive than, for example, monitoring internal cardiac
pressures through cardiac catherization in which a small catheter is treaded into
the heart chambers. Indeed many of the outstanding problems in biomedical
measurement, such as noninvasive measurement of internal cardiac pressures,
or the noninvasive measurement of intracranial pressure, await an appropriate
(and undoubtedly clever) transducer mechanism.
Further Study: The Transducer
The transducer often establishes the major performance criterion of the system.
In a later section, we list and define a number of criteria that apply to measure-
ment systems; however, in practice, measurement resolution, and to a lesser
extent bandwidth, are generally the two most important and troublesome mea-
surement criteria. In fact, it is usually possible to trade-off between these two
criteria. Both of these criteria are usually established by the transducer. Hence,
although it is not the topic of this text, good system design usually calls for care
in the choice or design of the transducer element(s). An efficient, low-noise
transducer design can often reduce the need for extensive subsequent signal
processing and still produce a better measurement.
Input transducers use one of two different fundamental approaches: the
input energy causes the transducer element to generate a voltage or current, or
the input energy creates a change in the electrical properties (i.e., the resistance,
inductance, or capacitance) of the transducer element. Most optical transducers
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
use the first approach. Photons strike a photo sensitive material producing free
electrons (or holes) that can then be detected as an external current flow. Piezo-
electric devices used in ultrasound also generate a charge when under mechani-
cal stress. Many examples can be found of the use of the second category, a
change in some electrical property. For example, metals (and semiconductors)
undergo a consistent change in resistance with changes in temperature, and most
temperature transducers utilize this feature. Other examples include the strain
gage, which measures mechanical deformation using the small change in resis-
tance that occurs when the sensing material is stretched.
Many critical problems in medical diagnosis await the development of
new approaches and new transducers. For example, coronary artery disease is a
major cause of death in developed countries, and its treatment would greatly
benefit from early detection. To facilitate early detection, a biomedical instru-
mentation system is required that is inexpensive and easy to operate so that it
could be used for general screening. In coronary artery disease, blood flow to
the arteries of the heart (i.e., coronaries) is reduced due to partial or complete
blockage (i.e., stenoses). One conceptually simple and inexpensive approach is
to detect the sounds generated by turbulent blood flow through partially in-
cluded coronary arteries (called bruits when detected in other arteries such as
the carotids). This approach requires a highly sensitive transducer(s), in this case
a cardiac microphone, as well as advanced signal processing methods. Results of
efforts based on this approach are ongoing, and the problem of noninvasive
detection of coronary artery disease is not yet fully solved.
Other holy grails of diagnostic cardiology include noninvasive measure-
ment of cardiac output (i.e., volume of blood flow pumped by the heart per unit
time) and noninvasive measurement of internal cardiac pressures. The former
has been approached using Doppler ultrasound, but this technique has not yet
been accepted as reliable. Financial gain and modest fame awaits the biomedical
engineer who develops instrumentation that adequately addresses any of these
three outstanding measurement problems.
ANALOG SIGNAL PROCESSING
While the most extensive signal processing is usually performed on digitized
data using algorithms implemented in software, some analog signal processing
is usually necessary. The first analog stage depends on the basic transducer
operation. If the transducer is based on a variation in electrical property, the
first stage must convert that variation in electrical property into a variation in
voltage. If the transducer element is single ended, i.e., only one element changes,
then a constant current source can be used and the detector equation follows
ohm’s law:
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Vout = I(Z + ∆Z) where ∆Z = f(input energy). (1)
Figure 1.2 shows an example of a single transducer element used in opera-
tional amplifier circuit that provides constant current operation. The transducer
element in this case is a thermistor, an element that changes its resistance with
temperature. Using circuit analysis, it is easy to show that the thermistor is
driven by a constant current of VS /R amps. The output, Vout, is [(RT + ∆RT)/R]VS.
Alternatively, an approximate constant current source can be generated using a
voltage source and a large series resistor, RS, where RS >> ∆R.
If the transducer can be configured differentially so that one element in-
creases with increasing input energy while the other element decreases, the
bridge circuit is commonly used as a detector. Figure 1.3 shows a device made
to measure intestinal motility using strain gages. A bridge circuit detector is
used in conjunction with a pair of differentially configured strain gages: when
the intestine contracts, the end of the cantilever beam moves downward and the
upper strain gage (visible) is stretched and increases in resistance while the
lower strain gage (not visible) compresses and decreases in resistance. The out-
put of the bridge circuit can be found from simple circuit analysis to be: Vout =
VS∆R/2, where VS is the value of the source voltage. If the transducer operates
based on a change in inductance or capacitance, the above techniques are still
useful except a sinusoidal voltage source must be used.
If the transducer element is a voltage generator, the first stage is usually
an amplifier. If the transducer produces a current output, as is the case in many
electromagnetic detectors, then a current-to-voltage amplifier (also termed a
transconductance amplifier) is used to produce a voltage output.
FIGURE 1.2 A thermistor (a semiconductor that changes resistance as a function
of temperature) used in a constant current configuration.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.3 A strain gage probe used to measure motility of the intestine. The
bridge circuit is used to convert differential change in resistance from a pair of
strain gages into a change in voltage.
Figure 1.4 shows a photodiode transducer used with a transconductance
amplifier. The output voltage is proportional to the current through the photodi-
ode: Vout = RfIdiode. Bandwidth can be increased at the expense of added noise by
reverse biasing the photodiode with a small voltage.* More sophisticated detec-
tion systems such as phase sensitive detectors (PSD) can be employed in some
cases to improve noise rejection. A software implementation of PSD is de-
scribed in Chapter 8. In a few circumstances, additional amplification beyond
the first stage may be required.
SOURCES OF VARIABILITY: NOISE
In this text, noise is a very general and somewhat relative term: noise is what
you do not want and signal is what you do want. Noise is inherent in most
measurement systems and often the limiting factor in the performance of a medi-
cal instrument. Indeed, many signal processing techniques are motivated by the
*A bias voltage improves movement of charge through the diode decreasing the response time.
From −10 to −50 volts are used, except in the case of avalanche photodiodes where a higher voltage
is required.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.4 Photodiode used in a transconductance amplifier.
desire to minimize the variability in the measurement. In biomedical measure-
ments, variability has four different origins: (1) physiological variability; (2) en-
vironmental noise or interference; (3) transducer artifact; and (4) electronic noise.
Physiological variability is due to the fact that the information you desire is based
on a measurement subject to biological influences other than those of interest.
For example, assessment of respiratory function based on the measurement of
blood pO2 could be confounded by other physiological mechanisms that alter
blood pO2. Physiological variability can be a very difficult problem to solve,
sometimes requiring a totally different approach.
Environmental noise can come from sources external or internal to the
body. A classic example is the measurement of fetal ECG where the desired
signal is corrupted by the mother’s ECG. Since it is not possible to describe the
specific characteristics of environmental noise, typical noise reduction tech-
niques such as filtering are not usually successful. Sometimes environmental
noise can be reduced using adaptive techniques such as those described in Chap-
ter 8 since these techniques do not require prior knowledge of noise characteris-
tics. Indeed, one of the approaches described in Chapter 8, adaptive noise can-
cellation, was initially developed to reduce the interference from the mother in
the measurement of fetal ECG.
Transducer artifact is produced when the transducer responds to energy
modalities other than that desired. For example, recordings of electrical poten-
tials using electrodes placed on the skin are sensitive to motion artifact, where
the electrodes respond to mechanical movement as well as the desired electrical
signal. Transducer artifacts can sometimes be successfully addressed by modifi-
cations in transducer design. Aerospace research has led to the development of
electrodes that are quite insensitive to motion artifact.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Unlike the other sources of variability, electronic noise has well-known
sources and characteristics. Electronic noise falls into two broad classes: thermal
or Johnson noise, and shot noise. The former is produced primarily in resistor
or resistance materials while the latter is related to voltage barriers associated
with semiconductors. Both sources produce noise with a broad range of frequen-
cies often extending from DC to 1012
–1013
Hz. Such a broad spectrum noise is
referred to as white noise since it contains energy at all frequencies (or at least
all the frequencies of interest to biomedical engineers). Figure 1.5 shows a plot
of power density versus frequency for white noise calculated from a noise wave-
form (actually an array of random numbers) using the spectra analysis methods
described in Chapter 3. Note that its energy is fairly constant across the spectral
range.
The various sources of noise or variability along with their causes and
possible remedies are presented in Table 1.2 below. Note that in three out of
four instances, appropriate transducer design was useful in the reduction of the
FIGURE 1.5 Power density (power spectrum) of digitizied white noise showing a
fairly constant value over frequency.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
TABLE 1.2 Sources of Variability
Source Cause Potential Remedy
Physiological Measurement only indi- Modify overall approach
variability rectly related to variable
of interest
Environmental Other sources of similar Noise cancellation
(internal or external) energy form Transducer design
Artifact Transducer responds to Transducer design
other energy sources
Electronic Thermal or shot noise Transducer or electronic
design
variability or noise. This demonstrates the important role of the transducer in
the overall performance of the instrumentation system.
Electronic Noise
Johnson or thermal noise is produced by resistance sources, and the amount of
noise generated is related to the resistance and to the temperature:
VJ = √4kT R B volts (2)
where R is the resistance in ohms, T the temperature in degrees Kelvin, and k
is Boltzman’s constant (k = 1.38 × 10−23
J/°K).* B is the bandwidth, or range of
frequencies, that is allowed to pass through the measurement system. The sys-
tem bandwidth is determined by the filter characteristics in the system, usually
the analog filtering in the system (see the next section).
If noise current is of interest, the equation for Johnson noise current can
be obtained from Eq. (2) in conjunction with Ohm’s law:
IJ = √4kT B/R amps (3)
Since Johnson noise is spread evenly over all frequencies (at least in the-
ory), it is not possible to calculate a noise voltage or current without specifying
B, the frequency range. Since the bandwidth is not always known in advance, it
is common to describe a relative noise; specifically, the noise that would occur
if the bandwidth were 1.0 Hz. Such relative noise specification can be identified
by the unusual units required: volts/√Hz or amps/√Hz.
*A temperature of 310 °K is often used as room temperature, in which case 4kT = 1.7 × 10−20
J.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Shot noise is defined as a current noise and is proportional to the baseline
current through a semiconductor junction:
Is = √2q Id B amps (4)
where q is the charge on an electron (1.662 × 10−19
coulomb), and Id is the
baseline semiconductor current. In photodetectors, the baseline current that gen-
erates shot noise is termed the dark current, hence, the symbol Id in Eq. (4).
Again, since the noise is spread across all frequencies, the bandwidth, BW, must
be specified to obtain a specific value, or a relative noise can be specified in
amps/√Hz.
When multiple noise sources are present, as is often the case, their voltage
or current contributions to the total noise add as the square root of the sum of
the squares, assuming that the individual noise sources are independent. For
voltages:
VT = (V2
1 + V2
2 + V2
3 + ؒ ؒ ؒ + V2
N)1/2
(5)
A similar equation applies to current. Noise properties are discussed fur-
ther in Chapter 2.
Signal-to-Noise Ratio
Most waveforms consist of signal plus noise mixed together. As noted pre-
viously, signal and noise are relative terms, relative to the task at hand: the
signal is that portion of the waveform of interest while the noise is everything
else. Often the goal of signal processing is to separate out signal from noise, to
identify the presence of a signal buried in noise, or to detect features of a signal
buried in noise.
The relative amount of signal and noise present in a waveform is usually
quantified by the signal-to-noise ratio, SNR. As the name implies, this is simply
the ratio of signal to noise, both measured in RMS (root-mean-squared) ampli-
tude. The SNR is often expressed in "db" (short for decibels) where:
SNR = 20 log ͩSignal
Noiseͪ (6)
To convert from db scale to a linear scale:
SNRlinear = 10db/20
(7)
For example, a ratio of 20 db means that the RMS value of the signal was
10 times the RMS value of the noise (1020/20
= 10), +3 db indicates a ratio of
1.414 (103/20
= 1.414), 0 db means the signal and noise are equal in RMS value,
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
−3 db means that the ratio is 1/1.414, and −20 db means the signal is 1/10 of
the noise in RMS units. Figure 1.6 shows a sinusoidal signal with various
amounts of white noise. Note that is it is difficult to detect presence of the signal
visually when the SNR is −3 db, and impossible when the SNR is −10 db. The
ability to detect signals with low SNR is the goal and motivation for many of
the signal processing tools described in this text.
ANALOG FILTERS: FILTER BASICS
The analog signal processing circuitry shown in Figure 1.1 will usually contain
some filtering, both to remove noise and appropriately condition the signal for
FIGURE 1.6 A 30 Hz sine wave with varying amounts of added noise. The sine
wave is barely discernable when the SNR is −3db and not visible when the SNR
is −10 db.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
analog-to-digital conversion (ADC). It is this filtering that usually establishes
the bandwidth of the system for noise calculations [the bandwidth used in Eqs.
(2)–(4)]. As shown later, accurate conversion of the analog signal to digital
format requires that the signal contain frequencies no greater than 1⁄2 the sam-
pling frequency. This rule applies to the analog waveform as a whole, not just
the signal of interest. Since all transducers and electronics produce some noise
and since this noise contains a wide range of frequencies, analog lowpass filter-
ing is usually essential to limit the bandwidth of the waveform to be converted.
Waveform bandwidth and its impact on ADC will be discussed further in Chap-
ter 2. Filters are defined by several properties: filter type, bandwidth, and attenu-
ation characteristics. The last can be divided into initial and final characteristics.
Each of these properties is described and discussed in the next section.
Filter Types
Analog filters are electronic devices that remove selected frequencies. Filters
are usually termed according to the range of frequencies they do not suppress.
Thus, lowpass filters allow low frequencies to pass with minimum attenuation
while higher frequencies are attenuated. Conversely, highpass filters pass high
frequencies, but attenuate low frequencies. Bandpass filters reject frequencies
above and below a passband region. An exception to this terminology is the
bandstop filter, which passes frequencies on either side of a range of attenuated
frequencies.
Within each class, filters are also defined by the frequency ranges that
they pass, termed the filter bandwidth, and the sharpness with which they in-
crease (or decrease) attenuation as frequency varies. Spectral sharpness is speci-
fied in two ways: as an initial sharpness in the region where attenuation first
begins and as a slope further along the attenuation curve. These various filter
properties are best described graphically in the form of a frequency plot (some-
times referred to as a Bode plot), a plot of filter gain against frequency. Filter
gain is simply the ratio of the output voltage divided by the input voltage, Vout/
Vin, often taken in db. Technically this ratio should be defined for all frequencies
for which it is nonzero, but practically it is usually stated only for the frequency
range of interest. To simplify the shape of the resultant curves, frequency plots
sometimes plot gain in db against the log of frequency.* When the output/input
ratio is given analytically as a function of frequency, it is termed the transfer
function. Hence, the frequency plot of a filter’s output/input relationship can be
*When gain is plotted in db, it is in logarithmic form, since the db operation involves taking the
log [Eq. (6)]. Plotting gain in db against log frequency puts the two variables in similar metrics and
results in straighter line plots.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
viewed as a graphical representation of the transfer function. Frequency plots
for several different filter types are shown in Figure 1.7.
Filter Bandwidth
The bandwidth of a filter is defined by the range of frequencies that are not
attenuated. These unattenuated frequencies are also referred to as passband fre-
quencies. Figure 1.7A shows that the frequency plot of an ideal filter, a filter
that has a perfectly flat passband region and an infinite attenuation slope. Real
filters may indeed be quite flat in the passband region, but will attenuate with a
FIGURE 1.7 Frequency plots of ideal and realistic filters. The frequency plots
shown here have a linear vertical axis, but often the vertical axis is plotted in db.
The horizontal axis is in log frequency. (A) Ideal lowpass filter. (B) Realistic low-
pass filter with a gentle attenuation characteristic. (C) Realistic lowpass filter with
a sharp attenuation characteristic. (D) Bandpass filter.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
more gentle slope, as shown in Figure 1.7B. In the case of the ideal filter, Figure
1.7A, the bandwidth or region of unattenuated frequencies is easy to determine;
specifically, it is between 0.0 and the sharp attenuation at fc Hz. When the
attenuation begins gradually, as in Figure 1.7B, defining the passband region is
problematic. To specify the bandwidth in this filter we must identify a frequency
that defines the boundary between the attenuated and non-attenuated portion of
the frequency characteristic. This boundary has been somewhat arbitrarily de-
fined as the frequency when the attenuation is 3 db.* In Figure 1.7B, the filter
would have a bandwidth of 0.0 to fc Hz, or simply fc Hz. The filter in Figure
1.7C has a sharper attenuation characteristic, but still has the same bandwidth
( fc Hz). The bandpass filter of Figure 1.7D has a bandwidth of fh − fl Hz.
Filter Order
The slope of a filter’s attenuation curve is related to the complexity of the filter:
more complex filters have a steeper slope better approaching the ideal. In analog
filters, complexity is proportional to the number of energy storage elements in
the circuit (which could be either inductors or capacitors, but are generally ca-
pacitors for practical reasons). Using standard circuit analysis, it can be shown
that each energy storage device leads to an additional order in the polynomial
of the denominator of the transfer function that describes the filter. (The denom-
inator of the transfer function is also referred to as the characteristic equation.)
As with any polynomial equation, the number of roots of this equation will
depend on the order of the equation; hence, filter complexity (i.e., the number
of energy storage devices) is equivalent to the number of roots in the denomina-
tor of the Transfer Function. In electrical engineering, it has long been common
to call the roots of the denominator equation poles. Thus, the complexity of the
filter is also equivalent to the number of poles in the transfer function. For
example, a second-order or two-pole filter has a transfer function with a second-
order polynomial in the denominator and would contain two independent energy
storage elements (very likely two capacitors).
Applying asymptote analysis to the transfer function, is not difficult to
show that the slope of a second-order lowpass filter (the slope for frequencies
much greater than the cutoff frequency, fc) is 40 db/decade specified in log-log
terms. (The unusual units, db/decade are a result of the log-log nature of the
typical frequency plot.) That is, the attenuation of this filter increases linearly
on a log-log scale by 40 db (a factor of 100 on a linear scale) for every order
of magnitude increase in frequency. Generalizing, for each filter pole (or order)
*This defining point is not entirely arbitrary because when the signal is attenuated 3 db, its ampli-
tude is 0.707 (10−3/20
) of what it was in the passband region and it has half the power of the unattenu-
ated signal (since 0.7072
= 1/2). Accordingly this point is also known as the half-power point.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
the downward slope (sometimes referred to as the rolloff) is increased by 20
db/decade. Figure 1.8 shows the frequency plot of a second-order (two-pole
with a slope of 40 db/decade) and a 12th-order lowpass filter, both having the
same cutoff frequency, fc, and hence, the same bandwidth. The steeper slope or
rolloff of the 12-pole filter is apparent. In principle, a 12-pole lowpass filter
would have a slope of 240 db/decade (12 × 20 db/decade). In fact, this fre-
quency characteristic is theoretical because in real analog filters parasitic com-
ponents and inaccuracies in the circuit elements limit the actual attenuation that
can be obtained. The same rationale applies to highpass filters except that the
frequency plot decreases with decreasing frequency at a rate of 20 db/decade
for each highpass filter pole.
Filter Initial Sharpness
As shown in Figure 1.8, both the slope and the initial sharpness increase with
filter order (number of poles), but increasing filter order also increases the com-
FIGURE 1.8 Frequency plot of a second-order (2-pole) and a 12th-order lowpass
filter with the same cutoff frequency. The higher order filter more closely ap-
proaches the sharpness of an ideal filter.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
plexity, hence the cost, of the filter. It is possible to increase the initial sharpness
of the filter’s attenuation characteristics without increasing the order of the filter,
if you are willing to except some unevenness, or ripple, in the passband. Figure
1.9 shows two lowpass, 4th
-order filters, differing in the initial sharpness of the
attenuation. The one marked Butterworth has a smooth passband, but the initial
attenuation is not as sharp as the one marked Chebychev; which has a passband
that contains ripples. This property of analog filters is also seen in digital filters
and will be discussed in detail in Chapter 4.
FIGURE 1.9 Two filters having the same order (4-pole) and cutoff frequency, but
differing in the sharpness of the initial slope. The filter marked Chebychev has a
steeper initial slope or rolloff, but contains ripples in the passband.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
ANALOG-TO-DIGITAL CONVERSION: BASIC CONCEPTS
The last analog element in a typical measurement system is the analog-to-digital
converter (ADC), Figure 1.1. As the name implies, this electronic component
converts an analog voltage to an equivalent digital number. In the process of
analog-to-digital conversion an analog or continuous waveform, x(t), is con-
verted into a discrete waveform, x(n), a function of real numbers that are defined
only at discrete integers, n. To convert a continuous waveform to digital format
requires slicing the signal in two ways: slicing in time and slicing in amplitude
(Figure 1.10).
Slicing the signal into discrete points in time is termed time sampling or
simply sampling. Time slicing samples the continuous waveform, x(t), at dis-
crete prints in time, nTs, where Ts is the sample interval. The consequences of
time slicing are discussed in the next chapter. The same concept can be applied
to images wherein a continuous image such as a photograph that has intensities
that vary continuously across spatial distance is sampled at distances of S mm.
In this case, the digital representation of the image is a two-dimensional array.
The consequences of spatial sampling are discussed in Chapter 11.
Since the binary output of the ADC is a discrete integer while the analog
signal has a continuous range of values, analog-to-digital conversion also re-
quires the analog signal to be sliced into discrete levels, a process termed quanti-
zation, Figure 1.10. The equivalent number can only approximate the level of
FIGURE 1.10 Converting a continuous signal (solid line) to discrete format re-
quires slicing the signal in time and amplitude. The result is a series of discrete
points (X’s) that approximate the original signal.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
the analog signal, and the degree of approximation will depend on the range of
binary numbers and the amplitude of the analog signal. For example, if the
output of the ADC is an 8-bit binary number capable of 28
or 256 discrete states,
and the input amplitude range is 0.0–5.0 volts, then the quantization interval
will be 5/256 or 0.0195 volts. If, as is usually the case, the analog signal is time
varying in a continuous manner, it must be approximated by a series of binary
numbers representing the approximate analog signal level at discrete points in
time (Figure 1.10). The errors associated with amplitude slicing, or quantization,
are described in the next section, and the potential error due to sampling is
covered in Chapter 2. The remainder of this section briefly describes the hard-
ware used to achieve this approximate conversion.
Analog-to-Digital Conversion Techniques
Various conversion rules have been used, but the most common is to convert
the voltage into a proportional binary number. Different approaches can be used
to implement the conversion electronically; the most common is the successive
approximation technique described at the end of this section. ADC’s differ in
conversion range, speed of conversion, and resolution. The range of analog volt-
ages that can be converted is frequently software selectable, and may, or may
not, include negative voltages. Typical ranges are from 0.0–10.0 volts or less,
or if negative values are possible ± 5.0 volts or less. The speed of conversion
is specified in terms of samples per second, or conversion time. For example,
an ADC with a conversion time of 10 µsec should, logically, be able to operate
at up to 100,000 samples per second (or simply 100 kHz). Typical conversion
rates run up to 500 kHz for moderate cost converters, but off-the-shelf converters
can be obtained with rates up to 10–20 MHz. Except for image processing
systems, lower conversion rates are usually acceptable for biological signals.
Even image processing systems may use downsampling techniques to reduce
the required ADC conversion rate and, hence, the cost.
A typical ADC system involves several components in addition to the
actual ADC element, as shown in Figure 1.11. The first element is an N-to-1
analog switch that allows multiple input channels to be converted. Typical ADC
systems provide up to 8 to 16 channels, and the switching is usually software-
selectable. Since a single ADC is doing the conversion for all channels, the
conversion rate for any given channel is reduced in proportion to the number of
channels being converted. Hence, an ADC system with converter element that
had a conversion rate of 50 kHz would be able to sample each of eight channels
at a theoretical maximum rate of 50/8 = 6.25 kHz.
The Sample and Hold is a high-speed switch that momentarily records the
input signal, and retains that signal value at its output. The time the switch is
closed is termed the aperture time. Typical values range around 150 ns, and,
except for very fast signals, can be considered basically instantaneous. This
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.11 Block diagram of a typical analog-to-digital conversion system.
instantaneously sampled voltage value is held (as a charge on a capacitor) while
the ADC element determines the equivalent binary number. Again, it is the
ADC element that determines the overall speed of the conversion process.
Quantization Error
Resolution is given in terms of the number of bits in the binary output with the
assumption that the least significant bit (LSB) in the output is accurate (which
may not always be true). Typical converters feature 8-, 12-, and 16-bit output
with 12 bits presenting a good compromise between conversion resolution and
cost. In fact, most signals do not have a sufficient signal-to-noise ratio to justify
a higher resolution; you are simply obtaining a more accurate conversion of the
noise. For example, assuming that converter resolution is equivalent to the LSB,
then the minimum voltage that can be resolved is the same as the quantization
voltage described above: the voltage range divided by 2N
, where N is the number
of bits in the binary output. The resolution of a 5-volt, 12-bit ADC is 5.0/212
=
5/4096 = 0.0012 volts. The dynamic range of a 12-bit ADC, the range from the
smallest to the largest voltage it can convert, is from 0.0012 to 5 volts: in db
this is 20 * log*1012
* = 167 db. Since typical signals, especially those of biologi-
cal origin, have dynamic ranges rarely exceeding 60 to 80 db, a 12-bit converter
with the dynamic range of 167 db may appear to be overkill. However, having
this extra resolution means that not all of the range need be used, and since 12-
bit ADC’s are only marginally more expensive than 8-bit ADC’s they are often
used even when an 8-bit ADC (with dynamic range of over 100 DB, would be
adequate). A 12-bit output does require two bytes to store and will double the
memory requirements over an 8-bit ADC.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The number of bits used for conversion sets a lower limit on the resolu-
tion, and also determines the quantization error (Figure 1.12). This error can be
thought of as a noise process added to the signal. If a sufficient number of
quantization levels exist (say N > 64), the distortion produced by quantization
error may be modeled as additive independent white noise with zero mean with
the variance determined by the quantization step size, δ = VMAX/2N
. Assuming
that the error is uniformly distributed between −δ/2 +δ/2, the variance, σ, is:
σ = ∫
δ/2
−δ/2
η2
/δ dη = V2
Max (2−2N
)/12 (8)
Assuming a uniform distribution, the RMS value of the noise would be
just twice the standard deviation, σ.
Further Study: Successive Approximation
The most popular analog-to-digital converters use a rather roundabout strategy
to find the binary number most equivalent to the input analog voltage—a digi-
tal-to-analog converter (DAC) is placed in a feedback loop. As shown Figure
1.13, an initial binary number stored in the buffer is fed to a DAC to produce a
FIGURE 1.12 Quantization (amplitude slicing) of a continuous waveform. The
lower trace shows the error between the quantized signal and the input.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.13 Block diagram of an analog-to-digital converter. The input analog
voltage is compared with the output of a digital-to-analog converter. When the
two voltages match, the number held in the binary buffer is equivalent to the input
voltage with the resolution of the converter. Different strategies can be used to
adjust the contents of the binary buffer to attain a match.
proportional voltage, VDAC. This DAC voltage, VDAC, is then compared to the
input voltage, and the binary number in the buffer is adjusted until the desired
level of match between VDAC and Vin is obtained. This approach begs the question
“How are DAC’s constructed?” In fact, DAC’s are relatively easy to construct
using a simple ladder network and the principal of current superposition.
The controller adjusts the binary number based on whether or not the
comparator finds the voltage out of the DAC, VDAC, to be greater or less than
the input voltage, Vin. One simple adjustment strategy is to increase the binary
number by one each cycle if VDAC < Vin, or decrease it otherwise. This so-called
tracking ADC is very fast when Vin changes slowly, but can take many cycles
when Vin changes abruptly (Figure 1.14). Not only can the conversion time be
quite long, but it is variable since it depends on the dynamics of the input signal.
This strategy would not easily allow for sampling an analog signal at a fixed
rate due to the variability in conversion time.
An alternative strategy termed successive approximation allows the con-
version to be done at a fixed rate and is well-suited to digital technology. The
successive approximation strategy always takes the same number of cycles irre-
spective of the input voltage. In the first cycle, the controller sets the most
significant bit (MSB) of the buffer to 1; all others are cleared. This binary
number is half the maximum possible value (which occurs when all the bits are
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.14 Voltage waveform of an ADC that uses a tracking strategy. The
ADC voltage (solid line) follows the input voltage (dashed line) fairly closely when
the input voltage varies slowly, but takes many cycles to “catch up” to an abrupt
change in input voltage.
1), so the DAC should output a voltage that is half its maximum voltage—that
is, a voltage in the middle of its range. If the comparator tells the controller that
Vin > VDAC, then the input voltage, Vin, must be greater than half the maximum
range, and the MSB is left set. If Vin < VDAC, then that the input voltage is in the
lower half of the range and the MSB is cleared (Figure 1.15). In the next cycle,
the next most significant bit is set, and the same comparison is made and the
same bit adjustment takes place based on the results of the comparison (Figure
1.15).
After N cycles, where N is the number of bits in the digital output, the
voltage from the DAC, VDAC, converges to the best possible fit to the input
voltage, Vin. Since Vin Ϸ VDAC, the number in the buffer, which is proportional
to VDAC, is the best representation of the analog input voltage within the resolu-
tion of the converter. To signal the end of the conversion process, the ADC puts
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.15 Vin and VDAC in a 6-bit ADC using the successive approximation
strategy. In the first cycle, the MSB is set (solid line) since Vin > VDAC . In the next
two cycles, the bit being tested is cleared because Vin < VDAC when this bit was
set. For the fourth and fifth cycles the bit being tested remained set and for the
last cycle it was cleared. At the end of the sixth cycle a conversion complete flag
is set to signify the end of the conversion process.
out a digital signal or flag indicating that the conversion is complete (Figure
1.15).
TIME SAMPLING: BASICS
Time sampling transforms a continuous analog signal into a discrete time signal,
a sequence of numbers denoted as x(n) = [x1, x2, x3, . . . xN],* Figure 1.16 (lower
trace). Such a representation can be thought of as an array in computer memory.
(It can also be viewed as a vector as shown in the next chapter.) Note that the
array position indicates a relative position in time, but to relate this number
sequence back to an absolute time both the sampling interval and sampling onset
time must be known. However, if only the time relative to conversion onset is
important, as is frequently the case, then only the sampling interval needs to be
*In many textbooks brackets, [ ], are used to denote digitized variables; i.e., x[n]. Throughout this
text we reserve brackets to indicate a series of numbers, or vector, following the MATLAB format.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.16 A continuous signal (upper trace) is sampled at discrete points in
time and stored in memory as an array of proportional numbers (lower trace).
known. Converting back to relative time is then achieved by multiplying the
sequence number, n, by the sampling interval, Ts: x(t) = x(nTs).
Sampling theory is discussed in the next chapter and states that a sinusoid
can be uniquely reconstructed providing it has been sampled by at least two
equally spaced points over a cycle. Since Fourier series analysis implies that
any signal can be represented is a series of sin waves (see Chapter 3), then by
extension, a signal can be uniquely reconstructed providing the sampling fre-
quency is twice that of the highest frequency in the signal. Note that this highest
frequency component may come from a noise source and could be well above
the frequencies of interest. The inverse of this rule is that any signal that con-
tains frequency components greater than twice the sampling frequency cannot
be reconstructed, and, hence, its digital representation is in error. Since this error
is introduced by undersampling, it is inherent in the digital representation and
no amount of digital signal processing can correct this error. The specific nature
of this under-sampling error is termed aliasing and is described in a discussion
of the consequences of sampling in Chapter 2.
From a practical standpoint, aliasing must be avoided either by the use of
very high sampling rates—rates that are well above the bandwidth of the analog
system—or by filtering the analog signal before analog-to-digital conversion.
Since extensive sampling rates have an associated cost, both in terms of the
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
ADC required and memory costs, the latter approach is generally preferable.
Also note that the sampling frequency must be twice the highest frequency
present in the input signal, not to be confused with the bandwidth of the analog
signal. All frequencies in the sampled waveform greater than one half the sam-
pling frequency (one-half the sampling frequency is sometimes referred to as
the Nyquist frequency) must be essentially zero, not merely attenuated. Recall
that the bandwidth is defined as the frequency for which the amplitude is re-
duced by only 3 db from the nominal value of the signal, while the sampling
criterion requires that the value be reduced to zero. Practically, it is sufficient
to reduce the signal to be less than quantization noise level or other acceptable
noise level. The relationship between the sampling frequency, the order of the
anti-aliasing filter, and the system bandwidth is explored in a problem at the
end of this chapter.
Example 1.1. An ECG signal of 1 volt peak-to-peak has a bandwidth of
0.01 to 100 Hz. (Note this frequency range has been established by an official
standard and is meant to be conservative.) Assume that broadband noise may
be present in the signal at about 0.1 volts (i.e., −20 db below the nominal signal
level). This signal is filtered using a four-pole lowpass filter. What sampling
frequency is required to insure that the error due to aliasing is less than −60 db
(0.001 volts)?
Solution. The noise at the sampling frequency must be reduced another
40 db (20 * log (0.1/0.001)) by the four-pole filter. A four-pole filter with a
cutoff of 100 Hz (required to meet the fidelity requirements of the ECG signal)
would attenuate the waveform at a rate of 80 db per decade. For a four-pole
filter the asymptotic attenuation is given as:
Attenuation = 80 log(f2/fc) db
To achieve the required additional 40 db of attenuation required by the
problem from a four-pole filter:
80 log(f2/fc) = 40 log(f2/fc) = 40/80 = 0.5
f2/fc = 10.5 =; f2 = 3.16 × 100 = 316 Hz
Thus to meet the sampling criterion, the sampling frequency must be at
least 632 Hz, twice the frequency at which the noise is adequately attenuated.
The solution is approximate and ignores the fact that the initial attenuation of
the filter will be gradual. Figure 1.17 shows the frequency response characteris-
tics of an actual 4-pole analog filter with a cutoff frequency of 100 Hz. This
figure shows that the attenuation is 40 db at approximately 320 Hz. Note the
high sampling frequency that is required for what is basically a relatively low
frequency signal (the ECG). In practice, a filter with a sharper cutoff, perhaps
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 1.17 Detailed frequency plot (on a log-log scale) of a 4-pole and 8-pole
filter, both having a cutoff frequency of 100 Hz.
an 8-pole filter, would be a better choice in this situation. Figure 1.17 shows
that the frequency response of an 8-pole filter with the same 100 Hz frequency
provides the necessary attenuation at less than 200 Hz. Using this filter, the
sampling frequency could be lowered to under 400 Hz.
FURTHER STUDY: BUFFERING
AND REAL-TIME DATA PROCESSING
Real-time data processing simply means that the data is processed and results
obtained in sufficient time to influence some ongoing process. This influence
may come directly from the computer or through human intervention. The pro-
cessing time constraints naturally depend on the dynamics of the process of
interest. Several minutes might be acceptable for an automated drug delivery
system, while information on the electrical activity the heart needs to be imme-
diately available.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The term buffer, when applied digital technology, usually describes a set
of memory locations used to temporarily store incoming data until enough data
is acquired for efficient processing. When data is being acquired continuously,
a technique called double buffering can be used. Incoming data is alternatively
sent to one of two memory arrays, and the one that is not being filled is pro-
cessed (which may involve simply transfer to disk storage). Most ADC software
packages provide a means for determining which element in an array has most
recently been filled to facilitate buffering, and frequently the ability to determine
which of two arrays (or which half of a single array) is being filled to facilitate
double buffering.
DATA BANKS
With the advent of the World Wide Web it is not always necessary to go through
the analog-to-digital conversion process to obtain digitized data of physiological
signals. A number of data banks exist that provide physiological signals such as
ECG, EEG, gait, and other common biosignals in digital form. Given the volatil-
ity and growth of the Web and the ease with which searches can be made, no
attempt will be made to provide a comprehensive list of appropriate Websites.
However, a good source of several common biosignals, particularly the ECG, is
the Physio Net Data Bank maintained by MIT—http://www.physionet.org. Some
data banks are specific to a given set of biosignals or a given signal processing
approach. An example of the latter is the ICALAB Data Bank in Japan—http://
www.bsp.brain.riken.go.jp/ICALAB/—which includes data that can be used to
evaluate independent component analysis (see Chapter 9) algorithms.
Numerous other data banks containing biosignals and/or images can be
found through a quick search of the Web, and many more are likely to come
online in the coming years. This is also true for some of the signal processing
algorithms as will be described in more detail later. For example, the ICALAB
Website mentioned above also has algorithms for independent component analy-
sis in MATLAB m-file format. A quick Web search can provide both signal
processing algorithms and data that can be used to evaluate a signal processing
system under development. The Web is becoming an evermore useful tool in
signal and image processing, and a brief search of the Web can save consider-
able time in the development process, particularly if the signal processing sys-
tem involves advanced approaches.
PROBLEMS
1. A single sinusoidal signal is contained in noise. The RMS value of the noise
is 0.5 volts and the SNR is 10 db. What is the peak-to-peak amplitude of the
sinusoid?
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
2. A resistor produces 10 µV noise when the room temperature is 310°K and
the bandwidth is 1 kHz. What current noise would be produced by this resistor?
3. The noise voltage out of a 1 MΩ resistor was measured using a digital volt
meter as 1.5 µV at a room temperature of 310 °K. What is the effective band-
width of the voltmeter?
4. The photodetector shown in Figure 1.4 has a sensitivity of 0.3µA/µW (at a
wavelength of 700 nm). In this circuit, there are three sources of noise. The
photodetector has a dark current of 0.3 nA, the resistor is 10 MΩ, and the
amplifier has an input current noise of 0.01 pA/√Hz. Assume a bandwidth of
10 kHz. (a) Find the total noise current input to the amplifier. (b) Find the
minimum light flux signal that can be detected with an SNR = 5.
5. A lowpass filter is desired with the cutoff frequency of 10 Hz. This filter
should attenuate a 100 Hz signal by a factor of 85. What should be the order of
this filter?
6. You are given a box that is said to contain a highpass filter. You input a
series of sine waves into the box and record the following output:
Frequency (Hz): 2 10 20 60 100 125 150 200 300 400
Vout volts rms: .15×10−7
0.1×10−3
0.002 0.2 1.5 3.28 4.47 4.97 4.99 5.0
What is the cutoff frequency and order of this filter?
7. An 8-bit ADC converter that has an input range of ± 5 volts is used to
convert a signal that varies between ± 2 volts. What is the SNR of the input if
the input noise equals the quantization noise of the converter?
8. As elaborated in Chapter 2, time sampling requires that the maximum fre-
quency present in the input be less than fs/2 for proper representation in digital
format. Assume that the signal must be attenuated by a factor of 1000 to be
considered “not present.” If the sampling frequency is 10 kHz and a 4th-order
lowpass anti-aliasing filter is used prior to analog-to-digital conversion, what
should be the bandwidth of the sampled signal? That is, what must the cutoff
frequency be of the anti-aliasing lowpass filter?
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
10
Fundamentals of Image Processing:
MATLAB Image Processing Toolbox
IMAGE PROCESSING BASICS: MATLAB IMAGE FORMATS
Images can be treated as two-dimensional data, and many of the signal process-
ing approaches presented in the previous chapters are equally applicable to im-
ages: some can be directly applied to image data while others require some
modification to account for the two (or more) data dimensions. For example,
both PCA and ICA have been applied to image data treating the two-dimen-
sional image as a single extended waveform. Other signal processing methods
including Fourier transformation, convolution, and digital filtering are applied to
images using two-dimensional extensions. Two-dimensional images are usually
represented by two-dimensional data arrays, and MATLAB follows this tradi-
tion;* however, MATLAB offers a variety of data formats in addition to the
standard format used by most MATLAB operations. Three-dimensional images
can be constructed using multiple two-dimensional representations, but these
multiple arrays are sometimes treated as a single volume image.
General Image Formats: Image Array Indexing
Irrespective of the image format or encoding scheme, an image is always repre-
sented in one, or more, two dimensional arrays, I(m,n). Each element of the
*Actually, MATLAB considers image data arrays to be three-dimensional, as described later in this
chapter.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
variable, I, represents a single picture element, or pixel. (If the image is being
treated as a volume, then the element, which now represents an elemental vol-
ume, is termed a voxel.) The most convenient indexing protocol follows the
traditional matrix notation, with the horizontal pixel locations indexed left to
right by the second integer, n, and the vertical locations indexed top to bottom
by the first integer m (Figure 10.1). This indexing protocol is termed pixel coor-
dinates by MATLAB. A possible source of confusion with this protocol is that
the vertical axis positions increase from top to bottom and also that the second
integer references the horizontal axis, the opposite of conventional graphs.
MATLAB also offers another indexing protocol that accepts non-integer
indexes. In this protocol, termed spatial coordinates, the pixel is considered to
be a square patch, the center of which has an integer value. In the default coordi-
nate system, the center of the upper left-hand pixel still has a reference of (1,1),
but the upper left-hand corner of this pixel has coordinates of (0.5,0.5) (see
Figure 10.2). In this spatial coordinate system, the locations of image coordi-
nates are positions on a (discrete) plane and are described by general variables
x and y. The are two sources of potential confusion with this system. As with
the pixel coordinate system, the vertical axis increases downward. In addition,
the positions of the vertical and horizontal indexes (now better though of as
coordinates) are switched: the horizontal index is first, followed by the vertical
coordinate, as with conventional x,y coordinate references. In the default spatial
coordinate system, integer coordinates correspond with their pixel coordinates,
remembering the position swap, so that I(5,4) in pixel coordinates references
the same pixel as I(4.0,5.0) in spatial coordinates. Most routines expect a
specific pixel coordinate system and produce outputs in that system. Examples
of spatial coordinates are found primarily in the spatial transformation routines
described in the next chapter.
It is possible to change the baseline reference in the spatial coordinate
FIGURE 10.1 Indexing format for MATLAB images using the pixel coordinate sys-
tem. This indexing protocol follows the standard matrix notation.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 10.2 Indexing in the spatial coordinate system.
system as certain commands allow you to redefine the coordinates of the refer-
ence corner. This option is described in context with related commands.
Data Classes: Intensity Coding Schemes
There are four different data classes, or encoding schemes, used by MATLAB
for image representation. Moreover, each of these data classes can store the data
in a number of different formats. This variety reflects the variety in image types
(color, grayscale, and black and white), and the desire to represent images as
efficiently as possible in terms of memory storage. The efficient use of memory
storage is motivated by the fact that images often require a large numbers of
array locations: an image of 400 by 600 pixels will require 240,000 data points,
each of which will need one or more bytes depending of the data format.
The four different image classes or encoding schemes are: indexed images,
RGB images, intensity images, and binary images. The first two classes are used
to store color images. In indexed images, the pixel values are, themselves, in-
dexes to a table that maps the index value to a color value. While this is an
efficient way to store color images, the data sets do not lend themselves to
arithmetic operations (and, hence, most image processing operations) since the
results do not always produce meaningful images. Indexed images also need an
associated matrix variable that contains the colormap, and this map variable
needs to accompany the image variable in many operations. Colormaps are N
by 3 matrices that function as lookup tables. The indexed data variable points
to a particular row in the map and the three columns associated with that row
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
contain the intensity of the colors red, green, and blue. The values of the three
columns range between 0 and 1 where 0 is the absence of the related color and
1 is the strongest intensity of that color. MATLAB convention suggests that
indexed arrays use variable names beginning in x.. (or simply x) and the sug-
gested name for the colormap is map. While indexed variables are not very
useful in image processing operations, they provide a compact method of storing
color images, and can produce effective displays. They also provide a conve-
nient and flexible method for colorizing grayscale data to produce a pseudocolor
image.
The MATLAB Image Processing Toolbox provides a number of useful
prepackaged colormaps. These colormaps can implemented with any number of
rows, but the default is 64 rows. Hence, if any of these standard colormaps are
used with the default value, the indexed data should be scaled to range between
0 and 64 to prevent saturation. An example of the application of a MATLAB
colormap is given in Example 10.3. An extension of that example demonstrates
methods for colorizing grayscale data using a colormap.
The other method for coding color image is the RGB coding scheme in
which three different, but associated arrays are used to indicate the intensity of
the three color components of the image: red, green, or blue. This coding
scheme produces what is know as a truecolor image. As with the encoding used
in indexed data, the larger the pixel value, the brighter the respective color. In
this coding scheme, each of the color components can be operated on separately.
Obviously, this color coding scheme will use more memory than indexed im-
ages, but this may be unavoidable if extensive processing is to be done on a
color image. By MATLAB convention the variable name RGB, or something
similar, is used for variables of this data class. Note that these variables are
actually three-dimensional arrays having dimensions N by M by 3. While we
have not used such three dimensional arrays thus far, they are fully supported
by MATLAB. These arrays are indexed as RGB(n,m,i) where i = 1,2,3. In fact,
all image variables are conceptualized in MATLAB as three-dimensional arrays,
except that for non-RGB images the third dimension is simply 1.
Grayscale images are stored as intensity class images where the pixel
value represents the brightness or grayscale value of the image at that point.
MATLAB convention suggests variable names beginning with I for variables
in class intensity. If an image is only black or white (not intermediate grays),
then the binary coding scheme can be used where the representative array is a
logical array containing either 0’s or 1’s. MATLAB convention is to use BW for
variable names in the binary class. A common problem working with binary
images is the failure to define the array as logical which would cause the image
variable to be misinterpreted by the display routine. Binary class variables can
be specified as logical (set the logical flag associated with the array) using the
command BW = logical(A), assuming A consists of only zeros and ones. A
logical array can be converted to a standard array using the unary plus operator:
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
A = ؉BW. Since all binary images are of the form “logical,” it is possible to
check if a variable is logical using the routine: isa(I, ’logical’); which will
return a1 if true and zero otherwise.
Data Formats
In an effort to further reduce image storage requirements, MATLAB provides
three different data formats for most of the classes mentioned above. The uint8
and uint16 data formats provide 1 or 2 bytes, respectively, for each array ele-
ment. Binary images do not support the uint16 format. The third data format,
the double format, is the same as used in standard MATLAB operations and,
hence, is the easiest to use. Image arrays that use the double format can be treated
as regular MATLAB matrix variables subject to all the power of MATLAB and
its many functions. The problem is that this format uses 8 bytes for each array
element (i.e., pixel) which can lead to very large data storage requirements.
In all three data formats, a zero corresponds to the lowest intensity value,
i.e., black. For the uint8 and uint16 formats, the brightest intensity value (i.e.,
white, or the brightest color) is taken as the largest possible number for that
coding scheme: for uint8, 28-1
, or 255; and for uint16, 216
, or 65,535. For the
double format, the brightest value corresponds to 1.0.
The isa routine can also be used to test the format of an image. The
routine, isa(I,’type’) will return a 1 if I is encoded in the format type, and
a zero otherwise. The variable type can be: unit8, unit16, or double. There
are a number of other assessments that can be made with the isa routine that
are described in the associated help file.
Multiple images can be grouped together as one variable by adding an-
other dimension to the variable array. Since image arrays are already considered
three-dimensional, the additional images are added to the fourth dimension.
Multi-image variables are termed multiframe variables and each two-dimen-
sional (or three-dimensional) image of a multiframe variable is termed a frame.
Multiframe variables can be generated within MATLAB by incrementing along
the fourth index as shown in Example 10.2, or by concatenating several images
together using the cat function:
IMF = cat(4, I1, I2, I3,...);
The first argument, 4, indicates that the images are to concatenated along
the fourth dimension, and the other arguments are the variable names of the
images. All images in the list must be the same type and size.
Data Conversions
The variety of coding schemes and data formats complicates even the simplest
of operations, but is necessary for efficient memory use. Certain operations
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
require a given data format and/or class. For example, standard MATLAB oper-
ations require the data be in double format, and will not work correctly with
Indexed images. Many MATLAB image processing functions also expect a spe-
cific format and/or coding scheme, and generate an output usually, but not al-
ways, in the same format as the input. Since there are so many combinations of
coding and data type, there are a number of routines for converting between
different types. For converting format types, the most straightforward procedure
is to use the im2xxx routines given below:
I_uint8 = im2uint8(I); % Convert to uint8 format
I_uint16 = im2uint16(I); % Convert to uint16 format
I_double = im2double(I); % Convert to double format
These routines accept any data class as input; however if the class is
indexed, the input argument, I, must be followed by the term indexed. These
routines also handle the necessary rescaling except for indexed images. When
converting indexed images, variable range can be a concern: for example, to
convert an indexed variable to uint8, the variable range must be between 0 and
255.
Converting between different image encoding schemes can sometimes be
done by scaling. To convert a grayscale image in uint8, or uint16 format to an
indexed image, select an appropriate grayscale colormap from the MATLAB’s
established colormaps, then scale the image variable so the values lie within the
range of the colormap; i.e., the data range should lie between 0 and N, where N
is the depth of the colormap (MATLAB’s colormaps have a default depth of
64, but this can be modified). This approach is demonstrated in Example 10.3.
However, an easier solution is simply to use MATLAB’s gray2ind function
listed below. This function, as with all the conversion functions, will scale the
input data appropriately, and in the case of gray2ind will also supply an appro-
priate grayscale colormap (although alternate colormaps of the same depth can
be substituted). The routines that convert to indexed data are:
[x, map] = gray2ind(I, N); % Convert from grayscale to
% indexed
% Convert from truecolor to indexed
[x, map] = rgb2ind(RGB, N or map);
Both these routines accept data in any format, including logical, and pro-
duce an output of type uint8 if the associated map length is less than or equal
to 64, or uint16 if greater that 64. N specifies the colormap depth and must be
less than 65,536. For gray2ind the colormap is gray with a depth of N, or the
default value of 64 if N is omitted. For RGB conversion using rgb2ind, a
colormap of N levels is generated to best match the RGB data. Alternatively, a
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
colormap can be provided as the second argument, in which case rgb2ind will
generate an output array, x, with values that best match the colors given in map.
The rgb2ind function has a number of options that affect the image conversion,
options that allow trade-offs between color accuracy and image resolution. (See
the associated help file).
An alternative method for converting a grayscale image to indexed values
is the routine grayslice which converts using thresholding:
x = grayslice(I, N or V); % Convert grayscale to indexed using
% thresholding
where any input format is acceptable. This function slices the image into N
levels using a equal step thresholding process. Each slice is then assigned a
specific level on whatever colormap is selected. This process allows some inter-
esting color representations of grayscale images, as described in Example 10.4.
If the second argument is a vector, V, then it contains the threshold levels (which
can now be unequal) and the number of slices corresponds to the length of this
vector. The output format is either uint8 or uint16 depending on the number
of slices, similar to the two conversion routines above.
Two conversion routines convert from indexed images to other encoding
schemes:
I = ind2gray(x, map); % Convert to grayscale intensity
% encoding
RGB = ind2rgb(x, map); % Convert to RGB (“truecolor”)
% encoding
Both functions accept any format and, in the case of ind2gray produces
outputs in the same format. Function ind2rgb produces outputs formatted as
double. Function ind2gray removes the hue and saturation information while
retaining the luminance, while function ind2rgb produces a truecolor RGB
variable.
To convert an image to binary coding use:
BW = im2bw(I, Level); % Convert to binary logical encoding
where Level specifies the threshold that will be used to determine if a pixel is
white (1) or black (0). The input image, I, can be either intensity, RGB, or
indexed,* and in any format (uint8, uint16, or double). While most functions
output binary images in uint8 format, im2bw outputs the image in logical format.
*As with all conversion routines, and many other routines, when the input image is in indexed
format it must be followed by the colormap variable.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
In this format, the image values are either 0 or 1, but each element is the same
size as the double format (8 bytes). This format can be used in standard MAT-
LAB operations, but does use a great deal of memory. One of the applications
of the dither function can also be used to generate binary images as described
in the associated help file.
A final conversion routine does not really change the data class, but does
scale the data and can be very useful. This routine converts general class double
data to intensity data, scaled between 0 and 1:
I = mat2gray(A, [Anin Amax]); % Scale matrix to intensity
% encoding, double format.
where A is a matrix and the optional second term specifies the values of A to be
scaled to zero, or black (Amin), or 1, or white (Amin). Since a matrix is already
in double format, this routine provides only scaling. If the second argument is
missing, the matrix is scaled so that its highest value is 1 and its lowest value
is zero. Using the default scaling can be a problem if the image contains a few
irrelevant pixels having large values. This can occur after certain image process-
ing operations due to border (or edge) effects. In such cases, other scaling must
be imposed, usually determined empirically, to achieve a suitable range of im-
age intensities.
The various data classes, their conversion routines, and the data formats
they support are summarized in Table 1 below. The output format of the various
conversion routines is indicated by the superscript: (1) uint8 or unit 16 depend-
ing on the number of levels requested (N); (2) Double; (3) No format change
(output format equals input format); and (4) Logical (size double).
Image Display
There are several options for displaying an image, but the most useful and easi-
est to use is the imshow function. The basic calling format of this routine is:
TABLE 10.1 Summary of Image Classes, Data Formats,
and Conversion Routines
Class Formats supported Conversion routines
Indexed All gray2ind1
, grayslice1
, rgb2ind1
Intensity All ind2gray2
, mat2gray2,3
, rgb2gray3
RGB All ind2rgb2
Binary uint8, double im2bw4
, dither1
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
imshow(I,arg)
where I is the image array and arg is an argument, usually optional, that de-
pends on the data format. For indexed data, the variable name must be followed
by the colormap, map. This holds for all display functions when indexed data
are involved. For intensity class image variables, arg can be a scalar, in which
case it specifies the number of levels to use in rendering the image, or, if arg
is a vector, [low high], arg specifies the values to be taken to readjust the
range limits of a specific data format.* If the empty matrix, [ ], is given as arg,
or it is simply missing, the maximum and minimum values in array I are taken
as the low and high values. The imshow function has a number of other options
that make it quite powerful. These options can be found with the help command.
When I is an indexed variable, it should be followed by the map variable.
There are two functions designed to display multiframe variables. The
function montage (MFW) displays the various images in a gird-like pattern as
shown in Example 10.2. Alternatively, multiframe variables can be displayed as
a movie using the immovie and movie commands:
mov = imovie(MFW); % Generate movie variable
movie(mov); % Display movie
Unfortunately the movie function cannot be displayed in a textbook, but
is presented in one of the problems at the end of the chapter, and several amus-
ing examples are presented in the problems at the end of the next chapter. The
immovie function requires multiframe data to be in either Indexed or RGB
format. Again, if MFW is an indexed variable, it must be followed by a colormap
variable.
The basics features of the MATLAB Imaging Processing Toolbox are
illustrated in the examples below.
Example 10.1 Generate an image of a sinewave grating having a spatial
frequency of 2 cycles/inch. A sinewave grating is a pattern that is constant in
the vertical direction, but varies sinusoidally in the horizontal direction. It is
used as a visual stimulus in experiments dealing with visual perception. Assume
the figure will be 4 inches square; hence, the overall pattern should contain 4
cycles. Assume the image will be placed in a 400-by-400 pixel array (i.e., 100
pixels per inch) using a uint16 format.
Solution Sinewave gratings usually consist of sines in the horizontal di-
rection and constant intensity in the vertical direction. Since this will be a gray-
*Recall the default minimum and maximum values for the three non-indexed classes were: [0, 256]
for uint8; [0, 65535] for uint16; and [0, 1] for double arrays.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
scale image, we will use the intensity coding scheme. As most reproductions
have limited grayscale resolution, a uint8 data format will be used. However,
the sinewave will be generated in the double format, as this is MATLAB’s
standard format. To save memory requirement, we first generate a 400-by-1
image line in double format, then convert it to uint8 format using the conversion
routine im2uint8. The uint8 image can then be extended vertically to 400 pixels.
% Example 10.1 and Figure 1.3
% Generate a sinewave grating 400 by 400 pixels
% The grating should vary horizontally with a spatial frequency
% of 4 cycles per inch.
% Assume the horizontal and vertical dimensions are 4 inches
%
clear all; close all;
N = 400; % Vertical and horizontal size
Nu_cyc = 4; % Produce 4 cycle grating
x = (1:N)*Ny_cyc/N; % Spatial (time equivalent) vector
%
FIGURE 10.3 A sinewave grating generated by Example 10.1. Such images are
often used as stimuli in experiments on vision.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
% Generate a single horizontal line of the image in a vector of
% 400 points
%
% Generate sin; scale between 0&1
I_sin(1,:) = .5 * sin(2*pi*x) ؉ .5;
I_8 = im2uint8(I_sin); % Convert to a uint8 vector
%
for i = 1:N % Extend to N (400) vertical lines
I(i,:) = I_8;
end
%
imshow(I); % Display image
title(’Sinewave Grating’);
The output of this example is shown as Figure 10.3. As with all images
shown in this text, there is a loss in both detail (resolution) and grayscale varia-
tion due to losses in reproduction. To get the best images, these figures, and all
figures in this section can be reconstructed on screen using the code from the
examples provided in the CD.
Example 10.2 Generate a multiframe variable consisting of a series of
sinewave gratings having different phases. Display these images as a montage.
Border the images with black for separation on the montage plot. Generate 12
frames, but reduce the image to 100 by 100 to save memory.
% Example 10.2 and Figure 10.4
% Generate a multiframe array consisting of sinewave gratings
% that vary in phase from 0 to 2 * pi across 12 images
%
% The gratings should be the same as in Example 10.1 except with
% fewer pixels (100 by 100) to conserve memory.
%
clear all; close all;
N = 100; % Vertical and horizontal points
Nu_cyc = 2; % Produce 4 cycle grating
M = 12; % Produce 12 images
x = (1:N)*Nu_cyc/N; % Generate spatial vector
%
for j = 1:M % Generate M (12) images
phase = 2*pi*(j-1)/M; % Shift phase through 360 (2*pi)
% degrees
% Generate sine; scale to be 0 & 1
I_sin = .5 * sin(2*pi*x ؉ phase) ؉ .5’*;
% Add black at left and right borders
I_sin = [zeros(1,10) I_sin(1,:) zeros(1,10)];
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 10.4 Montage of sinewave gratings created by Example 10.2.
I_8 = im2uint8(I_sin); % Convert to a uint8 vector
%
for i = 1:N % Extend to N (100) vertical lines
if i < 10 * I > 90 % Insert black space at top and
% bottom
I(i,:,1:j) = 0;
else
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
I(i,:,1,j) = I_8;
end
end
end
montage(I); % Display image as montage
title(’Sinewave Grating’);
The montage created by this example is shown in Figure 10.4 on the next
page. The multiframe data set was constructed one frame at a time and the
frame was placed in I using the frame index, the fourth index of I.* Zeros are
inserted at the beginning and end of the sinewave and, in the image construction
loop, for the first and last 9 points. This is to provide a dark band between the
images. Finally the sinewave was phase shifted through 360 degrees over the
12 frames.
Example 10.3 Construct a multiframe variable with 12 sinewave grating
images. Display these data as a movie. Since the immovie function requires the
multiframe image variable to be in either RGB or indexed format, convert the
uint16 data to indexed format. This can be done by the gray2ind(I,N) func-
tion. This function simply scales the data to be between 0 and N, where N is the
depth of the colormap. If N is unspecified, gray2ind defaults to 64 levels.
MATLAB colormaps can also be specified to be of any depth, but as with
gray2ind the default level is 64.
% Example 10.3
% Generate a movie of a multiframe array consisting of sinewave
% gratings that vary in phase from 0 to pi across 10 images
% Since function ’immovie’ requires either RGB or indexed data
% formats scale the data for use as Indexed with 64 gray levels.
% Use a standard MATLAB grayscale (’gray’);
%
% The gratings should be the same as in Example 10.2.
%
clear all;
close all;
% Assign parameters
N = 100; % Vertical and horizontal points
Nu_cyc = 2; % Produce 2 cycle grating
M = 12; % Produce 12 images
%
x = (1:N)*Nu_cyc/N; % Generate spatial vector
*Recall, the third index is reserved for referencing the color plane. For non-RGB variables, this
index will always be 1. For images in RGB format the third index would vary between 1 and 3.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
for j = 1:M % Generate M (100) images
% Generate sine; scale between 0 and 1
phase = 10*pi*j/M; % Shift phase 180 (pi) over 12 images
I_sin(1,:) = .5 * sin(2*pi*x ؉ phase) ؉ .5’;
for i = 1:N % Extend to N (100) vertical lines
for i = 1:N % Extend to 100 vertical lines to
Mf(i,:,1,j) = x1; % create 1 frame of the multiframe
% image
end
end
%
%
[Mf, map] = gray2ind(Mf); % Convert to indexed image
mov = immovie(Mf,map); % Make movie, use default colormap
movie(mov,10); % and show 10 times
To fully appreciate this example, the reader will need to run this program
under MATLAB. The 12 frames are created as in Example 10.3, except the
code that adds border was removed and the data scaling was added. The second
argument in immovie, is the colormap matrix and this example uses the map
generated by gray2ind. This map has the default level of 64, the same as all
of the other MATLAB supplied colormaps. Other standard maps that are appro-
priate for grayscale images are ‘bone’ which has a slightly bluish tint, ‘pink’
which has a decidedly pinkish tint, and ‘copper’ which has a strong rust tint.
Of course any colormap can be used, often producing interesting pseudocolor
effects from grayscale data. For an interesting color alternative, try running
Example 10.3 using the prepackaged colormap jet as the second argument of
immovie. Finally, note that the size of the multiframe array, Mf, is
(100,100,1,12) or 1.2 × 105
× 2 bytes. The variable mov generated by immovie
is even larger!
Image Storage and Retrieval
Images may be stored on disk using the imwrite command:
imwrite(I, filename.ext, arg1, arg2, ...);
where I is the array to be written into file filename. There are a large variety of
file formats for storing image data and MATLAB supports the most popular for-
mats. The file format is indicated by the filename’s extension, ext, which may be:
.bmp (Microsoft bitmap), .gif (graphic interchange format), .jpeg (Joint photo-
graphic experts group), .pcs (Paintbrush), .png (portable network graphics), and
.tif (tagged image file format). The arguments are optional and may be used to
specify image compression or resolution, or other format dependent information.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The specifics can be found in the imwrite help file. The imwrite routine can be
used to store any of the data formats or data classes mentioned above; however, if
the data array, I, is an indexed array, then it must be followed by the colormap
variable, map. Most image formats actually store uint8 formatted data, but the nec-
essary conversions are done by the imwrite.
The imread function is used to retrieve images from disk. It has the call-
ing structure:
[I map] = imread(‘filename.ext’,fmt or frame);
where filename is the name of the image file and .ext is any of the extensions
listed above. The optional second argument, fmt, only needs to be specified if
the file format is not evident from the filename. The alternative optional argu-
ment frame is used to specify which frame of a multiframe image is to be read
in I. An example that reads multiframe data is found in Example 10.4. As most
file formats store images in uint8 format, I will often be in that format. File
formats .tif and .png support uint16 format, so imread may generate data
arrays in uint16 format for these file types. The output class depends on the
manner in which the data is stored in the file. If the file contains a grayscale
image data, then the output is encoded as an intensity image, if truecolor, then
as RGB. For both these cases the variable map will be empty, which can be
checked with the isempty(map) command (see Example 10.4). If the file con-
tains indexed data, then both output, I and map will contain data.
The type of data format used by a file can also be obtained by querying a
graphics file using the function infinfo.
information = infinfo(‘filename.ext’)
where information will contain text providing the essential information about
the file including the ColorType, FileSize, and BitDepth. Alternatively, the im-
age data and map can be loaded using imread and the format image data deter-
mined from the MATLAB whos command. The whos command will also give
the structure of the data variable (uint8, uint16, or double).
Basic Arithmetic Operations
If the image data are stored in the double format, then all MATLAB standard
mathematical and operational procedures can be applied directly to the image
variables. However, the double format requires 4 times as much memory as the
uint16 format and 8 times as much memory as the uint8 format. To reduce the
reliance on the double format, MATLAB has supplied functions to carry out
some basic mathematics on uint8- and uint16-format arrays. These routines will
work on either format; they actually carry out the operations in double precision
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
on an element by element basis then convert back to the input format. This
reduces roundoff and overflow errors. The basic arithmetic commands are:
I_diff = imabssdiff(I, J); % Subtracts J from I on a pixel
% by pixel basis and returns
% the absolute difference
I_comp = imcomplement(I) % Compliments image I
I_add = imadd(I, J); % Adds image I and J (images and/
% or constants) to form image
% I_add
I_sub = imsubtract(I, J); % Subtracts J from image I
I_divide = imdivide(I, J) % Divides image I by J
I_multiply = immultiply(I, J) % Multiply image I by J
For the last four routines, J can be either another image variable, or a
constant. Several arithmetical operations can be combined using the imlincomb
function. The function essentially calculates a weighted sum of images. For
example to add 0.5 of image I1 to 0.3 of image I2, to 0.75 of Image I3, use:
% Linear combination of images
I_combined = imlincomb (.5, I1, .3, I2, .75, I3);
The arithmetic operations of multiplication and addition by constants are
easy methods for increasing the contrast or brightness or an image. Some of
these arithmetic operations are illustrated in Example 10.4.
Example 10.4 This example uses a number of the functions described
previously. The program first loads a set of MRI (magnetic resonance imaging)
images of the brain from the MATLAB Image Processing Toolbox’s set of stock
images. This image is actually a multiframe image consisting of 27 frames as
can be determined from the command imifinfo. One of these frames is se-
lected by the operator and this image is then manipulated in several ways: the
contrast is increased; it is inverted; it is sliced into 5 levels (N_slice); it is
modified horizontally and vertically by a Hanning window function, and it is
thresholded and converted to a binary image.
% Example 10.4 and Figures 10.5 and 10.6
% Demonstration of various image functions.
% Load all frames of the MRI image in mri.tif from the the MATLAB
% Image Processing Toolbox (in subdirectory imdemos).
% Select one frame based on a user input.
% Process that frame by: contrast enhancement of the image,
% inverting the image, slicing the image, windowing, and
% thresholding the image
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 10.5 Montage display of 27 frames of magnetic resonance images of
the brain plotted in Example 10.4. These multiframe images were obtained from
MATLAB’s mri.tif file in the images section of the Image Processing Toolbox.
Used with permission from MATLAB, Inc. Copyright 1993–2003, The Math
Works, Inc. Reprinted with permission.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 10.6 Figure showing various signal processing operations on frame 17
of the MRI images shown in Figure 10.5. Original from the MATLAB Image Pro-
cessing Toolbox. Copyright 1993–2003, The Math Works, Inc. Reprinted with per-
mission.
% Display original and all modifications on the same figure
%
clear all; close all;
N_slice = 5; % Number of sliced for
% sliced image
Level = .75; % Threshold for binary
% image
%
% Initialize an array to hold 27 frames of mri.tif
% Since this image is stored in tif format, it could be in either
% unit8 or uint16.
% In fact, the specific input format will not matter, since it
% will be converted to double format in this program.
mri = uint8(zeros(128,128,1,27)); % Initialize the image
% array for 27 frames
for frame = 1:27 % Read all frames into
% variable mri
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
[mri(:,:,:,frame), map ] = imread(’mri.tif’, frame);
end
montage(mri, map); % Display images as a
% montage
% Include map in case
% Indexed
%
frame_select = input(’Select frame for processing: ’);
I = mri(:,:,:,frame_select); % Select frame for
% processing
%
% Now check to see if image is Indexed (in fact ’whos’ shows it
% is).
if isempty(map) == 0 % Check to see if
% indexed data
I = ind2gray(I,map); % If so, convert to
% intensity image
end
I1 = im2double(I); % Convert to double
% format
%
I_bright = immultiply(I1,1.2); % Increase the contrast
I_invert = imcomplement(I1); % Compliment image
x_slice = grayslice(I1,N_slice); % Slice image in 5 equal
% levels
%
[r c] = size(I1); % Multiple
for i = 1:r % horizontally by a
% Hamming window
I_window(i,:) = I1(i,:) .* hamming(c)’;
end
for i = 1:c % Multiply vertically
% by same window
I_window(:,i) = I_window(:,i) .* hamming(r);
end
I_window = mat2gray(I_window); % Scale windowed image
BW = im2bw(I1,Level); % Convert to binary
%
figure;
subplot(3,2,1); % Display all images in
% a single plot
imshow(I1); title(’Original’);
subplot(3,2,2);
imshow(I_bright), title(’Brightened’);
subplot(3,2,3);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
imshow(I_invert); title(’Inverted’);
subplot(3,2,4);
I_slice = ind2rgb(x_slice, jet % Convert to RGB (see
(N_slice)); % text)
imshow(I_slice); title(’Sliced’); % Display color slices
subplot(3,2,5);
imshow(I_window); title(’Windowed’);
subplot(3,2,6);
imshow(BW); title(’Thresholded’);
Since the image file might be indexed (in fact it is), the imread function
includes map as an output. If the image is not indexed, then map will be empty.
Note that imread reads only one frame at a time, the frame specified as the
second argument of imread. To read in all 27 frames, it is necessary to use a
loop. All frames are then displayed in one figure (Figure 10.5) using the mon-
tage function. The user is asked to select one frame for further processing.
Since montage can display any input class and format, it is not necessary to
determine these data characteristics at this time.
After a particular frame is selected, the program checks if the map variable
is empty (function isempty). If it is not (as is the case for these data), then the
image data is converted to grayscale using function ind2gray which produces
an intensity image in double format. If the image is not Indexed, the image
variable is converted to double format. The program then performs the various
signal processing operations. Brightening is done by multiplying the image by
a constant greater that 1.0, in this case 1.2, Figure 10.6. Inversion is done using
imcomplement, and the image is sliced into N_slice (5) levels using gray-
slice. Since grayslice produces an indexed image, it also generates a map
variable. However, this grayscale map is not used, rather an alternative map
is substituted to produce a color image, with the color being used to enhance
certain features of the image.* The Hanning window is applied to the image in
both the horizontal and vertical direction Figure 10.6. Since the image, I1, is in
double format, the multiplication can be carried out directly on the image array;
however, the resultant array, I_window, has to be rescaled using mat2gray to
insure it has the correct range for imshow. Recall that if called without any
arguments; mat2gray scales the array to take up the full intensity range (i.e., 0
to 1). To place all the images in the same figure, subplot is used just as with
other graphs, Figure 10.6. One potential problem with this approach is that
Indexed data may plot incorrectly due to limited display memory allocated to
*More accurately, the image should be termed a pseudocolor image since the original data was
grayscale. Unfortunately the image printed in this text is in grayscale; however the example can be
rerun by the reader to obtain the actual color image.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
the map variables. (This problem actually occurred in this example when the
sliced array was displayed as an Indexed variable.) The easiest solution to this
potential problem is to convert the image to RGB before calling imshow as was
done in this example.
Many images that are grayscale can benefit from some form of color cod-
ing. With the RGB format, it is easy to highlight specific features of a grayscale
image by placing them in a specific color plane. The next example illustrates
the use of color planes to enhance features of a grayscale image.
Example 10.5 In this example, brightness levels of a grayscale image
that are 50% or less are coded into shades of blue, and those above are coded
into shades of red. The grayscale image is first put in double format so that the
maximum range is 0 to 1. Then each pixel is tested to be greater than 0.5. Pixel
values less that 0.5 are placed into the blue image plane of an RGB image (i.e.,
the third plane). These pixel values are multiplied by two so they take up the
full range of the blue plane. Pixel values above 0.5 are placed in the red plane
(plane 1) after scaling to take up the full range of the red plane. This image is
displayed in the usual way. While it is not reproduced in color here, a homework
problem based on these same concepts will demonstrate pseudocolor.
% Example 10.5 and Figure 10.7 Example of the use of pseudocolor
% Load frame 17 of the MRI image (mri.tif)
% from the Image Processing Toolbox in subdirectory ‘imdemos’.
FIGURE 10.7 Frame 17 of the MRI image given in Figure 10.5 plotted directly and
in pseudocolor using the code in Example 10.5. (Original image from MATLAB).
Copyright 1993–2003, The Math Works, Inc. Reprinted with permission.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
% Display a pseudocolor image in which all values less that 50%
% maximum are in shades of blue and values above are in shades
% of red.
%
clear all; close all;
frame = 17;
[I(:,:,1,1), map ] = imread(’mri.tif’, frame);
% Now check to see if image is Indexed (in fact ’whos’ shows it is).
if isempty(map) == 0 % Check to see if Indexed data
I = ind2gray(I,map); % If so, convert to Intensity image
end
I = im2double(I); % Convert to double
[M N] = size(I);
RGB = zeros(M,N,3); % Initialize RGB array
for i = 1:M
for j = 1:N % Fill RGB planes
if I(i,j) > .5
RGB(i,j,1) = (I(i,j)-.5)*2;
else
RGB(i,j,3) = I(i,j)*2;
end
end
end
%
subplot(1,2,1); % Display images in a single plot
imshow(I); title(’Original’);
subplot(1,2,2);
imshow(RGB) title(’Pseudocolor’);
The pseudocolor image produced by this code is shown in Figure 10.7.
Again, it will be necessary to run the example to obtain the actual color image.
ADVANCED PROTOCOLS: BLOCK PROCESSING
Many of the signal processing techniques presented in previous chapters oper-
ated on small, localized groups of data. For example, both FIR and adaptive
filters used data samples within the same general neighborhood. Many image
processing techniques also operate on neighboring data elements, except the
neighborhood now extends in two dimensions, both horizontally and vertically.
Given this extension into two dimensions, many operations in image processing
are quite similar to those in signal processing. In the next chapter, we examine
both two-dimensional filtering using two-dimensional convolution and the two-
dimensional Fourier transform. While many image processing operations are
conceptually the same as those used on signal processing, the implementation
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
is somewhat more involved due to the additional bookkeeping required to oper-
ate on data in two dimensions. The MATLAB Image Processing Toolbox sim-
plifies much of the tedium of working in two dimensions by introducing func-
tions that facilitate two-dimensional block, or neighborhood operations. These
block processing operations fall into two categories: sliding neighborhood oper-
ations and distinct block operation. In sliding neighborhood operations, the
block slides across the image as in convolution; however, the block must slide
in both horizontal and vertical directions. Indeed, two-dimensional convolution
described in the next chapter is an example of one very useful sliding neighbor-
hood operation. In distinct block operations, the image area is divided into a
number of fixed groups of pixels, although these groups may overlap. This is
analogous to the overlapping segments used in the Welch approach to the Fou-
rier transform described in Chapter 3. Both of these approaches to dealing with
blocks of localized data in two dimensions are supported by MATLAB routines.
Sliding Neighborhood Operations
The sliding neighborhood operation alters one pixel at a time based on some
operation performed on the surrounding pixels; specifically those pixels that lie
within the neighborhood defined by the block. The block is placed as symmetri-
cally as possible around the pixel being altered, termed the center pixel (Figure
10.8). The center pixel will only be in the center if the block is odd in both
FIGURE 10.8 A 3-by-2 pixel sliding neighborhood block. The block (gray area),
is shown in three different positions. Note that the block sometimes falls off the
picture and padding (usually zero padding) is required. In actual use, the block
slides, one element at a time, over the entire image. The dot indicates the center
pixel.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
dimensions, otherwise the center pixel position favors the left and upper sides
of the block (Figure 10.8).* Just as in signal processing, there is a problem that
occurs at the edge of the image when a portion of the block will extend beyond
the image (Figure 10.8, upper left block). In this case, most MATLAB sliding
block functions automatically perform zero padding for these pixels. (An excep-
tion, is the imfilter routine described in the next capter.)
The MATLAB routines conv2 and filter2 are both siding neighborhood
operators that are directly analogous to the one dimensional convolution routine,
conv, and filter routine, filter. These functions will be discussed in the next
chapter on image filtering. Other two-dimensional functions that are directly anal-
ogous to their one-dimensional counterparts include: mean2, std2, corr2, and
fft2. Here we describe a general sliding neighborhood routine that can be used
to implement a wide variety of image processing operations. Since these opera-
tions can be—but are not necessarily—nonlinear, the function has the name
nlfilter, presumably standing for nonlinear filter. The calling structure is:
I1 = nlfilter(I, [M N], func, P1, P2, ...);
where I is the input image array, M and N are the dimensions of the neighbor-
hood block (horizontal and vertical), and func specifies the function that will
operate over the block. The optional parameters P1, P2, . . . , will be passed to
the function if it requires input parameters. The function should take an M by
N input and must produce a single, scalar output that will be used for the value
of the center pixel. The input can be of any class or data format supported by
the function, and the output image array, I1, will depend on the format provided
by the routine’s output.
The function may be specified in one of three ways: as a string containing
the desired operation, as a function handle to an M-file, or as a function estab-
lished by the routine inline. The first approach is straightforward: simply em-
bed the function operation, which could be any appropriate MATLAB stat-
ment(s), within single quotes. For example:
I1 = nlfilter(I, [3 3], ‘mean2’);
This command will slide a 3 by 3 moving average across the image pro-
ducing a lowpass filtered version of the original image (analogous to an FIR
filter of [1/3 1/3 1/3] ). Note that this could be more effectively implemented
using the filter routines described in the next chapter, but more complicated,
perhaps nonlinear, operations could be included within the quotes.
*In MATLAB notation, the center pixel of an M by N block is located at: floor(([M N] ؉
1)/2).
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The use of a function handle is shown in the code:
I1 = nlfilter(I, [3 3], @my_function);
where my_function is the name of an M-file function. The function handle
@my_function contains all the information required by MATLAB to execute
the function. Again, this file should produce a single, scalar output from an M
by N input, and it has the possibility of containing input arguments in addition
to the block matrix.
The inline routine has the ability to take string text and convert it into
a function for use in nlfilter as in this example string:
F = inline(‘2*x(2,2) -sum( x(1:3,1))/3- sum(x(1:3,3))/3
- x(1,2)—x(3,2)’);
I1 = nlfilter(I, [3 3], F);
Function inline assumes that the input variable is x, but it also can find
other variables based on the context and it allows for additional arguments, P1,
P2, . . . (see associated help file). The particular function shown above would
take the difference between the center point and its 8 surrounding neighbors,
performing a differentiator-like operation. There are better ways to perform spa-
tial differentiation described in the next chapter, but this form will be demon-
strated as one of the operations in Example 10.6 below.
Example 10.6 Load the image of blood cells in blood.tiff in
MATLAB’s image files. Convert the image to class intensity and double format.
Perform the following sliding neighborhood operations: averaging over a 5 by
5 sliding block, differencing (spatial differentiation) using the function, F,
above; and vertical boundary detection using a 2 by 3 vertical differencer. This
differencer operator subtracts a vertical set of three left hand pixels from the
three adjacent right hand pixels. The result will be a brightening of vertical
boundaries that go from dark to light and a darkening of vertical boundaries
that go from light to dark. Display all the images in the same figure including
the original. Also include binary images of the vertical boundary image thresh-
olded at two different levels to emphasize the left and right boundaries.
% Example 10.6 and Figure 10.9
% Demonstration of sliding neighborhood operations
% Load image of blood cells, blood.tiff from the Image Processing
% Toolbox in subdirectory imdemos.
% Use a sliding 3 by 3 element block to perform several sliding
% neighborhood operations including taking the average over the
% block, implementing the function ’F’ in the example
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 10.9 A variety of sliding neighborhood operations carried out on an im-
age of blood cells. (Original reprinted with permission from The Image Processing
Handbook, 2nd ed. Copyright CRC Press, Boca Raton, Florida.)
% above, and implementing a function that enhances vertical
% boundaries.
% Display the original and all modification on the same plot
%
clear all; close all;
[I map] = imread(’blood1.tif’);% Input image
% Since image is stored in tif format, it could be in either uint8
% or uint16 format (although the ’whos’ command shows it is in
% uint8).
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
% The specific data format will not matter since the format will
% be converted to double either by ’ind2gray,’ if it is an In-
% dexed image or by ‘im2gray’ if it is not.
%
if isempty(map) == 0 % Check to see if indexed data
I = ind2gray(I,map); % If so, convert to intensity
% image
end
I = im2double(I); % Convert to double and scale
% If not already
%
% Perform the various sliding neighborhood operations.
% Averaging
I_avg = nlfilter(I,[5 5], ’mean2’);
%
% Differencing
F = inline(’x(2,2)—sum(x(1:3,1))/3- sum(x(1:3,3))/3 - ...
x(1,2)—x(3,2)’);
I_diff = nlfilter(I, [3 3], F);
%
% Vertical boundary detection
F1 = inline (’sum(x(1:3,2))—sum(x(1:3,1))’);
I_vertical = nlfilter(I,[3 2], F1);
%
% Rescale all arrays
I_avg = mat2gray(I_avg);
I_diff = mat2gray(I_diff);
I_vertical = mat2gray(I_vertical);
%
subplot(3,2,1); % Display all images in a single
% plot
imshow(I);
title(’Original’);
subplot(3,2,2);
imshow(I_avg);
title(’Averaged’);
subplot(3,2,3);
imshow(I_diff);
title(’Differentiated’);
subplot(3,2,4);
imshow(I_vertical);
title(’Vertical boundaries’);
subplot(3,2,5);
bw = im2bw(I_vertical,.6); % Threshold data, low threshold
imshow(bw);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
title(’Left boundaries’);
subplot(3,2,6);
bw1 = im2bw(I_vertical,.8); % Threshold data, high
% threshold
imshow(bw1);
title(’Right boundaries’);
The code in Example 10.6 produces the images in Figure 10.9. These
operations are quite time consuming: Example 10.6 took about 4 minutes to run
on a 500 MHz PC. Techniques for increasing the speed of Sliding Operations
can be found in the help file for colfilt. The vertical boundaries produced by
the 3 by 2 sliding block are not very apparent in the intensity image, but become
quite evident in the thresholded binary images. The averaging has improved
contrast, but the resolution is reduced so that edges are no longer distinct.
Distinct Block Operations
All of the sliding neighborhood options can also be implemented using configu-
rations of fixed blocks (Figure 10.10). Since these blocks do not slide, but are
FIGURE 10.10 A 7-by-3 pixel distinct block. As with the sliding neighborhood
block, these fixed blocks can fall off the picture and require padding (usually zero
padding). The dot indicates the center pixel although this point usually has little
significance in this approach.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
fixed with respect to the image (although they may overlap), they will produce
very different results. The MATLAB function for implementing distinct block
operations is similar in format to the sliding neighborhood function:
I1 = blkproc(I, [M N], [Vo Ho], func);
where M and N specify the vertical and horizontal size of the block, Vo and Ho
are optional arguments that specify the vertical and horizontal overlap of the
block, func is the function that operates on the block, I is the input array, and
I1 is the output array. As with nlfilter the data format of the output will
depend on the output of the function. The function is specified in the same
manner as described for nlfilter; however the function output will be dif-
ferent.
Function outputs for sliding neighborhood operations had to be single sca-
lars that then became the value of the center pixel. In distinct block operations,
the block does not move, so the function output will normally produce values
for every pixel in the block. If the block produces a single output, then only the
center pixel of each block will contain a meaningful value. If the function is an
operation that normally produces a single value, the output of this routine can
be expanded by multiplying it by an array of ones that is the same size as the
block This will place that single output in every pixel in the block:
I1 = blkproc(I [4 5], ‘std2 * ones(4,5)’);
In this example the output of the MATLAB function std2 is placed into
a 4 by 5 array and this becomes the output of the function, an array the same
size as the block. It is also possible to use the inline function to describe the
function:
F = inline(‘std2(x) * ones(size(x))’);
I1 = blkproc(I, [4 5], F);
Of course, it is possible that certain operations could produce a different
output for each pixel in the block. An example of block processing is given in
Example 10.7.
Example 10.7 Load the blood cell image used in Example 10.6 and
perform the following distinct block processing operations: 1) Display the aver-
age for a block size of 8 by 8; 2) For a 3 by 3 block, perform the differentiator
operation used in Example 10.6; and 3) Apply the vertical boundary detector
form Example 10.6 to a 3 by 3 block. Display all the images including the
original in a single figure.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
% Example 10.7 and Figure 10.11
% Demonstration of distinct block operations
% Load image of blood cells used in Example 10.6
% Use a 8 by 8 distinct block to get averages for the entire block
% Apply the 3 by 3 differentiator from Example 10.6 as a distinct
% block operation.
% Apply a 3 by 3 vertical edge detector as a block operation
% Display the original and all modification on the same plot
%
..... Image load, same as in Example 10.6.......
%
FIGURE 10.11 The blood cell image of Example 10.6 processed using three Dis-
tinct block operations: block averaging, block differentiation, and block vertical
edge detection. (Original image reprinted from The Image Processing Handbook,
2nd edition. Copyright CRC Press, Boca Raton, Florida.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
% Perform the various distinct block operations.
% Average of the image
I_avg = blkproc(I,[10 10], ’mean2 * ones(10,10)’);
%
% Deferentiator—place result in all blocks
F = inline(’(x(2,2)—sum(x(1:3,1))/3- sum(x(1:3,3))/3 ...
- x(1,2)—x(3,2)) * ones(size(x))’);
I_diff = blkproc(I, [3 3], F);
%
% Vertical edge detector-place results in all blocks
F1 = inline(’(sum(x(1:3,2))—sum(x(1:3,1))) ...
* ones(size(x))’);
I_vertical = blkproc(I, [3,2], F1);
.........Rescale and plotting as in Example 10.6.......
Figure 10.11 shows the images produced by Example 10.7. The “differen-
tiator” and edge detection operators look similar to those produced the Sliding
Neighborhood operation because they operate on fairly small block sizes. The
averaging operator shows images that appear to have large pixels since the
neighborhood average is placed in block of 8 by 8 pixels.
The topics covered in this chapter provide a basic introduction to image
processing and basic MATLAB formats and operations. In subsequent chapters
we use this foundation to develop some useful image processing techniques
such as filtering, Fourier and other transformations, and registration (alignment)
of multiple images.
PROBLEMS
1. (A) Following the approach used in Example 10.1, generate an image that
is a sinusoidal grating in both horizontal and vertical directions (it will look
somewhat like a checkerboard). (Hint: This can be done with very few addi-
tional instructions.) (B) Combine this image with its inverse as a multiframe
image and show it as a movie. Use multiple repetitions. The movie should look
like a flickering checkerboard. Submit the two images.
2. Load the x-ray image of the spine (spine.tif) from the MATLAB Image
Processing Toolbox. Slice the image into 4 different levels then plot in pseudo-
color using yellow, red, green, and blue for each slice. The 0 level slice should
be blue and the highest level slice should be yellow. Use grayslice and con-
struct you own colormap. Plot original and sliced image in the same figure. (If
the “original” image also displays in pseudocolor, it is because the computer
display is using the same 3-level colormap for both images. In this case, you
should convert the sliced image to RGB before displaying.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
3. Load frame 20 from the MRI image (mri.tif) and code it in pseudocolor
by coding the image into green and the inverse of the image into blue. Then
take a threshold and plot pixels over 80% maximum as red.
4. Load the image of a cancer cell (from rat prostate, courtesy of Alan W.
Partin, M.D., Johns Hopkins University School of Medicine) cell.tif and
apply a correction to the intensity values of the image (a gamma correction
described in later chapters). Specifically, modify each pixel in the image by a
function that is a quarter wave sine wave. That is, the corrected pixels are the
output of the sine function of the input pixels: Out(m,n) = f(In(m,n)) (see plot
below).
FIGURE PROB. 10.4 Correction function to be used in Problem 4. The input pixel
values are on the horizontal axis, and the output pixels values are on the vertical
axis.
5. Load the blood cell image in blood1.tif. Write a sliding neighborhood
function to enhance horizontal boundaries that go from dark to light. Write a
second function that enhances boundaries that go from light to dark. Threshold
both images so as to enhance the boundaries. Use a 3 by 2 sliding block. (Hint:
This program may require several minutes to run. You do not need to rerun the
program each time to adjust the threshold for the two binary images.)
6. Load the blood cells in blood.tif. Apply a distinct block function that
replaces all of the values within a block by the maximum value in that block.
Use a 4 by 4 block size. Repeat the operation using a function that replaces all
the values by the minimum value in the block.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
11
Image Processing:
Filters, Transformations,
and Registration
SPECTRAL ANALYSIS: THE FOURIER TRANSFORM
The Fourier transform and the efficient algorithm for computing it, the fast
Fourier transform, extend in a straightforward manner to two (or more) dimen-
sions. The two-dimensional version of the Fourier transform can be applied to
images providing a spectral analysis of the image content. Of course, the result-
ing spectrum will be in two dimensions, and usually it is more difficult to inter-
pret than a one-dimensional spectrum. Nonetheless, it can be a very useful anal-
ysis tool, both for describing the contents of an image and as an aid in the
construction of imaging filters as described in the next section. When applied
to images, the spatial directions are equivalent to the time variable in the one-
dimensional Fourier transform, and this analogous spatial frequency is given in
terms of cycles/unit length (i.e., cycles/cm or cycles/inch) or normalized to cy-
cles per sample. Many of the concerns raised with sampled time data apply to
sampled spatial data. For example, undersampling an image will lead to aliasing.
In such cases, the spatial frequency content of the original image is greater than
fS/2, where fS now is 1/(pixel size). Figure 11.1 shows an example of aliasing in
the frequency domain. The upper left-hand upper image contains a chirp signal
increasing in spatial frequency from left to right. The high frequency elements
on the right side of this image are adequately sampled in the left-hand image.
The same pattern is shown in the upper right-hand image except that the sam-
pling frequency has been reduced by a factor of 6. The right side of this image
also contains sinusoidally varying intensities, but at additional frequencies as
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.1 The influence of aliasing due to undersampling on two images with
high spatial frequency. The aliased images show addition sinusoidal frequencies
in the upper right image and jagged diagonals in the lower right image. (Lower
original image from file ‘testpostl.png’ from the MATLAB Image Processing Tool-
box. Copyright 1993–2003, The Math Works, Inc. Reprinted with permission.)
the aliasing folds other sinusoids on top of those in the original pattern. The
lower figures show the influence of aliasing on a diagonal pattern. The jagged
diagonals are characteristic of aliasing as are moire patterns seen in other im-
ages. The problem of determining an appropriate sampling size is even more
acute in image acquisition since oversampling can quickly lead to excessive
memory storage requirements.
The two-dimensional Fourier transform in continuous form is a direct ex-
tension of the equation given in Chapter 3:
F(ω1,ω2) = ∫
∞
m=−∞
∫
∞
n=−∞
f(m,n)e−jω1m
e−jω2n
dm dn (1)
The variables ω1 and ω2 are still frequency variables, although they define
spatial frequencies and their units are in radians per sample. As with the time
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
domain spectrum, F(ω1,ω2) is a complex-valued function that is periodic in both
ω1 and ω2. Usually only a single period of the spectral function is displayed, as
was the case with the time domain analog.
The inverse two-dimensional Fourier transform is defined as:
f(m,n) =
1
4π2 ∫
π
ω1
=−π
∫
π
ω2
=−π
F(ω1,ω2)e−jω1m
e−jω2n
dω1 dω2 (2)
As with the time domain equivalent, this statement is a reflection of the
fact that any two-dimensional function can be represented by a series (possibly
infinite) of sinusoids, but now the sinusoids extend over the two dimensions.
The discrete form of Eqs. (1) and (2) is again similar to their time domain
analogs. For an image size of M by N, the discrete Fourier transform becomes:
F(p,q) = ∑
M−1
m=0
∑
N−1
n=0
f(m,n)e−j(2π/M)p m
e−j(2π/N)q n
(3)
p = 0,1 . . . , M − 1; q = 0,1 . . . , N − 1
The values F(p,q) are the Fourier Transform coefficients of f(m,n). The
discrete form of the inverse Fourier Transform becomes:
f(m,n) =
1
MN
∑
M−1
p=0
∑
N−1
q=0
F(p,q)e−j(2π/M)p m
e−j(2π/N)q n
(4)
m = 0,1 . . . , M − 1; n = 0,1 . . . , N − 1
MATLAB Implementation
Both the Fourier transform and inverse Fourier transform are supported in two
(or more) dimensions by MATLAB functions. The two-dimensional Fourier
transform is evoked as:
F = fft2(x,M,N);
where F is the output matrix and x is the input matrix. M and N are optional
arguments that specify padding for the vertical and horizontal dimensions, re-
spectively. In the time domain, the frequency spectrum of simple waveforms
can usually be anticipated and the spectra of even relatively complicated wave-
forms can be readily understood. With two dimensions, it becomes more diffi-
cult to visualize the expected Fourier transform even of fairly simple images. In
Example 11.1 a simple thin rectangular bar is constructed, and the Fourier trans-
form of the object is constructed. The resultant spatial frequency function is
plotted both as a three-dimensional function and as an intensity image.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Example 11.1 Determine and display the two-dimensional Fourier trans-
form of a thin rectangular object. The object should be 2 by 10 pixels in size
and solid white against a black background. Display the Fourier transform as
both a function (i.e., as a mesh plot) and as an image plot.
% Example 11.1 Two-dimensional Fourier transform of a simple
% object.
% Construct a simple 2 by 10 pixel rectangular object, or bar.
% Take the Fourier transform padded to 256 by 256 and plot the
% result as a 3-dimensional function (using mesh) and as an
% intensity image.
%
% Construct object
close all; clear all;
% Construct the rectangular object
f = zeros(22,30); % Original figure can be small since it
f(10:12,10:20) = 1; % will be padded
%
F = fft2(f,128,128); % Take FT; pad to 128 by 128
F = abs(fftshift(F));, % Shift center; get magnitude
%
imshow(f,’notruesize’); % Plot object
.....labels..........
figure;
mesh(F); % Plot Fourier transform as function
.......labels..........
figure;
F = log(F); % Take log function
FIGURE 11.2A The rectangular object (2 pixels by 10 pixels used in Example
11.1. The Fourier transform of this image is shown in Figure 11.2B and C.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.2B Fourier transform of the rectangular object in Figure 11.2A plotted
as a function. More energy is seen, particularly at the higher frequencies, along
the vertical axis because the object’s vertical cross sections appear as a narrow
pulse. The border horizontal cross sections produce frequency characteristics
that fall off rapidly at higher frequencies.
I = mat2gray(F); % Scale as intensity image
imshow(I); % Plot Fourier transform as image
Note that in the above program the image size was kept small (22 by 30)
since the image will be padded (with zeros, i.e., black) by ‘fft2.’ The fft2
routine places the DC component in the upper-left corner. The fftshift routine
is used to shift this component to the center of the image for plotting purposes.
The log of the function was taken before plotting as an image to improve the
grayscale quality in the figure.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.2C The Fourier transform of the rectangular object in Figure 11.2A
plotted as an image. The log of the function was taken before plotting to improve
the details. As in the function plot, more high frequency energy is seen in the
vertical direction as indicated by the dark vertical band.
The horizontal chirp signal plotted in Figure 11.1 also produces a easily
interpretable Fourier transform as shown in Figure 11.3. The fact that this image
changes in only one direction, the horizontal direction, is reflected in the Fourier
transform. The linear increase in spatial frequency in the horizontal direction
produces an approximately constant spectral curve in that direction.
The two-dimensional Fourier transform is also useful in the construction
and evaluation of linear filters as described in the following section.
LINEAR FILTERING
The techniques of linear filtering described in Chapter 4 can be directly ex-
tended to two dimensions and applied to images. In image processing, FIR fil-
ters are usually used because of their linear phase characteristics. Filtering an
image is a local, or neighborhood, operation just as it was in signal filtering,
although in this case the neighborhood extends in two directions around a given
pixel. In image filtering, the value of a filtered pixel is determined from a linear
combination of surrounding pixels. For the FIR filters described in Chapter 4,
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.3 Fourier transform of the horizontal chirp signal shown in Figure
11.1. The spatial frequency characteristics of this image are zero in the vertical
direction since the image is constant in this direction. The linear increase in spa-
tial frequency in the horizontal direction is reflected in the more or less constant
amplitude of the Fourier transform in this direction.
the linear combination for a given FIR filter was specified by the impulse re-
sponse function, the filter coefficients, b(n). In image filtering, the filter function
exists in two dimensions, h(m,n). These two-dimensional filter weights are ap-
plied to the image using convolution in an approach analogous to one-dimen-
sional filtering.
The equation for two-dimensional convolution is a straightforward exten-
sion of the one-dimensional form (Eq. (15), Chapter 2):
y(m,n) = ∑
∞
k1
=−∞
∑
∞
k2
=−∞
x(k1,k2)b(m − k1,n − k2) (5)
While this equation would not be difficult to implement using MATLAB
statements, MATLAB has a function that implements two-dimensional convolu-
tion directly.
Using convolution to perform image filtering parallels its use in signal
imaging: the image array is convolved with a set of filter coefficients. However,
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
in image analysis, the filter coefficients are defined in two dimensions, h(m,n).
A classic example of a digital image filter is the Sobel filter, a set of coefficients
that perform a horizontal spatial derivative operation for enhancement of hori-
zontal edges (or vertical edges if the coefficients are rotated using transposition):
h(m,n)Sobel =
ͫ1 2 1
0 0 0
−1 −2 −1
ͬؒ
These two-dimensional filter coefficients are sometimes referred to as the
convolution kernel. An example of the application of a Sobel filter to an image
is provided in Example 11.2.
When convolution is used to apply a series of weights to either image or
signal data, the weights represent a two-dimensional impulse response, and, as
with a one-dimensional impulse response, the weights are applied to the data in
reverse order as indicated by the negative sign in the one- and two-dimensional
convolution equations (Eq. (15) from Chapter 2 and Eq. (5).* This can become
a source of confusion in two-dimensional applications. Image filtering is easier
to conceptualize if the weights are applied directly to the image data in the same
orientation. This is possible if digital filtering is implemented using correlation
rather that convolution. Image filtering using correlation is a sliding neighbor-
hood operation, where the value of the center pixel is just the weighted sum of
neighboring pixels with the weighting given by the filter coefficients. When
correlation is used, the set of weighting coefficients is termed the correlation
kernel to distinguish it from the standard filter coefficients. In fact, the opera-
tions of correlation and convolution both involve weighted sums of neighboring
pixels, and the only difference between correlation kernels and convolution ker-
nels is a 180-degree rotation of the coefficient matrix. MATLAB filter routines
use correlation kernels because their application is easier to conceptualize.
MATLAB Implementation
Two dimensional-convolution is implemented using the routine ‘conv2’:
I2 = conv2(I1, h, shape)
where I1 and h are image and filter coefficients (or two images, or simply two
matrices) to be convolved and shape is an optional argument that controls the
size of the output image. If shape is ‘full’, the default, then the size of the
output matrix follows the same rules as in one-dimensional convolution: each
*In one dimension, this is equivalent to applying the weights in reverse order. In two dimensions,
this is equivalent to rotating the filter matrix by 180 degrees before multiplying corresponding pixels
and coefficients.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
dimension of the output is the sum of the two matrix lengths along that dimen-
sion minus one. Hence, if the two matrices have sizes I1(M1, N1) and h(M2,
N2), the output size is: I2(M1 ؉ M2 − 1, N2 ؉ N2 − 1). If shape is ‘valid’,
then any pixel evaluation that requires image padding is ignored and the size of
the output image is: Ic(M1- M2 ؉ 1, N1- N2 ؉ 1). Finally, if shape is ‘same’
the size of the output matrix is the same size as I1; that is: I2(M1, N1). These
options allow a great deal in flexibility and can simplify the use of two-dimen-
sional convolution; for example, the ‘same’ option can eliminate the need for
dealing with the additional points generated by convolution.
Two-dimensional correlation is implemented with the routine ‘imfilter’
that provides even greater flexibility and convenience in dealing with size and
boundary effects. The calling structure of this routine is given in the next page.
I2 = imfilter(I1, h, options);
where again I1 and h are the input matrices and options can include up to
three separate control options. One option controls the size of the output array
using the same terms as in ‘conv2’ above: ‘same’ and ‘full’ (‘valid’ is
not valid in this routine!). With ‘imfilter’ the default output size is ‘same’
(not ‘full’), since this is the more likely option in image analysis. The second
possible option controls how the edges are treated. If a constant is given, then
the edges are padded with the value of that constant. The default is to use a
constant of zero (i.e., standard zero padding). The boundary option ‘symmet-
ric’ uses a mirror reflection of the end points as shown in Figure 2.10. Simi-
larly the option ‘circular’ uses periodic extension also shown in Figure 2.10.
The last boundary control option is ‘replicate’, which pads using the nearest
edge pixel. When the image is large, the influence of the various border control
options is subtle, as shown in Example 11.4. A final option specifies the use of
convolution instead of correlation. If this option is activated by including the
argument conv, imfilter is redundant with ‘conv2’ except for the options
and defaults. The imfilter routine will accept all of the data format and types
defined in the previous chapter and produces an output in the same format;
however, filtering is not usually appropriate for indexed images. In the case of
RGB images, imfilter operates on all three image planes.
Filter Design
The MATLAB Image Processing Toolbox provides considerable support for
generating the filter coefficients.* A number of filters can be generated using
MATLAB’s fspecial routine:
*Since MATLAB’s preferred implementation of image filters is through correlation, not convolu-
tion, MATLAB’s filter design routines generate correlation kernels. We use the term “filter coeffi-
cient” for either kernel format.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
h = fspecial(type, parameters);
where type specifies a specific filter and the optional parameters are related to
the filter selected. Filter type options include: ‘gaussian’, ‘disk’, ‘sobel’,
‘prewitt’, ‘laplacian’, ‘log’, ‘average’, and ‘unsharp’. The ‘gauss-
ian’ option produces a Gaussian lowpass filter. The equation for a Gaussian
filter is similar to the equation for the gaussian distribution:
h(m,n) = e−(d/σ)/2
where d = √(m2
+ n2
)
This filter has particularly desirable properties when applied to an image:
it provides an optimal compromise between smoothness and filter sharpness.
The MATLAB routine for this filter accepts two parameters: the first specifies
the filter size (the default is 3) and the second the value of sigma. The value of
sigma will influence the cutoff frequency while the size of the filter determines
the number of pixels over which the filter operates. In general, the size should
be 3–5 times the value of sigma.
Both the ‘sobel’ and ‘prewitt’ options produce a 3 by 3 filter that
enhances horizontal edges (or vertical if transposed). The ‘unsharp’ filter pro-
duces a contrast enhancement filter. This filter is also termed unsharp masking
because it actually suppresses low spatial frequencies where the low frequencies
are presumed to be the unsharp frequencies. In fact, it is a special highpass
filter. This filter has a parameter that specifies the shape of the highpass charac-
teristic. The ‘average’ filter simply produces a constant set of weights each of
which equals 1/N, where N = the number of elements in the filter (the default
size of this filter is 3 by 3, in which case the weights are all 1/9 = 0.1111). The
filter coefficients for a 3 by 3 Gaussian lowpass filter (sigma = 0.5) and the
unsharpe filter (alpha = 0.2) are shown below:
hunsharp = ͫ−0.1667 −0.6667 −0.1667
−0.6667 4.3333 −0.6667
−0.1667 −0.6667 −0.1667
ͬ; hgaussian = ͫ0.0113 0.0838 0.0113
0.0838 0.6193 0.0838
0.0113 0.0838 0.0113
ͬ
The Laplacian filter is used to take the second derivative of an image:
∂2
/∂x. The log filter is actually the log of Gaussian filter and is used to take the
first derivative, ∂ /∂x, of an image.
MATLAB also provides a routine to transform one-dimensional FIR fil-
ters, such as those described in Chapter 4, into two-dimensional filters. This
approach is termed the frequency transform method and preserves most of the
characteristics of the one-dimensional filter including the transition band-
width and ripple features. The frequency transformation method is implemented
using:
h = ftrans2(b);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
where h are the output filter coefficients (given in correlation kernel format),
and b are the filter coefficients of a one-dimensional filter. The latter could be
produced by any of the FIR routines described in Chapter 4 (i.e., fir1, fir2,
or remez). The function ftrans2 can take an optional second argument that
specifies the transformation matrix, the matrix that converts the one-dimensional
coefficients to two dimensions. The default transformation is the McClellan
transformation that produces a nearly circular pattern of filter coefficients. This
approach brings a great deal of power and flexibility to image filter design since
it couples all of the FIR filter design approaches described in Chapter 4 to image
filtering.
The two-dimensional Fourier transform described above can be used to
evaluate the frequency characteristics of a given filter. In addition, MATLAB
supplies a two-dimensional version of freqz, termed freqz2, that is slightly
more convenient to use since it also handles the plotting. The basic
call is:
[H fx fy] = freqz2(h, Ny, Nx);.
where h contains the two-dimensional filter coefficients and Nx and Ny specify
the size of the desired frequency plot. The output argument, H, contains the two-
dimensional frequency spectra and fx and fy are plotting vectors; however, if
freqz2 is called with no output arguments then it generates the frequen-
cy plot directly. The examples presented below do not take advantage of this
function, but simply use the two-dimensional Fourier transform for filter evalua-
tion.
Example 11.2 This is an example of linear filtering using two of the
filters in fspecial. Load one frame of the MRI image set (mri.tif) and apply
the sharpening filter, hunsharp, described above. Apply a horizontal Sobel filter,
hSobel, (also shown above), to detect horizontal edges. Then apply the Sobel filter
to detect the vertical edges and combine the two edge detectors. Plot both the
horizontal and combined edge detectors.
Solution To generate the vertical Sobel edge detector, simply transpose
the horizontal Sobel filter. While the two Sobel images could be added together
using imadd, the program below first converts both images to binary then com-
bines them using a logical or. This produces a more dramatic black and white
image of the boundaries.
% Example 11.2 and Figure 11.4A and B
% Example of linear filtering using selected filters from the
% MATLAB ’fspecial’ function.
% Load one frame of the MRI image and apply the 3 by 3 “unshape”
% contrast enhancement filter shown in the text. Also apply two
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.4A MRI image of the brain before and after application of two filters
from MATLAB’s fspecial routine. Upper right: Image sharpening using the filter
unsharp. Lower images: Edge detection using the sobel filter for horizontal
edges (left) and for both horizontal and vertical edges (right). (Original image
from MATLAB. Image Processing Toolbox. Copyright 1993–2003, The Math
Works, Inc. Reprinted with permission.)
% 3 by 3 Sobel edge detector filters to enhance horizontal and
% vertical edges.
% Combine the two edge detected images
%
clear all; close all;
%
frame = 17; % Load MRI frame 17
[I(:,:,:,1), map ] = imread(’mri.tif’, frame);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.4B Frequency characteristics of the unsharp and Sobel filters used
in Example 11.2.
if isempty(map) == 0 % Usual check and
I = ind2gray(I,map); % conversion if
% necessary.
else
I = im2double(I);
end
%
h_unsharp = fspecial(’unsharp’,.5); % Generate ‘unsharp’
I_unsharp = imfilter(I,h_unsharp); % filter coef. and
% apply
%
h_s = fspecial(’Sobel’); % Generate basic Sobel
% filter.
I_sobel_horin = imfilter(I,h_s); % Apply to enhance
I_sobel_vertical = imfilter(I,h_s’); % horizontal and
% vertical edges
%
% Combine by converting to binary and or-ing together
I_sobel_combined = im2bw(I_sobel_horin) * ...
im2bw(I_sobel_vertical);
%
subplot(2,2,1); imshow(I); % Plot the images
title(’Original’);
subplot(2,2,2); imshow(I_unsharp);
title(’Unsharp’);
subplot(2,2,3); imshow(I_sobel_horin);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
title(’Horizontal Sobel’);
subplot(2,2,4); imshow(I_sobel_combined);
title(’Combined Image’); figure;
%
% Now plot the unsharp and Sobel filter frequency
% characteristics
F= fftshift(abs(fft2(h_unsharp,32,32)));
subplot(1,2,1); mesh(1:32,1:32,F);
title(’Unsharp Filter’); view([-37,15]);
%
F = fftshift(abs(fft2(h_s,32,32)));
subplot(1,2,2); mesh(1:32,1:32,F);
title(’Sobel Filter’); view([-37,15]);
The images produced by this example program are shown below along
with the frequency characteristics associated with the ‘unsharp’ and ‘sobel’
filter. Note that the ‘unsharp’ filter has the general frequency characteristics
of a highpass filter, that is, a positive slope with increasing spatial frequencies
(Figure 11.4B). The double peaks of the Sobel filter that produce edge enhance-
ment are evident in Figure 11.4B. Since this is a magnitude plot, both peaks
appear as positive.
In Example 11.3, routine ftrans2 is used to construct two-dimensional
filters from one-dimensional FIR filters. Lowpass and highpass filters are con-
structed using the filter design routine fir1 from Chapter 4. This routine gener-
ates filter coefficients based on the ideal rectangular window approach described
in that chapter. Example 11.3 also illustrates the use of an alternate padding
technique to reduce the edge effects caused by zero padding. Specifically, the
‘replicate’ option of imfilter is used to pad by repetition of the last (i.e.,
image boundary) pixel value. This eliminates the dark border produced by zero
padding, but the effect is subtle.
Example 11.3 Example of the application of standard one-dimensional
FIR filters extended to two dimensions. The blood cell images (blood1.tif)
are loaded and filtered using a 32 by 32 lowpass and highpass filter. The one-
dimensional filter is based on the rectangular window filter (Eq. (10), Chapter
4), and is generated by fir. It is then extended to two dimensions using
ftrans2.
% Example 11.3 and Figure 11.5A and B
% Linear filtering. Load the blood cell image
% Apply a 32nd order lowpass filter having a bandwidth of .125
% fs/2, and a highpass filter having the same order and band-
% width. Implement the lowpass filter using ‘imfilter’ with the
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.5A Image of blood cells before and after lowpass and highpass filter-
ing. The upper lowpass image (upper right) was filtered using zero padding,
which produces a slight black border around the image. Padding by extending
the edge pixel eliminates this problem (lower left). (Original Image reprinted with
permission from The Image Processing Handbook, 2nd edition. Copyright CRC
Press, Boca Raton, Florida.)
% zero padding (the default) and with replicated padding
% (extending the final pixels).
% Plot the filter characteristics of the high and low pass filters
%
% Load the image and transform if necessary
clear all; close all;
N = 32; % Filter order
w_lp = .125; % Lowpass cutoff frequency
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.5B Frequency characteristics of the lowpass (left) and highpass (right)
filters used in Figure 11.5A.
w_hp = .125; % Highpass cutoff frequency
.......load image blood1.tif and convert as in Example
11.2 ......
%
b = fir1(N,w_lp); % Generate the lowpass filter
h_lp = ftrans2(b); % Convert to 2-dimensions
I_lowpass = imfilter(I,h_lp); % and apply with,
% and without replication
I_lowpass_rep = imfilter (I,h_lp,’replicate’);
b = fir1(N,w_hp,’high’); % Repeat for highpass
h_hp = ftrans2(b);
I_highpass = imfilter(I, h_hp);
I_highpass = mat2gray(I_highpass);
%
........plot the images and filter characteristics as in
Example 11.2.......
The figures produced by this program are shown below (Figure 11.5A and
B). Note that there is little difference between the image filtered using zero
padding and the one that uses extended (‘replicate’) padding. The highpass
filtered image shows a slight derivative-like characteristic that enhances edges.
In the plots of frequency characteristics, Figure 11.5B, the lowpass and highpass
filters appear to be circular, symmetrical, and near opposites.
The problem of aliasing due to downsampling was discussed above and
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
demonstrated in Figure 11.1. Such problems could occur whenever an image is
displayed in a smaller size that will require fewer pixels, for example when the
size of an image is reduced during reshaping of a computer window. Lowpass
filtering can be, and is, used to prevent aliasing when an image is downsized.
In fact, MATLAB automatically performs lowpass filtering when downsizing
an image. Example 11.4 demonstrates the ability of lowpass filtering to reduce
aliasing when downsampling is performed.
Example 11.4 Use lowpass filtering to reduce aliasing due to downsam-
pling. Load the radial pattern (‘testpat1.png’) and downsample by a factor
of six as was done in Figure 11.1. In addition, downsample that image by the
same amount, but after it has been lowpass filtered. Plot the two downsampled
images side-by-side. Use a 32 by 32 FIR rectangular window lowpass filter. Set
the cutoff frequency to be as high as possible and still eliminate most of the
aliasing.
% Example 11.4 and Figure 11.6
% Example of the ability of lowpass filtering to reduce aliasing.
% Downsample the radial pattern with and without prior lowpass
% filtering.
% Use a cutoff frequency sufficient to reduce aliasing.
%
clear all; close all;
N = 32; % Filter order
w = .5; % Cutoff frequency (see text)
FIGURE 11.6 Two images of the radial pattern shown in Figure 11.1 after down-
sampling by a factor of 6. The right-hand image was filtered by a lowpass filter
before downsampling.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
dwn = 6; % Downsampling coefficient
b = fir1(N,w); % Generate the lowpass filter
h = ftrans2(b); % Convert to 2-dimensions
%
[Imap] = imread(’testpat1.png’); % Load image
I_lowpass = imfilter(I,h); % Lowpass filter image
[M,N] = size(I);
%
I = I(1:dwn:M,1:dwn:N); % Downsample unfiltered image
subplot (1,2,1); imshow(I); % and display
title(’No Filtering’);
% Downsample filtered image and display
I_lowass = I_lowpass(1:dwn: M,1:dwn:N);
subplot(1,2,2); imshow(I_lowpass);
title (’Lowpass Filtered’);
The lowpass cutoff frequency used in Example 11.5 was determined em-
pirically. Although the cutoff frequency was fairly high ( fS/4), this filter still
produced substantial reduction in aliasing in the downsampled image.
SPATIAL TRANSFORMATIONS
Several useful transformations take place entirely in the spatial domain. Such
transformations include image resizing, rotation, cropping, stretching, shearing,
and image projections. Spatial transformations perform a remapping of pixels
and often require some form of interpolation in addition to possible anti-aliasing.
The primary approach to anti-aliasing is lowpass filtering, as demonstrated
above. For interpolation, there are three methods popularly used in image pro-
cessing, and MATLAB supports all three. All three interpolation strategies use
the same basic approach: the interpolated pixel in the output image is the
weighted sum of pixels in the vicinity of the original pixel after transformation.
The methods differ primarily in how many neighbors are considered.
As mentioned above, spatial transforms involve a remapping of one set of
pixels (i.e., image) to another. In this regard, the original image can be consid-
ered as the input to the remapping process and the transformed image is the
output of this process. If images were continuous, then remapping would not
require interpolation, but the discrete nature of pixels usually necessitates re-
mapping.* The simplest interpolation method is the nearest neighbor method in
which the output pixel is assigned the value of the closest pixel in the trans-
formed image, Figure 11.7. If the transformed image is larger than the original
and involves more pixels, then a remapped input pixel may fall into two or
*A few transformations may not require interpolation such as rotation by 90 or 180 degrees.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.7 A rotation transform using the nearest neighbor interpolation
method. Pixel values in the output image (solid grid) are assigned values from
the nearest pixel in the transformed input image (dashed grid).
more output pixels. In the bilinear interpolation method, the output pixel is the
weighted average of transformed pixels in the nearest 2 by 2 neighborhood, and
in bicubic interpolation the weighted average is taken over a 4 by 4 neighbor-
hood.
Computational complexity and accuracy increase with the number of pix-
els that are considered in the interpolation, so there is a trade-off between quality
and computational time. In MATLAB, the functions that require interpolation
have an optional argument that specifies the method. For most functions, the
default method is nearest neighbor. This method produces acceptable results on
all image classes and is the only method appropriate for indexed images. The
method is also the most appropriate for binary images. For RGB and intensity
image classes, the bilinear or bicubic interpolation method is recommended
since they lead to better results.
MATLAB provides several routines that can be used to generate a variety
of complex spatial transformations such as image projections or specialized dis-
tortions. These transformations can be particularly useful when trying to overlay
(register) images of the same structure taken at different times or with different
modalities (e.g., PET scans and MRI images). While MATLAB’s spatial trans-
formations routines allow for any imaginable transformation, only two types
of transformation will be discussed here: affine transformations and projective
transformations. Affine transformations are defined as transformations in which
straight lines remain straight and parallel lines remain parallel, but rectangles
may become parallelograms. These transformations include rotation, scaling,
stretching, and shearing. In projective translations, straight lines still remain
straight, but parallel lines often converge toward vanishing points. These trans-
formations are discussed in the following MATLAB implementation section.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
MATLAB Implementation
Affine Transformations
MATLAB provides a procedure described below for implementing any affine
transformation; however, some of these transformations are so popular they are
supported by separate routines. These include image resizing, cropping, and
rotation. Image resizing and cropping are both techniques to change the dimen-
sions of an image: the latter is interactive using the mouse and display while
the former is under program control. To change the size of an image, MATLAB
provides the ‘imresize’ command given below.
I_resize = imresize(I, arg or [M N], method);
where I is the original image and I_resize is the resized image. If the second
argument is a scalar arg, then it gives a magnification factor, and if it is a vector,
[M N], it indicates the desired new dimensions in vertical and horizontal pixels,
M, N. If arg > 1, then the image is increased (magnified) in size proportionally
and if arg < 1, it is reduced in size (minified). This will change image size
proportionally. If the vector [M N] is used to specify the output size, image
proportions can be modified: the image can be stretched or compressed along a
given dimension. The argument method specifies the type of interpolation to be
used and can be either ‘nearest’, ‘bilinear’, or ‘bicubic’, referring to the
three interpolation methods described above. The nearest neighbor (nearest) is
the default. If image size is reduced, then imresize automatically applies an
anti- aliasing, lowpass filter unless the interpolation method is nearest; i.e., the
default. The logic of this is that the nearest neighbor interpolation method would
usually only be used with indexed images, and lowpass filtering is not really
appropriate for these images.
Image cropping is an interactive command:
I_resize = imcrop;
The imcrop routine waits for the operator to draw an on-screen cropping
rectangle using the mouse. The current image is resized to include only the
image within the rectangle.
Image rotation is straightforward using the imrotate command:
I_rotate = imrotate(I, deg, method, bbox);
where I is the input image, I_rotate is the rotated image, deg is the degrees
of rotation (counterclockwise if positive, and clockwise if negative), and method
describes the interpolation method as in imresize. Again, the nearest neighbor
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
method is the default even though the other methods are preferred except for
indexed images. After rotation, the image will not, in general, fit into the same
rectangular boundary as the original image. In this situation, the rotated image
can be cropped to fit within the original boundaries or the image size can be
increased to fit the rotated image. Specifying the bbox argument as ‘crop’ will
produce a cropped image having the dimensions of the original image, while
setting bbox to ‘loose’ will produce a larger image that contains the entire
original, unrotated, image. The loose option is the default. In either case, addi-
tional pixels will be required to fit the rotated image into a rectangular space
(except for orthogonal rotations), and imrotate pads these with zeros produc-
ing a black background to the rotated image (see Figure 11.8).
Application of the imresize and imrotate routines is shown in Example
11.5 below. Application of imcrop is presented in one of the problems at the
end of this chapter.
FIGURE 11.8 Two spatial transformations (horizontal stretching and rotation) ap-
plied to an image of bone marrow. The rotated images are cropped either to
include the full image (lower left), or to have the same dimensions are the original
image (lower right). Stained image courtesy of Alan W. Partin, M.D., Ph.D., Johns
Hopkins University School of Medicine.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Example 11.5 Demonstrate resizing and rotation spatial transformations.
Load the image of stained tissue (hestain.png) and transform it so that the
horizontal dimension is 25% longer than in the original, keeping the vertical di-
mension unchanged. Rotate the original image 45 degrees clockwise, with and
without cropping. Display the original and transformed images in a single figure.
% Example 11.5 and Figure 11.8
% Example of various Spatial Transformations
% Input the image of bone marrow (bonemarr.tif) and perform
% two spatial transformations:
% 1) Stretch the object by 25% in the horizontal direction;
% 2) Rotate the image clockwise by 30 deg. with and without
% cropping.
% Display the original and transformed images.
%
.......read image and convert if necessary .......
%
% Rotate image with and without cropping
I_rotate = imrotate(I,-45, ’bilinear’);
I_rotate_crop = imrotate (I, -45, ’bilinear’, ’crop’);
%
[M N] = size(I);
% Stretch by 25% horin.
I_stretch = imresize (I,[M N*1.25], ’bilinear’);
%
.......display the images .........
The images produced by this code are shown in Figure 11.8.
General Affine Transformations
In the MATLAB Image Processing Toolbox, both affine and projective spatial
transformations are defined by a Tform structure which is constructed using one
of two routines: the routine maketform uses parameters supplied by the user to
construct the transformation while cp2tform uses control points, or landmarks,
placed on different images to generate the transformation. Both routines are
very flexible and powerful, but that also means they are quite involved. This
section describes aspects of the maketform routine, while the cp2tfrom routine
will be presented in context with image registration.
Irrespective of the way in which the desired transformation is specified, it
is implemented using the routine imtransform. This routine is only slightly
less complicated than the transformation specification routines, and only some
of its features will be discussed here. (The associated help file should be con-
sulted for more detail.) The basic calling structure used to implement the spatial
transformation is:
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
B = imtransform(A, Tform, ‘Param1’, value1, ‘Param2’,
value2,....);
where A and B are the input and output arrays, respectively, and Tform provides
the transformation specifications as generated by maketform or cp2tform. The
additional arguments are optional. The optional parameters are specified as pairs
of arguments: a string containing the name of the optional parameter (i.e.,
‘Param1’) followed by the value.* These parameters can (1) specify the pixels
to be used from the input image (the default is the entire image), (2) permit a
change in pixel size, (3) specify how to fill any extra background pixels gener-
ated by the transformation, and (4) specify the size and range of the output
array. Only the parameters that specify output range will be discussed here, as
they can be used to override the automatic rescaling of image size performed
by imtransform. To specify output image range and size, parameters ‘XData’
and ‘YData’ are followed by a two-variable vector that gives the x or y coordi-
nates of the first and last elements of the output array, B. To keep the size and
range in the output image the same as the input image, simply specify the hori-
zontal and vertical size of the input array, i.e.:
[M N] = size(A);
...
B = imtransform(A, Tform, ‘Xdata’, [1 N], ‘Ydata’, [1 M]);
As with the transform specification routines, imtransform uses the spa-
tial coordinate system described at the beginning of the Chapter 10. In this
system, the first dimension is the x coordinate while the second is the y, the
reverse of the matrix subscripting convention used by MATLAB. (However the
y coordinate still increases in the downward direction.) In addition, non-integer
values for x and y indexes are allowed.
The routine maketform can be used to generate the spatial transformation
descriptor, Tform. There are two alternative approaches to specifying the trans-
formation, but the most straightforward uses simple geometrical objects to de-
fine the transformation. The calling structure under this approach is:
Tform = maketform(‘type’, U, X);
where ‘type’ defines the type of transformation and U and X are vectors that
define the specific transformation by defining the input (U) and output (X) geom-
etries. While maketform supports a variety of transformation types, including
*This is a common approach used in many MATLAB routines when a large number of arguments
are possible, especially when many of these arguments are optional. It allows the arguments to be
specified in any order.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
custom, user-defined types, only the affine and projective transformations will
be discussed here. These are specified by the type parameters ‘affine’ and
‘projective’.
Only three points are required to define an affine transformation, so, for
this transformation type, U and X define corresponding vertices of input and
output triangles. Specifically, U and X are 3 by 2 matrices where each 2-column
row defines a corresponding vertex that maps input to output geometry. For
example, to stretch an image vertically, define an output triangle that is taller
than the input triangle. Assuming an input image of size M by N, to increase
the vertical dimension by 50% define input (U) and output (X) triangles as:
U = [1, 1; 1, M; N, M]’ X = [1, 1-.5M; 1, M; N, M];
In this example, the input triangle, U, is simply the upper left, lower left,
and lower right corners of the image. The output triangle, X, has its top, left
vertex increased by 50%. (Recall the coordinate pairs are given as x,y and y
increases negatively. Note that negative coordinates are acceptable). To increase
the vertical dimension symmetrically, change X to:
X = [1, 1-.25M; 1, 1.25*M; N, 1.25*M];
In this case, the upper vertex is increased by only 25%, and the two lower
vertexes are lowered in the y direction by increasing the y coordinate value by
25%. This transformation could be done with imresize, but this would also
change the dimension of the output image. When this transform is implemented
with imtransform, it is possible to control output size as described below.
Hence this approach, although more complicated, allows greater control of the
transformation. Of course, if output image size is kept the same, the contents of
the original image, when stretched, may exceed the boundaries of the image and
will be lost. An example of the use of this approach to change image proportions
is given in Problem 6.
The maketform routine can be used to implement other affine transforma-
tions such as shearing. For example, to shear an image to the left, define an
output triangle that is skewed by the desired amount with respect to the input
triangle, Figure 11.9. In Figure 11.9, the input triangle is specified as: U = [N/
2 1; 1 M; N M], (solid line) and the output triangle as X = [1 1; 1 M; N M] (solid
line). This shearing transform is implemented in Example 11.6.
Projective Transformations
In projective transformations, straight lines remain straight but parallel lines
may converge. Projective transformations can be used to give objects perspec-
tive. Projective transformations require four points for definition; hence, the
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.9 An affine transformation can be defined by three points. The trans-
formation shown here is defined by an input (left) and output (right) triangle and
produces a sheared image. M,N are indicated in this figure as row, column, but
are actually specified in the algorithm in reverse order, as x,y. (Original image
from the MATLAB Image Processing Toolbox. Copyright 1993–2003, The Math
Work, Inc. Reprinted with permission.)
defining geometrical objects are quadrilaterals. Figure 11.10 shows a projective
transformation in which the original image would appear to be tilted back. In
this transformation, vertical lines in the original image would converge in the
transformed image. In addition to adding perspective, these transformations are
of value in correcting for relative tilts between image planes during image regis-
tration. In fact, most of these spatial transformations will be revisited in the
section on image registration. Example 11.6 illustrates the use of these general
image transformations for affine and projective transformations.
Example 11.6 General spatial transformations. Apply the affine and pro-
jective spatial transformation to one frame of the MRI image in mri.tif. The
affine transformation should skew the top of the image to the left, just as shown
in Figure 11.9. The projective transformation should tilt the image back as
shown in Figure 11.10. This example will also use projective transformation to
tilt the image forward, or opposite to that shown in Figure 11.10.
After the image is loaded, the affine input triangle is defined as an equilat-
eral triangle inscribed within the full image. The output triangle is defined by
shifting the top point to the left side, so the output triangle is now a right triangle
(see Figure 11.9). In the projective transformation, the input quadrilateral is a
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.10 Specification of a projective transformation by defining two quadri-
laterals. The solid lines define the input quadrilateral and the dashed line defines
the desired output quadrilateral.
rectangle the same size as the input image. The output quadrilateral is generated
by moving the upper points inward and down by an equal amount while the lower
points are moved outward and up, also by a fixed amount. The second projective
transformation is achieved by reversing the operations performed on the corners.
% Example 11.6 General Spatial Transformations
% Load a frame of the MRI image (mri.tif)
% and perform two spatial transformations
% 1) An affine transformation that shears the image to the left
% 2) A projective transformation that tilts the image backward
% 3) A projective transformation that tilts the image forward
clear all; close all;
%
% .......load frame 18 .......
%
% Define affine transformation
U1 = [N/2 1; 1 M; N M]; % Input triangle
X1 = [1 1; 1 M; N M]; % Output triangle
% Generate transform
Tform1 = maketform(’affine’, U1, X1);
% Apply transform
I_affine = imtransform(I, Tform1,’Size’, [M N]);
%
% Define projective transformation vectors
offset = .25*N;
U = [1 1; 1 M; N M; N 1]; % Input quadrilateral
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
X = [1-offset 1؉offset; 1؉offset M-offset; ...
N-offset M-offset; N؉offset 1؉offset];
%
% Define transformation based on vectors U and X
Tform2 = maketform(’projective’, U, X);
I_proj1 = imtransform(I,Tform2,’Xdata’,[1 N],’Ydata’, ...
[1 M]);
%
% Second transformation. Define new output quadrilateral
X = [1؉offset 1؉offset; 1-offset M-offset; ...
N؉offset M-offset; N-offset 1؉offset];
% Generate transform
Tform3 = maketform(’projective’, U, X);
% Apply transform
I_proj2 = imtransform(I,Tform3, ’Xdata’,[1 N],
’Ydata’,[1 M]);
%
.......display images .......
The images produced by this code are shown in Figure 11.11.
Of course, a great many other transforms can be constructed by redefining
the output (or input) triangles or quadrilaterals. Some of these alternative trans-
formations are explored in the problems.
All of these transforms can be applied to produce a series of images hav-
ing slightly different projections. When these multiple images are shown as a
movie, they will give an object the appearance of moving through space, per-
haps in three dimensions. The last three problems at the end of this chapter
explore these features. The following example demonstrates the construction of
such a movie.
Example 11.7 Construct a series of projective transformations, that
when shown as a movie, give the appearance of the image tilting backward in
space. Use one of the frames of the MRI image.
Solution The code below uses the projective transformation to generate
a series of images that appear to tilt back because of the geometry used. The
approach is based on the second projective transformation in Example 11.7, but
adjusts the transformation to produce a slightly larger apparent tilt in each
frame. The program fills 24 frames in such a way that the first 12 have increas-
ing angles of tilt and the last 12 frames have decreasing tilt. When shown as a
movie, the image will appear to rock backward and forward. This same ap-
proach will also be used in Problem 7. Note that as the images are being gener-
ated by imtransform, they are converted to indexed images using gray2ind
since this is the format required by immovie. The grayscale map generated by
gray2ind is used (at the default level of 64), but any other map could be
substituted in immovie to produce a pseudocolor image.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.11 Original MR image and three spatial transformations. Upper right:
An affine transformation that shears the image to the left. Lower left: A projective
transform in which the image is made to appear tilted forward. Lower right: A
projective transformation in which the image is made to appear tilted backward.
(Original image from the MATLAB Image Processing Toolbox, Copyright 1993–
2003, The Math Works, Inc. Reprinted with permission.)
% Example 11.7 Spatial Transformation movie
% Load a frame of the MRI image (mri.tif). Use the projective
% transformation to make a movie of the image as it tilts
% horizontally.
%
clear all; close all;
Nu_frame = 12; % Number of frames in each direction
Max_tilt = .5; % Maximum tilt achieved
........load MRI frame 12 as in previous examples .......
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
%
U = [1 1; 1 M; N M; N 1]; % Input quadrilateral
for i = 1:Nu_frame % Construct Nu_frame * 2 movie frames
% Define projective transformation Vary offset up to Max_tilt
offset = Max_tilt*N*i/Nu_frame;
X = [1؉offset 1؉offset; 1-offset M-offset; N؉offset ...
M-offset; N-offset...1؉offset];
Tform2 = maketform(’projective’, U, X);
[I_proj(:,:,1,i), map] = gray2ind(imtransform(I,Tform2,...
’Xdata’,[1 N],’Ydata’,[1 M]));
% Make image tilt back and forth
I_proj(:,:,1,2*Nu_frame؉1-i) = I_proj(:,:,1,i);
end
%
% Display first 12 images as a montage
montage(I_proj(:,:,:,1:12),map);
mov = immovie(I_proj,map); % Display as movie
movie(mov,5);
While it is not possible to show the movie that is produced by this code,
the various frames are shown as a montage in Figure 11.12. The last three
problems in the problem set explore the use of spatial transformations used in
combination to make movies.
IMAGE REGISTRATION
Image registration is the alignment of two or more images so they best superim-
pose. This task has become increasingly important in medical imaging as it is
used for merging images acquired using different modalities (for example, MRI
and PET). Registration is also useful for comparing images taken of the same
structure at different points in time. In functional magnetic resonance imaging
(fMRI), image alignment is needed for images taken sequentially in time as
well as between images that have different resolutions. To achieve the best
alignment, it may be necessary to transform the images using any or all of the
transformations described previously. Image registration can be quite challenging
even when the images are identical or very similar (as will be the case in the
examples and problems given here). Frequently the images to be aligned are not
that similar, perhaps because they have been acquired using different modalities.
The difficulty in accurately aligning images that are only moderately similar pres-
ents a significant challenge to image registration algorithms, so the task is often
aided by a human intervention or the use of embedded markers for reference.
Approaches to image registration can be divided into two broad catego-
ries: unassisted image registration where the algorithm generates the alignment
without human intervention, and interactive registration where a human operator
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.12 Montage display of the movie produced by the code in Example
11.7. The various projections give the appearance of the brain slice tilting and
moving back in space. Only half the 24 frames are shown here as the rest are
the same, just presented in reverse order to give the appearance of the brain
rocking back and forth. (Original image from the MATLAB Image Processing
Toolbox. Copyright 1993–2003, The Math Works, Inc. Reprinted with permis-
sion.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
guides or aids the registration process. The former approach usually relies on
some optimization technique to maximize the correlation between the images.
In the latter approach, a human operator may aid the alignment process by
selecting corresponding reference points in the images to be aligned: corre-
sponding features are identified by the operator and tagged using some interac-
tive graphics procedure. This approach is well supported in MATLAB’s Image
Processing Toolbox. Both of these approaches are demonstrated in the examples
and problems.
Unaided Image Registration
Unaided image registration usually involves the application of an optimization
algorithm to maximize the correlation, or other measure of similarity, between
the images. In this strategy, the appropriate transformation is applied to one of
the images, the input image, and a comparison is made between this transformed
image and the reference image (also termed the base image). The optimization
routine seeks to vary the transformation in some manner until the comparison
is best possible. The problem with this approach is the same as with all optimi-
zation techniques: the optimization process may converge on a sub-optimal solu-
tion (a so-called local maximum), not the optimal solution (the global maxi-
mum). Often the solution achieved depends on the starting values of the
transformation variables. An example of convergence to a sub-optimal solution
and dependency on initial variables is found in Problem 8.
Example 11.8 below uses the optimization routine that is part of the basic
MATLAB package, fminsearch (formerly fmins). This routine is based on the
simplex (direct search) method, and will adjust any number of parameters to
minimize a function specified though a user routine. To maximize the correspon-
dence between the reference image and the input image, the negative of the
correlation between the two images is minimized. The routine fminsearch will
automatically adjust the transformation variables to achieve this minimum (re-
member that this may not be the absolute minimum).
To implement an optimization search, a routine is required that applies
the transformation variables supplied by fminsearch, performs an appropriate
trial transformation on the input image, then compares the trial image with the
reference image. Following convergence, the optimization routine returns the
values of the transformation variables that produce the best comparison. These
can then be applied to produce the final aligned image. Note that the program-
mer must specify the actual structure of the transformation since the optimiza-
tion routine works blindly and simply seeks a set of variables that produces a
minimum output. The transformation selected should be based on the possible
mechanisms for misalignment: translations, size changes, rotations, skewness,
projective misalignment, or other more complex distortions. For efficiency, the
transformation should be one that requires the least number of defining vari-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
ables. Reducing the number of variables increases the likelihood of optimal
convergence and substantially reduces computation time. To minimize the num-
ber of transformation variables, the simplest transformation that will compensate
for the possible mechanisms of distortions should be used.*
Example 11.8 This is an example of unaided image registration requir-
ing an affine transformation. The input image, the image to be aligned, is a
distorted version of the reference image. Specifically, it has been stretched hori-
zontally, compressed vertically, and tilted, all using a single affine transforma-
tion. The problem is to find a transformation that will realign this image with
the reference image.
Solution MATLAB’s optimization routine fminsearch will be used to
determine an optimized transformation that will make the two images as similar
as possible. MATLAB’s fminsearch routine calls the user routine rescale to
perform the transformation and make the comparison between the two images.
The rescale routine assumes that an affine transformation is required and that
only the horizontal, vertical, and tilt dimensions need to be adjusted. (It does
not, for example, take into account possible translations between the two images,
although this would not be too difficult to incorporate.) The fminsearch routine
requires as input arguments, the name of the routine whose output is to be mini-
mized (in this example, rescale), and the initial values of the transformation
variables (in this example, all 1’s). The routine uses the size of the initial value
vector to determine how many variables it needs to adjust (in this case, three
variables). Any additional input arguments following an optional vector specify-
ing operational features are passed to rescale immediately following the trans-
formation variables. The optimization routine will continue to call rescale au-
tomatically until it has found an acceptable minimum for the error (or until
some maximum number of iterations is reached, see the associated help file).
% Example 11.8 and Figure 11.13
% Image registration after spatial transformation
% Load a frame of the MRI image (mri.tif). Transform the original
% image by increasing it horizontally, decreasing it vertically,
% and tilting it to the right. Also decrease image contrast
% slightly
% Use MATLAB’s basic optimization routine, ’fminsearch’ to find
% the transformation that restores the original image shape.
%
*The number of defining variables depends on the transformation. For example rotation alone only
requires one variable, linear transformations require two variables, affine transformations require 3
variables while projective transformations require 4 variables. Two additional variables are required
for translations.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.13 Unaided image registration requiring several affine transforma-
tions. The left image is the original (reference) image and the distorted center
image is to be aligned with that image. After a transformation determined by opti-
mization, the right image is quite similar to the reference image. (Original image
from the same as fig 11.12.)
clear all; close all;
H_scale = .25; % Define distorting parameters
V_scale = .2; % Horizontal, vertical, and tilt
tilt = .2; % in percent
.......load mri.tif, frame 18.......
[M N]= size(I);
H_scale = H_scale * N/2; % Convert percent scale to pixels
V_scale = V_scale * M;
tilt = tilt * N
%
% Construct distorted image.
U = [1 1; 1 M; N M]; % Input triangle
X = [1-H_scale؉tilt 1؉V_scale; 1-H_scale M; N؉H_scale M];
Tform = maketform(’affine’, U, X);
I_transform = (imtransform(I,Tform,’Xdata’,[1 N], ...
’Ydata’, [1 M]))*.8;
%
% Now find transformation to realign image
initial_scale = [1 1 1]; % Set initial values
[scale,Fval] = fminsearch(’rescale’,initial_scale,[ ], ...
I, I_transform);
disp(Fval) % Display final correlation
%
% Realign image using optimized transform
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
X = [1؉scale(1)؉scale(3) 1 ؉ scale(2); 1؉scale(1) M; ...
N-scale(1) M];
Tform = maketform(’affine’, U, X);
I_aligned = imtransform(I_transform,Tform,’Xdata’,[1 N],
’Ydata’,[1 M]);
%
subplot(1,3,1); imshow(I); %Display the images
title(’Original Image’);
subplot(1,3,2); imshow(I_transform);
title(’Transformed Image’);
subplot(1,3,3); imshow(I_aligned);
title(’Aligned Image’);
The rescale routine is used by fminsearch. This routine takes in the
transformation variables supplied by fminsearch, performs a trial transforma-
tion, and compares the trial image with the reference image. The routine then
returns the error to be minimized calculated as the negative of the correlation
between the two images.
function err = rescale(scale, I, I_transform);
% Function used by ’fminsearch’ to rescale an image
% horizontally, vertically, and with tilt.
% Performs transformation and computes correlation between
% original and newly transformed image.
% Inputs:
% scale Current scale factor (from ’fminsearch’)
% I original image
% I_transform image to be realigned
% Outputs:
% Negative correlation between original and transformed image.
%
[M N]= size(I);
U = [1 1; 1 M; N M]; % Input triangle
%
% Perform trial transformation
X = [1؉scale(1)؉scale(3) 1 ؉ scale(2); 1؉scale(1) M; ...
N-scale(1) M];
Tform = maketform(’affine’, U, X);
I_aligned = imtransform(I_transform,Tform,’Xdata’, ...
[1 N], ’Ydata’,[1 M]);
%
% Calculate negative correlation
err = -abs(corr2(I_aligned,I));
The results achieved by this registration routine are shown in Figure
11.13. The original reference image is shown on the left, and the input image
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
is in the center. As noted above, this image is the same as the reference except
that it has been distorted by several affine transformations (horizontal scratch-
ing, vertical compression, and a tilt). The aligned image achieved by the optimi-
zation is shown on the right. This image is very similar to the reference image.
This optimization was fairly robust: it converged to a correlation of 0.99 from
both positive and negative initial values. However, in many cases, convergence
can depend on the initial values as demonstrated in Problem 8. This program
took about 1 minute to run on a 1 GHz PC.
Interactive Image Registration
Several strategies may be used to guide the registration process. In the example
used here, registration will depend on reference marks provided by a human
operator. Interactive image registration is well supported by the MATLAB Im-
age Processing Toolbox and includes a graphically based program, cpselect,
that automates the process of establishing corresponding reference marks. Under
this procedure, the user interactively identifies a number of corresponding fea-
tures in the reference and input image, and a transform is constructed from these
pairs of reference points. The program must specify the type of transformation
to be performed (linear, affine, projective, etc.), and the minimum number of
reference pairs required will depend on the type of transformation. The number
of reference pairs required is the same as the number of variables needed to
define a transformation: an affine transformation will require a minimum of
three reference points while a projective transformation requires four variables.
Linear transformations require only two pairs, while other more complex trans-
formations may require six or more point pairs. In most cases, the alignment is
improved if more than the minimal number of point pairs is given.
In Example 11.9, an alignment requiring a projective transformation is pre-
sented. This Example uses the routine cp2tform to produce a transformation in
Tform format, based on point pairs obtained interactively. The cp2tform routine
has a large number of options, but the basic calling structure is:
Tform = cp2tform(input_points, base_points, ‘type’);
where input_points is a m by 2 matrix consisting of x,y coordinates of the
reference points in the input image; base_points is a matrix containing the
same information for the reference image. This routine assumes that the points
are entered in the same order, i.e., that corresponding rows in the two vectors
describe corresponding points. The type variable is the same as in maketform
and specifies the type of transform (‘affine’, ‘projective’, etc.). The use
of this routine is demonstrated in Example 11.9.
Example 11.9 An example of interactive image registration. In this ex-
ample, an input image is generated by transforming the reference image with a
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
projective transformation including vertical and horizontal translations. The pro-
gram then opens two windows displaying the reference and input image, and takes
in eight reference points for each image from the operator using the MATLAB
ginput routine. As each point is taken it, it is displayed as an ‘*’ overlaid on
the image. Once all 16 points have been acquired (eight from each image), a
transformation is constructed using cp2tform. This transformation is then ap-
plied to the input image using imtransform. The reference, input, and realigned
images are displayed.
% Example 11.9 Interactive Image Registration
% Load a frame of the MRI image (mri.tif) and perform a spatial
% transformation that tilts the image backward and displaces
% it horizontally and vertically.
% Uses interactive registration and the MATLAB function
% ‘cp2tform’ to realign the image
%
clear all; close all;
nu_points = 8; % Number of reference points
.......Load mri.tif, frame 18 .......
[M N]= size(I);
%
% Construct input image. Perform projective transformation
U = [1 1; 1 M; N M; N 1];
offset = .15*N; % Projection offset
H = .2 * N; % Horizontal translation
V = .15 * M; % Vertical translation
X = [1-offset؉H 1؉offset-V; 1؉offset؉H M-offset-V; ...
N-offset؉H M-offset-V;...N؉offset؉H 1؉offset-V];
Tform1 = maketform(’projective’, U, X);
I_transform = imtransform(I,Tform1,’Xdata’,[1 N], ...
’Ydata’, [1 M]);
%
% Acquire reference points
% First open two display windows
fig(1) = figure;
imshow(I);
fig(2) = figure;
imshow(I_transform);
%
%
for i = 1:2 % Get reference points: both
% images
figure(fig(i)); % Open window i
hold on;
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
title(’Enter four reference points’);
for j = 1:nu_points
[x(j,i), y(j,i)] = ginput(1); % Get reference point
plot(x(j,i), y(j,i),’*’); % Mark reference point
% with *
end
end
%
% Construct transformation with cp2tform and implement with
% imtransform
%
[Tform2, inpts, base_pts] = cp2tform([x(:,2) y(:,2)], ...
[x(:,1) y(:,1)],’projective’);
I_aligned = imtransform(I_transform,Tform2,’Xdata’, ...
[1 N],’Ydata’,[1 M]);
%
figure;
subplot(1,3,1); imshow(I); % Display the images
title(’Original’);
subplot(1,3,2); imshow(I_transform);
title(’Transformation’);
subplot(1,3,3); imshow(I_aligned);
title(’Realigned’);
The reference and input windows are shown along with the reference
points selected in Figure 11.14A and B. Eight points were used rather than the
minimal four, because this was found to produce a better result. The influence
of the number of reference point used is explored in Problem 9. The result
of the transformation is presented in Figure 11.15. This figure shows that the
realignment was less that perfect, and, in fact, the correlation after alignment
was only 0.78. Nonetheless, the primary advantage of this method is that it
couples into the extraordinary abilities of human visual identification and,
hence, can be applied to images that are only vaguely similar when correlation-
based methods would surely fail.
PROBLEMS
1. Load the MATLAB test pattern image testpat1.png used in Example
11.5. Generate and plot the Fourier transform of this image. First plot only the
25 points on either side of the center of this transform, then plot the entire
function, but first take the log for better display.
2. Load the horizontal chirp pattern shown in Figure 11.1 (found on the disk
as imchirp.tif) and take the Fourier transform as in the above problem. Then
multiply the Fourier transform (in complex form) in the horizontal direction by
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.14A A reference image used in Example 11.9 showing the reference
points as black. (Original image from the MATLAB Image Processing Toolbox.
Copyright 1993–2003, The Math Works, Inc. Reprinted with permission.)
FIGURE 11.14B Input image showing reference points corresponding to those
shown in Figure 11.14A.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 11.15 Image registration using a transformation developed interactively.
The original (reference) image is seen on the left, and the input image in the
center. The image after transformation is similar, but not identical to the reference
image. The correlation between the two is 0.79. (Original image from the
MATLAB Image Processing Toolbox. Copyright 1993–2003, The Math Works,
Inc. Reprinted with permission.)
a half-wave sine function of same length. Now take the inverse Fourier trans-
form of this windowed function and plot alongside the original image. Also
apply the window in the vertical direction, take the inverse Fourier transform,
and plot the resulting image. Do not apply fftshift to the Fourier transform
as the inverse Fourier transform routine, ifft2 expects the DC component to
be in the upper left corner as fft2 presents it. Also you should take the absolute
value at the inverse Fourier transform before display, to eliminate any imaginary
components. (The chirp image is square, so you do not have to recompute the
half-wave sine function; however, you may want to plot the sine wave to verify
that you have a correct half-wave sine function ). You should be able to explain
the resulting images. (Hint: Recall the frequency characteristics of the two-point
central difference algorithm used for taking the derivative.)
3. Load the blood cell image (blood1.tif). Design and implement your own
3 by 3 filter that enhances vertical edges that go from dark to light. Repeat for
a filter that enhances horizontal edges that go from light to dark. Plot the two
images along with the original. Convert the first image (vertical edge enhance-
ment) to a binary image and adjust the threshold to emphasize the edges. Plot
this image with the others in the same figure. Plot the three-dimensional fre-
quency representations of the two filters together in another figure.
4. Load the chirp image (imchirp.tif) used in Problem 2. Design a one-
dimensional 64th-order narrowband bandpass filter with cutoff frequencies of
0.1 and 0.125 Hz and apply it the chirp image. Plot the modified image with
the original. Repeat for a 128th-order filter and plot the result with the others.
(This may take a while to run.) In another figure, plot the three-dimensional
frequency representation of a 64th-order filter.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
5. Produce a movie of the rotating brain. Load frame 16 of the MRI image
(mri.tif). Make a multiframe image of the basic image by rotating that image
through 360 degrees. Use 36 frames (10 degrees per rotation) to cover the com-
plete 360 degrees. (If your resources permit, you could use 64 frames with 5
degrees per rotation.) Submit a montage plot of those frames that cover the first
90 degrees of rotation; i.e., the first eight images (or 16, if you use 64 frames).
6. Back in the 1960’s, people were into “expanding their minds” through med-
itation, drugs, rock and roll, or other “mind-expanding” experiences. In this
problem, you will expand the brain in a movie using an affine transformation.
(Note: imresize will not work because it changes the number of pixels in the
image and immovie requires that all images have the same dimensions.) Load
frame 18 of the MRI image (mri.tif). Make a movie where the brain stretches
in and out horizontally from 75% to 150% of normal size. The image will
probably exceed the frame size during its larger excursions, but this is accept-
able. The image should grow symmetrically about the center (i.e., in both direc-
tions.) Use around 24 frames with the latter half of the frames being the reverse
of the first as in Example 11.7, so the brain appears to grow then shrink. Submit
a montage of the first 12 frames. Note: use some care in getting the range of
image sizes to be between 75% and 150%. (Hint: to simplify the computation
of the output triangle, it is best to define the input triangle at three of the image
corners. Note that all three triangle vertices will have to be modified to stretch
the image in both directions, symmetrically about the center.)
7. Produce a spatial transformation movie using a projective transformation.
Load a frame of the MRI image (mri.tif, your choice of frame). Use the projec-
tive transformation to make a movie of the image as it tilts vertically. Use 24
frames as in Example 11.7: the first 12 will tilt the image back while the rest tilt
the image back to its original position. You can use any reasonable transformation
that gives a vertical tilt or rotation. Submit a montage of the first 12 images.
8. Load frame 12 of mri.tif and use imrotate to rotate the image by 15
degrees clockwise. Also reduce image contrast of the rotated image by 25%. Use
MATLAB’s basic optimization program fminsearch to align the image that has
been rotated. (You will need to write a function similar to rescale in Example
11.8 that rotates the image based on the first input parameter, then computes the
negative correlation between the rotated image and the original image.)
9. Load a frame of the MRI image (mri.tif) and perform a spatial transfor-
mation that first expands the image horizontally by 20% then rotates the image
by 20 degrees. Use interactive registration and the MATLAB function cp2t-
form to transform the image. Use (A) the minimum number of points and (B)
twice the minimum number of points. Compare the correlation between the
original and the realigned image using the two different number of reference
points.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
12
Image Segmentation
Image segmentation is the identification and isolation of an image into regions
that—one hopes—correspond to structural units. It is an especially important
operation in biomedical image processing since it is used to isolate physiological
and biological structures of interest. The problems associated with segmentation
have been well studied and a large number of approaches have been developed,
many specific to a particular image. General approaches to segmentation can be
grouped into three classes: pixel-based methods, regional methods, and edge-
based methods. Pixel-based methods are the easiest to understand and to imple-
ment, but are also the least powerful and, since they operate on one element
at time, are particularly susceptible to noise. Continuity-based and edge-based
methods approach the segmentation problem from opposing sides: edge-based
methods search for differences while continuity-based methods search for simi-
larities.
PIXEL-BASED METHODS
The most straightforward and common of the pixel-based methods is threshold-
ing in which all pixels having intensity values above, or below, some level are
classified as part of the segment. Thresholding is an integral part of converting
an intensity image to a binary image as described in Chapter 10. Thresholding
is usually quite fast and can be done in real time allowing for interactive setting
of the threshold. The basic concept of thresholding can be extended to include
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
both upper and lower boundaries, an operation termed slicing since it isolates a
specific range of pixels. Slicing can be generalized to include a number of dif-
ferent upper and lower boundaries, each encoded into a different number. An
example of multiple slicing was presented in Chapter 10 using the MATLAB
gray2slice routine. Finally, when RGB color or pseudocolor images are in-
volved, thresholding can be applied to each color plane separately. The resulting
image could be either a thresholded RGB image, or a single image composed
of a logical combination (AND or OR) of the three image planes after threshold-
ing. An example of this approach is seen in the problems.
A technique that can aid in all image analysis, but is particularly useful in
pixel-based methods, is intensity remapping. In this global procedure, the pixel
values are rescaled so as to extend over different maximum and minimum val-
ues. Usually the rescaling is linear, so each point is adjusted proportionally
with a possible offset. MATLAB supports rescaling with the routine imadjust
described below, which also provides a few common nonlinear rescaling op-
tions. Of course, any rescaling operation is possible using MATLAB code if the
intensity images are of class double, or the image arithmetic routines described
in Chapter 10 are used.
Threshold Level Adjustment
A major concern in these pixel-based methods is setting the threshold or slicing
level(s) appropriately. Usually these levels are set by the program, although in
some situations they can be set interactively by the user.
Finding an appropriate threshold level can be aided by a plot of pixel
intensity distribution over the whole image, regardless of whether you adjust
the pixel level interactively or automatically. Such a plot is termed the intensity
histogram and is supported by the MATLAB routine imhist detailed below.
Figure 12.1 shows an x-ray image of the spine image with its associated density
histogram. Figure 12.1 also shows the binary image obtained by applying a
threshold at a specific point on the histogram. When RGB color images are
being analyzed, intensity histograms can be obtained from all three color planes
and different thresholds established for each color plane with the aid of the
corresponding histogram.
Intensity histograms can be very helpful in selecting threshold levels, not
only for the original image, but for images produced by various segmentation
algorithms described later. Intensity histograms can also be useful in evaluating
the efficacy of different processing schemes: as the separation between struc-
tures improves, histogram peaks should become more distinctive. This relation-
ship between separation and histogram shape is demonstrated in Figures 12.2
and, more dramatically, in Figures 12.3 and 12.4.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.1 An image of bone marrow, upper left, and its associated intensity
histogram, lower plot. The upper right image is obtained by thresholding the origi-
nal image at a value corresponding to the vertical line on the histogram plot.
(Original image from the MATLAB Image Processing Toolbox. Copyright 1993–
2003, The Math Works, Inc. Reprinted with permission.)
Intensity histograms contain no information on position, yet it is spatial
information that is of prime importance in problems of segmentation, so some
strategies have been developed for determining threshold(s) from the histogram
(Sonka et al. 1993). If the intensity histogram is, or can be assumed as, bimodal
(or multi-modal), a common strategy is to search for low points, or minima, in
the histogram. This is the strategy used in Figure 12.1, where the threshold was
set at 0.34, the intensity value at which the histogram shows an approximate
minimum. Such points represent the fewest number of pixels and should pro-
duce minimal classification errors; however, the histogram minima are often
difficult to determine due to variability.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.2 Image of bloods cells with (upper) and without (lower) intermediate
boundaries removed. The associated histograms (right side) show improved sep-
arability when the boundaries are eliminated. The code that generated these im-
ages is given in Example 12.1. (Original image reprinted with permission from
the Image Processing Handbook 2nd edition. Copyright CRC Press, Boca Raton,
Florida.)
An approach to improve the determination of histogram minima is based
on the observation that many boundary points carry values intermediate to the
values on either side of the boundary. These intermediate values will be associ-
ated with the region between the actual boundary values and may mask the
optimal threshold value. However, these intermediate points also have the high-
est gradient, and it should be possible to identify them using a gradient-sensitive
filter, such as the Sobel or Canny filter. After these boundary points are identi-
fied, they can be eliminated from the image, and a new histogram is computed
with a distribution that is possibly more definitive. This strategy is used in
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.3 Thresholded blood cell images. Optimal thresholds were applied to
the blood cell images in Figure 12.2 with (left) and without (right) boundaries pixel
masked. Fewer inappropriate pixels are seen in the right image.
Example 12.1, and Figure 12.2 shows images and associated histograms before
and after removal of boundary points as identified using Canny filtering. The
reduction in the number of intermediate points can be seen in the middle of the
histogram (around 0.45). As shown in Figure 12.3, this leads to slightly better
segmentation of the blood cells.
Another histogram-based strategy that can be used if the distribution is
bimodal is to assume that each mode is the result of a unimodal, Gaussian
distribution. An estimate is then made of the underlying distributions, and the
point at which the two estimated distributions intersect should provide the opti-
mal threshold. The principal problem with this approach is that the distributions
are unlikely to be truly Gaussian.
A threshold strategy that does not use the histogram is based on the con-
cept of minimizing the variance between presumed foreground and background
elements. Although the method assumes two different gray levels, it works well
even when the distribution is not bimodal (Sonka et al., 1993). The approach
uses an iterative process to find a threshold that minimizes the variance between
the intensity values on either side of the threshold level (Outso’s method). This
approach is implemented using the MATLAB routine grayslice (see Example
12.1).
A pixel-based technique that provides a segment boundary directly is con-
tour mapping. Contours are lines of equal intensity, and in a continuous image
they are necessarily continuous: they cannot end within the image, although
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.4 Contour maps drawn from the blood cell image of Figures 12.2 and
12.3. The right image was pre-filtered with a Gaussian lowpass filter (alpha = 3)
before the contour lines were drawn. The contour values were set manually to
provide good images.
they can branch or loop back on themselves. In digital images, these same prop-
erties exist but the value of any given contour line will not generally equal the
values of the pixels it traverses. Rather, it usually reflects values intermediate
between adjacent pixels. To use contour mapping to identify image structures
requires accurate setting of the contour levels, and this carries the same burdens
as thresholding. Nonetheless, contour maps do provide boundaries directly, and,
if subpixel interpolation is used in establishing the contour position, they may
be spatially more accurate. Contour maps are easy to implement in MATLAB,
as shown in the next section on MATLAB Implementation. Figure 12.4 shows
contours maps for the blood cell images shown in Figure 12.2. The right image
was pre-filtered with a Gaussian lowpass filter which reduces noise slightly and
improves the resultant contour image.
Pixel-based approaches can lead to serious errors, even when the average
intensities of the various segments are clearly different, due to noise-induced
intensity variation within the structure. Such variation could be acquired during
image acquisition, but could also be inherent in the structure itself. Figure 12.5
shows two regions with quite different average intensities. Even with optimal
threshold selection, many inappropriate pixels are found in both segments due
to intensity variations within the segments Fig 12.3 (right). Techniques for im-
proving separation in such images are explored in the sections on continuity-
based approaches.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.5 An image with two regions having different average gray levels.
The two regions are clearly distinguishable; however, using thresholding alone, it
is not possible to completely separate the two regions because of noise.
MATLAB Implementation
Some of the routines for implementing pixel-based operations such as im2bw
and grayslice have been described in preceding chapters. The image intensity
histogram routine is produced by imhist without the output arguments:
[counts, x] = imhist(I, N);
where counts is the histogram value at a given x, I is the image, and N is an
optional argument specifying the number of histogram bins (the default is 255).
As mentioned above, imhist is usually invoked without the output arguments,
count and x, to produce a plot directly.
The rescale routine is:
I_rescale = imscale(I, [low high], [bottom top], gamma);
where I_rescale is the rescaled output image, I is the input image. The range
between low and high in the input image is rescaled to be between bottom and
top in the output image.
Several pixel-based techniques are presented in Example 12.1.
Example 12.1 An example of segmentation using pixel-based methods.
Load the image of blood cells, and display along with the intensity histogram.
Remove the edge pixels from the image and display the histogram of this modi-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.6A Histogram of the image shown in Figure 12.3 before (upper) and
after (lower) lowpass filtering. Before filtering the two regions overlap to such an
extend that they cannot be identified. After lowpass filtering, the two regions are
evident, and the boundary found by minimum variance is shown. The application
of this boundary to the filtered image results in perfect separation as shown in
Figure 12.4B.
fied image. Determine thresholds using the minimal variance iterative technique
described above, and apply this approach to threshold both images. Display the
resultant thresholded images.
Solution To remove the edge boundaries, first identify these boundaries
using an edge detection scheme. While any of the edge detection filters de-
scribed previously can be used, this application will use the Canny filter as it is
most robust to noise. This filter is implemented as an option of MATLAB’s
edge routine, which produces a binary image of the boundaries. This binary
image will be converted to a boundary mask by inverting the image using
imcomplement. After inversion, the edge pixels will be zero while all other
pixels will be one. Multiplying the original image by the boundary mask will
produce an image in which the boundary points are removed (i.e., set to zero,
or black). All the images involved in this process, including the original image,
will then be plotted.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.6B Left side: The same image shown in Figure 12.5 after lowpass
filtering. Right side: This filtered image can now be perfectly separated by thresh-
olding.
% Example 12.1 and Figure 12.2 and Figure 12.3
% Lowpass filter blood cell image, then display histograms
% before and after edge point removal.
% Applies “optimal” threshold routine to both original and
% “masked” images and display the results
%
........input image and convert to double.......
h = fspecial(‘gaussian’,12,2); % Construct gaussian
% filter
I_f = imfilter(I,h,‘replicate’); % Filter image
%
I_edge = edge(I_f,‘canny’,.3); % To remove edge
I_rem = I_f .* imcomplement(I_edge); % points, find edge,
% complement and use
% as mask
%
subplot(2,2,1); imshow(I_f); % Display images and
% histograms
title(‘Original Figure’);
subplot(2,2,2); imhist(I_f); axis([0 1 0 1000]);
title(‘Filtered histogram’);
subplot(2,2,3); imshow(I_rem);
title(‘Edge Removed’);
subplot(2,2,4); imhist(I_rem); axis([0 1 0 1000]);
title(‘Edge Removed histogram’);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
%
figure; % Threshold and
% display images
t1 = graythresh(I); % Use minimum variance
% thresholds
t2 = graythresh(I_f);
subplot(1,2,1); imshow(im2bw(I,t1));
title(‘Threshold Original Image’);
subplot(1,2,2); imshow(im2bw(I_f,t2));
title(‘Threshold Masked Image’);
The results have been shown previously in Figures 12.2 and 12.3, and the
improvement in the histogram and threshold separation has been mentioned.
While the change in the histogram is fairly small (Figure 12.2), it does lead to
a reduction in artifacts in the thresholded image, as shown in Figure 12.3. This
small improvement could be quite significant in some applications. Methods
for removing the small remaining artifacts will be described in the section on
morphological operations.
CONTINUITY-BASED METHODS
These approaches look for similarities or consistency in the search for structural
units. As demonstrated in the examples below, these approaches can be very
effective in segmentation tasks, but they all suffer from a lack of edge definition.
This is because they are based on neighborhood operations and these tend to
blur edge regions, as edge pixels are combined with structural segment pixels.
The larger the neighborhood used, the more poorly edges will be defined. Unfor-
tunately, increasing neighborhood size usually improves the power of any given
continuity-based operation, setting up a compromise between identification abil-
ity and edge definition. One easy technique that is based on continuity is low-
pass filtering. Since a lowpass filter is a sliding neighborhood operation that
takes a weighted average over a region, it enhances consistent characteristics.
Figure 12.6A shows histograms of the image in Figure 12.5 before and after
filtering with a Gaussian lowpass filter (alpha = 1.5). Note the substantial im-
provement in separability suggested by the associated histograms. Applying a
threshold to the filtered image results in perfectly isolated segments as shown
in Figure 12.6B. The thresholded images in both Figures 12.5 and 12.4B used
the same minimum variance technique to set the threshold, yet the improvement
brought about by simple lowpass filtering is remarkable.
Image features related to texture can be particularly useful in segmenta-
tion. Figure 12.7 shows three regions that have approximately the same average
intensity values, but are readily distinguished visually because of differences in
texture. Several neighborhood-based operations can be used to distinguish tex-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
tures: the small segment Fourier transform, local variance (or standard devia-
tion), the Laplacian operator, the range operator (the difference between maxi-
mum and minimum pixel values in the neighborhood), the Hurst operator
(maximum difference as a function of pixel separation), and the Haralick opera-
tor (a measure of distance moment). Many of these approaches are either di-
rectly supported in MATLAB, or can be implement using the nlfilter routine
described in Chapter 10.
MATLAB Implementation
Example 12.2 attempts to separate the three regions shown in Figure 12.7 by
applying one of these operators to convert the texture pattern to a difference in
intensity that can then be separated using thresholding.
Example 12.2 Separate out the three segments in Figure 12.7 that differ
only in texture. Use one of the texture operators described above and demon-
strate the improvement in separability through histogram plots. Determine ap-
propriate threshold levels for the three segments from the histogram plot.
FIGURE 12.7 An image containing three regions having approximately the same
intensity, but different textures. While these areas can be distinguished visually,
separation based on intensity or edges will surely fail. (Note the single peak in
the intensity histogram in Figure 12.9–upper plot.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Solution Use the nonlinear range filter to convert the textural patterns
into differences in intensity. The range operator is a sliding neighborhood proce-
dure that takes the difference between the maximum and minimum pixel value
with a neighborhood. Implement this operation using MATLAB’s nlfilter
routine with a 7-by-7 neighborhood.
% Example 12.2 Figures 12.8, 12.9, and 12.10
% Load image ‘texture3.tif’ which contains three regions having
% the same average intensities, but different textural patterns.
% Apply the “range” nonlinear operator using ‘nlfilter’
% Plot original and range histograms and filtered image
%
clear all; close all;
[I] = imread(‘texture3.tif’); % Load image and
I = im2double(I); % Convert to double
%
range = inline(‘max(max(x))— % Define Range function
min (min(x))’);
I_f = nlfilter(I,[7 7], range); % Compute local range
I_f = mat2gray(I_f); % Rescale intensities
FIGURE 12.8 The texture pattern shown in Figure 12.7 after application of the
nonlinear range operation. This operator converts the textural properties in the
original figure into a difference in intensities. The three regions are now clearly
visible as intensity differences and can be isolated using thresholding.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.9 Histogram of original texture pattern before (upper) and after non-
linear filtering using the range operator (lower). After filtering, the three intensity
regions are clearly seen. The thresholds used to isolate the three segments are
indicated.
%
imshow(I_f); % Display results
title(‘“Range” Image’);
figure;
subplot(2,1,1); imhist(I); % Display both histograms
title(‘Original Histogram’)
subplot(2,1,2); imhist(I_f);
title(‘“Range” Histogram’);
figure;
subplot(1,3,1); imshow(im2bw % Display three segments
(I_f,.22));
subplot(1,3,2); imshow(islice % Uses ’islice’ (see below)
(I_f,.22,.54));
subplot(1,3,3); imshow(im2bw(I_f,.54));
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The image produced by the range filter is shown in Figure 12.8, and a
clear distinction in intensity level can now be seen between the three regions.
This is also demonstrated in the histogram plots of Figure 12.9. The histogram
of the original figure (upper plot) shows a single Gaussian-like distribution with
no evidence of the three patterns.* After filtering, the three patterns emerge as
three distinct distributions. Using this distribution, two thresholds were chosen
at minima between the distributions (at 0.22 and 0.54: the solid vertical lines in
Figure 12.9) and the three segments isolated based on these thresholds. The two
end patterns could be isolated using im2bw, but the center pattern used a special
routine, islice. This routine sets pixels to one whose values fall between an
upper and lower boundary; if the pixel has values above or below these bound-
aries, it is set to zero. (This routine is on the disk.) The three fairly well sepa-
rated regions are shown in Figure 12.10. A few artifacts remain in the isolated
images, and subsequent methods can be used to eliminate or reduce these erro-
neous pixels.
Occasionally, segments will have similar intensities and textural proper-
ties, except that the texture differs in orientation. Such patterns can be distin-
guished using a variety of filters that have orientation-specific properties. The
local Fourier transform can also be used to distinguish orientation. Figure 12.11
shows a pattern with texture regions that are different only in terms of their
orientation. In this figure, also given in Example 12.3, orientation was identified
FIGURE 12.10 Isolated regions of the texture pattern in Figure 12.7. Although
there are some artifact, the segmentation is quite good considering the original
image. Methods for reducing the small artifacts will be given in the section on
edge detection.
*In fact, the distribution is Gaussian since the image patterns were generated by filtering an array
filled with Gaussianly distributed numbers generated by randn.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.11 Textural pattern used in Example 12.3. The horizontal and vertical
patterns have the same textural characteristics except for their orientation. As in
Figure 12.7, the three patterns have the same average intensity.
by application of a direction operator that operates only in the horizontal direc-
tion. This is followed by a lowpass filter to improve separability. The intensity
histograms in Figure 12.12 shown at the end of the example demonstrate the
intensity separations achieved by the directional range operator and the improve-
ment provided by the lowpass filter. The different regions are then isolated using
threshold techniques.
Example 12.3 Isolate segments from a texture pattern that includes two
patterns with the same textural characteristics except for orientation. Note that
the approach used in Example 12.2 will fail: the similarity in the statistical
properties of the vertical and horizontal patterns will give rise to similar intensi-
ties following a range operation.
Solution Apply a filter that has directional sensitivity. A Sobel or Prewitt
filter could be used, followed by the range or similar operator, or the operations
could be done in a single step by using a directional range operator. The choice
made in this example is to use a horizontal range operator implemented with
nlfilter. This is followed by a lowpass filter (Gaussian, alpha = 4) to improve
separation by removing intensity variation. Two segments are then isolated us-
ing standard thresholding. In this example, the third segment was constructed
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.12 Images produced by application of a direction range operator ap-
plied to the image in Figure 12.11 before (upper) and after (lower) lowpass filter-
ing. The histograms demonstrate the improved separability of the filter image
showing deeper minima in the filtered histogram.
by applying a logical operation to the other two segments. Alternatively, the
islice routine could have been used as in Example 12.2.
% Example 12.3 and Figures 12.11, 12.12, and 12.13
% Analysis of texture pattern having similar textural
% characteristics but with different orientations. Use a
% direction-specific filter.
%
clear all; close all;
I = imread(‘texture4.tif’); % Load “orientation” texture
I = im2double(I); % Convert to double
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.13 Isolated segments produced by thresholding the lowpass filtered
image in Figure 12.12. The rightmost segment was found by applying logical op-
erations to the other two images.
%
% Define filters and functions: I-D range function
range = inline(‘max(x)—min(x)’);
h_lp = fspecial (‘gaussian’, 20, 4);
%
% Directional nonlinear filter
I_nl = nlfilter(I, [9 1], range);
I_h = imfilter(I_nl*2, h_lp); % Average (lowpass filter)
%
subplot(2,2,1); imshow % Display image and histogram
(I_nl*2); % before lowpass filtering
title(‘Modified Image’); % and after lowpass filtering
subplot(2,2,2); imhist(I_nl);
title(‘Histogram’);
subplot(2,2,3); imshow(I_h*2); % Display modified image
title(‘Modified Image’);
subplot(2,2,4); imhist(I_h);
title(‘Histogram’);
%
figure;
BW1 = im2bw(I_h,.08); % Threshold to isolate segments
BW2 = ϳim2bw(I_h,.29);
BW3 = ϳ(BW1 & BW2); % Find third image from other
% two
subplot(1,3,1); imshow(BW1); % Display segments
subplot(1,3,2); imshow(BW2);
subplot(1,3,3); imshow(BW3);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The image produced by the horizontal range operator with, and without,
lowpass filtering is shown in Figure 12.12. Note the improvement in separation
produced by the lowpass filtering as indicated by a better defined histogram.
The thresholded images are shown in Figure 12.13. As in Example 12.2, the
separation is not perfect, but is quite good considering the challenges posed by
the original image.
Multi-Thresholding
The results of several different segmentation approaches can be combined either
by adding the images together or, more commonly, by first thresholding the
images into separate binary images and then combining them using logical oper-
ations. Either the AND or OR operator would be used depending on the charac-
teristics of each segmentation procedure. If each procedure identified all of the
segments, but also included non-desired areas, the AND operator could be used
to reduce artifacts. An example of the use of the AND operation was found
in Example 12.3 where one segment was found using the inverse of a logical
AND of the other two segments. Alternatively, if each procedure identified
some portion of the segment(s), then the OR operator could be used to com-
bine the various portions. This approach is illustrated in Example 12.4 where
first two, then three, thresholded images are combined to improve segment iden-
tification. The structure of interest is a cell which is shown on a gray back-
ground. Threshold levels above and below the gray background are combined
(after one is inverted) to provide improved isolation. Including a third binary
image obtained by thresholding a texture image further improves the identifica-
tion.
Example 12.4 Isolate the cell structures from the image of a cell shown
in Figure 12.14.
Solution Since the cell is projected against a gray background it is possi-
ble to isolate some portions of the cell by thresholding above and below the
background level. After inversion of the lower threshold image (the one that is
below the background level), the images are combined using a logical OR. Since
the cell also shows some textural features, a texture image is constructed by
taking the regional standard deviation (Figure 12.14). After thresholding, this
texture-based image is also combined with the other two images.
% Example 12.4 and Figures 12.14 and 12.15
% Analysis of the image of a cell using texture and intensity
% information then combining the resultant binary images
% with a logical OR operation.
clear all; close all;
I = imread(‘cell.tif’); % Load “orientation” texture
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.14 Image of cells (left) on a gray background. The textural image
(right) was created based on local variance (standard deviation) and shows
somewhat more definition. (Cancer cell from rat prostate, courtesy of Alan W.
Partin, M.D., Ph.D., Johns Hopkins University School of Medicine.)
I = im2double(I); % Convert to double
%
h = fspecial(‘gaussian’, 20, 2); % Gaussian lowpass filter
%
subplot(1,2,1); imshow(I); % Display original image
title(‘Original Image’);
I_std = (nlfilter(I,[3 3], % Texture operation
’std2’))*6;
I_lp = imfilter(I_std, h); % Average (lowpass filter)
%
subplot(1,2,2); imshow(I_lp*2); % Display texture image
title(‘Filtered image’);
%
figure;
BW_th = im2bw(I,.5); % Threshold image
BW_thc = ϳim2bw(I,.42); % and its complement
BW_std = im2bw(I_std,.2); % Threshold texture image
BW1 = BW_th * BW_thc; % Combine two thresholded
% images
BW2 = BW_std * BW_th * BW_thc; % Combine all three images
subplot(2,2,1); imshow(BW_th); % Display thresholded and
subplot(2,2,2); imshow(BW_thc); % combined images
subplot(2,2,3); imshow(BW1);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.15 Isolated portions of the cells shown in Figure 12.14. The upper
images were created by thresholding the intensity. The lower left image is a com-
bination (logical OR) of the upper images and the lower right image adds a
thresholded texture-based image.
The original and texture images are shown in Figure 12.14. Note that the
texture image has been scaled up, first by a factor of six, then by an additional
factor of two, to bring it within a nominal image range. The intensity thresh-
olded images are shown in Figure 12.15 (upper images; the upper right image
has been inverted). These images are combined in the lower left image. The
lower right image shows the combination of both intensity-based images with
the thresholded texture image. This method of combining images can be ex-
tended to any number of different segmentation approaches.
MORPHOLOGICAL OPERATIONS
Morphological operations have to do with processing shapes. In this sense they
are continuity-based techniques, but in some applications they also operate on
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
edges, making them useful in edge-based approaches as well. In fact, morpho-
logical operations have many image processing applications in addition to seg-
mentation, and they are well represented and supported in the MATLAB Image
Processing Toolbox.
The two most common morphological operations are dilation and erosion.
In dilation the rich get richer and in erosion the poor get poorer. Specifically,
in dilation, the center or active pixel is set to the maximum of its neighbors,
and in erosion it is set to the minimum of its neighbors. Since these operations
are often performed on binary images, dilation tends to expand edges, borders,
or regions, while erosion tends to decrease or even eliminate small regions.
Obviously, the size and shape of the neighborhood used will have a very strong
influence on the effect produced by either operation.
The two processes can be done in tandem, over the same area. Since both
erosion and dilation are nonlinear operations, they are not invertible transforma-
tions; that is, one followed by the other will not generally result in the original
image. If erosion is followed by dilation, the operation is termed opening. If the
image is binary, this combined operation will tend to remove small objects
without changing the shape and size of larger objects. Basically, the initial ero-
sion tends to reduce all objects, but some of the smaller objects will disappear
altogether. The subsequent dilation will restore those objects that were not elimi-
nated by erosion. If the order is reversed and dilation is performed first followed
by erosion, the combined process is called closing. Closing connects objects
that are close to each other, tends to fill up small holes, and smooths an object’s
outline by filling small gaps. As with the more fundamental operations of dila-
tion and erosion, the size of objects removed by opening or filled by closing
depends on the size and shape of the neighborhood that is selected.
An example of the opening operation is shown in Figure 12.16 including
the erosion and dilation steps. This is applied to the blood cell image after
thresholding, the same image shown in Figure 12.3 (left side). Since we wish
to eliminate black artifacts in the background, we first invert the image as shown
in Figure 12.16. As can be seen in the final, opened image, there is a reduction
in the number of artifacts seen in the background, but there is also now a gap
created in one of the cell walls. The opening operation would be more effective
on the image in which intermediate values were masked out (Figure 12.3, right
side), and this is given as a problem at the end of the chapter.
Figure 12.17 shows an example of closing applied to the same blood cell
image. Again the operation was performed on the inverted image. This operation
tends to fill the gaps in the center of the cells; but it also has filled in gaps
between the cells. A much more effective approach to filling holes is to use the
imfill routine described in the section on MATLAB implementation.
Other MATLAB morphological routines provide local maxima and min-
ima, and allows for manipulating the image’s maxima and minima, which im-
plement various fill-in effects.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.16 Example of the opening operation to remove small artifacts. Note
that the final image has fewer background spots, but now one of the cells has a
gap in the wall.
MATLAB Implementation
The erosion and dilation could be implemented using the nonlinear filter routine
nlfilter, although this routine limits the shape of the neighborhood to a rect-
angle. The MATLAB routines imdilate and imerode provide for a variety of
neighborhood shapes and are much faster than nlfilter. As mentioned above,
opening consists of erosion followed by dilation and closing is the reverse.
MATLAB also provide routines for implementing these two operations in one
statement.
To specify the neighborhood used by all of these routines, MATLAB uses
a structuring element.* A structuring element can be defined by a binary array,
where the ones represent the neighborhood and the zeros are irrelevant. This
allows for easy specification of neighborhoods that are nonrectangular, indeed
that can have any arbitrary shape. In addition, MATLAB makes a number of
popular shapes directly available, just as the fspecial routine makes a number
*Not to be confused with a similar term, structural unit, used in the beginning of this chapter. A
structural unit is the object of interest in the image.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.17 Example of closing to fill gaps. In the closed image, some of the
cells are now filled, but some of the gaps between cells have been erroneously
filled in.
of popular two-dimensional filter functions available. The routine to specify the
structuring element is strel and is called as:
structure = strel(shape, NH, arg);
where shape is the type of shape desired, NH usually specifies the size of the
neighborhood, and arg and an argument, frequently optional, that depends on
shape. If shape is ‘arbitrary’, or simply omitted, then NH is an array that
specifies the neighborhood in terms of ones as described above. Prepackaged
shapes include:
‘disk’ a circle of radius NH (in pixels)
‘line’ a line of length NH and angle arg in degrees
‘rectangle’ a rectangle where NH is a two element vector specifying rows and col-
umns
‘diamond’ a diamond where NH is the distance from the center to each corner
‘square’ a square with linear dimensions NH
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
For many of these shapes, the routine strel produces a decomposed
structure that runs significantly faster.
Based on the structure, the statements for dilation, erosion, opening, and
closing are:
I1 = imdilate(I, structure);
I1 = imerode(I, structure);
I1 = imopen(I, structuure);
I1 = imclose(I, structure);
where I1 is the output image, I is the input image and structure is the neigh-
borhood specification given by strel, as described above. In all cases, struc-
ture can be replaced by an array specifying the neighborhood as ones, bypass-
ing the strel routine. In addition, imdilate and imerode have optional
arguments that provide packing and unpacking of the binary input or output
images.
Example 12.5 Apply opening and closing to the thresholded blood cell
images of Figure 12–3 in an effort to remove small background artifacts and to
fill holes. Use a circular structure with a diameter of four pixels.
% Example 12.5 and Figures 12.16 and 12.17
% Demonstration of morphological opening to eliminate small
% artifacts and of morphological closing to fill gaps
% These operations will be applied to the thresholded blood cell
% images of Figure 12.3 (left image).
% Uses a circular or disk shaped structure 4 pixels in diameter
%
clear all; close all;
I = imread(‘blood1.tif’); % Get image and threshold
I = im2double(I);
BW = ϳim2bw(I,thresh(I));
%
SE = strel(‘disk’,4); % Define structure: disk of radius
% 4 pixels
BW1= imerode(BW,SE); % Opening operation: erode
BW2 = imdilate(BW1,SE); % image first, then dilate
%
.......display images.....
%
BW3= imdilate(BW,SE); % Closing operation, dilate image
BW4 = imerode(BW3,SE); % first then erode
%
.......display images.....
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
This example produced the images in Figures 12.15 and 12.16.
Example 12.6 Apply an opening operation to remove the dark patches
seen in the thresholded cell image of Figure 12.15.
% Figures 12.6 and 12.18
% Use opening to remove the dark patches in the thresholded cell
% image of Figure 12.15
%
close all; clear all;
%
SE = strel(‘square’,5); % Define closing structure:
% square 5 pixels on a side
load fig12_15; % Get data of Figure 12.15 (BW2)
BW1= ϳimopen(ϳBW2,SE); % Opening operation
.......Display images.....
The result of this operation is shown in Figure 12.18. In this case, the
closing operation is able to remove completely the dark patches in the center of
the cell image. A 5-by-5 pixel square structural element was used. The size (and
shape) of the structural element controlled the size of artifact removed, and no
attempt was made to optimize its shape. The size was set here as the minimum
that would still remove all of the dark patches. The opening operation in this
example used the single statement imopen. Again, the opening operation oper-
ates on activated (i.e., white pixels), so to remove dark artifacts it is necessary
to invert the image (using the logical NOT operator, ϳ) before performing the
opening operation. The opened image is then inverted again before display.
FIGURE 12.18 Application of the open operation to remove the dark patches in
the binary cell image in Figure 12.15 (lower right). Using a 5 by 5 square struc-
tural element resulted in eliminating all of the dark patches.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
MATLAB morphology routines also allow for manipulation of maxima
and minima in an image. This is useful for identifying objects, and for filling.
Of the many other morphological operations supported by MATLAB, only the
imfill operation will be described here. This operation begins at a designated
pixel and changes connected background pixels (0’s) to foreground pixels (1’s),
stopping only when a boundary is reached. For grayscale images, imfill brings
the intensity levels of the dark areas that are surrounded by lighter areas up to
the same intensity level as surrounding pixels. (In effect, imfill removes re-
gional minima that are not connected to the image border.) The initial pixel can
be supplied to the routine or obtained interactively. Connectivity can be defined
as either four connected or eight connected. In four connectivity, only the four
pixels bordering the four edges of the pixel are considered, while in eight con-
nectivity all pixel that touch, including those that touch only at the corners, are
considered connected.
The basic imfill statement is:
I_out = imfill(I, [r c], con);
where I is the input image, I_out is the output image, [r c] is a two-element
vector specifying the beginning point, and con is an optional argument that is
set to 8 for eight connectivity (four connectivity is the default). (See the help
file to use imfill interactively.) A special option of imfill is available specifi-
cally for filling holes. If the image is binary, a hole is a set of background pixels
that cannot be reached by filling in the background from the edge of the image.
If the image is an intensity image, a hole is an area of dark pixels surrounded
by lighter pixels. To invoke this option, the argument following the input image
should be holes. Figure 12.19 shows the operation performed on the blood cell
image by the statement:
I_out = imfill(I, ‘holes’);
EDGE-BASED SEGMENTATION
Historically, edge-based methods were the first set of tools developed for seg-
mentation. To move from edges to segments, it is necessary to group edges into
chains that correspond to the sides of structural units, i.e., the structural bound-
aries. Approaches vary in how much prior information they use, that is, how
much is used of what is known about the possible shape. False edges and missed
edges are two of the more obvious, and more common, problems associated
with this approach.
The first step in edge-based methods is to identify edges which then be-
come candidates for boundaries. Some of the filters presented in Chapter 11
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.19 Hole filling operation produced by imfill. Note that neither the
edge cell (at the upper image boundary) or the overlapped cell in the center are
filled since they are not actually holes. (Original image reprinted with permission
from the Image Processing Handbook 2nd edition. Copyright CRC Press, Boca
Raton, Florida.)
perform edge enhancement, including the Sobel, Prewitt, and Log filters. In
addition, the Laplacian, which takes the spatial second derivative, can be used
to find edge candidates. The Canny filter is the most advanced edge detector
supported by MATLAB, but it necessarily produces a binary output while many
of the secondary operations require a graded edge image.
Edge relaxation is one approach used to build chains from individual edge
candidate pixels. This approach takes into account the local neighborhood: weak
edges positioned between strong edges are probably part of the edge, while
strong edges in isolation are likely spurious. The Canny filter incorporates a
type of edge relaxation. Various formal schemes have been devised under this
category. A useful method is described in Sonka (1995) that establishes edges
between pixels (so-called crack edges) based on the pixels located at the end
points of the edge.
Another method for extending edges into chains is termed graph search-
ing. In this approach, the endpoints (which could both be the same point in a
closed boundary) are specified, and the edge is determined based on minimizing
some cost function. Possible pathways between the endpoints are selected from
candidate pixels, those that exceed some threshold. The actual path is selected
based on a minimization of the cost function. The cost function could include
features such as the strength of an edge pixel and total length, curvature, and
proximity of the edge to other candidate borders. This approach allows for a
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
great deal of flexibility. Finally, dynamic programming can be used which is
also based on minimizing a cost function.
The methods briefly described above use local information to build up the
boundaries of the structural elements. Details of these methods can be found in
Sonka et al. (1995). Model-based edge detection methods can be used to exploit
prior knowledge of the structural unit. For example, if the shape and size of the
image is known, then a simple matching approach based on correlation can be
used (matched filtering). When the general shape is known, but not the size, the
Hough transform can be used. This approach was originally designed for identi-
fying straight lines and curves, but can be expanded to other shapes provided
the shape can be described analytically.
The basic idea behind the Hough transform is to transform the image into
a parameter space that is constructed specifically to describe the desired shape
analytically. Maxima in this parameter space then correspond to the presence of
the desired image in image space. For example, if the desired object is a straight
line (the original application of the Hough transform), one analytic representa-
tion for this shape is y = mx + b,* and such shapes can be completely defined
by a two-dimensional parameter space of m and b parameters. All straight lines
in image space map to points in parameter space (also known as the accumula-
tor array for reasons that will become obvious). Operating on a binary image
of edge pixels, all possible lines through a given pixel are transformed into m,b
combinations, which then increment the accumulator array. Hence, the accumu-
lator array accumulates the number of potential lines that could exist in the
image. Any active pixel will give rise to a large number of possible line slopes,
m, but only a limited number of m,b combinations. If the image actually contains
a line, then the accumulator element that corresponds to that particular line’s
m,b parameters will have accumulated a large number. The accumulator array
is searched for maxima, or supra threshold locations, and these locations identify
a line or lines in the image.
This concept can be generalized to any shape that can be described analyt-
ically, although the parameter space (i.e., the accumulator) may have to include
several dimensions. For example, to search for circles note that a circle can be
defined in terms of three parameters, a, s, and r for the equation given below.
(y = a)2
+ (x − b)2
= r2
(1)
where a and b define the center point of the circle and r is the radius. Hence
the accumulator space must be three-dimensional to represent a, b, and r.
*This representation of a line will not be able to represent vertical lines since m → ∞ for a vertical
line. However, lines can also be represented in two dimensions using cylindrical coordinates, r and
θ: y = r cos θ + r sin θ.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
MATLAB Implementation
Of the techniques described above, only the Hough transform is supported by
MATLAB image processing routines, and then only for straight lines. It is sup-
ported as the Radon transform which computes projections of the image along
a straight line, but this projection can be done at any angle.* This results in a
projection matrix that is the same as the accumulator array for a straight line
Hough transform when expressed in cylindrical coordinates.
The Radon transform is implemented by the statement:
[R, xp] = radon(BW, theta);
where BW is a binary input image and theta is the projection angle in degrees,
usually a vector of angles. If not specified, theta defaults to (1:179). R is the
projection array where each column is the projection at a specific angle. (R is a
column vector if theta is a constant). Hence, maxima in R correspond to the
positions (encoded as an angle and distance) in the image. An example of the
use of radon to perform the Hough transformation is given in Example 12.7.
Example 12.7 Find the strongest line in the image of Saturn in image
file ‘saturn.tif’. Plot that line superimposed on the image.
Solution First convert the image to an edge array using MATLAB’s
edge routine. Use the Hough transform (implemented for straight lines using
radon) to build an accumulator array. Find the maximum point in that array
(using max) which will give theta, the angle perpendicular to the line, and the
distance along that perpendicular line of the intersection. Convert that line to
rectangular coordinates, then plot the line superimposed on the image.
% Example 12.7 Example of the Hough transform
% (implemented using ‘radon’) to identify lines in an image.
% Use the image of Saturn in ‘saturn.tif’
%
clear all; close all;
radians = 2*pi/360; % Convert from degrees to radians
I = imread(’saturn.tif’); % Get image of Saturn
theta = 0:179; % Define projection angles
BW = edge(I,.02); % Threshold image, threshold set
[R,xp] = radon(BW,theta); % Hough (Radon) transform
% Convert to indexed image
[X, map] = gray2ind (mat2gray(R));
*The Radon transform is an important concept in computed tomography (CT) as described in a
following section.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
%
subplot(1,2,1); imshow(BW) % Display results
title(‘Saturn ϳ Thresholded’);
subplot(1,2,2); imshow(X, hot);
% The hot colormap gives better
% reproduction
%
[M, c] = max(max(R)); % Find maximum element
[M, r] = max(R(:,c));
% Convert to rectangular coordinates
[ri ci] = size(BW); % Size of image array
[ra ca] = size(R); % Size of accumulator array
m = tan((c-90)*radians); % Slope from theta
b = -r/cos((c-90)*radians); % Intercept from basic
% trigonometry
x = (0:ci);
y = m*x ؉ b; % Construct line
subplot(1,2,1); hold on;
plot(x,-y,’r’); % Plot line on graph
subplot(1,2,1); hold on;
plot(c, ra-r,’*k’); % Plot maximum point in
% accumulator
This example produces the images shown in Figure 12.20. The broad
white line superimposed is the line found as the most dominant using the Hough
transform. The location of this in the accumulator or parameter space array is
shown in the right-hand image. Other points nearly as strong (i.e., bright) can
be seen in the parameter array which represent other lines in the image. Of
course, it is possible to identify these lines as well by searching for maxima
other than the global maximum. This is done in a problem below.
PROBLEMS
1. Load the blood cell image (blood1.tif) Filter the image with two lowpass
filters, one having a weak cutoff (for example, Gaussian with an alpha of 0.5)
and the other having a strong cutoff (alpha > 4). Threshold the two filtered
images using the maximum variance routine (graythresh). Display the original
and filtered images along with their histograms. Also display the thresholded
images.
2. The Laplacian filter which calculates the second derivative can also be used
to find edges. In this case edges will be located where the second derivative is
near zero. Load the image of the spine (‘spine.tif’) and filter using the
Laplacian filter (use the default constant). Then threshold this image using
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 12.20 Thresholded image of Saturn (from MATLAB’s saturn.tif) with
the dominant line found by the Hough transform. The right image is the accumula-
tor array with the maximum point indicated by an ‘*’. (Original image is a public
domain image courtesy of NASA, Voyger 2 image, 1981-08-24.)
islice. The threshold values should be on either side of zero and should be
quite small (< 0.02) since you are interested in values quite close to zero.
3. Load image ‘texture3.tif’ which contains three regions having the same
average intensities but different textural patterns. Before applying the nonlinear
range operator used in Example 12.2, preprocess with a Laplacian filter (alpha =
0.5). Apply the range operator as in Example 12.2 using nlfilter. Plot original
and range images along with their histograms. Threshold the range image to
isolate the segments and compare with the figures in the book. (Hint: You may
have to adjust the thresholds slightly, but you do not have to rerun the time-
consuming range operator to adjust these thresholds.) You should observe a
modest improvement: one of the segments can now be perfectly separated.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
4. Load the texture orientation image texture4.tif. Separate the segments
as well as possible by using a Sobel operator followed by a standard deviation
operator implemented using nlfilter. (Note you will have to multiply the
standard deviation image by around 4 to get it into an appropriate range.) Plot
the histogram and use it to determine the best boundaries for separating the
three segments. Display the three segments as white objects.
5. Load the thresholded image of Figure 12.5 (found as Fig12_5.tif on the
disk) and use opening to eliminate as many points as possible in the upper field
without affecting the lower field. Then use closing to try to blacken as many
points as possible in the lower field without affecting the upper field. (You
should be able to blacken the lower field completely except for edge effects.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
13
Image Reconstruction
Medical imaging utilizes several different physical principals or imaging modal-
ities. Common modalities used clinically include x-ray, computed tomography
(CT), positron emission tomography (PET), single photon emission computed
tomography (SPECT), and ultrasound. Other approaches under development in-
clude optical imaging* and impedence tomography. Except for simple x-ray
images which provide a shadow of intervening structures, some form of image
processing is required to produce a useful image. The algorithms used for image
reconstruction depend on the modality. In magnetic resonance imaging (MRI),
reconstruction techniques are fairly straightforward, requiring only a two-dimen-
sional inverse Fourier transform (described later in this chapter). Positron emis-
sion tomography (PET) and computed tomography use projections from colli-
mated beams and the reconstruction algorithm is critical. The quality of the
image is strongly dependent on the image reconstruction algorithm.†
*Of course, optical imaging is used in microscopy, but because of scattering it presents serious
problems when deep tissues are imaged. A number of advanced image processing methods are under
development to overcome problems due to scattering and provide useful images using either coher-
ent or noncoherent light.
†CT may be the first instance where the analysis software is an essential component of medical
diagnosis and comes between the physician and patient: the physician has no recourse but to trust
the software.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
CT, PET, AND SPECT
Reconstructed images from PET, SPECT, and CT all use collimated beams
directed through the target, but they vary in the mechanism used to produce
these collimated beams. CT is based on x-ray beams produced by an external
source that are collimated by the detector: the detector includes a collimator,
usually a long tube that absorbs diagonal or off-axis photons. A similar approach
is used for SPECT, but here the photons are produced by the decay of a radioac-
tive isotope within the patient. Because of the nature of the source, the beams
are not as well collimated in SPECT, and this leads to an unavoidable reduction
in image resolution. Although PET is also based on photons emitted from a
radioactive isotope, the underlying physics provide an opportunity to improve
beam collimation through so-called electronic collimation. In PET, the radioac-
tive isotope emits a positron. Positrons are short lived, and after traveling only
a short distance, they interact with an electron. During this interaction, their
masses are annihilated and two photons are generated traveling in opposite di-
rections, 180 deg. from one another. If two separate detectors are activated at
essentially the same time, then it is likely a positron annihilation occurred some-
where along a line connecting these two detectors. This coincident detection
provides an electronic mechanism for establishing a collimated path that tra-
verses the original positron emission. Note that since the positron does not decay
immediately, but may travel several cm in any direction before annihilation,
there is an inherent limitation on resolution.
In all three modalities, the basic data consists of measurements of the
absorption of x-rays (CT) or concentrations of radioactive material (PET and
SPECT), along a known beam path. From this basic information, the reconstruc-
tion algorithm must generate an image of either the tissue absorption character-
istics or isotope concentrations. The mathematics are fairly similar for both
absorption and emission processes and will be described here in terms of absorp-
tion processes; i.e., CT. (See Kak and Slaney (1988) for a mathematical descrip-
tion of emission processes.)
In CT, the intensity of an x-ray beam is dependent on the intensity of the
source, Io, the absorption coefficient, µ, and length, R, of the intervening tissue:
I(x,y) = Ioe−µR
(1)
where I(x,y) is the beam intensity (proportional to number of photons) at posi-
tion x,y. If the beam passes through tissue components having different absorp-
tion coefficients then, assuming the tissue is divided into equal sections ∆R, Eq.
(1) becomes:
I(x,y) = Ioexpͩ−∑
i
µ(x,y)∆Rͪ (2)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The projection p(x,y), is the log of the intensity ratio, and is obtained by
dividing out Io and taking the natural log:
p(x,y) = lnͩ Io
I(x,y)ͪ= ∑
i
µi(x,y)∆R (3)
Eq. (3) is also expressed as a continuous equation where it becomes the
line integral of the attenuation coefficients from the source to the detector:
p(x,y) = ∫
Detector
Source
µ(x,y)dR (4)
Figure 13.1A shows a series of collimated parallel beams traveling
through tissue.* All of these beams are at the same angle, θ, with respect to the
reference axis. The output of each beam is just the projection of absorption
characteristics of the intervening tissue as defined in Eq. (4). The projections of
all the individual parallel beams constitute a projection profile of the intervening
FIGURE 13.1 (A) A series of parallel beam paths at a given angle, θ, is projected
through biological tissue. The net absorption of each beam can be plotted as a
projection profile. (B) A large number of such parallel paths, each at a different
angle, is required to obtain enough information to reconstruct the image.
*In modern CT scanners, the beams are not parallel, but dispersed in a spreading pattern from a
single source to an array of detectors, a so-called fan beam pattern. To simplify the analysis pre-
sented here, we will assume a parallel beam geometry. Kak and Slaney (1988) also cover the
derivation of reconstruction algorithms for fan beam geometry.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
tissue absorption coefficients. With only one projection profile, it is not possible
to determine how the tissue absorptions are distributed along the paths. How-
ever, if a large number of projections are taken at different angles through the
tissue, Figure 13.1B, it ought to be possible, at least in principle, to estimate the
distribution of absorption coefficients from some combined analysis applied to
all of the projections. This analysis is the challenge given to the CT reconstruc-
tion algorithm.
If the problem were reversed, that is, if the distribution of tissue absorption
coefficients was known, determining the projection profile produced by a set of
parallel beams would be straightforward. As stated in Eq. (13-4), the output of
each beam is the line integral over the beam path through the tissue. If the beam
is at an angle, θ (Figure 13-2), then the equation for a line passing through the
origin at angle θ is:
x cos θ + y sin θ = 0 (5)
and the projection for that single line at a fixed angle, pθ, becomes:
pθ = ∫
∞
−∞ ∫
∞
−∞
I(x,y)(x cosθ + y sinθ) dxdy (6)
where I(x,y) is the distribution of absorption coefficients as Eq. (2). If the beam
is displaced a distance, r, from the axis in a direction perpendicular to θ, Figure
13.2, the equation for that path is:
FIGURE 13.2 A single beam path is defined mathematically by the equation given
in Eq. (5).
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
x cos θ + y sin θ − r = 0 (7)
The whole family of parallel paths can be mathematically defined using
Eqs. (6) and (7) combined with the Dirac delta distribution, δ, to represent the
discrete parallel beams. The equation describing the entire projection profile,
pθ(r), becomes:
pθ(r) = ∫
∞
−∞ ∫
∞
−∞
I(x,y) δ(x cosθ + y sinθ − r) dxdy (8)
This equation is known as the Radon transform, . It is the same as the
Hough transform (Chapter 12) for the case of straight lines. The expression for
pθ(r) can also be written succinctly as:
pθ(r) = [I(x,y)] (9)
The forward Radon transform can be used to generate raw CT data from
image data, useful in problems, examples, and simulations. This is the approach
that is used in some of the examples given in the MATLAB Implementation
section, and also to generate the CT data used in the problems.
The Radon transform is helpful in understanding the problem, but does
not help in the actual reconstruction. Reconstructing the image from the projec-
tion profiles is a classic inverse problem. You know what comes out—the pro-
jection profiles—but want to know the image (or, in the more general case, the
system), that produced that output. From the definition of the Radon transform
in Eq. (9), the image should result from the application of an inverse Radon
transform −1
, to the projection profiles, pθ(r):
I(x,y) = −1
[pθ(r)] (10)
While the Radon transform (Eqs. (8) and (9)) and inverse Radon trans-
form (Eq. (10)) are expressed in terms of continuous variables, in imaging sys-
tems the absorption coefficients are given in terms of discrete pixels, I(n,m),
and the integrals in the above equations become summations. In the discrete
situation, the absorption of each pixel is an unknown, and each beam path pro-
vides a single projection ratio that is the solution to a multi-variable equation.
If the image contains N by M pixels, and there are N × M different projections
(beam paths) available, then the system is adequately determined, and the recon-
struction problem is simply a matter of solving a large number of simultaneous
equations. Unfortunately, the number of simultaneous equations that must be
solved is generally so large that a direct solution becomes unworkable. The early
attempts at CT reconstruction used an iterative approach called the algebraic
reconstruction algorithm or ART. In this algorithm, each pixel was updated
based on errors between projections that would be obtained from the current
pixel values and the actual projections. When many pixels are involved, conver-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
gence was slow and the algorithm was computationally intensive and time-
consuming. Current approaches can be classified as either transform methods or
series expansion methods. The filtered back-projection method described below
falls into the first category and is one of the most popular of CT reconstruction
approaches.
Filtered back-projection can be described in either the spatial or spatial
frequency domain. While often implemented in the latter, the former is more
intuitive. In back-projection, each pixel absorption coefficient is set to the sum
(or average) of the values of all projections that traverse the pixel. In other
words, each projection that traverses a pixel contributes its full value to the
pixel, and the contributions from all of the beam paths that traverse that pixel
are simply added or averaged. Figure 13.3 shows a simple 3-by-3 pixel grid
with a highly absorbing center pixel (absorption coefficient of 8) against a back-
ground of lessor absorbing pixels. Three projection profiles are shown traversing
the grid horizontally, vertically, and diagonally. The lower grid shows the image
that would be reconstructed using back-projection alone. Each grid contains the
average of the projections though that pixel. This reconstructed image resembles
the original with a large central value surrounded by smaller values, but the
background is no longer constant. This background variation is the result of
blurring or smearing the central image over the background.
To correct the blurring or smoothing associated with the back-projection
method, a spatial filter can be used. Since the distortion is in the form of a
blurring or smoothing, spatial differentiation is appropriate. The most common
filter is a pure derivative up to some maximum spatial frequency. In the fre-
quency domain, this filter, termed the Ram-Lak filter, is a ramp up to some
maximum cutoff frequency. As with all derivative filters, high-frequency noise
will be increased, so this filter is often modified by the addition of a lowpass
filter. Lowpass filters that can be used include the Hamming window, the Han-
ning window, a cosine window, or a sinc function window (the Shepp-Logan
filter). (The frequency characteristics of these filters are shown in Figure 13.4).
Figure 13.5 shows a simple image of a light square on a dark background. The
projection profiles produced by the image are also shown (calculated using the
Radon transform).
The back-projection reconstruction of this image shows a blurred version
of the basic square form with indistinct borders. Application of a highpass filter
sharpens the image (Figure 13.4). The MATLAB implementation of the inverse
Radon transform, iradon described in the next section, uses the filtered back-
projection method and also provides for all of the filter options.
Filtered back-projection is easiest to implement in the frequency domain.
The Fourier slice theorem states that the one-dimensional Fourier transform of
a projection profile forms a single, radial line in the two-dimensional Fourier
transform of the image. This radial line will have the same angle in the spatial
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.3 Example of back-projection on a simple 3-by-3 pixel grid. The up-
per grid represents the original image which contains a dark (absorption 8) center
pixel surrounded by lighter (absorption 2) pixels. The projections are taken as the
linear addition of all intervening pixels. In the lower reconstructed image, each
pixel is set to the average of all beams that cross that pixel. (Normally the sum
would be taken over a much larger set of pixels.) The center pixel is still higher
in absorption, but the background is no longer the same. This represents a
smearing of the original image.
frequency domain as the projection angle (Figure 13.6). Once the two-dimen-
sional Fourier transform space is filled from the individual one-dimensional
Fourier transforms of the projection profiles, the image can be constructed by
applying the inverse two-dimensional Fourier transform to this space. Before
the inverse transform is done, the appropriate filter can be applied directly in
the frequency domain using multiplication.
As with other images, reconstructed CT images can suffer from alaising
if they are undersampled. Undersampling can be the result of an insufficient
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.4 Magnitude frequency characteristics of four common filters used in
filtered back-projection. They all show highpass characteristics at lower frequen-
cies. The cosine filter has the same frequency characteristics as the two-point
central difference algorithm.
number of parallel beams in the projection profile or too few rotation angles.
The former is explored in Figure 13.7 which shows the square pattern of Figure
13.5 sampled with one-half (left-hand image) and one-quarter (right-hand im-
age) the number of parallel beams used in Figure 13.5. The images have been
multiplied by a factor of 10 to enhance the faint aliasing artifacts. One of the
problems at the end of this chapter explores the influence of undersampling by
reducing the number of angular rotations an well as reducing the number of
parallel beams.
Fan Beam Geometry
For practical reasons, modern CT scanners use fan beam geometry. This geome-
try usually involves a single source and a ring of detectors. The source rotates
around the patient while those detectors in the beam path acquire the data. This
allows very high speed image acquisition, as short as half a second. The source
fan beam is shaped so that the beam hits a number of detections simultaneously,
Figure 13.8. MATLAB provides several routines that provide the Radon and
inverse Radon transform for fan beam geometry.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.5 Image reconstruction of a simple white square against a black
background. Back-projection alone produces a smeared image which can be cor-
rected with a spatial derivative filter. These images were generated using the
code given in Example 13.1.
MATLAB Implementation
Radon Transform
The MATLAB Image Processing Toolbox contains routines that perform both
the Radon and inverse Radon transforms. The Radon transform routine has al-
ready been introduced as an implementation of the Hough transform for straight
line objects. The procedure here is essentially the same, except that an intensity
image is used as the input instead of the binary image used in the Hough trans-
form.
[p, xp] = radon(I, theta);
where I is the image of interest and theta is the production angle in degs.s,
usually a vector of angles. If not specified, theta defaults to (1:179). The output
parameter p is the projection array, where each column is the projection profile
at a specific angle. The optional output parameter, xp gives the radial coordi-
nates for each row of p and can be used in displaying the projection data.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.6 Schematic representation of the steps in filtered back-projection
using frequency domain techniques. The steps shown are for a single projection
profile and would be repeated for each projection angle.
FIGURE 13.7 Image reconstructions of the same simple pattern shown in Figure
13.4, but undersampled by a factor of two (left image) or four (right image). The
contrast has been increased by a factor of ten to enhance the relatively low-
intensity aliasing patterns.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.8 A series of beams is projected from a single source in a fan-like
pattern. The beams fall upon a number of detectors arranged in a ring around
the patient. Fan beams typically range between 30 to 60 deg. In the most recent
CT scanners (so-called fourth-generation machines) the detectors completely en-
circle the patient, and the source can rotate continuously.
Inverse Radon Transform: Parallel Beam Geometry
MATLAB’s inverse Radon transform is based on filtered back-projection and
uses the frequency domain approach illustrated in Figure 13.6. A variety of
filtering options are available and are implemented directly in the frequency
domain.
The calling structure of the inverse Radon transform is:
[I,f] = iradon(p,theta,interp,filter,d,n);
where p is the only required input argument and is a matrix where each column
contains one projection profile. The angle of the projection profiles is specified
by theta in one of two ways: if theta is a scalar, it specifies the angular
spacing (in degs.s) between projection profiles (with an assumed range of zero
to number of columns − 1); if theta is a vector, it specifies the angles them-
selves, which must be evenly spaced. The default theta is 180 deg. divided by
the number of columns. During reconstruction, iradon assumes that the center
of rotation is half the number of rows (i.e., the midpoint of the projection pro-
file: ceil(size (p,1)/2)).
The optional argument interp is a string specifying the back-projection
interpolation method: ‘nearest’, ‘linear’ (the default), and ‘spline’. The
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
filter option is also specified as a string. The ‘Ram-Lak’ option is the default
and consists of a ramp in frequency (i.e., an ideal derivative) up to some maxi-
mum frequency (Figure 13.4 (on p. 382)). Since this filter is prone to high-
frequency noise, other options multiply the ramp function by a lowpass function.
These lowpass functions are the same as described above: Hamming window
(‘Hamming’), Hanning window (‘Hann’), cosine (‘cosine’), and sinc
(‘Shepp-Logan’) function. Frequency plots of several of these filters are shown
in Figure 13.4. The filter’s frequency characteristics can be modified by the
optional parameter, d, which scales the frequency axis: if d is less than one (the
default value is one) then filter transfer function values above d, in normalized
frequency, are set to 0. Hence, decreasing d increases the lowpass filter effect.
The optional input argument, n, can be reused to rescale the image. These filter
options are explored in several of the problems.
The image is contained in the output matrix I (class double), and the
optional output vector, h, contains the filter’s frequency response. (This output
vector was used to generate the filter frequency curves of Figure 13.4.) An
application of the inverse Radon transform is given in Example 13.1.
Example 13.1 Example of the use of back-projection and filtered back-
projection. After a simple image of a white square against a dark background is
generated, the CT projections are constructed using the forward Radon trans-
form. The original image is reconstructed from these projections using both
the filtered and unfiltered back-projection algorithm. The original image, the
projections, and the two reconstructed images are displayed in Figure 13.5 on
page 385.
% Example 13.1 and Figure 13.4.
% Image Reconstruction using back-projection and filtered
% back-projection.
% Uses MATLAB’s ‘iradon’ for filtered back-projection and
% ‘i_back’ for unfiltered back-projection.
% (This routine is a version of ‘iradon’ modified to eliminate
% the filter.)
% Construct a simple image consisting of a white square against
% a black background. Then apply back-projection without
% filtering and with the derivative (Ram-Lak) filters.
% Display the original and reconstructed images along with the
% projections.
%
clear all; close all;
%
I = zeros(128,128); % Construct image: black
I(44:84,44:84) = 1; % background with a central
% white square
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
%
% Generate the projections using ‘radon’
theta = (1:180); % Angle between projections
% is 1 deg.
[p,xp] = radon(I, theta);
%
% Now reconstruct the image
I_back = i_back(p,delta_theta); % Back-projection alone
I_back = mat2gray(I_back); % Convert to grayscale
I_filter_back = iradon % Filtered back-projection
(p,delta_theta);
%
.......Display images.......
The display generated by this code is given in Figure 13.4. Example 13.2
explores the effect of filtering on the reconstructed images.
Example 13.2 The inverse Radon transform filters. Generate CT data by
applying the Radon transform to an MRI image of the brain (an unusual exam-
ple of mixed modalities!). Reconstruct the image using the inverse Radon trans-
form with the Ram-Lak (derivative) filter and the cosine filter with a maximum
relative frequency of 0.4. Display the original and reconstructed images.
% Example 13.2 and Figure 13.9 Image Reconstruction using
% filtered back-projection
% Uses MATLAB’s ‘iradon’ for filtered backprojection
% Load a frame of the MRI image (mri.tif) and construct the CT
% projections using ‘radon’. Then apply backprojection with
% two different filters: Ram-Lak and cosine (with 0.4 as
% highest frequency
%
clear all; close all;
frame = 18; % Use MR image slice 18
[I(:,:,:,1), map ] = imread(‘mri.tif’,frame);
if isempty(map) == 0 % Check to see if Indexed data
I = ind2gray(I,map); % If so, convert to Intensity
% image
end
I = im2double(I); % Convert to double and scale
%
% Construct projections of MR image
delta_theta = (1:180);
[p,xp] = radon(I,delta_theta); % Angle between projections
% is 1 deg.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
%
% Reconstruct image using Ram-Lak filter
I_RamLak = iradon(p,delta_theta,‘Ram-Lak’);
%
.......Display images.......
Radon and Inverse Radon Transform: Fan Beam Geometry
The MATLAB routines for performing the Radon and inverse Radon transform
using fan beam geometry are termed fanbeam and ifanbeam, respectively, and
have the form:
fan = fanbeam(I,D)
where I is the input image and D is a scalar that specifies the distance between
the beam vertex and the center of rotation of the beams. The output, fan, is a
matrix containing the fan bean projection profiles, where each column contains
the sensor samples at one rotation angle. It is assumed that the sensors have a
one-deg. spacing and the rotation angles are spaced equally over 0 to 359 deg.
A number of optional input variables specify different geometries, sensor spac-
ing, and rotation increments.
The inverse Radon transform for fan beam projections is specified as:
I = ifanbeam(fan,D)
FIGURE 13.9 Original MR image and reconstructed images using the inverse
Radon transform with the Ram-Lak derivative and the cosine filter. The cosine
filter’s lowpass cutoff has been modified by setting its maximum relative fre-
quency to 0.4. The Ram-Lak reconstruction is not as sharp as the original image
and sharpness is reduced further by the cosine filter with its lowered bandwidth.
(Original image from the MATLAB Image Processing Toolbox. Copyright 1993–
2003, The Math Works, Inc. Reprinted with permission.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
where fan is the matrix of projections and D is the distance between beam
vertex and the center of rotation. The output, I, is the reconstructed image.
Again there are a number of optional input arguments specifying the same type
of information as in fanbeam. This routine first converts the fan beam geometry
into a parallel geometry, then applies filtered back-projection as in iradon.
During the filtered back-projection stage, it is possible to specify filter options
as in iradon. To specify, the string ‘Filter’ should precede the filter name
(‘Hamming’, ‘Hann’, ‘cosine’, etc.).
Example 13.3 Fan beam geometry. Apply the fan beam and parallel
beam Radon transform to the simple square shown in Figure 13.4. Reconstruct
the image using the inverse Radon transform for both geometries.
% Example 13.3 and Figure 13.10
% Example of reconstruction using the Fan Beam Geometry
% Reconstructs a pattern of 4 square of different intensities
% using parallel beam and fan beam approaches.
%
clear all; close all;
D = 150; % Distance between fan beam vertex
% and center of rotation
theta = (1:180); % Angle between parallel
% projections is 1 deg.
%
I = zeros(128,128); % Generate image
I(22:54,22:52) = .25; % Four squares of different shades
I(76;106,22:52) = .5; % against a black background
I(22:52,76:106) = .75;
I(76:106,76:106) = 1;
%
% Construct projections: Fan and parallel beam
[F,Floc,Fangles] = fanbeam (I,D,‘FanSensorSpacing’,.5);
[R,xp] = radon(I,theta);
%
% Reconstruct images. Use Shepp-Logan filter
I_rfb = ifanbeam(F,D,‘FanSensorSpacing’,.5,‘Filter’, ...
‘Shepp-Logan’);
I_filter_back = iradon(R,theta,‘Shepp-Logan’);
%
% Display images
subplot(1,2,1);
imshow(I_rfb); title(‘Fan Beam’)
subplot(1,2,2);
imshow(I_filter_back); title(‘Parallel Beam’)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The images generated by this example are shown in Figure 13.10. There
are small artifacts due to the distance between the beam source and the center
of rotation. The affect of this distance is explored in one of the problems.
MAGNETIC RESONANCE IMAGING
Basic Principles
MRI images can be acquired in a number of ways using different image acquisi-
tion protocols. One of the more common protocols, the spin echo pulse sequence,
will be described with the understanding that a fair number of alternatives are
commonly used. In this sequence, the image is constructed on a slice-by-slice
basis, although the data are obtained on a line-by-line basis. For each slice, the
raw MRI data encode the image as a variation in signal frequency in one dimen-
sion, and in signal phase in the other. To reconstruct the image only requires
the application of a two-dimensional inverse Fourier transform to this fre-
quency/phase encoded data. If desired, spatial filtering can be implemented in
the frequency domain before applying the inverse Fourier transform.
The physics underlying MRI is involved and requires quantum mechanics
for a complete description. However, most descriptions are approximations that
use classical mechanics. The description provided here will be even more abbre-
viated than most. (For a detailed classical description of the MRI physics see
Wright’s chapter in Enderle et al., 2000.). Nuclear magnetism occurs in nuclei
with an odd number of nucleons (protons and/or neutrons). In the presence of a
magnetic field such nuclei possess a magnetic dipole due to a quantum mechani-
FIGURE 13.10 Reconstruction of an image of four squares at different intensities
using parallel beam and fan beam geometry. Some artifact is seen in the fan
beam geometry due to the distance between the beam source and object (see
Problem 3).
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
cal property known as spin.* In MRI lingo, the nucleus and/or the associated
magnetic dipole is termed a spin. For clinical imaging, the hydrogen proton is
used because it occurs in large numbers in biological tissue. Although there are
a large number of hydrogen protons, or spins, in biological tissue (1 mm3
of
water contains 6.7 × 1019
protons), the net magnetic moment that can be pro-
duced, even if they were all aligned, is small due to the near balance between
spin-up (1⁄2) and spin-down (−1⁄2) states. When they are placed in a magnetic
field, the magnetic dipoles are not static, but rotate around the axis of the applied
magnetic field like spinning tops, Figure 13.11A (hence, the spins themselves
spin). A group of these spins produces a net moment in the direction of the
magnetic field, z, but since they are not in phase, any horizontal moment in the
x and y direction tends to cancel (Figure 13.11B).
While the various spins do not have the same relative phase, they do all
rotate at the same frequency, a frequency given by the Larmor equation:
ωo = γH (11)
FIGURE 13.11 (A) A single proton has a magnetic moment which rotates in the
presence of an applied magnet field, Bz. This dipole moment could be up or down
with a slight favoritism towards up, as shown. (B) A group of upward dipoles
create a net moment in the same direction as the magnetic field, but any horizon-
tal moments (x or y) tend to cancel. Note that all of these dipole vectors should
be rotating, but for obvious reasons they are shown as stationary with the as-
sumption that they rotate, or more rigorously, that the coordinate system is ro-
tating.
*Nuclear spin is not really a spin, but another one of those mysterious quantum mechanical proper-
ties. Nuclear spin can take on values of ±1/2, with +1/2 slightly favored in a magnetic field.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
where ωo is the frequency in radians, H is the magnitude of the magnitude field,
and γ is a constant termed the gyromagnetic constant. Although γ is primarily a
function of the type of nucleus it also depends slightly on the local chemical
environment. As shown below, this equation contains the key to spatial localiza-
tion in MRI: variations in local magnetic field will encode as variations in rota-
tional frequency of the protons.
If these rotating spins are exposed to electromagnetic energy at the rota-
tional or Larmor frequency specified in Eq. (11), they will absorb this energy
and rotate further and further from their equilibrium position near the z axis:
they are tipped away from the z axis (Figure 13.12A). They will also be syn-
chronized by this energy, so that they now have a net horizontal moment. For
protons, the Larmor frequency is in the radio frequency (rf) range, so an rf
pulse of the appropriate frequency in the xy-plane will tip the spins away from
the z-axis an amount that depends on the length of the pulse:
θ = γHTp (12)
where θ is the tip angle and Tp pulse time. Usually Tp is adjusted to tip the angle
either 90 or 180 deg. As described subsequently, a 90 deg. tip is used to generate
the strongest possible signal and an 180 deg tip, which changes the sign of the
FIGURE 13.12 (A) After an rf pulse that tips the spins 90 deg., the net magnetic
moment looks like a vector, Mxy, rotating in the xy-plane. The net vector in the z
direction is zero. (B) After the rf energy is removed, all of the spins begin to relax
back to their equilibrium position, increasing the z component, Mz, and decreas-
ing the xy component, Mxy. The xy component also decreases as the spins de-
synchronize.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
moment, is used to generate an echo signal. Note that a given 90 or 180 deg. Tp
will only flip those spins that are exposed to the appropriate local magnetic
field, H.
When all of the spins in a region are tipped 90 deg. and synchronized,
there will be a net magnetic moment rotating in the xy-plane, but the component
of the moment in the z direction will be zero (Figure 13.12A). When the rf
pulse ends, the rotating magnetic field will generate its own rf signal, also at
the Larmor frequency. This signal is known as the free induction decay (FID)
signal. It is this signal that induces a small voltage in the receiver coil, and it is
this signal that is used to construct the MR image. Immediately after the pulse
ends, the signal generated is given by:
S(t) = ρ sin (θ) cos(ωot) (13)
where ωo is the Larmor frequency, θ is the tip angle, and ρ is the density of
spins. Note that a tip angle of 90 deg. produces the strongest signal.
Over time the spins will tend to relax towards the equilibrium position
(Figure 13.12B). This relaxation is known as the longitudinal or spin-lattice
relaxation time and is approximately exponential with a time constant denoted
as “T1.” As seen in Figure 13.12B, it has the effect of increasing the horizontal
moment, Mz, and decreasing the xy moment, Mxy. The xy moment is decreased
even further, and much faster, by a loss of synchronization of the collective
spins, since they are all exposed to a slightly different magnetic environment
from neighboring atoms (Figure 13.12B). This so-called transverse or spin-spin
relaxation time is also exponential and decays with a time constant termed “T2.”
The spin-spin relaxation time is always less than the spin lattice relaxation time,
so that by the time the net moment returns to equilibrium position along the z
axis the individual spins are completely de-phased. Local inhomogeneities in
the applied magnetic field cause an even faster de-phasing of the spins. When
the de-phasing time constant is modified to include this effect, it is termed T*2
(pronounced tee two star). This time constant also includes the T2 influences.
When these relaxation processes are included, the equation for the FID signals
becomes:
S(t) = ρ cos(ωot) e−t/T*
2 e−t/T1 (14)
While frequency dependence (i.e., the Larmor equation) is used to achieve
localization, the various relation times as well as proton density are used to
achieve image contrast. Proton density, ρ, for any given collection of spins is a
relatively straightforward measurement: it is proportional to FID signal ampli-
tude as shown in Eq. (14). Measuring the local T1 and T2 (or T*2 ) relaxation
times is more complicated and is done through clever manipulations of the rf
pulse and local magnetic field gradients, as briefly described in the next section.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Data Acquisition: Pulse Sequences
A combination of rf pulses, magnetic gradient pulses, delays, and data acquisi-
tion periods is termed a pulse sequence. One of the clever manipulations used
in many pulse sequences is the spin echo technique, a trick for eliminating the
de-phasing caused by local magnetic field inhomogeneities and related artifacts
(the T*2 decay). One possibility might be to sample immediately after the rf
pulse ends, but this is not practical. The alternative is to sample a realigned
echo. After the spins have begun to spread out, if their direction is suddenly
reversed they will come together again after a known delay. The classic example
is that of a group of runners who are told to reverse direction at the same time,
say one minute after the start. In principal, they all should get back to the start
line at the same time (one minute after reversing) since the fastest runners will
have the farthest to go at the time of reversal. In MRI, the reversal is accom-
plished by a phase-reversing 180 rf pulse. The realignment will occur with the
same time constant, T*2 , as the misalignment. This echo approach will only
cancel the de-phasing due to magnetic inhomogeneities, not the variations due
to the sample itself: i.e., those that produce the T2 relaxation. That is actually
desirable because the sample variations that cause T2 relaxation are often of
interest.
As mentioned above, the Larmor equation (Eq. (11)) is the key to localiza-
tion. If each position in the sample is subjected to a different magnetic field
strength, then the locations are tagged by their resonant frequencies. Two ap-
proaches could be used to identify the signal from a particular region. Use an rf
pulse with only one frequency component, and if each location has a unique
magnetic field strength then only the spins in one region will be excited, those
whose magnetic field correlates with the rf frequency (by the Larmor equation).
Alternatively excite a broader region, then vary the magnetic field strength so
that different regions are given different resonant frequencies. In clinical MRI,
both approaches are used.
Magnetic field strength is varied by the application of gradient fields ap-
plied by electromagnets, so-called gradient coils, in the three dimensions. The
gradient fields provide a linear change in magnetic field strength over a limited
area within the MR imager. The gradient field in the z direction, Gz, can be used
to isolate a specific xy slice in the object, a process known as slice selection.*
In the absence of any other gradients, the application of a linear gradient in the
z direction will mean that only the spins in one xy-plane will have a resonant
frequency that matches a specific rf pulse frequency. Hence, by adjusting the
*Selected slices can be in any plane, x, y, z, or any combination, by appropriate activation of the
gradients during the rf pulse. For simplicity, this discussion assumes the slice is selected by the z-
gradient so spins in an xy-plane are excited.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
gradient, different xy-slices will be associated with (by the Larmor equation),
and excited by, a specific rf frequency. Since the rf pulse is of finite duration it
cannot consist of a single frequency, but rather has a range of frequencies, i.e.,
a finite bandwidth. The thickness of the slice, that is, the region in the z-direc-
tion over which the spins are excited, will depend on the steepness of the gradi-
ent field and the bandwidth of the rf pulse:
∆z ϰ γGz z(∆ω) (15)
Very thin slices, ∆z, would require a very narrowband pulse, ∆ω, in com-
bination with a steep gradient field, Gz.
If all three gradients, Gx, Gy, and Gz, were activated prior to the rf pulse
then only the spins in one unique volume would be excited. However, only one
data point would be acquired for each pulse repetition, and to acquire a large
volume would be quite time-consuming. Other strategies allow the acquisition
of entire lines, planes, or even volumes with one pulse excitation. One popular
pulse sequence, the spin-echo pulse sequence, acquires one line of data in the
spatial frequency domain. The sequence begins with a shaped rf pulse in con-
junction with a Gz pulse that provides slice selection (Figure 13.13). The Gz
includes a reversal at the end to cancel a z-dependent phase shift. Next, a y-
gradient pulse of a given amplitude is used to phase encode the data. This is
followed by a second rf/Gz combination to produce the echo. As the echo re-
groups the spins, an x-gradient pulse frequency encodes the signal. The re-
formed signal constitutes one line in the ferquency domain (termed k-space in
MRI), and is sampled over this period. Since the echo signal duration is several
hundred microseconds, high-speed data acquisition is necessary to sample up to
256 points during this signal period.
As with slice thickness, the ultimate pixel size will depend on the strength
of the magnetic gradients. Pixel size is directly related to the number of pixels
in the reconstructed image and the actual size of the imaged area, the so-called
field-of-view (FOV). Most modern imagers are capable of a 2 cm FOV with
samples up to 256 by 256 pixels, giving a pixel size of 0.078 mm. In practice,
image resolution is usually limited by signal-to-noise considerations since, as
pixel area decreases, the number of spins available to generate a signal dimin-
ishes proportionately. In some circumstances special receiver coils can be used
to increase the signal-to-noise ratio and improve image quality and/or resolu-
tion. Figure 13.14A shows an image of the Shepp-Logan phantom and the same
image acquired with different levels of detector noise.* As with other forms of
signal processing, MR image noise can be improved by averaging. Figure
*The Shepp-Logan phantom was developed to demonstrate the difficulty of identifying a tumor in
a medical image.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.13 The spin-echo pulse sequence. Events are timed with respect to
the initial rf pulse. See text for explanation.
13.14D shows the noise reduction resulting from averaging four of the images
taken under the same noise conditions as Figure 13.14C. Unfortunately, this
strategy increases scan time in direct proportion to the number of images aver-
aged.
Functional Magnetic Resonance Imaging
Image processing for MR images is generally the same as that used on other
images. In fact, MR images have been used in a number of examples and prob-
lems in previous chapters. One application of MRI does have some unique im-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.14 (A) MRI reconstruction of a Shepp-Logan phantom. (B) and (C)
Reconstruction of the phantom with detector noise added to the frequency do-
main signal. (D) Frequency domain average of four images taken with noise simi-
lar to C. Improvement in the image is apparent. (Original image from the MATLAB
Image Processing Toolbox. Copyright 1993–2003, The Math Works, Inc. Re-
printed with permission.)
age processing requirements: the area of functional magnetic resonance imaging
(fMRI). In this approach, neural areas that are active in specific tasks are identi-
fied by increases in local blood flow. MRI can detect cerebral blood changes
using an approach known as BOLD: blood oxygenation level dependent. Special
pulse sequences have been developed that can acquire images very quickly, and
these images are sensitive to the BOLD phenomenon. However, the effect is
very small: changes in signal level are only a few percent.
During a typical fMRI experiment, the subject is given a task which is
either physical (such a finger tapping), purely sensory (such as a flashing visual
stimulus), purely mental (such as performing mathematical calculations), or in-
volves sensorimotor activity (such as pushing a button whenever a given image
appears). In single-task protocols, the task alternates with non-task or baseline
activity period. Task periods are usually 20–30 seconds long, but can be shorter
and can even be single events under certain protocols. Multiple task protocols
are possible and increasingly popular. During each task a number of MR images
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
are acquired. The primary role of the analysis software is to identify pixels that
have some relationship to the task/non-task activity.
There are a number of software packages available that perform fMRI
analysis, some written in MATLAB such as SPM, (statistical parametric map-
ping), others in c-language such as AFNI (analysis of neural images). Some
packages can be obtained at no charge off the Web. In addition to identifying
the active pixels, these packages perform various preprocessing functions such
as aligning the sequential images and reshaping the images to conform to stan-
dard models of the brain.
Following preprocessing, there are a number of different approaches to
identifying regions where local blood flow correlates with the task/non-task
timing. One approach is simply to use correlation, that is correlate the change
in signal level, on a pixel-by-pixel basis, with a task-related function. This func-
tion could represent the task by a one and the non-task by a zero, producing a
square wave-like function. More complicated task functions account for the dy-
namics of the BOLD process which has a 4 to 6 second time constant. Finally,
some new approaches based on independent component analysis (ICA, Chapter
9) can be used to extract the task function from the data itself. The use of
correlation and ICA analysis is explored in the MATLAB Implementation sec-
tion and in the problems. Other univariate statistical techniques are common
such as t-tests and f-tests, particularly in the multi-task protocols (Friston, 2002).
MATLAB Implementation
Techniques for fMRI analysis can be implemented using standard MATLAB
routines. The identification of active pixels using correlation with a task protocol
function will be presented in Example 13.4. Several files have been created on
the disk that simulate regions of activity in the brain. The variations in pixel
intensity are small, and noise and other artifacts have been added to the image
data, as would be the case with real data. The analysis presented here is done
on each pixel independently. In most fMRI analyses, the identification proce-
dure might require activity in a number of adjoining pixels for identification.
Lowpass filtering can also be used to smooth the image.
Example 13.4 Use correlation to identify potentially active areas from
MRI images of the brain. In this experiment, 24 frames were taken (typical
fMRI experiments would contain at least twice that number): the first 6 frames
were acquired during baseline activity and the next 6 during the task. This off-
on cycle was then repeated for the next 12 frames. Load the image in MATLAB
file fmril, which contains all 24 frames. Generate a function that represents the
off-on task protocol and correlate this function with each pixel’s variation over
the 24 frames. Identify pixels that have correlation above a given threshold and
mark the image where these pixels occur. (Usually this would be done in color
with higher correlations given brighter color.) Finally display the time sequence
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
of one of the active pixels. (Most fMRI analysis packages can display the time
variation of pixels or regions, usually selected interactively.)
% Example 13.4 Example of identification of active area
% using correlation.
% Load the 24 frames of the image stored in fmri1.mat.
% Construct a stimulus profile.
% In this fMRI experiment the first 6 frames were taken during
% no-task conditions, the next six frames during the task
% condition, and this cycle was repeated.
% Correlate each pixel’s variation over the 24 frames with the
% task profile. Pixels that correlate above a certain threshold
% (use 0.5) should be identified in the image by a pixel
% whose intensity is the same as the correlation values
%
clear all; close all
thresh = .5; % Correlation threshold
load fmri1; % Get data
i_stim2 = ones(24,1); % Construct task profile
i_stim2(1:6) = 0; % First 6 frames are no-task
i_stim2(13:18) = 0; % Frames 13 through 18
% are also no-task
%
% Do correlation: pixel by pixel over the 24 frames
I_fmri_marked = I_fmri;
active = [0 0];
for i = 1:128
for j = 1:128
for k = 1:24
temp(k) = I_fmri(i,j,1,k);
end
cor_temp = corrcoef([temp’i_stim2]);
corr(i,j) = cor_temp(2,1); % Get correlation value
if corr(i,j) > thresh
I_fmri_marked(i,j,:,1) = I_fmri(i,j,:,1) ؉ corr(i,j);
active = [active; i,j]; % Save supra-threshold
% locations
end
end
end
%
% Display marked image
imshow(I_fmri_marked(:,:,:,1)); title(‘fMRI Image’);
figure;
% Display one of the active areas
for i = 1:24 % Plot one of the active areas
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
active_neuron(i) = I_fmri(active(2,1),active(2,2),:,i);
end
plot(active_neuron); title(‘Active neuron’);
The marked image produced by this program is shown in Figure 13.15.
The actual active area is the rectangular area on the right side of the image
slightly above the centerline. However, a number of other error pixels are pres-
ent due to noise that happens to have a sufficiently high correlation with the
task profile (a correlation of 0.5 in this case). In Figure 13.16, the correlation
threshold has been increased to 0.7 and most of the error pixels have been
FIGURE 13.15 White pixels were identified as active based on correlation with
the task profile. The actual active area is the rectangle on the right side slightly
above the center line. Due to inherent noise, false pixels are also identified, some
even outside of the brain. The correlation threshold was set a 0.5 for this image.
(Original image from the MATLAB Image Processing Toolbox. Copyright 1993–
2003, The Math Works, Inc. Reprinted with permission.)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.16 The same image as in Figure 13.15 with a higher correlation
threshold (0.7). Fewer errors are seen, but the active area is only partially identi-
fied.
eliminated, but now the active region is only partially identified. An intermedi-
ate threshold might result in a better compromise, and this is explored in one of
the problems.
Functional MRI software packages allow isolation of specific regions of
interest (ROI), usually though interactive graphics. Pixel values in these regions
of interest can be plotted over time and subsequent processing can be done on
the isolated region. Figure 13.17 shows the variation over time (actually, over
the number of frames) of one of the active pixels. Note the very approximate
correlation with the square wave-like task profile also shown. The poor correla-
tion is due to noise and other artifacts, and is fairly typical of fMRI data. Identi-
fying the very small signal within the background noise is the one of the major
challenges for fMRI image processing algorithms.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.17 Variation in intensity of a single pixel within the active area of
Figures 13.15 and 13.16. A correlation with the task profile is seen, but consider-
able noise is also present.
Principal Component and Independent Component Analysis
In the above analysis, active pixels were identified by correlation with the task
profile. However, the neuronal response would not be expected to follow the
task temporal pattern exactly because of the dynamics of the blood flow re-
sponse (i.e., blood hemodynamics) which requires around 4 to 6 seconds to
reach its peak. In addition, there may be other processes at work that systemati-
cally affect either neural activity or pixel intensity. For example, respiration can
alter pixel intensity in a consistent manner. Identifying the actual dynamics of
the fMRI process and any consistent artifacts might be possible by a direct
analysis of the data. One approach would be to search for components related
to blood flow dynamics or artifacts using either principal component analysis
(PCA) or independent component analysis (ICA).
Regions of interest are first identified using either standard correlation or
other statistical methods so that the new tools need not be applied to the entire
image. Then the isolated data from each frame is re-formatted so that it is one-
dimensional by stringing the image rows, or columns, together. The data from
each frame are now arranged as a single vector. ICA or PCA is applied to the
transposed ensemble of frame vectors so that each pixel is treated as a different
source and each frame is an observation of that source. If there are pixels whose
intensity varies in a non-random manner, this should produce one or more com-
ponents in the analyses. The component that is most like the task profile can
then be used as a more accurate estimate of blood flow hemodynamics in the
correlation analysis: the isolated component is used for the comparison instead
of the task profile. An example of this approach is given in Example 13.5.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Example 13.5 Select a region of interest from the data of Figure 13.16,
specifically an area that surrounds and includes the potentially active pixels.
Normally this area would be selected interactively by an operator. Reformat the
images so that each frame is a single row vector and constitutes one row of an
ensemble composed of the different frames. Perform both an ICA and PCA
analysis and plot the resulting components.
% Example 13.5 and Figure 13.18 and 13.19
% Example of the use of PCA and ICA to identify signal
% and artifact components in a region of interest
% containing some active neurons.
% Load the region of interest then re-format to a images so that
% each of the 24 frames is a row then transpose this ensemble
% so that the rows are pixels and the columns are frames.
% Apply PCA and ICA analysis. Plot the first four principal
% components and the first two independent components.
%
close all; clear all;
nu_comp = 2;
% Number of independent components
load roi2; % Get ROI data
% Find number of frames %
[r c dummy frames] = size(ROI);
% Convert each image frame to a column and construct an
% ensemble were each column is a different frame
%
for i = 1:frames
for j = 1:r
row = ROI(j,:,:,i); % Convert frame to a row
if j == 1
temp = row;
else
temp = [temp row];
end
end
if i == 1
data = temp; % Concatenate rows
else
data = [data;temp];
end
end
%
% Now apply PCA analysis
[U,S,pc]= svd(data’,0); % Use singular value decomposition
eigen = diag(S).v
2;
for i = 1:length(eigen)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.18 First four components from a principal component analysis ap-
plied to a region of interest in Figure 13.15 that includes the active area. A func-
tion similar to the task is seen in the second component. The third component
also has a possible repetitive structure that could be related to respiration.
pc(:,i) = pc(:,i) * sqrt(eigen(i));
end
%
% Determine the independent components
w = jadeR(data’,nu_comp);
ica = (w* data’);
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.19 Two components found by independent component analysis. The
task-related function and the respiration artifact are now clearly identified.
%
.......Display components.......
The principal components produced by this analysis are shown in Figure
13.18. A waveform similar to the task profile is seen in the second plot down.
Since this waveform derived from the data, it should more closely represent the
actual blood flow hemodynamics. The third waveform shows a regular pattern,
possibly due to respiration artifact. The other two components may also contain
some of that artifact, but do not show any other obvious pattern.
The two patterns in the data are better separated by ICA. Figure 13.19
shows the first two independent components and both the blood flow hemody-
namics and the artifact are clearly shown. The former can be used instead of
the task profile in the correlation analysis. The results of using the profile ob-
tained through ICA are shown in Figure 13.20A and B. Both activity maps were
obtained from the same data using the same correlation threshold. In Figure
13.20A, the task profile function was used, while in Figure 13.20B the hemody-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 13.20A Activity map obtained by correlating pixels with the square-wave
task function. The correlation threshold was 0.55. (Original image from the
MATLAB Image Processing Toolbox. Copyright 1993–2003, The Math Works,
Inc. Reprinted with permission.)
FIGURE 13.20B Activity map obtained by correlating pixels with the estimated
hemodynamic profile obtained from ICA. The correlation threshold was 0.55.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
namic profile (the function in the lower plot of Figure 13.19) was used in the
correlation. The improvement in identification is apparent. When the task func-
tion is used, very few of the areas actually active are identified and a number
of error pixels are identified. Figure 13.20B contains about the same number of
errors, but all of the active areas are identified. Of course, the number of active
areas identified using the task profile could be improved by lowering the thresh-
old of correlation, but this would also increase the errors.
PROBLEMS
1. Load slice 13 of the MR image used in Example 13.3 (mri.tif). Construct
parallel beam projections of this image using the Radon transform with two
different angular spacings between rotations: 5 deg. and 10 deg. In addition,
reduce spacing of the 5 deg. data by a factor of two. Reconstruct the three
images (5 deg. unreduced, 5 deg. reduced, and 10 deg.) and display along with
the original image. Multiply the images by a factor of 10 to enhance any varia-
tions in the background.
2. The data file data_prob_13_2 contains projections of the test pattern im-
age, testpat1.png with noise added. Reconstruct the image using the inverse
Radon transform with two filter options: the Ram-Lak filter (the default), and
the Hamming filter with a maximum frequency of 0.5.
3. Load the image squares.tif. Use fanbeam to construct fan beam projec-
tions and ifanbeam to produce the reconstructed image. Repeat for two different
beam distances: 100 and 300 (pixels). Plot the reconstructed images. Use a
FanSensorSpacing of 1.
4. The rf-pulse used in MRI is a shaped pulse consisting of a sinusoid at the
base frequency that is amplitude modulated by some pulse shaping waveform.
The sinc waveform (sin(x)/x) is commonly used. Construct a shaped pulse con-
sisting of cos(ω2) modulated by sinc(ω2). Pulse duration should be such that ω2
ranges between ±π: −2π ≤ ω2 ≤ 2π. The sinusoidal frequency, ω1, should be 10
ω2. Use the inverse Fourier transform to plot the magnitude frequency spectrum
of this slice selection pulse. (Note: the MATLAB sinc function is normalized
to π, so the range of the vector input to this function should be ±2. In this case,
the cos function will need to multiplied by 2π, as well as by 10.)
5. Load the 24 frames of image fmri3.mat. This contains the 4-D variable,
I_fmri, which has 24 frames. Construct a stimulus profile. Assume the same
task profile as in Example 13.4: the first 6 frames were taken during no-task
conditions, the next six frames during the task condition, then the cycle was
repeated. Rearrange Example 13.4 so that the correlations coefficients are com-
puted first, then the thresholds are applied (so each new threshold value does not
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
require another calculation of correlation coefficients). Search for the optimal
threshold. Note these images contain more noise than those used in Example
13.4, so even the best thresholded will contain error pixels.
6. Example of identification of active area using correlation. Repeat Problem
6 except filter the matrix containing the pixel correlations before applying the
threshold. Use a 4 by 4 averaging filter. (fspecial can be helpful here.)
7. Example of using principal component analysis and independent component
analysis to identify signal and artifact. Load the region of interest file roi4.mat
which contains variable ROI. This variable contains 24 frames of a small region
around the active area of fmri3.mat. Reformat to a matrix as in Example 13.5
and apply PCA and ICA analysis. Plot the first four principal components and
the first two independent components. Note the very slow time constant of the
blood flow hemodynamics.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
2
Basic Concepts
NOISE
In Chapter 1 we observed that noise is an inherent component of most measure-
ments. In addition to physiological and environmental noise, electronic noise
arises from the transducer and associated electronics and is intermixed with the
signal being measured. Noise is usually represented as a random variable, x(n).
Since the variable is random, describing it as a function of time is not very
useful. It is more common to discuss other properties of noise such as its proba-
bility distribution, range of variability, or frequency characteristics. While noise
can take on a variety of different probability distributions, the Central Limit
Theorem implies that most noise will have a Gaussian or normal distribution*.
The Central Limit Theorem states that when noise is generated by a large num-
ber of independent sources it will have a Gaussian probability distribution re-
gardless of the probability distribution characteristics of the individual sources.
Figure 2.1A shows the distribution of 20,000 uniformly distributed random
numbers between −1 and +1. The distribution is approximately flat between the
limits of ±1 as expected. When the data set consists of 20,000 numbers, each
of which is the average of two uniformly distributed random numbers, the distri-
bution is much closer to Gaussian (Figure 2.1B, upper right). The distribution
*Both terms are used and reader should be familiar with both. We favor the term “Gaussian” to
avoid the value judgement implied by the word “normal.”
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.1 (A) The distribution of 20,000 uniformly distributed random numbers.
(B) The distribution of 20,000 numbers, each of which is the average of two uni-
formly distributed random numbers. (C) and (D) The distribution obtained when
3 and 8 random numbers, still uniformly distributed, are averaged together. Al-
though the underlying distribution is uniform, the averages of these uniformly dis-
tributed numbers tend toward a Gaussian distribution (dotted line). This is an
example of the Central Limit Theorem at work.
constructed from 20,000 numbers that are averages of only 8 random numbers
appears close to Gaussian, Figure 2.1D, even though the numbers being aver-
aged have a uniform distribution.
The probability of a Gaussianly distributed variable, x, is specified in the
well-known normal or Gaussian distribution equation:
p(x) =
1
σ√2π
e−x2/2σ2
(1)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Two important properties of a random variable are its mean, or average
value, and its variance, the term σ2
in Eq. (1). The arithmetic quantities of
mean and variance are frequently used in signal processing algorithms, and their
computation is well-suited to discrete data.
The mean value of a discrete array of N samples is evaluated as:
x¯ =
1
N
∑
N
k=1
xk (2)
Note that the summation in Eq. (2) is made between 1 and N as opposed
to 0 and N − 1. This protocol will commonly be used throughout the text to be
compatible with MATLAB notation where the first element in an array has an
index of 1, not 0.
Frequently, the mean will be subtracted from the data sample to provide
data with zero mean value. This operation is particularly easy in MATLAB as
described in the next section. The sample variance, σ2
, is calculated as shown in
Eq. (3) below, and the standard deviation, σ, is just the square root of the variance.
σ2
=
1
N − 1
∑
N
k=1
(xk − x¯)2
(3)
Normalizing the standard deviation or variance by 1/N − 1 as in Eq. (3)
produces the best estimate of the variance, if x is a sample from a Gaussian
distribution. Alternatively, normalizing the variance by 1/N produces the second
moment of the data around x. Note that this is the equivalent of the RMS value
of the data if the data have zero as the mean.
When multiple measurements are made, multiple random variables can be
generated. If these variables are combined or added together, the means add so
that the resultant random variable is simply the mean, or average, of the individ-
ual means. The same is true for the variance—the variances add and the average
variance is the mean of the individual variances:
σ2
=
1
N
∑
N
k=1
σ2
k (4)
However, the standard deviation is the square root of the variance and the
standard deviations add as the √N times the average standard deviation [Eq.
(5)]. Accordingly, the mean standard deviation is the average of the individual
standard deviations divided by √N [Eq. (6)].
From Eq. (4):
∑
N
k=1
σ2
k, hence: ∑
N
k=1
σk = √N σ2
= √N σ (5)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Mean Standard Deviation =
1
N
∑
N
k=1
σk =
1
N
√N σ =
σ
√N
(6)
In other words, averaging noise from different sensors, or multiple obser-
vations from the same source, will reduce the standard deviation of the noise
by the square root of the number of averages.
In addition to a mean and standard deviation, noise also has a spectral
characteristic—that is, its energy distribution may vary with frequency. As shown
below, the frequency characteristics of the noise are related to how well one
instantaneous value of noise correlates with the adjacent instantaneous values:
for digitized data how much one data point is correlated with its neighbors. If
the noise has so much randomness that each point is independent of its neigh-
bors, then it has a flat spectral characteristic and vice versa. Such noise is called
white noise since it, like white light, contains equal energy at all frequencies
(see Figure 1.5). The section on Noise Sources in Chapter 1 mentioned that
most electronic sources produce noise that is essentially white up to many mega-
hertz. When white noise is filtered, it becomes bandlimited and is referred to as
colored noise since, like colored light, it only contains energy at certain frequen-
cies. Colored noise shows some correlation between adjacent points, and this
correlation becomes stronger as the bandwidth decreases and the noise becomes
more monochromatic. The relationship between bandwidth and correlation of adja-
cent points is explored in the section on autocorrelation.
ENSEMBLE AVERAGING
Eq. (6) indicates that averaging can be a simple, yet powerful signal processing
technique for reducing noise when multiple observations of the signal are possi-
ble. Such multiple observations could come from multiple sensors, but in many
biomedical applications, the multiple observations come from repeated responses
to the same stimulus. In ensemble averaging, a group, or ensemble, of time re-
sponses are averaged together on a point-by-point basis; that is, an average
signal is constructed by taking the average, for each point in time, over all
signals in the ensemble (Figure 2.2). A classic biomedical engineering example
of the application of ensemble averaging is the visual evoked response (VER)
in which a visual stimulus produces a small neural signal embedded in the EEG.
Usually this signal cannot be detected in the EEG signal, but by averaging
hundreds of observations of the EEG, time-locked to the visual stimulus, the
visually evoked signal emerges.
There are two essential requirements for the application of ensemble aver-
aging for noise reduction: the ability to obtain multiple observations, and a
reference signal closely time-linked to the response. The reference signal shows
how the multiple observations are to be aligned for averaging. Usually a time
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.2 Upper traces: An ensemble of individual (vergence) eye movement
responses to a step change in stimulus. Lower trace: The ensemble average, dis-
placed downward for clarity. The ensemble average is constructed by averaging the
individual responses at each point in time. Hence, the value of the average re-
sponse at time T1 (vertical line) is the average of the individual responses at that
time.
signal linked to the stimulus is used. An example of ensemble averaging is
shown in Figure 2.2, and the code used to produce this figure is presented in
the following MATLAB implementation section.
MATLAB IMPLEMENTATION
In MATLAB the mean, variance, and standard deviations are implemented as
shown in the three code lines below.
xm = mean(x); % Evaluate mean of x
xvar = var(x) % Evaluate the variance of x normalizing by
% N-1
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
xnorm = var(x,1); % Evaluate the variance of x
xstd = std(x); % Evaluate the standard deviation of x,
If x is an array (also termed a vector for reasons given later) the output
of these function calls is a scalar representing the mean, variance, or standard
deviation. If x is a matrix then the output is a row vector resulting from applying
the appropriate calculation (mean, variance, or standard deviation) to each col-
umn of the matrix.
Example 2.1 below shows the implementation of ensemble averaging that
produced the data in Figure 2.2. The program first loads the eye movement data
(load verg1), then plots the ensemble. The ensemble average is determined
using the MATLAB mean routine. Note that the data matrix, data_out, must
be in the correct orientation (the responses must be in rows) for routine mean.
If that were not the case (as in Problem 1 at the end of this chapter), the matrix
transposition operation should be performed*. The ensemble average, avg, is
then plotted displaced by 3 degrees to provide a clear view. Otherwise it would
overlay the data.
Example 2.1 Compute and display the Ensemble average of an ensemble
of vergence eye movement responses to a step change in stimulus. These re-
sponses are stored in MATLAB file verg1.mat.
% Example 2.1 and Figure 2.2 Load eye movement data, plot
% the data then generate and plot the ensemble average.
%
close all; clear all;
load verg1; % Get eye movement data;
Ts = .005; % Sample interval = 5 msec
[nu,N] = size(data_out); % Get data length (N)
t = (1:N)*Ts; % Generate time vector
%
% Plot ensemble data superimposed
plot(t,data_out,‘k’);
hold on;
%
% Construct and plot the ensemble average
avg = mean(data_out); % Calculate ensemble average
plot(t,avg-3,‘k’); % and plot, separated from
% the other data
xlabel(‘Time (sec)’); % Label axes
ylabel(‘Eye Position’);
*In MATLAB, matrix or vector transposition is indicated by an apostrophe following the variable.
For example if x is a row vector, x′ is a column vector and visa versa. If X is a matrix, X′ is that
matrix with rows and columns switched.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
plot([.43 .43],[0 5],’-k’); % Plot horizontal line
text(1,1.2,‘Averaged Data’); % Label data average
DATA FUNCTIONS AND TRANSFORMS
To mathematicians, the term function can take on a wide range of meanings. In
signal processing, most functions fall into two categories: waveforms, images,
or other data; and entities that operate on waveforms, images, or other data
(Hubbard, 1998). The latter group can be further divided into functions that
modify the data, and functions used to analyze or probe the data. For example,
the basic filters described in Chapter 4 use functions (the filter coefficients) that
modify the spectral content of a waveform while the Fourier Transform detailed
in Chapter 3 uses functions (harmonically related sinusoids) to analyze the spec-
tral content of a waveform. Functions that modify data are also termed opera-
tions or transformations.
Since most signal processing operations are implemented using digital
electronics, functions are represented in discrete form as a sequence of numbers:
x(n) = [x(1),x(2),x(3), . . . ,x(N)] (5)
Discrete data functions (waveforms or images) are usually obtained through
analog-to-digital conversion or other data input, while analysis or modifying
functions are generated within the computer or are part of the computer pro-
gram. (The consequences of converting a continuous time function into a dis-
crete representation are described in the section below on sampling theory.)
In some applications, it is advantageous to think of a function (of whatever
type) not just as a sequence, or array, of numbers, but as a vector. In this conceptu-
alization, x(n) is a single vector defined by a single point, the endpoint of the
vector, in N-dimensional space, Figure 2.3. This somewhat curious and highly
mathematical concept has the advantage of unifying some signal processing
operations and fits well with matrix methods. It is difficult for most people to
imagine higher-dimensional spaces and even harder to present them graphically,
so operations and functions in higher-dimensional space are usually described
in 2 or 3 dimensions, and the extension to higher dimensional space is left to
the imagination of the reader. (This task can sometimes be difficult for non-
mathematicians: try and imagine a data sequence of even a 32-point array repre-
sented as a single vector in 32-dimensional space!)
A transform can be thought of as a re-mapping of the original data into a
function that provides more information than the original.* The Fourier Trans-
form described in Chapter 3 is a classic example as it converts the original time
*Some definitions would be more restrictive and require that a transform be bilateral; that is, it
must be possible to recover the original signal from the transformed data. We will use the looser
definition and reserve the term bilateral transform to describe reversible transformations.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.3 The data sequence x(n) = [1.5,2.5,2] represented as a vector in
three-dimensional space.
data into frequency information which often provides greater insight into the
nature and/or origin of the signal. Many of the transforms described in this text
are achieved by comparing the signal of interest with some sort of probing
function. This comparison takes the form of a correlation (produced by multipli-
cation) that is averaged (or integrated) over the duration of the waveform, or
some portion of the waveform:
X(m) = ∫
∞
−∞
x(t) fm(t) dt (7)
where x(t) is the waveform being analyzed, fm(t) is the probing function and m
is some variable of the probing function, often specifying a particular member
in a family of similar functions. For example, in the Fourier Transform fm(t) is
a family of harmonically related sinusoids and m specifies the frequency of an
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
individual sinusoid in that family (e.g., sin(mft)). A family of probing functions
is also termed a basis. For discrete functions, a probing function consists of a
sequence of values, or vector, and the integral becomes summation over a finite
range:
X(m) = ∑
N
n=1
x(n)fm(n) (8)
where x(n) is the discrete waveform and fm(n) is a discrete version of the family
of probing functions. This equation assumes the probe and waveform functions
are the same length. Other possibilities are explored below.
When either x(t) or fm(t) are of infinite length, they must be truncated in
some fashion to fit within the confines of limited memory storage. In addition,
if the length of the probing function, fm(n), is shorter than the waveform, x(n),
then x(n) must be shortened in some way. The length of either function can be
shortened by simple truncation or by multiplying the function by yet another
function that has zero value beyond the desired length. A function used to
shorten another function is termed a window function, and its action is shown
in Figure 2.4. Note that simple truncation can be viewed as multiplying the
function by a rectangular window, a function whose value is one for the portion
of the function that is retained, and zero elsewhere. The consequences of this
artificial shortening will depend on the specific window function used. Conse-
quences of data windowing are discussed in Chapter 3 under the heading Win-
dow Functions. If a window function is used, Eq. (8) becomes:
X(m) = ∑
N
n=1
x(n) fm(n) W(n) (9)
where W(n) is the window function. In the Fourier Transform, the length of
W(n) is usually set to be the same as the available length of the waveform, x(n),
but in other applications it can be shorter than the waveform. If W(n) is a rectan-
gular function, then W(n) =1 over the length of the summation (1 ≤ n ≤ N), and
it is usually omitted from the equation. The rectangular window is implemented
implicitly by the summation limits.
If the probing function is of finite length (in mathematical terms such a
function is said to have finite support) and this length is shorter than the wave-
form, then it might be appropriate to translate or slide it over the signal and
perform the comparison (correlation, or multiplication) at various relative posi-
tions between the waveform and probing function. In the example shown in
Figure 2.5, a single probing function is shown (representing a single family
member), and a single output function is produced. In general, the output would
be a family of functions, or a two-variable function, where one variable corre-
sponds to the relative position between the two functions and the other to the
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.4 A waveform (upper plot) is multiplied by a window function (middle
plot) to create a truncated version (lower plot) of the original waveform. The win-
dow function is shown in the middle plot. This particular window function is called
the Kaiser Window, one of many popular window functions.
specific family member. This sliding comparison is similar to convolution de-
scribed in the next section, and is given in discrete form by the equation:
X(m,k) = ∑
N
n=1
x(n) fm(n − k) (10)
where the variable k indicates the relative position between the two functions
and m is the family member as in the above equations. This approach will be
used in the filters described in Chapter 4 and in the Continuous Wavelet Trans-
form described in Chapter 7. A variation of this approach can be used for
long—or even infinite—probing functions, provided the probing function itself
is shortened by windowing to a length that is less than the waveform. Then the
shortened probing function can be translated across the waveform in the same
manner as a probing function that is naturally short. The equation for this condi-
tion becomes:
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.5 The probing function slides over the waveform of interest (upper
panel) and at each position generates the summed, or averaged, product of the
two functions (lower panel), as in Eq. (10). In this example, the probing function
is one member of the “Mexican Hat” family (see Chapter 7) and the waveform is
a sinusoid that increases its frequency linearly over time (known as a chirp.) The
summed product (lower panel), also known as the scalar product, shows the rela-
tive correlation between the waveform and the probing function as it slides across
the waveform. Note that this relative correlation varies sinusoidally as the phase
between the two functions varies, but reaches a maximum around 2.5 sec, the
time when the waveform is most like the probing function.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
X(m,k) = ∑
N
n=1
x(n) [W(n − k) fm(n)] (11)
where fm(n) is a longer function that is shortened by the sliding window function,
(W(n − k), and the variables m and k have the same meaning as in Eq. (10).
This is the approach taken in the Short-Term Fourier Transform described in
Chapter 6.
All of the discrete equations above, Eqs. (7) to (11), have one thing in
common: they all feature the multiplication of two (or sometimes three) func-
tions and the summation of the product over some finite interval. Returning to
the vector conceptualization for data sequences mentioned above (see Figure
2.3), this multiplication and summation is the same as scalar product of the two
vectors.*
The scalar product is defined as:
Scalar product of a & b ≡ 〈a,b〉 =
ͫa1
a2
Ӈ
an
ͬͫb1
b2
Ӈ
bn
ͬ= a1b1 + a2b2 + . . . + anbn (12)
Note that the scalar product results in a single number (i.e., a scalar), not
a vector. The scalar product can also be defined in terms of the magnitude of
the two vectors and the angle between them:
Scalar product of a and b ≡ 〈a,b〉 = *a* *b* cos θ (13)
where θ is the angle between the two vectors. If the two vectors are perpendicu-
lar to one another, i.e., they are orthogonal, then θ = 90°, and their salar product
will be zero. Eq. (13) demonstrates that the scalar product between waveform
and probe function is mathematically the same as a projection of the waveform
vector onto the probing function vector (after normalizing by probe vector
length). When the probing function consists of a family of functions, then the
scalar product operations in Eqs. (7)–(11) can be thought of as projecting the
waveform vector onto vectors representing the various family members. In this
vector-based conceptualization, the probing function family, or basis, can be
thought of as the axes of a coordinate system. This is the motivation behind the
development of probing functions that have family members that are orthogonal,
*The scalar product is also termed the inner product, the standard inner product, or the dot
product.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
or orthonormal:† the scalar product computations (or projections) can be done
on each axes (i.e., on each family member) independently of the others.
CONVOLUTION, CORRELATION, AND COVARIANCE
Convolution, correlation, and covariance are similar-sounding terms and are
similar in the way they are calculated. This similarity is somewhat misleading—at
least in the case of convolution—since the areas of application and underlying
concepts are not the same.
Convolution and the Impulse Response
Convolution is an important concept in linear systems theory, solving the need
for a time domain operation equivalent to the Transfer Function. Recall that the
Transfer Function is a frequency domain concept that is used to calculate the
output of a linear system to any input. Convolution can be used to define a
general input–output relationship in the time domain analogous to the Transfer
Function in the frequency domain. Figure 2.6 demonstrates this application of
convolution. The input, x(t), the output, y(t), and the function linking the two
through convolution, h(t), are all functions of time; hence, convolution is a time
domain operation. (Ironically, convolution algorithms are often implemented in
the frequency domain to improve the speed of the calculation.)
The basic concept behind convolution is superposition. The first step is to
determine a time function, h(t), that tells how the system responds to an infi-
nitely short segment of the input waveform. If superposition holds, then the
output can be determined by summing (integrating) all the response contribu-
tions calculated from the short segments. The way in which a linear system
responds to an infinitely short segment of data can be determined simply by
noting the system’s response to an infinitely short input, an infinitely short
pulse. An infinitely short pulse (or one that is at least short compared to the
dynamics of the system) is termed an impulse or delta function (commonly
denoted δ(t)), and the response it produces is termed the impulse response, h(t).
FIGURE 2.6 Convolution as a linear process.
†Orthonormal vectors are orthogonal, but also have unit length.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Given that the impulse response describes the response of the system to an
infinitely short segment of data, and any input can be viewed as an infinite
string of such infinitesimal segments, the impulse response can be used to deter-
mine the output of the system to any input. The response produced by an infi-
nitely small data segment is simply this impulse response scaled by the magni-
tude of that data segment. The contribution of each infinitely small segment can
be summed, or integrated, to find the response created by all the segments.
The convolution process is shown schematically in Figure 2.7. The left
graph shows the input, x(n) (dashed curve), to a linear system having an impulse
response of h(n) (solid line). The right graph of Figure 2.7 shows three partial
responses (solid curves) produced by three different infinitely small data segments
at N1, N2, and N3. Each partial response is an impulse response scaled by the
associated input segment and shifted to the position of that segment. The output
of the linear process (right graph, dashed line) is the summation of the individual
FIGURE 2.7 (A) The input, x(n), to a linear system (dashed line) and the impulse
response of that system, h(n) (solid line). Three points on the input data se-
quence are shown: N1, N2, and N3. (B) The partial contributions from the three
input data points to the output are impulse responses scaled by the value of the
associated input data point (solid line). The overall response of the system, y(n)
(dashed line, scaled to fit on the graph), is obtained by summing the contributions
from all the input points.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
impulse responses produced by each of the input data segments. (The output is
scaled down to produce a readable plot).
Stated mathematically, the output y(t), to any input, x(t) is given by:
y(t) = ∫
+∞
−∞
h(τ) x(t − τ) dτ = ∫
+∞
−∞
h(t − τ) x(τ) dτ (14)
To determine the impulse of each infinitely small data segment, the im-
pulse response is shifted a time τ with respect to the input, then scaled (i.e.,
multiplied) by the magnitude of the input at that point in time. It does not matter
which function, the input or the impulse response, is shifted.* Shifting and mul-
tiplication is sometimes referred to as the lag product. For most systems, h(τ)
is finite, so the limit of integration is finite. Moreover, a real system can only
respond to past inputs, so h(τ) must be 0 for τ < 0 (negative τ implies future times
in Eq. (14), although for computer-based operations, where future data may be
available in memory, τ can be negative.
For discrete signals, the integration becomes a summation and the convo-
lution equation becomes:
y(n) = ∑
N
k=1
h(n − k) x(k) or....
y(n) = ∑
N
k=1
h(n) x(k − n) ≡ h(n) * x(n) (15)
Again either h(n) or x(n) can be shifted. Also for discrete data, both h(n)
and x(n) must be finite (since they are stored in finite memory), so the summa-
tion is also finite (where N is the length of the shorter function, usually h(n)).
In signal processing, convolution can be used to implement some of the
basic filters described in Chapter 4. Like their analog counterparts, digital filters
are just linear processes that modify the input spectra in some desired way (such
as reducing noise). As with all linear processes, the filter’s impulse response,
h(n), completely describes the filter. The process of sampling used in analog-
to-digital conversion can also be viewed in terms of convolution: the sampled
output x(n) is just the convolution of the analog signal, x(t), with a very short
pulse (i.e., an impulse function) that is periodic with the sampling frequency.
Convolution has signal processing implications that extend beyond the determi-
nation of input-output relationships. We will show later that convolution in the
time domain is equivalent to multiplication in the frequency domain, and vice
versa. The former has particular significance to sampling theory as described
latter in this chapter.
*Of course, shifting both would be redundant.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Covariance and Correlation
The word correlation connotes similarity: how one thing is like another. Mathe-
matically, correlations are obtained by multiplying and normalizing. Both covar-
iance and correlation use multiplication to compare the linear relationship be-
tween two variables, but in correlation the coefficients are normalized to fall
between zero and one. This makes the correlation coefficients insensitive to
variations in the gain of the data acquisition process or the scaling of the vari-
ables. However, in many signal processing applications, the variable scales are
similar, and covariance is appropriate. The operations of correlation and covari-
ance can be applied to two or more waveforms, to multiple observations of the
same source, or to multiple segments of the same waveform. These comparisons
between data sequences can also result in a correlation or covariance matrix as
described below.
Correlation/covariance operations can not only be used to compare differ-
ent waveforms at specific points in time, they can also make comparisons over
a range of times by shifting one signal with respect the other. The crosscorrela-
tion function is an example of this process. The correlation function is the lagged
product of two waveforms, and the defining equation, given here in both contin-
uous and discrete form, is quite similar to the convolution equation above (Eqs.
(14) and (15):
rxx(t) = ∫
T
0
y(t) x(t + τ)dτ (16a)
rxx(n) = ∑
M
k=1
y(k + n) x(k) (16b)
Eqs. (16a) and (16b) show that the only difference in the computation of
the crosscorrelation versus convolution is the direction of the shift. In convolu-
tion the waveforms are shifted in opposite directions. This produces a causal
output: the output function is the creation of past values of the input function
(the output is caused by the input). This form of shifting is reflected in the
negative sign in Eq. (15). Crosscorrelation shows the similarity between two
waveforms at all possible relative positions of one waveform with respect to
the other, and it is useful in identifying segments of similarity. The output of
Eq. (16) is sometimes termed the raw correlation since there is no normaliza-
tion involved. Various scalings can be used (such as dividing by N, the number
of in the sum), and these are described in the section on MATLAB implementa-
tion.
A special case of the correlation function occurs when the comparison is
between two waveforms that are one in the same; that is, a function is correlated
with different shifts of itself. This is termed the autocorrelation function and it
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
provides a description of how similar a waveform is to itself at various time
shifts, or time lags. The autocorrelation function will naturally be maximum for
zero lag (n = 0) because at zero lag the comparison is between identical wave-
forms. Usually the autocorrelation is scaled so that the correlation at zero lag is
1. The function must be symmetric about n = 0, since shifting one version of
the same waveform in the negative direction is the same as shifting the other
version in the positive direction.
The autocorrelation function is related to the bandwidth of the waveform.
The sharper the peak of the autocorrelation function the broader the bandwidth.
For example, in white noise, which has infinite bandwidth, adjacent points are
uncorrelated, and the autocorrelation function will be nonzero only for zero lag
(see Problem 2). Figure 2.8 shows the autocorrelation functions of noise that
has been filtered to have two different bandwidths. In statistics, the crosscorrela-
tion and autocorrelation sequences are derived from the expectation operation
applied to infinite data. In signal processing, data lengths are finite, so the expec-
FIGURE 2.8 Autocorrelation functions of a random time series with a narrow
bandwidth (left) and broader bandwidth (right). Note the inverse relationship be-
tween the autocorrelation function and the spectrum: the broader the bandwidth
the narrower the first peak. These figures were generated using the code in Ex-
ample 2.2 below.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
tation operation becomes summation (with or without normalization), and the
crosscorrelation and autocorrelation functions are necessarily estimations.
The crosscovariance function is the same as crosscorrelation function ex-
cept that the means have been removed from the data before calculation. Ac-
cordingly, the equation is a slight modification of Eq. (16b), as shown below:
Cov(n) = ∑
M
k=1
[y(k + n) − y] [x(k) − x] (17)
The terms correlation and covariance, when used alone (i.e., without the
term function), imply operations similar to those described in Eqs. (16) and
(17), but without the lag operation. The result will be a single number. For
example, the covariance between two functions is given by:
Cov = σx,y = ∑
M
k=1
[y(k) − y] [x(k) − x] (18)
Of particular interest is the covariance and correlation matrices. These
analysis tools can be applied to multivariate data where multiple responses, or
observations, are obtained from a single process. A representative example in
biosignals is the EEG where the signal consists of a number of related wave-
forms taken from different positions on the head. The covariance and correla-
tion matrices assume that the multivariate data are arranged in a matrix where
the columns are different variables and the rows are different observations of
those variables. In signal processing, the rows are the waveform time samples,
and the columns are the different signal channels or observations of the signal.
The covariance matrix gives the variance of the columns of the data ma-
trix in the diagonals while the covariance between columns is given by the
off-diagonals:
S =
ͫσ1,1 σ1,2 ؒؒؒ σ1,N
σ2,1 σ2,2 ؒؒؒ σ2,N
Ӈ Ӈ O Ӈ
σN,1 σN,2 ؒؒؒ σN,N
ͬ (19)
An example of the use of the covariance matrix to compare signals is
given in the section on MATLAB implementation.
In its usual signal processing definition, the correlation matrix is a normal-
ized version of the covariance matrix. Specifically, the correlation matrix is
related to the covariance matrix by the equation:
C(i,j) =
C(i,j)
√C(i,i) C(j,j)
(20)
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
The correlation matrix is a set of correlation coefficients between wave-
form observations or channels and has a similar positional relationship as in the
covariance matrix:
Rxx =
ͫrxx(0) rxx(1) ؒؒؒ rxx(L)
rxx(1) rxx(0) ؒؒؒ rxx(L − 1)
Ӈ Ӈ O Ӈ
rxx(L) rxx(L − 1) ؒؒؒ rxx(0)
ͬ (21)
Since the diagonals in the correlation matrix give the correlation of a
given variable or waveform with itself, they will all equal 1 (rxx(0) = 1), and the
off-diagonals will vary between ± 1.
MATLAB Implementation
MATLAB has specific functions for performing convolution, crosscorrelation/
autocorrelation, crossvariance/autocovariance, and construction of the correla-
tion and covariance matrices. To implement convolution in MATLAB, the code
is straightforward using the conv function:
y = conv(x,h)
where x and h are vectors containing the waveforms to be convolved and y is
the output waveform. The length of the output waveform is equal to the length
of x plus the length of h minus 1. This will produce additional data points, and
methods for dealing with these extra points are presented at the end of this
chapter, along with other problems associated with finite data. Frequently, the
additional data points can simply be discarded. An example of the use of this
routine is given in Example 2.2. Although the algorithm performs the process
defined in equation in Eq. (15), it actually operates in the frequency domain to
improve the speed of the operation.
The crosscorrelation and autocorrelation operations are both performed
with the same MATLAB routine, with autocorrelation being treated as a special
case:
[c,lags] = xcorr(x,y,maxlags,‘options’)
Only the first input argument, x, is required. If no y variable is specified,
autocorrelation is performed. The optional argument maxlags specifies the shift-
ing range. The shifted waveform is shifted between ± maxlags, or the default
value which is −N + 1 to N − 1 where N is length of the input vector, x. If a y
vector is specified then crosscorrelation is performed, and the same shifting
range applies. If one of the waveforms the shorter than the other (as is usually
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
the case), it is zero padded (defined and described at the end of this chapter) to
be the same length as the longer segment; hence, N would be the length of
longer waveform. A number of scaling operations can be specified by the argu-
ment options. If options equals biased, the output is divided by 1/N which
gives a biased estimate of the crosscorrelation/autocorrelation function. If op-
tions equals unbiased, the output is scaled by 1/*N − M* where M is the
length of the data output as defined below. Setting options to coeff is used
in autocorrelation and scales the autocorrelation function so that the zero lag
autocorrelation has a value equal to one. Finally options equals none indicates
no scaling, which is the default.
The xcorr function produces an output argument, c, that is a vector of
length 2 maxlags + 1 if maxlags is specified or 2N − 1 if the default range is
used. The optional output argument, lags, is simply a vector containing the lag
values (i.e., a vector of integers ranging between ±maxlags and is useful in
plotting.
Autocovariance or crosscovariance is obtained using the xcov function:
[c,lags] = xcov(x,y,maxlags,‘options’)
The arguments are identical to those described above for the xcorr func-
tion.
Correlation or covariance matrices are calculated using the corrcoef or
cov functions respectively. Again, the calls are similar for both functions:
Rxx = corrcoef(x)
S = cov(x), or S = cov(x,1);
Without the additional 1 in the calling argument, cov normalizes by N −
1, which provides the best unbiased estimate of the covariance matrix if the
observations are from a Gaussian distribution. When the second argument is
present, cov normalizes by N which produces the second moment of the obser-
vations about their mean.
Example 2.2 shows the use of both the convolution and autocorrelation
functions. The program produces autocorrelation functions of noise bandlimited
at two different frequencies. To generate the bandlimited (i.e., colored) noise
used for the autocorrelation, an impulse response function is generated in the
form of sin(x)/x (i.e., the sinc function). We will see in Chapter 4 that this is
the impulse response of one type of lowpass filter. Convolution of this impulse
response with a white noise sequence is used to generate bandlimited noise. A
vector containing Gaussian white noise is produced using the randn routine and
the lowpass filter is implemented by convolving the noise with the filter’s im-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
pulse response. The result is noise bandlimited by the cutoff frequency of the
filter. The output of the filter is then processed by the autocorrelation routine to
produce the autocorrelation curves shown in Figure 2.8 above. The two figures
were obtained for bandlimited noise having bandwidths of π/20 rad/sec and π/8
rad/sec. The variable wc specifies the cutoff frequency of the lowpass filter in
the code below. The theory and implementation of a lowpass filter such as used
below are presented in Chapter 4.
Example 2.2 Generate bandlimited noise and compute and plot the auto-
correlation function for two different bandwidths.
% Example 2.2 and Figure 2.8
% Generate colored noise having two different bandwidths
% and evaluate using autocorrelation.
%
close all; clear all;
N = 1024; % Size of arrays
L = 100; % FIR filter length
w = pi/20; % Lowpass filter cutoff frequency
noise = randn(N,1); % Generate noise
%
% Compute the impulse response of a lowpass filter
% This type of filter is covered in Chapter 4
%
wn = pi*[1/20 1/8]; % Use cutoff frequencies of /20 and
% /8
for k = 1:2 % Repeat for two different cutoff
% frequencies
wc = wn(k); % Assigning filter cutoff frequency
for i = 1:L؉1 % Generate sin(x)/x function
n = i-L/2; % and make symmetrical
if n = = 0
hn(i) = wc/pi;
else
hn(i) = (sin(wc*(n)))/(pi*n); % Filter impulse response
end
end
out = conv(hn,noise); % Filter
[cor, lags] = xcorr(out,‘coeff); % Calculate autocorrela-
% tion, normalized
% Plot the autocorrelation functions
subplot (1,2,k);
plot(lags(1,:),cor(:,1),‘k’); % Plot using ‘lags’ vector
axis([-50 50 -.5 1.1]); % Define axes scale
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
ylabel(‘Rxx’); % Labels
xlabel(‘Lags(n)’);
title([‘Bandwidth =
‘num2str(wc)]);
end
Example 2.3 evaluates the covariance and correlation of sinusoids that are,
and are not, orthogonal. Specifically, this example demonstrates the lack of
correlation and covariance between sinusoids that are orthogonal such as a sine
and cosine at the same frequency and harmonically related sinusoids (i.e., those
having multiple frequencies of one another). It also shows correlation and covar-
iance for sinusoids that are not orthogonal such as sines that are not at harmoni-
cally related frequencies.
Example 2.3 Generate a data matrix where the columns consist of or-
thogonal and non-orthogonal sinusoids. Specifically, the data matrix should con-
sist of a 1 Hz sine and a cosine, a 2 Hz sine and cosine, and a 1.5 Hz sine and
cosine. The six sinusoids should all be at different amplitudes. The first four
sinusoids are orthogonal and should show negligible correlation while the two
1.5 Hz sinusoids should show some correlation with the other sinusoids.
% Example 2.3
% Application of the correlation and covariance matrices to
% sinusoids that are orthogonal and non-orthogonal
%
clear all; close all;
N = 256; % Number of data points in
% each waveform
fs = 256; % Sample frequency
n = (1:N)/fs; % Generate 1 sec of data
%
% Generate the sinusoids as columns of the matrix
x(:,1) = sin(2*pi*n)’; % Generate a 1 Hz sin
x(:,2) = 2*cos(2*pi*n); % Generate a 1 Hx cos
x(:,3) = 1.5*sin(4*pi*n)’; % Generate a 2 Hz sin
x(:,4) = 3*cos(4*pi*n)’; % Generate a 2 Hx cos
x(:,5) = 2.5*sin(3*pi*n)’; % Generate a 1.5 Hx sin
x(:,6) = 1.75*cos(3*pi*n)’; % Generate a 1.5 Hz cos
%
S = cov(x) % Print covariance matrix
C = corrcoef(x) % and correlation matrix
The output from this program is a covariance and correlation matrix. The
covariance matrix is:
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
S =
0.5020 0.0000 0.0000 0.0000 0.0000 -0.4474
0.0000 2.0078 -0.0000 -0.0000 1.9172 -0.0137
0.0000 -0.0000 1.1294 0.0000 -0.0000 0.9586
0.0000 -0.0000 0.0000 4.5176 -2.0545 -0.0206
0.0000 1.9172 -0.0000 -2.0545 2.8548 0.0036
-0.4474 -0.0137 0.9586 -0.0206 0.0036 1.5372
In the covariance matrix, the diagonals which give the variance of the six
signals vary since the amplitudes of the signals are different. The covariance
between the first four signals is zero, demonstrating the orthogonality of these
signals. The correlation between the 5th
and 6th
signals and the other sinusoids
can be best observed from the correlation matrix:
Rxx =
1.0000 0.0000 0.0000 0.0000 0.0000 -0.5093
0.0000 1.0000 -0.0000 -0.0000 0.8008 -0.0078
0.0000 -0.0000 1.0000 0.0000 -0.0000 0.7275
0.0000 -0.0000 0.0000 1.0000 -0.5721 -0.0078
0.0000 0.8008 -0.0000 -0.5721 1.0000 0.0017
-0.5093 -0.0078 0.7275 -0.0078 0.0017 1.0000
In the correlation matrix, the correlation of each signal with itself is, of
course, 1.0. The 1.5 Hz sine (the 5th
column of the data matrix) shows good
correlation with the 1.0 and 2.0 Hz cosine (2nd
and 4th
rows) but not the other
sinewaves, while the 1.5 Hz cosine (the 6th
column) shows the opposite. Hence,
sinusoids that are not harmonically related are not orthogonal and do show some
correlation.
SAMPLING THEORY AND FINITE DATA CONSIDERATIONS
To convert an analog waveform into a digitized version residing in memory
requires two operations: sampling the waveform at discrete points in time,* and,
if the waveform is longer than the computer memory, isolating a segment of the
analog waveform for the conversion. The waveform segmentation operation is
windowing as mentioned previously, and the consequences of this operation are
discussed in the next chapter. If the purpose of sampling is to produce a digi-
tized copy of the original waveform, then the critical issue is how well does this
copy represent the original? Stated another way, can the original be recon-
structed from the digitized copy? If so, then the copy is clearly adequate. The
*As described in Chapter 1, this operation involves both time slicing, termed sampling, and ampli-
tude slicing, termed quantization.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
answer to this question depends on the frequency at which the analog waveform
is sampled relative to the frequencies that it contains.
The question of what sampling frequency should be used can be best
addressed assuming a simple waveform, a single sinusoid.* All finite, continu-
ous waveforms can be represented by a series of sinusoids (possibly an infinite
series), so if we can determine the appropriate sampling frequency for a single
sinusoid, we have also solved the more general problem. The “Shannon Sam-
pling Theorem” states that any sinusoidal waveform can be uniquely recon-
structed provided it is sampled at least twice in one period. (Equally spaced
samples are assumed). That is, the sampling frequency, fs, must be ≥ 2fsinusoid. In
other words, only two equally spaced samples are required to uniquely specify
a sinusoid, and these can be taken anywhere over the cycle. Extending this to a
general analog waveform, Shannon’s Sampling Theorem states that a continuous
waveform can be reconstructed without loss of information provided the sam-
pling frequency is greater than twice the highest frequency in the analog wave-
form:
fs > 2fmax (22)
As mentioned in Chapter 1, in practical situations, fmax is usually taken as
the highest frequency in the analog waveform for which less than a negligible
amount of energy exists.
The sampling process is equivalent to multiplying the analog waveform
by a repeating series of short pulses. This repeating series of short pulses is
sometimes referred to as the sampling function. Recall that the ideal short pulse
is called the impulse function, δ(t). In theory, the impulse function is infinitely
short, but is also infinitely tall, so that its total area equals 1. (This must be
justified using limits, but any pulse that is very short compared to the dynamics
of the sampled waveform will due. Recall the sampling pulse produced in most
modern analog-to-digital converters, termed the aperture time, is typically less
than 100 nsec.) The sampling function can be stated mathematically using the
impulse response.
Samp(n) = ∑
∞
k=−∞
δ (n − kTs) (23)
where Ts is the sample interval and equals 1/fs.
For an analog waveform, x(t), the sampled version, x(n), is given by multi-
plying x(t) by the sampling function in Eq. (22):
*A sinusoid has a straightforward frequency domain representation: only a single complex point at
the frequency of the sinusoid. Classical methods of frequency analysis described in the next chapter
make use of this fact.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
x(n) = ∑
∞
k=−∞
x(nTs) δ (n − kTs) (24)
The frequency spectrum of the sampling process represented by Eq. (23)
can be determined by taking advantage of fact that multiplication in the time
domain is equivalent to convolution in frequency domain (and vice versa).
Hence, the frequency characteristic of a sampled waveform is just the convolu-
tion of the analog waveform spectrum with the sampling function spectrum.
Figure 2.9A shows the spectrum of a sampling function having a repetition rate
of Ts, and Figure 2.9B shows the spectrum of a hypothetical signal that has a
well-defined maximum frequency, fmax. Figure 2.9C shows the spectrum of the
sampled waveform assuming fs = 1/Ts ≥ 2fmax. Note that the frequency character-
istic of the sampled waveform is the same as the original for the lower frequen-
cies, but the sampled spectrum now has a repetition of the original spectrum
reflected on either side of fs and at multiples of fs. Nonetheless, it would be
possible to recover the original spectrum simply by filtering the sampled data
by an ideal lowpass filter with a bandwidth > fmax as shown in Figure 2.9E.
Figure 2.9D shows the spectrum that results if the digitized data were sampled
at fs < 2fmax, in this case fs = 1.5fmax. Note that the reflected portion of the spec-
trum has become intermixed with the original spectrum, and no filter can un-
mix them.* When fs < 2fmax, the sampled data suffers from spectral overlap,
better known as aliasing. The sampled data no longer provides a unique repre-
sentation of the analog waveform, and recovery is not possible.
When correctly sampled, the original spectrum can by recovered by apply-
ing an ideal lowpass filter (digital filter) to the digitized data. In Chapter 4, we
show that an ideal lowpass filter has an impulse response given by:
h(n) =
sin(2πfcTs n)
πn
(25)
where Ts is the sample interval and fc is the filter’s cutoff frequency.
Unfortunately, in order for this impulse function to produce an ideal filter,
it must be infinitely long. As demonstrated in Chapter 4, truncating h(n) results
in a filter that is less than ideal. However if fs >> fmax, as is often the case, then
any reasonable lowpass filter would suffice to recover the original waveform,
Figure 2.9F. In fact, using sampling frequencies much greater than required is
the norm, and often the lowpass filter is provided only by the response charac-
teristics of the output, or display device which is sufficient to reconstruct an
adequate looking signal.
*You might argue that you could recover the original spectrum if you knew exactly the spectrum
of the original analog waveform, but with this much information, why bother to sample the wave-
form in the first place!
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.9 Consequences of sampling expressed in the frequency domain. (A)
Frequency spectrum of a repetitive impulse function sampling at 6 Hz. (B) Fre-
quency spectrum of a hypothetical time signal that has a maximum frequency,
fmax, around 2 Hz. (Note negative frequencies occur with complex representation).
(C) Frequency spectrum of sampled waveform when the sampling frequency was
greater that twice the highest frequency component in the sampled waveform.
(D) Frequency spectrum of sampled waveform when the sampling frequency was
less that twice the highest frequency component in the sampled waveform. Note the
overlap. (E) Recovery of correctly sampled waveform using an ideal lowpass filter
(dotted line). (F) Recovery of a waveform when the sampling frequency is much
much greater that twice the highest frequency in the sampled waveform (fs = 10fmax).
In this case, the lowpass filter (dotted line) need not have as sharp a cutoff.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
Edge Effects
An advantage of dealing with infinite data is that one need not be concerned
with the end points since there are no end points. However, finite data consist
of numerical sequences having a fixed length with fixed end points at the begin-
ning and end of the sequence. Some operations, such as convolution, may pro-
duce additional data points while some operations will require additional data
points to complete their operation on the data set. The question then becomes
how to add or eliminate data points, and there are a number of popular strategies
for dealing with these edge effects.
There are three common strategies for extending a data set when addi-
tional points are needed: extending with zeros (or a constant), termed zero pad-
ding; extending using periodicity or wraparound; and extending by reflection,
also known as symmetric extension. These options are illustrated in Figure 2.10.
In the zero padding approach, zeros are added to the end or beginning of the
data sequence (Figure 2.10A). This approach is frequently used in spectral anal-
ysis and is justified by the implicit assumption that the waveform is zero outside
of the sample period anyway. A variant of zero padding is constant padding,
where the data sequence is extended using a constant value, often the last (or
first) value in the sequence. If the waveform can be reasonably thought of as
one cycle of a periodic function, then the wraparound approach is clearly justi-
fied (Figure 2.10B). Here the data are extended by tacking on the initial data
sequence to the end of the data set and visa versa. This is quite easy to imple-
ment numerically: simply make all operations involving the data sequence index
modulo N, where N is the initial length of the data set. These two approaches
will, in general, produce a discontinuity at the beginning or end of the data set,
which can lead to artifact in certain situations. The symmetric reflection approach
eliminates this discontinuity by tacking on the end points in reverse order (or
beginning points if extending the beginning of the data sequence) (Figure 2.10C).*
To reduce the number of points in cases where an operation has generated
additional data, two strategies are common: simply eliminate the additional
points at the end of the data set, or eliminate data from both ends of the data
set, usually symmetrically. The latter is used when the data are considered peri-
odic and it is desired to retain the same period or when other similar concerns are
involved. An example of this is circular or periodic convolution. In this case,
the original data set is extended using the wraparound strategy, convolution is
performed on the extended data set, then the additional points are removed
*When using this extension, there is a question as to whether or not to repeat the last point in the
extension; either strategy will produce a smooth extension. The answer to this question will depend
on the type of operation being performed and the number of data points involved, and determining
the best approach may require empirical evaluation.
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
FIGURE 2.10 Three strategies for extending the length of a finite data set. (A)
Zero padding: Zeros are added at the ends of the data set. (B) Periodic or wrap-
around: The waveform is assumed periodic so the end points are added at the
beginning, and beginning points are added at the end. (C) Symmetric: Points are
added to the ends in reverse order. Using this strategy the edge points may be
repeated as was done at the beginning of the data set, or not repeated as at the
end of the set.
symmetrically. The goal is to preserve the relative phase between waveforms
pre- and post-convolution. Periodic convolution is often used in wavelet analysis
where a data set may be operated on sequentially a number of times, and exam-
ples are found in Chapter 7.
PROBLEMS
1. Load the data in ensemble_data.mat found in the CD. This file contains
a data matrix labeled data. The data matrix contains 100 responses of a second-
Copyright 2004 by Marcel Dekker, Inc. All Rights Reserved.
order system buried in noise. In this matrix each row is a separate response.
Plot several randomly selected samples of these responses. Is it possible to eval-
uate the second-order response from any single record? Construct and plot the
ensemble average for this data. Also construct and plot the ensemble standard
deviation.
2. Use the MATLAB autocorrelation and random number routine to plot the
autocorrelation sequence of white noise. Use arrays of 2048 and 256 points to
show the affect of data length on this operation. Repeat for both uniform and
Gaussian (normal) noise. (Use the MATLAB routines rand and randn, respec-
tively.)
3. Construct a 512-point noise arrray then filter by aver
Be the first to comment