Statistical Models of Mammographic Texture and Appearance
Upcoming SlideShare
Loading in...5
×
 

Statistical Models of Mammographic Texture and Appearance

on

  • 6,999 views

Breast cancer is the most common cancer in women. Many countries—including the UK—offer asymptomatic screening for the disease. The interpretation of mammograms is a visual task and is subject to ...

Breast cancer is the most common cancer in women. Many countries—including the UK—offer asymptomatic screening for the disease. The interpretation of mammograms is a visual task and is subject to human error. Computer-aided image interpretation has been proposed as a way of helping radiologists perform this difficult task. Shape and texture features are typically classified into true or false detections of specific signs of breast cancer. This thesis promotes an alternative approach where any deviation from normal appearance is marked as suspicious, automatically including all signs of breast cancer. This approach requires a model of normal mammographic appearance. Statistical models allow deviation from normality to be measured within a rigorous mathematical framework. Generative models make it possible to determine how and why a model is successful or unsuccessful. This thesis presents two generative statistical models. The first treats mammographic appearance as a stationary texture. The second models the appearance of entire mammograms. Psychophysical experiments were used to evaluate synthetic textures and mammograms generated using these models. A novelty detection experiment on real and simulated data shows how the model of local texture may be used to detect abnormal features.

Statistics

Views

Total Views
6,999
Views on SlideShare
6,999
Embed Views
0

Actions

Likes
0
Downloads
72
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Statistical Models of Mammographic Texture and Appearance Statistical Models of Mammographic Texture and Appearance Document Transcript

  • Statistical Models of Mammographic Texture and AppearanceA thesis submitted to the University of Manchester for thedegree of Doctor of Philosophy in the Faculty of Medical and Human Sciences 2005 Christopher J. Rose School of Medicine 1
  • Contents1 Introduction 28 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2 Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.3 Computer-aided mammography . . . . . . . . . . . . . . . . . . . 30 1.4 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.5 Generative models . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.6 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 33 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Breast cancer 36 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2 Anatomy of the breast . . . . . . . . . . . . . . . . . . . . . . . . 37 2
  • CONTENTS CONTENTS 2.3 Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.1 What is breast cancer? . . . . . . . . . . . . . . . . . . . . 39 2.3.2 Predictive factors . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.3 Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.4 Clinical detection . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.5 Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.3.6 Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4 Breast imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.1 X-ray mammography . . . . . . . . . . . . . . . . . . . . . 51 2.4.2 Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . 55 2.4.3 Magnetic resonance imaging . . . . . . . . . . . . . . . . . 55 2.4.4 Computed tomography . . . . . . . . . . . . . . . . . . . . 56 2.4.5 Thermography . . . . . . . . . . . . . . . . . . . . . . . . 57 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Computer-aided mammography 59 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2 Computer-aided mammography . . . . . . . . . . . . . . . . . . . 60 3
  • CONTENTS CONTENTS 3.3 Image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4 Breast segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Breast density and risk estimation . . . . . . . . . . . . . . . . . . 68 3.6 Microcalcification detection . . . . . . . . . . . . . . . . . . . . . 70 3.7 Masses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.8 Spiculated lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.9 Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.10 Clinical decision support . . . . . . . . . . . . . . . . . . . . . . . 85 3.11 Evaluation of computer-based methods . . . . . . . . . . . . . . . 86 3.12 Image databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.13 Commercial systems . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.14 Prompting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.15 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054 Scale-orientation pixel signatures 107 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.2 Mathematical morphology . . . . . . . . . . . . . . . . . . . . . . 108 4
  • CONTENTS CONTENTS 4.2.1 Dilation and erosion . . . . . . . . . . . . . . . . . . . . . 109 4.2.2 Opening and closing . . . . . . . . . . . . . . . . . . . . . 110 4.2.3 M- and N-filters . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3 Pixel signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.3.1 Local scale-orientation descriptors . . . . . . . . . . . . . . 112 4.3.2 Constructing pixel signatures . . . . . . . . . . . . . . . . 113 4.3.3 Metric properties . . . . . . . . . . . . . . . . . . . . . . . 115 4.4 Analysis of the current implementation . . . . . . . . . . . . . . . 116 4.4.1 Structuring element length . . . . . . . . . . . . . . . . . . 116 4.4.2 Local coverage . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.5 An information theoretic measure of signature quality . . . . . . . 122 4.5.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.6 Classification-based evaluation . . . . . . . . . . . . . . . . . . . . 128 4.6.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5
  • CONTENTS CONTENTS 4.6.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315 Modelling distributions with mixtures of Gaussians 133 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3 Density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.4 Gaussian mixture models . . . . . . . . . . . . . . . . . . . . . . . 140 5.4.1 Learning the parameters . . . . . . . . . . . . . . . . . . . 141 5.4.2 The k-means clustering algorithm . . . . . . . . . . . . . . 142 5.4.3 The Expectation Maximisation algorithm for Gaussian mix- tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.5 Useful properties of multivariate normal distributions . . . . . . . 151 5.5.1 Marginal distributions . . . . . . . . . . . . . . . . . . . . 151 5.5.2 Conditional distributions . . . . . . . . . . . . . . . . . . . 153 5.5.3 Sampling from a Gaussian mixture model . . . . . . . . . 160 6
  • CONTENTS CONTENTS 5.6 Learning from large datasets . . . . . . . . . . . . . . . . . . . . . 161 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1646 Modelling mammographic texture for image synthesis and anal- ysis 166 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.3 Non-parametric sampling for texture synthesis . . . . . . . . . . . 170 6.4 A generative parametric model of texture . . . . . . . . . . . . . . 172 6.5 Generating synthetic textures . . . . . . . . . . . . . . . . . . . . 174 6.5.1 Pixel-wise texture synthesis . . . . . . . . . . . . . . . . . 174 6.5.2 Patch-wise texture synthesis . . . . . . . . . . . . . . . . . 174 6.5.3 The advantages and disadvantages of a parametric statisti- cal approach . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.6 Some texture models and synthetic textures . . . . . . . . . . . . 178 6.6.1 A model of fractal mammographic texture . . . . . . . . . 178 6.6.2 A model of real mammographic texture . . . . . . . . . . . 179 6.6.3 The quality of the synthetic textures . . . . . . . . . . . . 182 6.6.4 Time and space requirements of the parametric method . . 185 7
  • CONTENTS CONTENTS 6.7 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1897 Evaluating the texture model 190 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 7.2 Psychophysical evaluation of synthetic textures . . . . . . . . . . 191 7.2.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.3 Initial validation of the novelty detection method . . . . . . . . . 197 7.3.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 7.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 7.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.4 Evaluation of novelty detection performance . . . . . . . . . . . . 200 7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 200 7.4.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 8
  • CONTENTS CONTENTS 7.4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 7.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2208 GMMs in principal components spaces and low-dimensional tex- ture models 222 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 8.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . 223 8.3 Gaussian mixtures in principal components spaces . . . . . . . . . 224 8.3.1 A numerical issue . . . . . . . . . . . . . . . . . . . . . . . 227 8.4 Texture synthesis in principal components spaces . . . . . . . . . 228 8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2319 A generative statistical model of entire mammograms 233 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 9.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 9.2.1 Why are mammograms hard to model? . . . . . . . . . . . 234 9
  • CONTENTS CONTENTS 9.2.2 Approaches to modelling the appearance of entire mammo- grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 9.3 Modelling and synthesising entire mammograms . . . . . . . . . . 242 9.3.1 Breast shape and the correspondence problem . . . . . . . 243 9.3.2 Approximate appearance . . . . . . . . . . . . . . . . . . . 249 9.3.3 Detailed appearance . . . . . . . . . . . . . . . . . . . . . 254 9.3.4 Generating synthetic mammograms . . . . . . . . . . . . . 256 9.4 Example synthetic mammograms . . . . . . . . . . . . . . . . . . 257 9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25810 Evaluating the synthetic mammograms 261 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 10.2 Qualitative evaluation by a mammography expert . . . . . . . . . 262 10.3 A quantitative psychophysical evaluation . . . . . . . . . . . . . . 263 10.3.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 10.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 10.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 10.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 10
  • CONTENTS 11 10.4 Evaluating the detailing model . . . . . . . . . . . . . . . . . . . . 265 10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26911 Summary and conclusions 271 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 11.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 11.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 11.4 Final statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279A The expectation maximisation algorithm 280 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 A.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 A.3 Proof of convergence . . . . . . . . . . . . . . . . . . . . . . . . . 282
  • List of Figures 2.1 Basic anatomy of the normal developed female breast. . . . . . . . 38 2.2 Incidence of breast cancer in England. . . . . . . . . . . . . . . . 43 2.3 The mediolateral-oblique and cranio-caudal views. . . . . . . . . . 52 3.1 An example microcalcification cluster. . . . . . . . . . . . . . . . 72 3.2 An example circumscribed mass. . . . . . . . . . . . . . . . . . . . 75 3.3 An example spiculated lesion. . . . . . . . . . . . . . . . . . . . . 79 4.1 Dilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.2 A sieved mammographic image. . . . . . . . . . . . . . . . . . . . 112 4.3 Example pixel signatures. . . . . . . . . . . . . . . . . . . . . . . 114 4.4 An illustration of the two limitations of the existing implementation.118 4.5 Incremental approximations of the bow tie structuring element. . 119 12
  • 134.6 Rotating the “rectangular” structuring elements. . . . . . . . . . . 1214.7 An “improved” pixel signature from the centre of a Gaussian blob. 1224.8 Regions of increased Shannon entropy. . . . . . . . . . . . . . . . 1274.9 An example region of interest and its groundtruth. . . . . . . . . 1295.1 An illustration of the expectation maximisation algorithm. . . . . 1495.2 A two-dimensional distribution marginalised over one dimension. . 1525.3 A conditional distribution. . . . . . . . . . . . . . . . . . . . . . . 1545.4 The divide-and-conquer clustering algorithm. . . . . . . . . . . . . 1636.1 Unconditional samples from the fractal model. . . . . . . . . . . . 1806.2 Fractal training and synthetic textures. . . . . . . . . . . . . . . . 1816.3 Unconditional samples from the real mammographic texture model. 1826.4 Real training and synthetic textures. . . . . . . . . . . . . . . . . 1836.5 Examples of synthesis failure using patch-wise synthesis with a model of real mammographic appearance. . . . . . . . . . . . . . 1857.1 A screenshot of one of the trials. . . . . . . . . . . . . . . . . . . . 1957.2 Fractal and scrambled textures. . . . . . . . . . . . . . . . . . . . 1987.3 ROC curve for texture discrimination. . . . . . . . . . . . . . . . 199
  • 147.4 The circle chord attenuation function. . . . . . . . . . . . . . . . . 2087.5 The sigmoid attenuation function. . . . . . . . . . . . . . . . . . . 2087.6 Examples of simulated masses using the three methods. . . . . . . 2097.7 Example log-likelihood image and ROC curve for simulated micro- calcifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2117.8 Example log-likelihood image and ROC curve for a simulated mass. 2127.9 ROC curve for simulated masses and microcalcifications (combined).2137.10 Example log-likelihood image and ROC curve for a real microcal- cification cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2157.11 ROC curve for real masses. . . . . . . . . . . . . . . . . . . . . . . 2167.12 ROC curve for real microcalcifications and masses (combined). . . 2178.1 Synthesis using a principal components model. . . . . . . . . . . . 2299.1 Examples of mammographic variation. . . . . . . . . . . . . . . . 2359.2 Overview of the Active Appearance Model. . . . . . . . . . . . . . 2409.3 Samples from two shape models, illustrating the need for good correspondences. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2449.4 Values of the Kotcheff and Taylor objective function. . . . . . . . 2469.5 Values of the MDL objective function. . . . . . . . . . . . . . . . 247
  • 159.6 The initial and final correspondences for the mammogram shape model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2489.7 Block diagram for the steerable pyramid decomposition. . . . . . . 2529.8 The coefficients in the top three levels of a steerable pyramid de- composition of a mammogram. . . . . . . . . . . . . . . . . . . . 2539.9 Synthetic mammograms generated using the model. . . . . . . . . 2599.10 Real and synthetic mammograms. . . . . . . . . . . . . . . . . . . 26010.1 Contributions of detailing coefficients to real and synthetic mam- mograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
  • List of Algorithms 1 The non-iterative k-means algorithm. . . . . . . . . . . . . . . . . 143 2 The iterative k-means algorithm. . . . . . . . . . . . . . . . . . . 143 3 The EM algorithm for fitting a GMM with two components to one-dimensional data. . . . . . . . . . . . . . . . . . . . . . . . . . 148 4 The EM algorithm for fitting a GMM with multiple components to multivariate data. . . . . . . . . . . . . . . . . . . . . . . . . . 150 5 Efros and Leung’s texture synthesis algorithm. . . . . . . . . . . . 171 6 Pixel-wise texture synthesis with a Gaussian mixture model of local textural appearance. . . . . . . . . . . . . . . . . . . . . . . . . . 175 7 Patch-wise texture synthesis with a Gaussian mixture model of local textural appearance. . . . . . . . . . . . . . . . . . . . . . . 176 8 Fractal mammographic texture algorithm. . . . . . . . . . . . . . 179 9 Novelty detection using a Gaussian mixture model of texture. . . 188 10 Simulating microcalcification clusters. . . . . . . . . . . . . . . . . 206 11 Generating a synthetic mammogram . . . . . . . . . . . . . . . . 256 16
  • List of Tables 4.1 Classification results for the two signature types. . . . . . . . . . . 130 7.1 Results for the psychophysical experiment. . . . . . . . . . . . . . 196 17
  • AbstractBreast cancer is the most common cancer in women. Many countries—includingthe UK—offer asymptomatic screening for the disease. The interpretation ofmammograms is a visual task and is subject to human error. Computer-aidedimage interpretation has been proposed as a way of helping radiologists performthis difficult task. Shape and texture features are typically classified into trueor false detections of specific signs of breast cancer. This thesis promotes analternative approach where any deviation from normal appearance is marked assuspicious, automatically including all signs of breast cancer. This approach re-quires a model of normal mammographic appearance. Statistical models allowdeviation from normality to be measured within a rigorous mathematical frame-work. Generative models make it possible to determine how and why a model issuccessful or unsuccessful. This thesis presents two generative statistical models.The first treats mammographic appearance as a stationary texture. The sec-ond models the appearance of entire mammograms. Psychophysical experimentswere used to evaluate synthetic textures and mammograms generated using thesemodels. A novelty detection experiment on real and simulated data shows howthe model of local texture may be used to detect abnormal features. 18
  • DeclarationNo portion of the work referred to in the thesis has been submitted in support ofan application for another degree or qualification of this or any other universityor other institute of learning. 19
  • Copyright 1. Copyright in text of this thesis rests with the Author. Copies (by any process) either in full, or of extracts, may be made only in accordance with instructions given by the Author and lodged in the John Rylands University Library of Manchester. Details may be obtained from the Librarian. This page must form part of any such copies made. Further copies (by any process) of copies made in accordance with such instructions may not be made without the permission (in writing) of the Author. 2. The ownership of any intellectual property rights which may be described in this thesis is vested in the University of Manchester, subject to any prior agreement to the contrary, and may not be made available for use by third parties without the written permission of the University, which will prescribe the terms and conditions of any such agreement. 3. Further information on the conditions under which disclosures and ex- ploitation may take place is available from the Head of School of School of Medicine. 20
  • DedicationThis thesis is dedicated to the memory of Gareth Jones.In addition to being an excellent office mate, Gareth made a substantial contri-bution to my PhD research. With his dry sense of humour, willingness to helpand pragmatic perfectionism—and despite his admirable unwillingness to bendto the stupidity of others—he motivated me to learn how to prepare documentsusing the L TEX typesetting system, contributed to discussions on mathematical Amatters, helped me with various aspects of MATLAB and UNIX, and radicallyaltered my view of computers and programming. It is a pleasure to have knownhim, and I wish I had known him better.Friday 21 November 2003. 21
  • AcknowledgementsThe author would like to thank the following people: • My mother, Anne, who has put myself and my brothers first in everything she has done. • My girlfriend, Chris, for uncountable reasons. • My PhD supervisor, Prof. Chris Taylor OBE, who is patient, supportive, giving and hard-working. • Anthony Holmes, for his generosity in getting me started. • Special thanks go to Andrew Bentley, who employed a spotty teenage geek and taught him electronics and computer programming. This thesis would not exist without his support—thank you! Thanks also to Richard, David, Keith and Martin for all their assistance. • My friends, for their support over the last few years: Stuart, Rick, Rob, Jimi, Elios, Alan, Caroline, Harpreet, Karen, Ruth and Siˆn. a • My office mates: Gareth, Craig, Mike, Kaiyan, Basma, Tamader, John and Rob. 22
  • 23• Other members of ISBE, including Tim Cootes, Carole Twining, Sue Astley, Paul Beatty, Jim Graham, Ian Scott and Tomos Williams, for their help at various times during my time as a PhD student.• The ISBE information technology support team for keeping things ticking.• Alexandre Nasrallah for proof-reading some of the chapters in this thesis.
  • FundingThe work described in this thesis was supported by the EPSRC as part of theMIAS-IRC project From Medical Images and Signals to Clinical Information (EP-SRC GR/N14248/01 and UK Medical Research Council Grant No. D2025/31). 24
  • About the AuthorIn holiday time during his A-level studies and first degree, Chris Rose workedfor Kraft Jacobs Suchard on a range of electronic and software projects. Hegraduated from The University of Manchester in 1999 with a 2.1 BEng (Hons)degree in Electronic Systems Engineering. He then worked for a small softwarehouse where he developed software and produced training materials for Ericsson.In 2000, he returned to The University of Manchester to begin a PhD in theDivision of Imaging Science and Biomedical Engineering, under the supervisionof Prof. Chris Taylor OBE. During this period he published the following papersrelated to the work in this thesis. • C. J. Rose and C. J. Taylor. An Improved Method of Computing Scale- Orientation Signatures. In Medical Image Understanding and Analysis, pages 5–8, July 2001 • C. J. Rose and C. J. Taylor. A Statistical Model of Texture for Medical Im- age Synthesis and Analysis. In Medical Image Understanding and Analysis, pages 1–4, July 2003 25
  • 26• C. J. Rose and C. J. Taylor. A Model of Mammographic Appearance. In British Journal of Radiology Congress Series: Proceedings of UK Radiolog- ical Congress 2004, pages 34–35, Manchester, United Kingdom, June 2004• C. J. Rose and C. J. Taylor. A Statistical Model of Mammographic Ap- pearance for Synthesis and Analysis. In International Workshop on Digital Mammography, 2004. (Accepted, pending.)• C. J. Rose and C. J. Taylor. A Generative Statistical Model of Mammo- graphic Appearance. In D. Rueckert, J. Hajnal, and G.-Z. Yang, editors, Medical Image Understanding and Analysis 2004, pages 89–92, Imperial College London, UK, September 2004• C. J. Rose and C. J. Taylor. A Holistic Approach to the Detection of Abnormalities in Mammograms. In British Journal of Radiology Congress Series: Proceedings of UK Radiological Congress 2005, page 29, Manchester, United Kingdom, June 2005• A. S. Holmes, C. J. Rose, and C. J. Taylor. Measuring Similarity between Pixel Signatures. Image and Vision Computing, 20(5–6):331–340, April 2002• A. S. Holmes, C. J. Rose, and C. J. Taylor. Transforming Pixel Signa- tures into an Improved Metric Space. Image and Vision Computing, 20(9– 10):701–707, August 2002
  • ‘As many truths as men. Occasionally, I glimpse a truer Truth, hiding in im-perfect simulacrums of itself, but as I approach, it bestirs itself and moves deeperinto the thorny swamp of dissent.’ From Cloud Atlas by David Mitchell. 27
  • Chapter 1Introduction1.1 IntroductionSince work for this thesis began, approximately 64 000 British women have diedfrom breast cancer [24]. Computer-aided X-ray mammography has been pro-posed as a way to help radiologists detect breast cancer at an early stage. Thisthesis describes work on generative statistical models of normal mammographicappearance. The ultimate aim of this strand of research is to be able to detectbreast cancer as a deviation from normal appearance. The generative propertyenables insight into what has been modelled successfully and where improvementis needed. Two generative statistical models of mammographic appearance aredescribed.This chapter presents a brief overview of the main subjects and motivations ofthis thesis. The chapter presents: 28
  • Chapter 1—Breast cancer 29 • An overview of breast cancer. • An overview of computer-aided mammography. • A description of novelty detection, the approach to breast cancer detection that motivates this thesis. • A description of generative models, and an explanation of why this property is vital to developing accurate models. • An overview of the organisation of the thesis.1.2 Breast cancerApproximately 11 500 women die from breast cancer each year in England andWales and it is the most common cancer in women (both in the UK and world-wide) [82]. It is possible to detect breast cancer at an early stage using X-raymammography; treatments are available and survival rates are good [82]. The UKNational Health Service Breast Screening Programme (NHSBSP) was initiated in1988 as a result of the Forrest report [66], published in 1987. All asymptomaticwomen aged 50–69 are invited for X-ray mammographic screening every threeyears. Radiologists visually inspect these X-ray images for signs of breast cancerand other problems. A more detailed background to breast cancer and screeningis presented in Chapter 2.
  • Chapter 1—Computer-aided mammography 301.3 Computer-aided mammographyResearch into the use of computers to detect breast cancer in mammograms hasbeen underway for about thirty years. In the most common approach, a com-puter automatically analyses a digitised mammogram and attempts to locatesigns of cancer. Detections are displayed to clinicians as prompts on a computerscreen or paper printout. Computer-aided mammography research has maturedto the point where, in 1998, the US Food and Drug Administration (FDA) gavepre-market approval to the ImageChecker system, developed by R2 TechnologyIncorporated. Three other systems have since been given FDA approval. How-ever, results from research into the effectiveness of these systems in the clinicalenvironment are mixed. A large prospective study recently showed that expertscreening radiologist performance in one academic practice was not improved bythe use of a computer-aided mammography system [76] (see Section 3.14 for amore detailed discussion). Other studies have indicated that such systems canhelp radiologists detect breast cancer earlier [8]. Psychophysical experimentsthat have studied the effect of the false prompt rate (i.e. incorrect detections ofcancer) on radiologist performance indicate that the number of true and falseprompts must be approximately equal if radiologist performance is to be im-proved [95]. Only 5% of screening mammograms have any form of abnormality.This suggests that a target rate should be approximately 0.0125 false positivesper image (see Chapter 3). Commercial systems operate at much higher falsepositive rates. For example, R2 Technology Incorporated claim that version 8.0of their ImageChecker algorithm achieves ‘1.5 false positive marks per normalcase at the 91 percent sensitivity level’ [149]. This perhaps explains why the
  • Chapter 1—Novelty detection 31commercial computer-aided mammography systems do not appear to improveradiologist performance. Research is needed to determine how computer-aidedmammography systems can be improved and how the false positive rate can bereduced to the target level. It is likely that much more sophisticated approacheswill be required. This thesis investigates one such approach, which is describedbriefly in the next section.1.4 Novelty detectionBreast cancer, as imaged in mammograms, can manifest itself in a number ofdifferent ways. Masses appear as “blob”-like features, microcalcifications appearas very small specks, architectural distortions subtly change the appearance ofthe breast tissue and spiculated masses have radiating linear structures. Each ofthese can be extremely subtle. Current computer-aided mammography methodstypically target only microcalcifications and masses (including spiculated masses),and treat each type of abnormality separately. A common approach is to locatecandidate abnormalities (often using ad hoc methods), compute measurements ofshape and texture (called features) and then use a classifier to classify the featuresinto clinically meaningful classes (e.g. malignant or benign). The approach has anumber of drawbacks: • Different features and classifiers are required for each type of abnormality. • The features and classifiers implicitly and incompletely model the appear- ance of normal and breast cancer tissue. These tissue types are subject to
  • Chapter 1—Generative models 32 significant variation. • It is often difficult to justify why a particular measure of texture or shape is better than another and what it actually represents. • The use of ad hoc methods risks the accidental adoption of assumptions about the data.The approach advocated in this thesis is novelty detection, which is motivatedby the fact that signs indicative of breast cancer are not found in pathology-freemammograms. If deviation from normality could be detected, then all types ofabnormality would automatically be detectable. This approach requires a modelof what normal mammograms look like. Mammograms vary dramatically, bothbetween women and between screening sessions, so such a model must be able tocope with this variability. Statistical models capture variability and are suited tonovelty detection problems because deviation from normality can be measured ina meaningful way within a rigorous mathematical framework. Abnormal mam-mograms are relatively rare in the screening environment, so there is much moredata with which to train a model of normality than there is to train a classifierthat has an “abnormal” class.1.5 Generative modelsIf novelty detection is to be used, then the underlying model must be able to“legally” represent any pathology-free instance and be unable to legally representabnormal instances. The only way to verify this is to be able to generate instances
  • Chapter 1—Overview of the thesis 33from the model; thus the model must be generative. Further, generative modelsmake it relatively easy to visualise what has been modelled successfully and whathas not. The generative property makes progress towards a model that accuratelyexplains mammographic appearance tractable. The aim of the research presentedin this thesis was to develop and evaluate generative statistical models of normalmammographic appearance with the ultimate aim of being able to detect breastcancer via novelty detection. Two models have been developed and evaluated.The first assumes that mammograms are textures and neglects the shape of thebreast and the spatial variability in mammographic texture. The model allowssynthetic textures to be generated and can be used in an analytical mode toperform novelty detection. The second is a generative statistical model of entiremammograms and addresses many of the problems associated with modellingmammographic appearance.1.6 Overview of the thesis • Chapter 2 presents background information on breast cancer, the clinical problem and the various imaging modalities that are used to diagnose the disease. • Chapter 3 presents a review of the computer-aided mammography litera- ture. • Chapter 4 describes work on improving the way that scale-orientation pixel signatures (a type of texture feature) are computed. A measure of
  • Chapter 1—Overview of the thesis 34 signature quality, based upon information theory, is developed and a simple classification experiment is presented. • Chapter 5 presents background information on the multivariate normal distribution and the Gaussian mixture model. These models are used ex- tensively in this thesis. • Chapter 6 presents Efros and Leung’s algorithm for texture synthesis and develops the method into a parametric statistical model of texture that can be used in both generative and analytical modes. Synthetic textures are presented. • Chapter 7 presents a psychophysical evaluation of synthetic mammo- graphic textures produced by the model developed in Chapter 6. A novelty detection experiment using simulated and real data is presented. • Chapter 8 presents an investigation into how Gaussian mixture models (and hence the class of texture model presented in Chapter 6) may be learned in low-dimensional principal components spaces. Texture synthesis and analysis using such models is discussed. • Chapter 9 describes a generative statistical model of entire mammograms and shows how synthetic mammograms may be generated. • Chapter 10 presents three evaluations of the synthetic mammograms gen- erated using the model of entire mammograms. • Chapter 11 summarises the work presented in the thesis.
  • Chapter 1—Summary 351.7 SummaryThis chapter presented a brief overview of the subjects, motivations and structureof this thesis. The next chapter presents an introduction to breast cancer andthe imaging modalities used to detect the disease.
  • Chapter 2Breast cancer2.1 IntroductionThis chapter introduces the clinical problem of breast cancer and describes howmedical imaging is used to detect the disease. The chapter discusses: • The anatomy of the breast. • Breast cancer and its risk factors, prevention, detection, treatment and survival. • The various medical imaging modalities used to detect breast cancer, par- ticularly X-ray mammography. 36
  • Chapter 2—Anatomy of the breast 372.2 Anatomy of the breastThe main purpose of the female breast is to produce and deliver milk to offspring.Additionally, breasts are a secondary sexual characteristic and serve to indicatesexual maturity. A brief description of the basic anatomy of the breast follows,but the interested reader is directed to [172] for a comprehensive descriptionwithin the context of mammography.The breast itself is a modified sweat gland and is composed of several structures,illustrated in Figure 2.1. Above the ribcage is the pectoral muscle. At the frontof the breast, and externally visible, is the nipple. Milk is produced in lobes anddelivered to the nipple by ducts. These are collectively referred to as parenchymalor glandular tissue; they are the functional structures of the breast, as opposed tobeing connective or supporting tissues. The areola exposes glands that lubricatethe nipple during breastfeeding. Circular radiating muscles behind the areolacause the nipple to become erect upon tactile stimulation, facilitating suckling.The lymphatic system is responsible for protecting the body from infection frommicroorganisms and antigens. This is achieved by transporting the microorgan-isms and antigens to the lymph nodes where they are dealt with by the body’scellular immune system. Blood is transported to and from the breast by the vas-culature. Blood delivers oxygen and nutrients and removes waste products. Thestructure of breast is supported by Cooper’s ligaments and also contains adipose(fatty) tissue, neither of which are shown in Figure 2.1.
  • Chapter 2—Anatomy of the breast 38Figure 2.1: Basic anatomy of the normal developed female breast.Key:A Pectoral muscleB VasculatureC LobeD DuctE Lymph node and lymphatic systemF NippleG Areola
  • Chapter 2—Breast cancer 392.3 Breast cancerBreast cancer is almost exclusively a disease that affects women: 11 491 womenand 82 men died from breast cancer in England and Wales in 2002 [82]. We willnow briefly examine the background to the disease.2.3.1 What is breast cancer?We will now briefly discuss the cellular basis of cancer1 . Our bodies are composedof cells, which typically carry all of the genetic information required to determinehow we will grow. Cancer is an umbrella term for a group of diseases that causecells in the body to reproduce in an uncontrolled manner.Cells have several abilities, one of which is reproduction. Reproduction is achievedvia cell division. At each cell division, the genetic material contained within themother cell is copied to the daughter cells via a robust mechanism. This robustmechanism can detect errors in the genetic material contained within the cell andcan instruct the cell to “commit suicide” via programmed cell death (PCD)2 toprevent the erroneous information from being propagated.Recent cancer research has suggested that an enzyme called telomerase [77] playsan important role. At each normal cell division, genetic material at the endsof the chromosomes is lost. To prevent useful genetic material from being de-stroyed, the ends of chromosomes have redundant repeating genetic sequences 1 The interested reader is directed to [36] for background material on cellular biology 2 PCD is also referred to as apoptosis.
  • Chapter 2—Breast cancer 40called telomeres. Part of these sequences are lost at each cell division, but thegenetic information specific to the organism is preserved. If telomeres becometoo short, or are deleted entirely, the body interprets the genetic sequence asbeing broken. In this situation, the cell can be instructed to perform PCD, orreparative mechanisms can be employed. These reparative mechanisms can intro-duce genetic mutations. Cancer cells are “immortal” in that they do not respondto PCD instructions. Telomerase—an enzyme that builds new telomeres—is ex-pressed in approximately 90% of cancers, and the telomeres in cancer cells do notshorten. It is believed that telomerase may be the reason why cancer cells areimmortal. Cancer cells divide rapidly until they are forcefully destroyed (e.g. bymedical intervention or the death of the host organism). Cancer cells are there-fore genetically abnormal, but the exact genetic nature of cancer is not yet fullyunderstood.Cancers are named after their originating organ (i.e. breast cancer originates inthe breast and is composed of pathological breast tissue). Cancer cells can breakaway from their original location and travel through the vascular or lymphaticsystems. These cells may lodge to form secondary cancers in other parts of thebody. This process is called metastasis. The new cancer is named after theoriginating tissue and new location, for example secondary breast cancer of thebrain. Breast cancer generally develops in the ducts (ductal cancer), but mayalso develop in the lobes (lobular cancer).The terms cancer and tumour are not synonymous. A tumour may be benign ormalignant. Benign tumours are abnormal growths, but do not grow uncontrol-lably or metastasise, and are not necessarily life-threatening. The word cancer
  • Chapter 2—Breast cancer 41is synonymous with the phrase malignant tumour. Benign tumours can becomemalignant, but malignant tumours do not become benign. Cancer is caused by anumber of factors that can act individually or in combination [5]. These include: • External factors, e.g. exposure to: – Chemicals—particularly tobacco use – Infectious organisms – Radiation • Internal factors, e.g. : – Inherited and metabolic genetic mutations – Hormones – Immunity responsesBreast cancers can be described as being in situ (i.e. they have not spread fromtheir originating duct or lobule), and are often cured [4]. Alternatively, breastcancers can be described as being invasive or infiltrating (i.e. they have brokeninto the surrounding fatty tissue of the breast). The severity of an invasive breastcancer is related to the stage of the disease, which describes how far it has spread(e.g. it is confined to the breast, or surrounding tissue, or has metastasised todistant organs). The following terms are often used to describe the stage of thedisease [37]:
  • Chapter 2—Breast cancer 42 • Stage 1 – The tumour is no larger than 2 cm in diameter. – The lymph nodes in the armpit are unaffected. – The cancer has not metastasised. • Stage 2 – The tumour is between 2 cm and 5 cm in diameter, and/or the cancer has spread to the lymph nodes under the armpit. – The cancer has not spread elsewhere in the body. • Stage 3 – The tumour is larger than 5 cm in diameter. – The cancer has spread to the lymph nodes under the armpit. – The cancer has not spread elsewhere in the body. • Stage 4 – The tumour may be any size. – The lymph nodes in the armpit are often affected. – The cancer has spread to other parts of the body.2.3.2 Predictive factorsThe risk of developing breast cancer increases with age, as Figure 2.2 illustrates.In the USA, 95% of new cases and 96% of breast cancer deaths in the period
  • Chapter 2—Breast cancer 43Figure 2.2: Incidence of breast cancer in England.The incidence of breast cancer in English women in 2001 per 100 000 populationas a function of age. Linear interpolation is used between data points. Source ofdata: National Statistics [21].1996–2000 occurred in women aged 40 and older [4].Risk factors can be grouped by relative risk3 [4]: • Relative risk > 4.0 – Inherited genetic mutations (particularly BRCA1 and/or BRCA2). – Two or more first-degree relatives4 diagnosed with breast cancer at an early age. 3 Relative risk is defined as the ratio of the probability of the disease in the group exposedto the risk, to the probability of the disease in a control group. 4 A first-degree relative is a mother, father, sister, brother, daughter or son
  • Chapter 2—Breast cancer 44 – Post-menopausal breast density. • Relative risk > 2.0 and ≤ 4.0 – One first-degree relative with breast cancer. – High dose of radiation to the chest. – High post-menopausal bone density. • Relative risk > 1.0 and ≤ 2.0 – Late age at first full-term pregnancy (> 30 years). – Early menarche (< 12 years). – Late menopause (> 55 years). – No full-term pregnancies. – Recent oral contraceptive use. – Recent and long-term hormone replacement therapy. – Tall. – High socioeconomic status. – Post-menopausal obesity.Tobacco use is not necessarily linked to breast cancer. Some studies have shownthat smoking is not associated with the disease, while others have indicated alink [43]. Effects due to smoking are confounded by alcohol use, which correlateswith both tobacco use and increased breast cancer risk. Alcohol is the dietaryfactor most consistently associated with increased breast cancer risk [4] and breastcancer risk increases by about 7% per alcoholic drink consumed per day [112].
  • Chapter 2—Breast cancer 452.3.3 PreventionBreast cancer cannot be prevented due to the environmental and inherited riskfactors. However it should be possible to reduce the incidence of cancers that canbe attributed to lifestyle factors via behavioural modification.One of the most important lifestyle changes that can be made is the managementof alcohol consumption: even moderate alcohol use is associated with increasedbreast cancer risk [4]. Moderate alcohol consumption has a cardio-protectiveeffect, so advice on alcohol consumption must consider more than just breastcancer risk [182]. Women who are not known to have an increased risk of breastcancer are advised to adopt a healthy lifestyle by limiting alcohol, avoiding to-bacco use and by maintaining a healthy weight through regular exercise and adiet that is low in fats and high in fruit and vegetables. However, this adviceis not specific to breast cancer, and instead considers evidence for all commondiseases [182]. Women who are known to have an increased risk of breast cancershould be advised accordingly.There is debate within the clinical community about how women should be ad-vised regarding tobacco use and its effect on breast cancer risk. Some favourhonest advice that states that the balance of evidence shows no or little increasedrisk, while others favour advice that emphasises the evidence that indicates thatthere is an increased risk in some circumstances, and that women should be dis-couraged from smoking because of other associated risks (e.g. lung cancer) [43].General practitioners should consider the risk of breast cancer when prescrib-ing hormonal medications such as hormone replacement therapy or oral contra-
  • Chapter 2—Breast cancer 46ceptives. Women at very high risk may be offered prophylactic mastectomy ortreatment with a drug such as Tamoxifen [4].2.3.4 Clinical detectionBreast cancer is most successfully treated at an early stage and it has been rec-ommended for the past 30 years or so that women perform regular breast self-examination (BSE). In recent years this advice has been challenged. A Canadianmeta-analysis failed to find evidence that BSE reduces breast cancer mortality,but found that BSE results in more benign breast biopsies and increased patientdistress [12]. The study recommended that women should not be taught BSE, butthe author stresses the difference between BSE and breast self-awareness, and en-courages the latter [118]. An American study found that women who had benignbiopsies after performing BSE tended to perform BSE less frequently as a result[13]. Advice on BSE and breast self-awareness needs to informed by evidence ofthe risks of increased biopsy rate and distress with the potential benefits. TheAmerican Cancer Society currently recommends that women optionally performmonthly BSE [4].Some countries have implemented national screening programmes—where womenare invited for asymptomatic X-ray imaging of the breast (mammography) todetect cancer at an early stage. The International Breast Cancer Screening Net-work currently has 27 member countries who have pilot or established nationalor subnational screening programmes [101]. These members are predominantlydeveloped countries in North America, Western Europe and the Far East. The
  • Chapter 2—Breast cancer 47UK National Health Service Breast Screening Programme (NHSBSP) was initi-ated in 1988 as a result of the Forrest report [66]. Women between the ages of50 and 70 (formerly 65) are invited for screening every three years. Women nowhave two views of each of their breasts imaged at each screening session, resultingin 13% more breast cancers being detected in 2002/3 compared with the previ-ous 12 months when a single view was used [133]. A 14 year follow-up of theEdinburgh randomised trial of breast screening, published in 1999, showed thatbreast screening reduced breast cancer mortality by 13% [2]; the NHSBSP an-nual review for 2004 [133] claims that mortality dropped by 30% in the precedingdecade, though this success cannot be attributed to breast screening alone.The benefits of asymptomatic breast screening are disputed and some argue thatscreening may even be detrimental to the health of women. Gøtzsche and Olsenargue that there is no reliable evidence that screening mammography reducesmortality and that screening may result in distress and unnecessarily aggressivetreatment [73, 136]. However, their conclusions are largely based upon meta-analyses which debunk studies that show that screening has a positive effect,rather than upon data that show that screening has a negative effect. Anothercriticism of screening mammography is economic. While the cost per womanscreened is low (approximately £40 [134] in the UK), another picture emergeswhen one looks at the cost per life saved. The UK NHSBSP currently costsapproximately £52M per year and is estimated to save approximately 300 livesper year [134]. This equates to an approximate average cost of £173 300 perlife saved. By the year 2010, it is estimated that the NHSBSP will save 1 250lives per year; this will bring the cost per life saved down to approximately
  • Chapter 2—Breast cancer 48£41 600 (assuming other factors do not change). In 1995, the cost per life savedby the Ontario, Canada screening programme was estimated to be £558 000,based upon the cost of a single mammography examination and the estimatednumber of women who would need to be screened in order for one life to be saved[186]. Variation in the cost of screening can be attributed to the environmentand manner in which screening and treatment are implemented. It is a matterfor those responsible for public health policy to determine the best use of availableresources given the evidence for and against screening mammography.Molecular tests are now available that can detect some of the BRCA geneticmutations [4] and these may be used routinely in the future. Consideration isbeing given to a UK-wide programme to use magnetic resonance imaging to screenpre-menopausal women at high genetic risk of breast cancer [28].2.3.5 TreatmentTreatment for breast cancer is dependent upon several factors: the stage of dis-ease and its biological characteristics, patient age and the risks and benefits asdetermined by clinicians and the patient [4]. Surgery to remove the canceroustissue is common, and the type of surgery is chosen to balance the need to re-move the cancer with the disfigurement that the surgery will cause. Surgery mayinvolve (in order of increasing disfigurement): • Lumpectomy—which can be employed when the cancer is localised—involves removing the “lump” and a border of “normal” tissue which is checked to ensure that all cancerous tissue has been removed.
  • Chapter 2—Breast cancer 49 • Simple mastectomy (or total mastectomy) involves the removal of the entire breast. • Modified radical mastectomy involves the removal of the entire breast and underarm lymph nodes. • Radical mastectomy involves the removal of the breast, underarm lymph nodes and chest wall muscle. This type of surgery is now used less frequently as less disfiguring approaches have proved to be effective [4].Surgery is often used alongside chemotherapy, hormone therapy, biologic (alsocalled immune and antibody) therapy or radiotherapy. Chemotherapy, hormoneand biologic therapies are systemic treatments in that they are applied to theentire body—rather than a specific organ—with the intention of killing cancercells that may have metastasised.Chemotherapy is a drug treatment that kills rapidly dividing cells. This includescancer cells as well as some types of normal cells, such as blood and hair cells.Chemotherapy, in combination with surgery, has been shown to deliver five yearsurvival rates of between 50% and 70% [25]Hormone therapy attempts to prevent the growth of metastasised cancer cellsby blocking the effects of hormones (such as oestrogen) that can promote theirgrowth. An anti-oestrogen drug called Tamoxifen has been used successfully,but recent research shows that the aromatase inhibitor anastrozole significantlyincreases disease-free survival over five years compared to Tamoxifen [75].
  • Chapter 2—Breast cancer 50Trastuzumab (marketed under the name Herceptin) is a biologic therapy thattargets cancer cells which produce an excess of a protein called HER2. Whencombined with chemotherapy, trastuzumab treatment can reduce the relativerisk of mortality by 20%, but can increase the risk of heart failure [119].In contrast to the systemic treatments, radiotherapy (also called radiation ther-apy) is targeted at specific locations. High energy radiation is focused on areas ofthe body affected with cancer (such as the breast, chest wall or underarm area).Alternatively, small radiation sources, called pellets, can be implanted into thecancer. There is no significant difference in survival between women who havesmall breast tumours removed by lumpectomy compared to those who also re-ceive radiotherapy, but women who receive radiotherapy have a reduced risk oftheir cancer returning and therefore require less additional treatment [64].2.3.6 SurvivalThe one and five year survival rates for English women diagnosed with breastcancer between 1993 and 2000 were 92.6% and 75.9% respectively [148]. Forcomparison, in the same period the mean one and five year survival rates inboth sexes for lung cancer—the second most common cancer in women and mostcommon cancer in men—were 21.6% and 5.5% respectively. In the USA, thefive year survival rate for women with breast cancer is 87% [4]. There is also anassociation between low socioeconomic status, poor access to medical care andadditional illness and low survival rates [4].
  • Chapter 2—Breast imaging 512.4 Breast imagingThis section introduces X-ray mammography—the most common form of clinicalimaging used to detect breast cancer—and briefly discusses the other imagingmodalities that may be used.2.4.1 X-ray mammographyX-rays were discovered by Wilhelm Conrad R¨ntgen in 1895, who was awarded othe first Nobel prize for physics for his discovery. X-rays are high-frequencyelectromagnetic radiation (30 PHz–60 EHz) and are useful in diagnostic imagingbecause the dense tissues in the body are more likely to absorb X-rays (i.e. theyare radio-opaque) while the soft tissues are less likely to absorb X-rays (i.e. theyare radiolucent). X-rays are formed by accelerating electrons from a heated cath-ode filament towards an anode. The interaction of the high energy electrons withthe anode emits radiation in the X-ray spectrum. This radiation is then directedtowards the patient.X-rays are detected using photographic film or digitally (e.g. using a charge-coupled device). By placing a body part between the X-ray source and detector,it is possible to form an image that spatially describes the X-ray absorption ofthe body part. This image will be a two-dimensional projection of the three-dimensional structure.X-rays were first used to investigate breast cancer almost a hundred years ago[160]. An X-ray mammogram is obtained by imaging the breast compressed
  • Chapter 2—Breast imaging 52Figure 2.3: The mediolateral-oblique and cranio-caudal views.The diagram illustrates the directions of compression used in the mediolateral-oblique (MLO) and cranio-caudal (CC) views. The MLO view is illustrated onthe left in blue and the CC view is illustrated on the right in red.between two parallel radiolucent plates. Different directions of compression allowclinicians to view the three-dimensional structure of the breast in more than oneway. This allows ambiguities caused by occlusion or other perspective effects tobe minimised. Two common views are the cranio-caudal view (CC—“head totail”) and the mediolateral-oblique view (MLO; where the compression is angledapproximately 45◦ to the CC view). These are illustrated in Figure 2.3.X-ray mammography is the imaging modality of choice for breast cancer investiga-
  • Chapter 2—Breast imaging 53tion and the UK National Health Service Breast Screening Programme generateshundreds of thousands of mammograms each year [133]. X-ray mammographyis favoured because of its high resolution (required to image microcalcifications)and low cost (approximately £40 per woman screened [134]).Fully digital systems are increasing in quality and popularity. The advantages offully digital systems may include: • Direct digital image acquisition. • Increased sensitivity compared to film-based methods, permitting lower ra- diation dosage. • Immediate image display and enhancement. • Improved archival and transmission possibilities (including remote image analysis by human or computer).It is expected that fully digital mammography will soon supersede film-basedmammography. Fully digital mammography is likely to benefit the computer-aided mammography research community, as the digitisation step required forfilm-based mammography is an impediment to the collection of useful imagedata.Although X-ray mammography remains the most useful imaging modality forbreast cancer, it is dependent upon the use of radiation, which itself can causecancer. It is likely that some cancers are caused by the screening programme.Efforts are made to monitor and minimise radiation dose.
  • Chapter 2—Breast imaging 54Mammograms are most commonly read visually as X-ray films, although com-mercial computer-aided mammography and digital systems are being used—particularly in the USA (see Section 3.13 for a discussion of commercial sys-tems). In the screening environment, dedicated viewing stations are loaded witha batch of mammograms. The mammograms are positioned so that left andright breasts—and CC and MLO views, if both are available—can be compareddirectly. Radiologists use strategies to try and ensure that ‘danger zones’ are al-ways examined. In the UK screening environment, it is typical that a radiologistwill take an average of 30 s to read each patient’s mammograms. Radiologistsrecord their assessments and difficult cases are likely to be discussed with col-leagues. If double reading is used—where two radiologists independently readeach mammogram—a protocol will be followed to combine the assessments ofeach radiologist.Women for whom screening indicates abnormality are recalled for further investi-gation such as a magnification X-ray or ultrasound. The diagnosis of breast cancermay be confirmed by analysing a tissue sample extracted by biopsy. Because theinterpretation of mammograms is a difficult task and is subject to human error,biopsies are sometimes performed on women who do not have cancer. The recallprocess is traumatic and biopsy—like any surgery—causes discomfort and worry.The benign biopsy rate in 2002/3 was 1.20 per 100 000 women screened [133].The benign biopsy rate has improved with advances in diagnostic technique.The radiological signs of breast cancer are described in Chapter 3; example imagesare given for the most common indicative signs.
  • Chapter 2—Breast imaging 552.4.2 UltrasonographyUltrasound imaging works by sending high-frequency sound pulses into the tissuesof a patient using an array of transceivers that is placed on the patient’s skin.When these sounds encounter tissue interfaces, some of the sound is reflected backto the array. The distances from the skin surface to the tissue interfaces are thencomputed based upon the time between the pulses being sent and received andthe speed of the sound wave. One-dimensional transceiver arrays produce imageslices, while two-dimensional arrays produce volumes. These are presented tothe ultrasonographer on a computer display. Ultrasound images are generated inreal-time and are useful in breast cancer investigation when a suspicious featurehas been identified by X-ray mammography or when a patient has reported withsymptoms [63]. It is particularly useful for differentiating between cysts (whichare benign) and malignant masses.2.4.3 Magnetic resonance imagingThe human body is composed largely of water, which in turn is composed largelyof hydrogen. A hydrogen atom has an unpaired proton, and so has a non-zeronuclear spin. In magnetic resonance imaging (MRI), the patient is placed in astrong uniform magnetic field (usually between 0.23T and 3.0T). This forces thespins of the protons in the hydrogen atoms to align with the field. Almost allprotons will be paired, in that each member of a pair will be oriented at 180◦ tothe other, but some will not. A radio frequency pulse can temporarily deflect theunpaired protons.
  • Chapter 2—Breast imaging 56The imparted energy is released as electromagnetic radiation as the spins realignwith the field. The realignment signal is characteristic of tissue type and canbe measured. By applying an additional graduated magnetic field it is possibleto localise the signals, since their frequency is related to their position in thegraduated field. The received signals are recorded in a frequency space calledK-space. An inverse Fourier transform is applied to form the correspondingspatial volumetric data. Voxel values represent tissue type and hence the patient’sanatomy.The spatial resolution of current clinical MRI systems is not as good as that of X-ray mammography, so microcalcifications cannot be imaged. However, MRI hasseveral advantages over X-ray imaging: patients are exposed to little radiation,three-dimensional data can be acquired and contrast agents can be used. Un-fortunately, MRI is currently too expensive for routine asymptomatic screeningfor breast cancer, but may be useful for screening younger women whose familyhistory and/or genetic status suggest that are at increased risk of breast cancer[28].2.4.4 Computed tomographyIn computed tomography (CT), an X-ray source is rotated around the patient’smajor axis. Whereas the beam of a conventional X-ray can be considered to beconical (i.e. 3-D), CT typically uses a “triangular” beam (i.e. a very thin cone).The attenuation of the beam as it passes through the patient is recorded by anX-ray detector positioned opposite to the source. The attenuation data from all
  • Chapter 2—Summary 57orientations can be combined to compute a 2-D image “slice”, where each locationin the slice represents the X-ray attenuation of the tissue to which it corresponds.By slowly passing the patient through the rotating mechanism, 3-D data can beacquired. Although it is possible to use CT for breast imaging, it is rarely usedto diagnose breast cancer [117]. The technique can be useful for surgical planningand to assess the patient’s response to treatment.2.4.5 ThermographyAdvanced cancers promote angiogenesis—the development of a blood supply tothe tumour. Regions containing more blood are hotter than others, and this heatmay be detectable on the skin surface. Thermography is an imaging techniquethat forms maps of the emission of infrared radiation [117]. These maps enableclinicians to look for asymmetries in the heat patterns on the breasts that mayresult from angiogenesis or the enhanced metabolic processes that occur in atumour. Compared to X-ray mammography, thermography lacks specificity andresolution.2.5 SummaryThis chapter presented an introduction to breast cancer and the imaging modal-ities used to detect the disease. In summary: • Breast cancer is a significant public health issue. While many countries
  • Chapter 2—Summary 58 now have screening programmes to help detect the disease at an early and treatable stage, the image interpretation task is performed visually and is subject to human error. • X-ray mammography is the most useful imaging modality because of the high image quality and low cost. X-ray mammography allows the anatomy of the breast to be imaged at very high resolution, allowing very small indicative signs of breast cancer—such as microcalcifications—to be seen. • X-ray mammography does have some drawbacks (e.g. the use of radiation, 2-D projection of the 3-D structure, potential for poor patient positioning, potential for poor film exposure and development). • Other imaging modalities have their uses in detecting and diagnosing breast cancer, but X-ray mammography for screening is unlikely to be replaced by any of the currently available imaging techniques.
  • Chapter 3Computer-aided mammography3.1 IntroductionThis chapter presents a review of the computer-aided mammography literature.The chapter reviews: • Image pre-processing. • Automatic prediction of breast cancer risk. • The appearance of common signs of breast cancer and approaches to their detection by computer. • Methods of evaluating computer-aided mammography systems. • Common image databases. • Available commercial systems. 59
  • Chapter 3—Computer-aided mammography 60 • Research on computer-aided prompting of radiologists.We discuss the typical approach to computer-aided mammography and the prob-lems associated with it. We propose how this problem might be solved.3.2 Computer-aided mammographyAlthough screening mammography has been shown to reduce breast cancer mor-tality [133, 2], it suffers from some problems that computer vision systems mightbe able to solve, for example: • Double reading improves cancer detection rate [57], but it cannot always be performed in the screening environment due to human resource or economic limitations. Computer vision systems could act as a second reader. • The interpretation of mammograms is a difficult task and human error does occur [97]. Computer vision systems could deliver a guaranteed minimum quality of screening and potentially catch some of the errors made by radi- ologists. • Cancer is detected in less than 1% of women screened [133]. A computer vision system that could accurately dismiss mammograms that were normal could dramatically reduce radiologist workload.The most commonly proposed approach to computer-aided mammography isprompting, in which a computer system automatically analyses a digitised mam-mogram and places prompts on a representation of the mammogram—e.g. an
  • Chapter 3—Image enhancement 61image of the digitised mammogram displayed on screen or a paper printout—toindicate the presence and location of possible signs of abnormality. A radiol-ogist would then consider these prompts alongside their own interpretation ofmammograms. Prompting is discussed further in Section 3.14. The following re-view of computer-aided mammography research generally assumes the promptingapproach, but other paradigms are also discussed.3.3 Image enhancementImage enhancement describes approaches that change the characteristics of im-ages to make them more amenable to other tasks (e.g. inspection by humans orfurther processing by computer). This includes noise suppression or equalisation,image magnification, grey-level manipulation (e.g. brightness and contrast im-provement) and feature enhancement or suppression. Generic image enhancementtechniques are well-established and are routinely used within more sophisticatedalgorithms.A commonly-used algorithm is histogram equalisation [168]. Histogram equali-sation attempts to modify the grey-level values in an input image such that thehistogram of those values matches a specified histogram, which is often flat. Ifa flat target histogram is specified, the result will be an image that uses theentire range of grey-levels, with increased contrast near maxima in the originalhistogram, and decreased contrast near minima. A possible problem with theapproach is that the image is modified based upon global image statistics, whichmight not be appropriate in local contexts. Local histogram modification tech-
  • Chapter 3—Image enhancement 62niques use local neighbourhoods, while adaptive histogram modification methodsuse local contextual information [168].Averaging filters replace pixel values with the average of those within a localneighbourhood. Using the mean tends to blur edges (as it is essentially a low-pass filter) while using the median does not. Bick et al. used median filtering toremove noise spikes [15, 168]. Lai et al. used a modified median filter where theset of pixels considered by the filter was restricted to exclude those that were toodissimilar to the pixel that the filter was centred on [116]. The approach achievedbetter edge preservation compared to the standard median filter. Such methodsare “coarse” in that they rarely have any model of the domain in which theyoperate (e.g. such filters might mistake film noise for small microcalcificationsbecause they have no “knowledge” about those two classes of image feature).Zwiggelaar et al. used a directional recursive median filter to construct mam-mographic feature descriptors [192] (see Section 3.8 and Chapter 4 for a moredetailed description of such descriptors).Grey-level values can be manipulated via the Fourier domain. For example,image smoothing can be used in an attempt to suppress noise by attenuatinghigh-frequency components [168]. However, methods that operate only in theFourier domain lack spatial information, and so important context may not beavailable. Wavelets address this problem as they can be used to describe imagesin terms of both space and frequency, and are commonly used in mammography.Wavelet analysis was used by Qian et al. [147] to enhance microcalcifications byselectively reconstructing a subset of the wavelet sub-band images. Comparedto Fourier methods, wavelets allow the characteristics of the signal(s) of interest
  • Chapter 3—Image enhancement 63to be specified more precisely. Wavelets were used in place of ad hoc texturefeatures by Campanini et al. [35] and used to statistically model mammographictexture by Sajda et al. [159] (see Section 3.7 for a more detailed discussion ofthese methods).In contrast to the frequency-based methods such as Fourier and wavelet analy-sis, mathematical morphology analyses images based upon the shape of imagefeatures. It can be used to remove image features of a given shape and size(e.g. Dengler et al. considered microcalcification candidates [55]). A possibleproblem with mathematical morphology is that a specification of shape is re-quired: image features that vary dramatically in shape may require very manysuch specifications, leading to implementation issues. A detailed discussion ofmathematical morphology can be found in Chapter 4.Noise equalisation is important because machine learning systems are under-pinned by statistical methods which often implicitly assume that the noise hasparticular characteristics. By equalising the noise, the properties of the imagedata are likely to be more closely matched to the assumptions made by the al-gorithms that operate on that data. Image noise in digitised mammograms maybe considered to vary as a function of grey-level pixel value [106, 166]. Smithet al. used a radiopaque step-wedge phantom to estimate this relationship in orderto correct the non-uniformity [166], but a phantom is likely to be a nuisance in ascreening environment. Karssemeijer and Veldkamp described noise equalisationtransforms where the noise is estimated from the image itself—rather than from aradiological phantom—using the standard deviation of local contrasts [106, 180].It was demonstrated that equalising the noise using the approach improved the
  • Chapter 3—Image enhancement 64performance of detection algorithms. This is likely to be due to the explanationgiven above.Highnam and Brady [88, 86] proposed a physics-based model of the mammo-graphic image acquisition process to convert digital mammograms to an imagerepresentation they call hint . In the hint representation pixel values represent thethickness of the “interesting” (non-fat) tissue. The technique relies upon knowingseveral parameters that describe the X-ray imaging process, such as the thicknessof the compressed breast, tube voltage, film type and exposure time. By modellingthe imaging process, the appearance under a set of “standard” imaging conditionscan be predicted, leading to the Standard Mammogram Form (SMF) [88]. It isnot always practical to measure the various imaging parameters during routinescreening and radiologists do not train with such standardised mammograms. Itseems likely that working with mammograms where pixel values represent tangi-ble quantities will lead to better detection algorithms, but digital mammogramsare not widely available in hint form. One of the goals of the eDiaMoND projectwas to make such data available to researchers (see Section 3.12) [23].The identification of curvilinear structures is useful in detecting and classifyingspiculated lesions (see Section 3.8). Cerneaz and Brady developed a physics-basedmodel that was used to model the expected attenuation of curvilinear structures[39]. The authors assumed that such structures are elliptical in cross-section andso would appear to have strong second derivative components in the image. Thesecond derivative was used to enhance candidate pixels and a skeletonisation al-gorithm [168] was used in further processing. Physics-based models would haveto be extremely complex and specific to properly explain the appearance of mam-
  • Chapter 3—Breast segmentation 65mograms. It therefore seems likely that approaches based upon image data itselfhave more potential. Most research on digital mammography has used this latterapproach.3.4 Breast segmentationThe identification of the breast border is a common task in digital mammog-raphy and the development of reliable automatic methods is important. Suchinformation is required to limit the search for abnormalities to the breast area(particularly when algorithms are computationally expensive), or so that someform of breast shape analysis can be performed (see Chapter 9 for an example).Locating the breast border is a non-trivial task due to the variation both betweenwomen and inherent to the X-ray acquisition process.Grey-level thresholding is a common approach to breast segmentation. Twothresholds are generally sought. The first discards pixels with low grey-levels,assuming them to belong to non-breast radiolucent objects (such as air). Thesecond discards pixels with high grey-levels, assuming them to belong to non-breast radiopaque objects (such as film markers). The selection of these thresh-olds is generally non-trivial, and other information such as shape is often alsoused. Byng et al. determined these thresholds manually [33]. They can also bedetermined by analysing the shape of the image histogram [42]. Although thresh-olding can provide an initial estimate of the boundary, the approach is generallyconfounded by features such as film markers, and much more sophisticated ap-proaches that have some model of what the segmented image should look like are
  • Chapter 3—Breast segmentation 66generally used.Chandrasekhar and Attikiouzel analysed the shape of the cumulative grey-levelimage histogram to identify a characteristic ‘knee’ which represents the bound-ary between background and breast tissue [42]. Adaptive thresholding yielded aninitial segmentation which was then modelled by polynomials. This segmentationwas subtracted from the original image and the result was thresholded, resultingin a binary image describing the breast and non-breast regions. Morphologicaloperations were used to remove artifacts arising from film scratches. An imple-mentation of Chandrasekhar and Attikiouzel’s algorithm was subjectively goodenough to approximately limit the operation of detection algorithms to the breastregion, but was not good enough to allow the shape of entire mammograms to bemodelled in the work described in Chapter 9.Lou et al. [121] quantised mammograms using k-means clustering and inspectedhorizontal slices through the quantised images to determine the direction of adecrease in pixel value. The direction was used to estimate the left-right orien-tation of the breast. Pixel values on the skin-air border were found to lay in oneof three quantised pixel values. This information was used to generate an initialestimate of the breast border. Actual mammogram pixel values were sampledalong normals to the initial estimate. Pixels values along normals to the breastborder will decrease from values associated with the edge of the breast to thoseassociated with the non-breast region. Linear models of pixel value as a func-tion of distance along the normals were used to refine the estimate of the breastborder. A rule-based search was then used to further refine the breast border.Finally, a B-spline was used to link and smooth the located breast border points.
  • Chapter 3—Breast segmentation 67The approach is sensible because the skin-air border should be relatively easy tomodel. However, a common confounding feature is the placement of film markers.These would pose an occlusion problem to methods that do not also use a modelof legal breast shape.The active shape model (ASM) [48] has been used in a number of medical and non-medical applications. An ASM models the statistical variation of shape associatedwith a particular class of object and uses a statistical model of pixel values alongnormals to the shape boundary to legally deform the model to fit to an objectin an image. Smith et al. used an ASM to locate the breast outline [165]. TheASM can therefore be viewed as a generalisation of the approach proposed byLou et al. [121]. The two main problems with the ASM are that it does not useall the image information in its search strategy and it requires an initialisationthat is already a good approximation to the final solution. The former wasrectified by the Active Appearance Model [47]. A better approach to breastborder segmentation might be to build a low resolution appearance model (similarto that described in Chapter 9) and then search over the model parameters to findthose that best describe a low resolution version of the mammogram in question.This would provide a low resolution estimate of the boundary. The estimatecould then be refined at high resolution using a model of the skin-air boundarytransition. Refinements could be propagated upwards to the low resolution modelwhere illegal (unlikely) refinements could be rejected.
  • Chapter 3—Breast density and risk estimation 683.5 Breast density and risk estimationPost-menopausal breast density is a high risk factor for breast cancer [4]. Also,because cancer develops from dense (glandular) tissue it may be masked in mam-mograms by normal dense tissue. Automatic assessment of the density of breastsand the risk associated with that density may be helpful to radiologists, particu-larly as automated methods can provide stable independent measurements, whilethere will be inherent variability in assessments made by humans.Byng et al. proposed a simple interactive approach where users of their systemselected grey-level thresholds to segment the breast region and dense tissue [33].The proportion of dense to total area was used as a measure of breast density. Theapproach is reasonable because the mammographic brightness indicates density,but it seem that a similar approach using the calibrated hint measure wouldbe more stable. Additionally, the manual selection of thresholds will introducevariation between and within users; a fully automated system could avoid suchproblems.Taylor et al. investigated sorting mammograms into fatty and dense sets usinga multi-resolution non-overlapping tile-based method. A number of statisticaland texture measures, computed for each tile, were evaluated and local skewnesswas found to best discriminate between the classes [175]. The reader is hereafterreferred to Section 3.15 for a discussion of ad hoc texture descriptors.Wolfe proposed that parenchymal patterns are related to breast cancer risk [185]and developed a radiological lexicon for describing the dense and fatty char-
  • Chapter 3—Breast density and risk estimation 69acteristics of mammograms, known as Wolfe grades. The relationship betweenparenchymal pattern and breast cancer risk has been confirmed by Boyd et al. [22]and van Gils et al. [178]. Tahoces et al. statistically modelled various texture de-scriptors to predict Wolfe grades [173].Caldwell et al. used fractal dimension—a measure of the complexity of a self-affine object—with mammographic images (considered as surfaces) to measuretextural characteristics. They classified mammograms by Wolfe grade, basedupon average fractal dimension and the difference between that average and thefractal dimension of a region near the nipple [34].Karssemeijer divided the breast into radial regions so that the distances to theskin line were approximately equal. Grey-level histograms were computed foreach region and the mean standard deviation and skewness were used to classifymammograms by Wolfe grade using a k-nearest neighbour classifier [107]. Thesuccess of the method can probably be attributed to the statistical characterisa-tion of the appearance of the mammograms.Zhou et al. used a rule-based method that classified mammograms according toprototypical characteristics in their grey-level histograms. This classification wasused to automatically select a threshold with which to segment the dense tissue.The proportion of dense to total breast area was then computed [189]. Detectinga well-understood feature in a 1-D function (the histogram) can be reasonablyeasy, although the approach is dependent upon the stability of these histogramcharacteristics.A Gaussian mixture model of texture descriptors, learned using the Expectation-
  • Chapter 3—Microcalcification detection 70Maximisation (EM) algorithm, was used by Zwiggelaar et al. to segment mammo-grams into six tissue classes [193]. The area of dense tissue—as segmented by themodel—as a proportion of total area was used in a k-nearest neighbour frameworkto classify mammograms into one of five density classes. Although learning thedistribution of texture features allows a principled statistical approach to be used,it is not clear that the clustering produced by the EM algorithm would necessarilycorrespond to a clustering that an expert might produce. Further, the EM algo-rithm aims to find the best fit of a model of a probability density function to thedata, rather than to partition the data (as Algorithm 3 in Chapter 5 explains, inthe EM algorithm every data point belongs to every model component, so thereis no actual partitioning). Dedicated clustering methods might have been moreappropriate. Gaussian mixture models are discussed in some detail in Chapter 5and a proof of the convergence property of the EM algorithm is presented inAppendix A.3.6 Microcalcification detectionMicrocalcifications are tiny (approximately 500 µm) specks of calcium. A clusterof microcalcifications can indicate the presence of an early cancer. Microcalci-fications can sometimes be detected easily as they can be much brighter thanthe surrounding tissue. However, small microcalcifications may appear to bevery similar to film or digitisation noise. Scratches on the mammographic filmcan sometimes be mistaken for bright microcalcifications, particularly by auto-mated methods. A mammogram containing an obvious microcalcification cluster
  • Chapter 3—Microcalcification detection 71is shown in Figure 3.1.Karssemeijer describes an iterative scheme for updating pixel labels, based uponthree local image descriptors (local contrast at two spatial resolutions and an es-timate of local orientation). Pre-processing was used to achieve noise equalisationusing information from a radiological phantom. A Markov random field modelwas used to model the spatial constraints between four pixel classes (background,microcalcifications, lines or edges, and film emulsion errors) and a final labellingwas achieved via iteration [105]. Local methods are appropriate for individualmicrocalcification detection because of their small size, but are inappropriate forcluster detection. Detecting clusters of microcalcifications is important becausetheir form contains important information about the cause of the cluster (e.g. ma-lignancy). In addition, it can be difficult to determine when Markov random fieldmodels have converged.Veldkamp et al. [179] classified microcalcification clusters as being malignant orbenign by estimating the likelihood of them being malignant. Individual micro-calcifications were detected using Karssemeijer’s method. Discs were then centredon each microcalcification and the boundaries of the intersection of the discs werecomputed. Microcalcifications were clustered according to which boundary theywere located within. The procedure was performed for both mediolateral-obliqueand cranio-caudal views, and correspondences were determined between clustersin each view. Features used for classification included the relative location of thecluster in the breast, measures of calcification distribution within the cluster andshape features. The likelihood of malignancy was computed as the ratio of thenumber of malignant to benign neighbours in the k-nearest neighbourhood. The
  • Chapter 3—Microcalcification detection 72Figure 3.1: An example microcalcification cluster.The location of the microcalcification cluster is indicated by the red circle. Thebottom left image shows a magnification of the cluster; the bottom right imageshows a histogram equalised version of the magnified cluster. Source: The mam-mographic image analysis society digital mammogram database [171].
  • Chapter 3—Microcalcification detection 73approach is sensible because it acknowledges that it is the clusters that are im-portant, includes information about the form of clusters and delivers a statisticalmeasure of the likelihood of malignancy.Bocchi et al. [18] designed a matched filter to enhance microcalcifications by as-suming a Gaussian model of microcalcifications and a fractal model of mammo-graphic background. A region growing algorithm was used to segment candidatemicrocalcification clusters and to describe the location of each candidate micro-calcification. An artificial neural network was used to discriminate between mi-crocalcifications and artifacts of the filtering stage. Segmented regions were char-acterised by fractal descriptors and these were used in a second artificial neuralnetwork to identify true clusters. The underlying assumptions of the approach—a Gaussian model of microcalcifications and a fractal model of mammographicbackground—while being reasonable models, are not true. A more realistic modelof these image features may have improved their results.False positive elimination was addressed by Ema et al. who used edge gradientsat signal-perimeter pixels to eliminate features such as noise or other artifacts[61]. Zhang et al. used a “shift-invariant” artificial neural network to segmentcandidate microcalcifications [188]. The size and “linearity” of candidate micro-calcifications were analysed to reject false positives due to vessels. Both of thesemethods implicitly attempt to model the neighbourhood around true microcalci-fications and direct modelling of that neighbourhood—such as that described inChapter 6—might be more appropriate.
  • Chapter 3—Masses 743.7 MassesMasses are abnormal growths and may be malignant or benign. Masses may ap-pear to be localised bright regions, but are often very similar in appearance to,and may be obscured by, normal glandular tissue. Detection and discriminationof masses can be difficult even for expert mammography radiologists. Malignantmasses are often characterised by linear features radiating from the mass, calledspicules, and we discuss methods for detecting and assessing spiculation in Sec-tion 3.8. A mammogram containing an obvious circumscribed mass is shown inFigure 3.2.A common approach to the detection and classification of masses is to determinecandidate mass regions and then compute descriptors for the region designedto allow discrimination between true and false detections. The problems thatresearch addresses is how candidate mass locations are found, which featuresshould be extracted and how they should be combined to yield a classification.Karssemeijer and te Brake compared two methods for segmenting masses [177].The first grew a region from a seed location, expanding the region if neighbour-ing pixels were above a certain threshold. The region growing was repeated usinga number of thresholds and the “best” region was selected using a maximumlikelihood method that considered the distribution of pixel grey-levels inside andoutside the region. The second method was a dynamic contour defined by a setof connected vertices, similar to the method proposed by Kass et al. [111]. Thevertices were accelerated towards the mass boundary using internal and externalforces. The internal forces served to encourage compactness and circularity of the
  • Chapter 3—Masses 75Figure 3.2: An example circumscribed mass.The location of the mass is indicated by the red circle. The bottom left image showsa magnification of the mass; the bottom right image shows a histogram equalisedversion of the mass. Source: The mammographic image analysis society digitalmammogram database [171].
  • Chapter 3—Masses 76region, while the external forces served to encourage the boundary to converge onstrong image gradients. A damping force was used to promote convergence. Theauthors report that the two methods produced segmentations that were similarto those of radiologists. The segmentations produced by the dynamic contourmodel allowed better discrimination between normal and abnormal regions whengeometric and texture features were used within an artificial neural network clas-sifier. Region growing methods generally only consider local neighbourhoods,and so segmentations can have illegal shapes. Dynamic contour methods dependupon the form of the forces used to constrain them. Equations relating the im-age content to the force applied to the vertices tend to be ad hoc in nature, andso it is easy for assumptions about the data to be implicitly included. It maybe more appropriate to learn the form of the constraining forces than to choosethem manually. Dynamic contour methods do not generally have any notion ofthe range of legal shapes that they may take. This is often a problem in caseswhere the objects of interest have prototypical characteristics (e.g. the shapes ofpeople’s hands), but is appropriate for objects such as mammographic masseswhere the shapes lack typical structure (i.e. have very high variability).Haralick et al. used texture descriptors—spatial grey-level dependence (SGLD)matrices (also called co-occurrence matrices)—to compute texture features [79].The (i, j)-th element of a SGLD matrix Sd,θ describes the number of pixels inthe input image with grey-level i that have a pixel with grey-level j at a distanceof d in direction θ. Petrosian et al. and Chan et al. computed statistics fromthese matrices to describe textural characteristics [141, 41]. These were used todiscriminate between textures associated with mass and non-mass regions. As the
  • Chapter 3—Masses 77number of grey-levels increases (i.e. as the number of bits used in the digitisationincreases), so does the size of the SGLD matrices. This leads to a problem similarto the “curse of dimensionality” (described in Section 5.3), where very much datais required to estimate the matrices that adequately characterise the texture. Thebit-depth of the images can be reduced to make the estimation tractable, but thiscan lead to a poor description of the texture.Brzakovic et al. segmented mass candidates using a multi-scale fuzzy method. Atextural descriptor was used within a hierarchy of classifiers that used thresholdsand Bayesian methods to classify the candidates as as malignant or benign [29].Wavelets were used by Campanini et al. to detect malignant masses [35]. Waveletdecompositions were computed on square windows extracted by “scanning” mam-mograms over a range of scales. A support vector machine classifier was trainedon the wavelet coefficients to classify the windows as malignant masses or normalregions. For a particular test mammogram, the initial output was a set of binaryimages, one at each scale. A majority voting scheme was employed to produce afinal classification. Support vector machines have proved to perform well in high-dimensional spaces and the authors rely on the ability of the learning system toextract useful features from the full descriptions provided by the wavelet coeffi-cients. This is reasonable because it removes the need to make explicit or implicitassumptions about which image characteristics are appropriate to extract.Sajda et al. present a generative statistical model of the appearance of mammo-graphic regions of interest. Wavelet coefficients, computed from mammographicpatches, were statistically modelled using a tree-structured variant of a hidden
  • Chapter 3—Spiculated lesions 78Markov model. In addition to being able to generate synthetic mammographictextures and compress mammographic images, the model can be used in an ana-lytical mode as an adjunct to a mass detection algorithm to reduce false positives.Models were trained on mass and non-mass regions of interest and used to com-pute likelihood ratios for test images [159, 169]. The method is discussed furtherin Section 6.2.3.8 Spiculated lesionsThe margin of a mass contains information that radiologists can use to charac-terise the mass. Margins can be described as circumscribed, obscured, lobulated,indistinct or spiculated. Spiculations (also called stellate distortions) are curvi-linear radial features and a strong sign of malignancy. Automated methods seekto classify the mass margin using either features that describe properties of themargin or by detecting and classifying spiculations directly. A mammogram con-taining an obvious spiculated lesion is shown in Figure 3.3.Scale-orientation pixel signatures1 corresponding to linear structures were statis-tically modelled by Zwiggelaar and Marti [191]. The model was then used toclassify pixels as belonging to linear structures or not. Pixel signatures are atype of texture feature and describe pixel neighbourhoods in terms of scale andorientation. Signatures taken from blob-like features are dissimilar to those takenfrom linear features. Modelling signatures from linear features is sensible as itallows the presence of such structures to be analysed in a statistically meaningful 1 Scale-orientation pixel signatures are presented in detail in Chapter 4.
  • Chapter 3—Spiculated lesions 79Figure 3.3: An example spiculated lesion.The location of the spiculated lesion is indicated by the red circle. The bottom leftimage shows a magnification of the spiculated lesion; the bottom right image showsa histogram equalised version of the spiculated lesion. Source: The mammographicimage analysis society digital mammogram database [171].
  • Chapter 3—Spiculated lesions 80way.Zwiggelaar et al. compared several approaches to the detection of linear struc-tures [190]. Two variants of a line operator were investigated that computes anorientation and strength for each pixel by computing the mean pixel value alongoriented lines centred on the pixel in question. Karssemeijer’s method [110] anda ridge detector designed to minimise the response to “blobs” were also used.The line operators were found to perform best. An approach to spicule detec-tion was investigated. Linear features were classified into their anatomical classeson the basis of their cross-sectional profiles. Noise was reduced using principalcomponents analysis and classification was achieved by assuming Gaussian mod-els of class conditional densities. However, in order to detect spiculated lesions,a method would be required to integrate knowledge from the classified linearstructures.Karssemeijer computed statistics for a circular region centred on each pixel inturn that described the concentration and radial uniformity of moderately stronggradients pointing towards the centre of the region. A continuous multi-resolutionscheme was used to match the feature extraction to the scale of the image fea-tures. These features were used in an artificial neural network to predict thelikelihood of suspiciousness [108]. Karssemeijer describes a method of comput-ing the orientation and strength of linear features by combining the responses tothree oriented Gaussian second derivate kernels [110]. While spicules do point inthe general direction of the central mass, they are often curved and so a methodthat could determine that a number of curvilinear structures—rather than justpixels with particular gradients—“point” towards a given area might be more
  • Chapter 3—Spiculated lesions 81successful.A linear discriminant was used by Mudigonda et al. to classify masses as malig-nant or benign using texture and gradient features extracted from SGLD matricescomputed from “ribbons” around mass borders [132]. One of the problems thatsuch a method would have is determining the correct region around the border,as some spicules can be quite short, while others can be relatively long.A spiculation descriptor was proposed by Huo et al. and evaluated for four typesof mass region [94]. Let I denote the pixels inside a segmented mass region and Odenote the pixels outside the segmented region but within the region of interestcontaining the mass candidate. Four types of region were investigated: I, O,O ∪ I and a region lying on the boundary of the segmented mass. The directionsof maximal gradient were computed for each pixel in the region and compared tothe direction defined by the line connecting the centre of gravity of the mass regionto the location of the pixel in question. Statistics computed from these measureswere used to describe the spiculation associated with the mass. The authorsreport that features computed from O and O ∪ I provided better estimates ofthe likelihood of malignancy than the other regions, but combining the measuresfrom all regions yielded the best performance. This method is similar to thatof Karssemeijer [108]; again, a method that could determine that a number ofcurvilinear structures—rather than just pixels with particular gradients—“point”towards a given area might be more successful.Evans et al. developed a statistical model of the characteristics of normal curvilin-ear features [62]. A multi-scale method was used to enhance locally bright ridge-
  • Chapter 3—Spiculated lesions 82like structures. Shape description features were computed for non-intersectingcurvilinear features which were then projected into a principal components space.The distribution of points in the principal components space was modelled us-ing a Gaussian mixture model. Such a method could be used within a noveltydetection scheme to detect abnormal curvilinear features.Sahiner et al. classified masses as malignant or benign using morphological andtextural features extracted from a region on the mass periphery. They usedan active contour (see [111, 177]) to segment mass candidates. Morphologicalfeatures (e.g. a Fourier descriptor, convexity and rectangularity measures) andtexture features extracted from SGLD matrices were used in a linear discriminantclassifier [158].A wavelet decomposition was used by Liu et al. to detect and classify masses [120].Orientation and magnitude features were extracted from each sub-band image andused within a binary classification tree that processed the features in a coarse tofine order according to the scale of the sub-band images. This allowed imagesto be efficiently processed, as positive mass detections were propagated fromcoarse levels, eliminating the need to process all pixels in all sub-band images.Median filtering was used on the final response images to reduce false positives.While classifiers like the support vector machine are now more common thanclassification trees, the approach taken could allow definitely normal features tobe ignored at little computational expense. However, there is a risk of an increasedfalse negative and positive rate if some of the available evidence is ignored.
  • Chapter 3—Asymmetry 833.9 AsymmetryRadiologists typically view mammograms as pairs of left and right breasts anduse information in each to help understand the appearance of the other. Anabnormality that is detected as a result of a difference between a pair of mammo-grams is called an asymmetry, although there is a distinction between a radiolog-ical asymmetry and a mathematical asymmetry. All pairs of mammograms aremathematically asymmetrical and this asymmetry may be quite marked whilestill being considered normal. Few computer-aided detection algorithms includeasymmetry information and almost certainly suffer as a result.Giger et al. used mathematical asymmetry to generate candidate mass locationsby registering pairs of breasts and performing bilateral subtraction. Geometricand texture features were extracted and used within an artificial neural network.The authors improved performance using temporal subtraction, which can beconsidered as another form of asymmetry [71]. A potential problem with thisapproach is that the texture analysed is created from a bilateral subtraction thatis due to a registration. If the behaviour of the registration algorithm is unstable(e.g. if it performs differently on different types of breast) then the texture dis-crimination task would be confounded. There is a fundamental problem in theassumption that dense correspondences can be obtained between a pair of mam-mograms, because structure may be missing from one or both mammograms(e.g. the pectoral muscle or nipple may not have been imaged).Miller and Astley [129] note that mathematical asymmetry—typically obtainedvia image registration and bilateral subtraction, which may introduce artificial
  • Chapter 3—Asymmetry 84asymmetry—is not a good model of radiological asymmetry. They propose mea-sures for three types of radiological asymmetry: shape, intensity and topology.Radiologists annotated the dense regions of mammograms and correspondencewas assumed between the largest such regions in each pair of mammograms.Bilateral differences in shape descriptors were used as shape asymmetry mea-sures. The authors also used the minimum cost of “transporting” grey-levelsfrom one reduced-resolution mammogram to the other—using the transportationalgorithm [89]—as a measure of intensity asymmetry. Topological asymmetrywas measured using the difference between area and binary moments. A lin-ear discriminant performed best when all three measures were combined. Theassumption that there is a correspondence between the largest annotated denseregions may not be correct because it may be possible for a dense region in onebreast to correspond to two (or more) such regions in the other. In addition,only considering the largest dense regions ignores the contribution to asymmetryfrom the other regions. Asymmetry can be a subtle sign of abnormality, and soapplying the transportation algorithm at low resolution may miss the more subtleasymmetries.Miller and Astley could only compute transportation cost for low-resolution mam-mograms because the solution to the transportation programming problem scalespoorly with the number of pixels and computing power was limited when judgedby today’s standards [129]. Board et al. revisited the transportation problemas an asymmetry measure and developed a multi-resolution transportation algo-rithm where solutions at low resolutions constrain the problems at higher reso-lutions, thus allowing only “plausible” transportations [17]. They used the mean
  • Chapter 3—Clinical decision support 85transportation cost per pixel to discriminate between normal and abnormal asym-metries and the per-pixel transportation cost to localise asymmetries. While thiswork addresses the problem that Miller and Astley faced, it is not clear whattransportation cost means in a statistical sense.3.10 Clinical decision supportClinical decision support refers to the use of computer technology to help clini-cians make clinically relevant decisions. While computer-aided detection (CADe)is concerned with fully-automatic methods that aim to draw the attention of ra-diologists to abnormalities they may have missed or to act as substitute indepen-dent second readers, clinical decision support—which may also be referred to ascomputer-aided diagnosis (CADi)—is concerned with the independent evaluationof clinical information to help clinicians reach diagnoses. The clinical informationis often provided by the radiologist, rather than being identified automatically bythe computer.Even simple clinically significant information can improve the performance ofCADe systems. Kilday et al. included patient age with more conventional shapeand texture features [113]. The inclusion of age increased the area under theROC curve for the system from 0.72 to 0.82 (see Section 3.11 for background onROC analysis). However, care must be taken when constructing such systemsso that a priori information does not dominate other evidence (e.g. it would beundesirable for a mammogram from a young woman with breast cancer to bemisclassified as normal on the basis that breast cancer is uncommon in that age
  • Chapter 3—Evaluation of computer-based methods 86group).Wu et al. trained an artificial neural network on features (e.g. presence of well-defined mass, presence of microcalcification, subtlety of distortions), rated byradiologists on a 10-point scale, from textbook cases. The trained system wasevaluated on clinical cases and the authors report that the system could discrim-inate between malignant and benign cases more accurately than attending andresident radiologists [187]. A similar approach was used by Floyd et al. [65].D’Orsi et al. developed a reporting scheme where radiologists recorded either themagnitude of a mammographic feature or a measure of their confidence in thepresence of the feature. Discriminant analysis was used to provide an estimateof the likelihood of malignancy [58]. A significant problem with these approachesis that the decision provided by the computer system is dependent upon humaninput. There is likely to be both inter- and intra-user variation, so such systemsmust be constructed to be robust to such error. In contrast to CADe systems,where is should be possible to quote a guaranteed minimum level of performance,no such guarantees can be made for CADi. In addition, it is likely that clinicianswould need to be trained how to use such systems and visual inspection and man-ual interaction are required (i.e. a CADi system could not act as an independentsecond reader).3.11 Evaluation of computer-based methodsComputer-aided mammography systems are generally designed to produce a mea-sure which can be used to make a binary decision about the presence (condition
  • Chapter 3—Evaluation of computer-based methods 87A) or absence (condition B ) of some characteristic. Often, there is locationinformation associated with the measure. Example outputs of computer-aidedmammography systems are: • A classification of a region of interest as malignant or benign. • An estimate of the likelihood of malignancy in a mammogram. • A pixel-wise classification of a mammogram into microcalcification and non- microcalcification classes. • A pixel-wise estimate of the likelihood of the presence of a malignant mass. • A pixel-wise segmentation of a mammogram into several tissue classes.A simple evaluation measure that can be used when binary classifications aremade is percent correct (e.g. ‘the system correctly detected 75% of the malignantmasses’ ). This measure describes the proportion of true positives (TP)—correctdetections of condition A. However, the measure does not tell us the number of: • false positives (FP)—incorrect detections of condition A; • true negatives (TN)—correct detections of condition B ; • false negatives (FN)—incorrect detections of condition B.Rather than being reported explicitly, these statistics are usually used to computesensitivity (the proportion of cases of condition A that are correctly identified)and specificity (the proportion of cases of condition B that are correctly identified)
  • Chapter 3—Evaluation of computer-based methods 88[140]. Formally, if nTP denotes the number of true positives, nFP denotes thenumber of false positives, nTN denotes the number of true negatives and nFNdenotes the number of false negatives, then sensitivity and specificity are definedas: nTP Sensitivity = (3.1) nTP + nFN nTN Specificity = (3.2) nTN + nFPA perfect detection or classification algorithm would have both sensitivity andspecificity equal to unity.When an algorithm produces less coarse measurements about the presence orabsence of the characteristic in question (e.g. on a continuous scale), richer de-scriptions of the performance of the algorithm can be produced. Sensitivity andspecificity can be computed at each of a number of thresholds. These can be plot-ted on the unit plane to form a receiver operating characteristic 2 (ROC) curve,with 1 − specificity plotted on the abscissa (i.e. the “x-axis”) and sensitivityplotted on the ordinate (i.e. the “y-axis”). The diagonal line defined by sensitivity = 1 − specificity (3.3)(i.e. y = x) represents the performance of a random classifier. The ROC curve de-scribes the trade-off between sensitivity and specificity when a particular thresh-old (an operating point) is selected to discriminate between the two classes. Ide- 2 Receiver operating characteristic analysis is named after the RADAR receiver operators ofthe second world war [46].
  • Chapter 3—Image databases 89ally, the ROC curve would enclose the unit plane perfectly, and the area underthe curve would be unity. The area under the ROC curve is commonly used tosummarise the ROC curve, and is usually given the symbol Az . An example ofa desirable ROC curve is shown in Figure 7.3 and an example of an undesirableROC curve is shown in Figure 7.4.4.Variants of ROC analysis were developed to allow localisation information toform part of the analysis (e.g. FROC [30], LROC [170], AFROC [40]). Theabscissa of a FROC curve shows the number of false positives per image andthe corresponding proportion of correct detections with correct localisation isplotted on the ordinate. FROC curves are highly sensitive to the criteria usedto determine suitable localisation. ROC and FROC analysis are commonly usedin the computer-aided mammography literature and the reader is directed toMetz for a detailed exposition on experimental design issues and performanceevaluation in computer-aided mammography [126, 127].3.12 Image databasesComparing published results is not meaningful unless one can be sure that thesame data and evaluation criteria were used. It is commonplace for authors toevaluate their algorithms using data, obtained from radiologist colleagues, whichis not made available to other investigators. This is often perfectly justified,for example when data with particular characteristics is required, or when ethicalapproval or confidentiality agreements prohibit the dissemination of patient data.In the majority of cases, however, the use of publicly-available data should be
  • Chapter 3—Image databases 90preferred, to more easily allow results to be compared and experiments to bereplicated. Efforts were made to establish common datasets in the 1990s, whenseveral research groups compiled databases and made them available to otherinvestigators. Data was originally distributed via physical media (e.g. CD-ROM,magnetic tape) or the Internet, but as persistent storage capacity has increasedand Internet connectivity is approaching ubiquity, the Internet has become thedominant means of distributing data to investigators.The UK Mammographic Image Analysis Society’s (MIAS) database [171, 128]contains 161 pairs of MLO views. The database contains examples of normalmammograms and common types of abnormality. The images were digitised at50 µm per pixel at 8 bits per pixel. The images were obtained from a singleUK screening centre, and the database includes all breast types (e.g. fatty, fatty-glandular, dense). Groundtruth was annotated by a radiologist and consists oflocation coordinates and radii which specify regions containing abnormalities.The authors say that the images were ‘carefully selected. . . to be of the high-est quality of exposure and patient positioning’ ; most papers publicising digitalmammogram databases make similar claims. A reduced-resolution version of thedatabase—called the mini-MIAS database—is also available [130].The University of South Florida’s Digital Database for Screening Mammography(DDSM) [83] contains 2 620 cases with 4 films per case, taken from screeningexaminations. The images were obtained from a number of sites (the Universityof South Florida, Massachusetts General Hospital, Sandia National Laboratoriesand Washington University School of Medicine). The images were compressedusing a lossless variant of the JPEG image format and software is provided to
  • Chapter 3—Image databases 91decode data in this format. In addition to the image data, the database con-tains patient age, examination and digitisation dates and American College ofRadiologist (ACR) and Breast Imaging Reporting and Data System (BI-RADS)annotations. The database is available via the Internet [52].The Internet is the most suitable medium for advertising and distributing imagedatabases because it allows data to be accessed on demand and at low cost byanyone in the world with a suitable Internet connection. A comparison of thedatabases publicised in the literature with those advertised or made available viathe Internet reveals that several databases have not been adequately maintained.These include: the Lawrence Livermore National Laboratory/University of Cal-ifornia, San Francisco (LLNL/UCSF) database [123], the PRISM/PROMAMdatabase, the University of Chicago/University of North Carolina (Chapel Hill)database [135] and the University of Washington database (although this appearsto have been included in the DDSM).The UK Diagnostic Mammography National Database (eDiaMoND) project [23],was a research collaboration between academia, clinicians and industry thataimed to investigate the use of “grid” technologies to improve the efficiency ofthe NHS Breast Screening Programme by enabling access to image data throughdigitisation and to aid training, epidemiology and computer aided detection ef-forts. The project aimed to make data available to its users in both traditionaland Standard Mammogram Form (SMF) formats [88]. However, it appears thatblanket ethical approval to allow researchers to use the data for arbitrary re-search has not been obtained and so there is currently no open access to thedata. However, ethical approval may be given to specific projects. The European
  • Chapter 3—Image databases 92MammoGrid project has similar aims [3] to the eDiaMoND project.Although the DDSM is recognised as being the premier database for computer-aided mammography, the image data is compressed using a lossless variant of theJPEG format, which is not widely supported. A second problem is the relativelypoor annotation. Mass regions are outlined, but only the areas containing micro-calcifications and spicules are given—individual microcalcifications and spiculesare not annotated.An “ideal” database of digital mammograms for computer-aided mammographyresearch would have some or all of the following characteristics: • Ethical approval of and patient consent for all possible useful research in which the database could be used. • Safeguards to ensure patient confidentiality and anonymity. • Grouping by patient, with current and prior cases, with four views per case. • Enough cases that statistically significant results could be obtained. • Patients should be sampled from several clinical centres. • Mammograms should not be excluded on the basis of substandard image ac- quisition (unless, perhaps, a radiologist would discard the mammogram and ask for the patient to be recalled for better mammograms to be obtained). • Representation of all classes of mammograms: – Normal and abnormal cases.
  • Chapter 3—Image databases 93 – Inclusion of all clinically significant abnormalities (microcalcifications, masses, spiculated lesions, architectural distortions and asymmetries). – All types of breast (e.g. fatty, dense). – Data should be collected from both asymptomatic and symptomatic women. • Pixel-level annotation by several radiologists so that groundtruth likelihood of abnormality could be estimated. • Inclusion of clinical information relevant to breast cancer risk (e.g. patient age, family history of breast cancer, socioeconomic status). • Image acquisition and digitisation parameters. • Identified subsets of abnormality (e.g. mass subset, microcalcification sub- set), so that component algorithms could be tested separately. • Specifications and implementations of a set of common evaluation strategies, so that results in published work can be compared directly.The “ideal” database described above would require significant resources to buildand maintain, however the lack of a database with these—or similar—characteristics(and the lack of standardised evaluation protocols) is an impediment to the field.With the recent advent of web and “grid” services, it should be possible to providenot only mammographic image data via the Internet, but to facilitate standard-ised evaluation of computer-aided mammography algorithms. Digitised mammo-grams could be requested from a data provider, locally analysed, and algorithm
  • Chapter 3—Commercial systems 94output submitted to an evaluation service provider which would return the evalu-ation results (e.g. ROC curve data). These results would be directly comparableto others generated by the same service provider.3.13 Commercial systemsThere have been several attempts to develop and market commercial computer-aided mammography systems. The reader is directed to [72] for a detailed historyof commercialisation efforts. It is common for medical devices to be marketedin the USA first, and doing so requires pre-market approval (PMA) from theUS Food and Drug Administration (FDA). For devices such as CADe systems,PMA requires that the device does not significantly increase the callback rate(especially for biopsies) and is capable of correctly identifying areas associatedwith cancer. PMA is not concerned with value for money or the impact a devicehas on work-flow. Instead, PMA is a certificate of safety, rather than of clinicaleffectiveness or efficiency. FDA PMA is judged in terms of the mammographylandscape in the USA, which differs from that of the UK (e.g. in the USA theage range of women undergoing screening is wider, the screening population isself-referred and the screening interval is one year [72]). Claims made abouta CADe system with respect to FDA PMA do not automatically apply to theUK. Nevertheless, we will restrict the discussion of commercial systems to thosewhich have obtained FDA PMA (although VuCOMP expects FDA approval forits M-Vu system in 2005 [181]).There are currently four commercial CADe systems for mammography that have
  • Chapter 3—Commercial systems 95obtained FDA PMA: the ImageChecker by R2 Technology Incorporated [150],Second Look by iCAD Incorporated [99], the KODAK Mammography CAD Sys-tem by the Eastman Kodak Company [114] and the Senographe 2000D systemby the General Electric Company [69]. General Electric license the ImageCheckersoftware for their Senographe system. The KODAK Mammography CAD Sys-tem has only recently been given FDA PMA (late 2004) and no evaluations ofthe technology have been published in the literature. We will therefore restrictour discussion to the ImageChecker and Second Look systems.The ImageChecker system obtained PMA in 1998 and the FDA has since grantedPMA for several improvements to the system. The system uses algorithms devel-oped by Nico Karssemeijer and collaborators and displays mass and microcalci-fication prompts on a computer monitor. There has been extensive evaluation ofthe system in the USA and Europe.R2 Technology Incorporated claim that version 8.0 of their ImageChecker algo-rithm achieves ‘1.5 false positive marks per normal case at the 91 percent sensitiv-ity level’ [149]. The system costs approximately £108 000 and an annual servicecontract costs approximately £10 000 (ca. 2001, [72]).The reader is referred to [72] for a discussion of evaluations performed on theImageChecker system and to Section 3.14 for a discussion of evaluations of theImageCheckersystem for prompting. Astley et al. compared the ImageCheckersystem to non-medical readers [9] for pre-screening. 900 cases containing fourfilms per case (10% containing cancers) were read by 6 trained but non-medicalreaders and the ImageChecker system. The ImageChecker failed to mark 3 of the
  • Chapter 3—Commercial systems 96cancers, while the non-medical readers failed to mark between 4 and 21 cancers.The best non-medical readers had false positive rates of 33% and 44% while theImageChecker system had a false positive rate of around 69%. It took the non-medical readers an average of 40 s to read a case, while the ImageChecker systemtook an average of 318 s.The Second Look system obtained PMA in 2002 and the FDA has since grantedPMA for several improvements to the system. The system uses algorithms devel-oped by Steven Rogers, a retired airborne weapons specialist, and displays massand microcalcification cluster prompts on a paper printout [72]. The Second Look700 system costs $139 950 [98] (approximately £72 870).Astley et al. evaluated the Second Look system in a UK screening environment[8]. They report that the false positive rate was 1.43 per image on normal mam-mograms and 1.22 per image when averaged over both normal and abnormalmammograms. The system could correctly identify 73.8% of abnormalities (risingto 83.3% when both MLO and CC views were available). The authors simulatedclinical use of the system. 790 cases were read by 3 radiologists and 1 radio-grapher, with and without prompting. No significant differences in recall rateor reading time were found (the radiographer was faster with prompting, whilethe radiologists took longer). The authors note that technical problems withthe system (e.g. reduced throughput due to problems with stick-on film labelsand failures caused by static electricity in the reading room) would require theemployment of an additional administrator and would delay reading by a day.An evaluation of the Second Look system’s ability to detect early cancers wasperformed. Current and prior films were studied from a normal control group
  • Chapter 3—Prompting 97and from a group for whom cancer was identified in the current films. The radi-ologists were asked to identify cancers in the prior films without and then withthe current films. The radiologists identified 10% and then 14.4% of the cancers.The Second Look system identified 27.8% of the cancers in the prior films. Thissuggests that, on early cancers, the system can perform better than radiologists.3.14 PromptingThe prompting model for computer-aided mammography is predicated on theassumption that prompts will help radiologists. Research into prompting seeksto determine if, and under what circumstances, this assumption is valid. Thereare essentially two types of prompting research: • The psychophysical aspects of prompting. Participants generally perform image interpretation tasks in synthetic environments. • Evaluation of radiologist performance when CADe systems are used.Hutt et al. investigated the effect of erroneous prompts on radiologist perfor-mance. Seven radiologists viewed 48 digitised mammographic regions of interestwith and without microcalcification clusters. Prompts were placed on the imagesand the error rate was varied. The authors report that prompting was only ef-fective when the false positive rate was low (approximately 0.5 false prompts perimage) [96]. A screening environment was simulated and 6 radiologists viewed100 films containing normal and abnormal mammograms with single or mul-tiple abnormalities. The mammograms were read with and without prompts,
  • Chapter 3—Prompting 98with the false prompt rate set at approximately 1.1 per image. The radiologistsperformed better in the prompted condition. In prompted cases where the radi-ologists missed abnormalities, the films had no prompt on the real abnormalityand a false prompt elsewhere. This work suggests that, not only is the falsepositive rate important, but incorrect prompts can distract radiologists from realabnormalities.Hutt’s PhD thesis presents a larger version of the experiment reported by Huttet al. [96]. Prompted and unprompted mammograms were read by 30 radiolo-gists from 11 UK screening centres. The results suggest that prompting can beexpected to be successful if the number of false positives does not exceed thenumber of true positives by more than 50%. Hutt suggests that, given the overrepresentation of abnormal mammograms in the test set, this relationship shouldbe revised downwards to a true to false positive ratio of approximately unity.This ratio was confirmed by Astley et al. in a psychophysical experiment thatused simulated abnormalities and non-medical readers [7]. Given that only 5% ofscreening mammograms have any form of abnormality, a prompting system thatgenerates true and false positives with equal probability will on average generatea false positive no more than once in 20 cases. If we assume 4 images per case,then this equates to 1 false positive in 80 images (or, 0.0125 false positives perimage). By comparison, R2 Technology Incorporated claim that version 8.0 oftheir ImageChecker algorithm achieves ‘1.5 false positive marks per normal caseat the 91 percent sensitivity level’ [149]. However, it should be noted that thepsychophysical experiments reported by Hutt et al. were not conducted in clinicalsettings and relatively few radiologists and images were used, limiting the validity
  • Chapter 3—Prompting 99of generalising their results.Giger et al. evaluated the usefulness of an “intelligent search” mammographyworkstation [70]. Upon presentation of an unknown case, the workstation outputan estimate of malignancy (based upon an automatic segmentation algorithm andan artificial neural network using geometric and texture features), images—froman atlas—of lesions that were deemed to be similar and graphics illustrating thecharacteristics of the presented lesion relative to those in the atlas. The usercould search for similar lesions using various criteria. Users could interactivelyalter the image contrast and magnify the mammograms. A set of 50 normal and50 mass images were viewed by 5 radiologists with and without the workstation.The authors report an improvement when the workstation was used (Az of 0.90with the workstation compared to 0.86 without). This work suggests that allowingradiologists to manipulate the digital images and compare them to other cases canimprove radiologist performance, but the paper does not analyse which aspectsof the system were most effective.Karssemeijer et al. investigated single and double reading by radiologists and sin-gle reading with prompting [109]. A set of 10 expert radiologists read 500 cases,where half contained cancers, and estimated the likelihood of malignancy. Theimages were also analysed using the ImageChecker system and the suspiciousnessrating of each prompt was recorded. Double reading was simulated by combin-ing annotations from each possible pairing of the 10 radiologists using a promptproximity rule. Reading with CADe was simulated using a similar approach. Theperformance of the three types of reading was assessed using the mean sensitiv-ity in the region of the ROC curve representing false positive rates lower than
  • Chapter 3—Prompting 10010%. This figure was chosen as the false positive rate of screening in the USA isapproximately 8% and is between 1% and 4% in Europe. For single reading, themean sensitivity was 39.4%. For simulated double reading, the mean sensitivitywas 49.9%. For simulated reading with CADe the mean sensitivity was 46.4%.Gur et al. prospectively assessed the impact of CADe on patient recall and cancerdetection rates in a clinical setting [76]. A set of 115 571 mammograms wasdivided into two almost equal sets which were read by 24 radiologists with andwithout prompts generated by the ImageChecker system. No significant increasein recall or detection rates were found when CADe was used. However, theconfidence intervals associated with recall and detection rates were large enoughto be consistent with the possibility of large improvements when CADe is used,due to the relatively low number of cancers detected with and without CADe andthe large inter-reader variability among the radiologists. Additionally, during theperiod of the study, the percentage of women who were screened for the firsttime decreased from 40% to 30%. On average, first screening rounds have higherrecall rates than subsequent rounds and so cancers detected in first rounds maybe considered “easier”. However, the authors found there to be no statisticallysignificant trend in detection rates over time. The authors conclude that, if theirresults were not due to chance, current CADe systems are not suitable for use byexpert screening mammography radiologists.Freer and Ulissey conducted a large prospective study of the effect of CADe onrecall rate, positive predictive value for biopsy, cancer detection rate and the stageof detected cancers [67]. 12 860 screening mammograms were interpreted firstwithout the assistance of CADe, and then immediately after with the assistance
  • Chapter 3—Discussion 101of the ImageChecker CADe system. The authors report that use of the CADesystem for prompting resulted in an increase in recall rate (from 6.5% to 7.7%),no change in positive predictive value for biopsy, an increase of 19.5% in thenumber of cancers detected and an increase in the number of early stage cancersdetected (from 73% to 78%). However, the authors caution that the relativelylow median age of the screening population (49 years) imposes limitations on thestatistical significance of the above observations.Warren Burhenne et al. retrospectively studied the ability of the ImageCheckersystem to identify cancers missed by radiologists [32]. 1 083 mammograms thatled to biopsy-proved cancers and their available prior mammograms were collectedfrom 13 centres. The CADe system was able to identify 77% of the cancers thatwere originally missed by radiologists, without a statistically significant increasein recall rate. The research suggests that the ImageChecker system could have adramatic effect on the early detection of breast cancer.3.15 DiscussionOne of the earliest papers on computer-aided mammography was written byAckerman and Gose [1] in 1972. The authors aimed to classify low-resolutiondigitised photographs of regions of mammograms as malignant or benign usingautomatically-extracted features (measures of calcification, spiculation, rough-ness and the area-to-perimeter ratio). Classification was attempted using a mul-tivariate Gaussian model and nearest neighbour classification, the latter of whichwas found to perform best. While computers, image digitisation technology and
  • Chapter 3—Discussion 102machine learning algorithms have developed significantly since the paper waspublished, the approach to computer-aided mammography has not.This approach can be stated as follows: ad hoc features are extracted from seg-mented regions and classified into clinically significant classes. The classificationstage is informed by the wider pattern recognition, machine learning and statis-tical decision theory communities. The segmentation step typically uses ad hocalgorithms. Often, an attempt to insert human expertise into the system is madeby choosing features that describe characteristics that radiologists report to beimportant. Whilst the above approach is usually reported to be successful, adhoc methods risk the accidental adoption of assumptions about the data. Theresult of this problem may be that CADe methods perform well on the originalinvestigator’s data, but do not work as well on other data.Many methods that use classifiers produce “probability” images, which are laterthresholded to obtain a final classification. These images are typically not trueprobability or likelihood images and are simply images with probability-like val-ues. This distinction is important, because accurate quantitative descriptionsmay be more useful to clinicians than qualitative (classification) descriptions,and could be used in further statistical analyses. Measurements that describethe state or change in anatomy may also be clinically useful. For example, it isprobably more meaningful to report that a tumour in a mammogram has likelyincreased in volume by 20% since the last screening session, than to simply saythat an area is suspicious.Evaluation criteria are often optimistic. In LROC analysis, for example, the
  • Chapter 3—Discussion 103selection of forgiving localisation criteria can give an inaccurate assessment of theperformance of algorithms. This could be rectified by the adoption of standarddatabases and assessment criteria. This would have the additional benefit ofallowing meaningful comparison of results in the literature.The research of Hutt et al. suggested that CADe would only result in a significantimprovement in radiologist performance if the number of false positives can bereduced to 0.0125 per image. Research has indicated that radiologist performancecan be improved by CADe algorithms that have false prompt rates substantiallyhigher than that target [67, 32]. However, it has been shown that current commer-cial CADe systems can fail to improve radiologist performance [76], so loweringthe false positive rate to a level at which significant improvement can be expectedis highly desirable. Reducing the false positive rate while maintaining sensitivitywill be a significant challenge. The hypothesis promoted in this thesis is thatthis kind of improvement can only be achieved by systems that understand theappearance of mammograms.Abnormality in mammograms manifests itself in a number of ways, but mostCADe methods target only one of these classes of abnormality; microcalcificationclusters, masses and spiculated lesions are most commonly chosen. A better ap-proach would be to develop a single method that can identify all (or many) ofthe common types of abnormality. It is not immediately clear how this mightbe achieved, because the appearances of the various forms of abnormality are sodifferent. However, there is commonality between all types of abnormal mammo-graphic appearance: none of them are found in normal mammograms. A methodthat could detect deviation from normality should be able to identify all forms of
  • Chapter 3—Discussion 104abnormality. This approach is called novelty detection.Novelty detection uses a model of the class of interest that allows novel instancesto be identified. Statistical models serve this purpose well, because deviationfrom normality can be measured in a meaningful way within a rigorous mathe-matical framework. Further, generative statistical models—such as active shapeand appearance models [47]—allow synthetic instances of the class of interestto be generated. This allows the specificity and generality of the model to beassessed. A specific model is one that models only legal instances of the classof interest and a general model is one that models all possible instances of theclass of interest. A good model would be both specific and general. The aim ofthe work presented in this thesis is to investigate generative statistical models ofmammographic appearance. The ultimate aim is to perform CADe by noveltydetection.Novelty detection has previously been applied to computer-aided mammography.Tarassenko et al. identified masses using a novelty detection method. Geometricaland textural features were extracted from pre-processed mammograms. A Parzenwindow density estimator (see Section 5.3) was used to model the distribution offeature vectors extracted from normal tissue. The method identified all massesin a test set of 40 images at a false positive rate of 1 per image [174]. Holmesused an adaptive kernel density estimator to learn the distribution of transformedscale-orientation pixel signatures taken from normal tissue (see Chapter 4 for adetailed discussion of pixel signatures). The transformation to a low-dimensionalspace allowed Euclidean distance to approximate a sophisticated robust metric.Holmes performed novelty detection by computing the likelihood of signatures
  • Chapter 3—Summary 105under the model to produce likelihood images. Subjectively, the likelihood valuesappeared to allow pixels belonging to normal tissue to be discriminated from thosebelonging to spiculated lesions, though no quantitative evaluation of the methodwas performed [90]. However, neither of these methods employed generativemodels.3.16 SummaryThis chapter presented a review of the computer-aided mammography literature.In summary: • The typical approach taken to CADe is to classify shape and texture fea- tures, extracted from candidate locations, into clinically significant classes. It can be difficult to justify exactly why one set of features is better than another and to explain what they correspond to in terms of the clinical situation. Features are typically tuned to a specific sign of abnormality, so each indicative sign requires a different algorithm. • The lack of standardised evaluation methods, training and test sets makes it very difficult to compare published results. • Commercial systems are available and have been shown to improve radi- ologist performance; however, they can also fail to improve performance. Psychophysical research has suggested that a false positive rate much lower than that achieved by current commercial systems is required for signif- icant improvement in radiologist performance. Much more sophisticated
  • Chapter 3—Summary 106 approaches may be required to achieve such targets. • One such approach may be novelty detection, where all forms of abnormality should be able to be detected and quantified within a rigorous mathematical framework. Novelty detection requires a model of the appearance of normal mammograms that allows deviation from normality to be measured.
  • Chapter 4Scale-orientation pixel signatures4.1 IntroductionThis chapter presents some work on improving an existing method for describinglocal image structure in terms of scale and orientation. The chapter presents: • Background information on mathematical morphology and its use in com- puting scale-orientation pixel signatures. • An analysis which identifies two flaws in an existing implementation and proposes how these problems can be rectified. • An information theoretic method for comparing the old and new pixel sig- natures. • A classification experiment to compare the two approaches. 107
  • Chapter 4—Mathematical morphology 1084.2 Mathematical morphologyImage and signal processing has commonly been thought about in terms of fre-quency (e.g. Fourier analysis; wavelet analysis uses positional information in ad-dition to frequency information [49, 122]). Mathematical morphology approachesimage and signal processing in terms of shape 1 . One of the attractions of mor-phological processing is that image features can be targeted for processing withoutaltering the rest of the image (e.g. small features can be removed from images,leaving edges and grey-levels untouched). We will present two fundamental mor-phological operators and show how they can be combined to perform two otherclasses of morphological operation.Morphological operators can be defined for simple 1-D signals, 2-D images ormore complex signals. We will restrict our discussion to the 2-D image plane.The operators we shall discuss are binary operators, meaning that they take twoinput objects and return a single output. One of these input objects is the imagematrix to be processed and the other is an object called a structuring element,which allows the operations to be tuned to specific sizes and shapes of feature. Astructuring element is simply a shape and can be represented by a set of vectorsthat specify offsets from some origin2 . The structuring element can be visualisedby plotting each offset in the set in an image plane. A simple structuring element 1 A thorough presentation is given by Serra and Matheron [161, 162, 124], though the readeris directed to Sonka et al. [168] for an introduction to mathematical morphology as it relatesto the work here. 2 Although the grey-scale definitions of the following operators can use structuring elementsthat have associated grey-levels, this is not of interest in this work.
  • Chapter 4—Mathematical morphology 109is shown in Figure 4.1(b) and corresponds to the following point set 3 : S = {(0, 0), (0, 1)}. (4.1)4.2.1 Dilation and erosionDilation and erosion are the fundamental morphological operators. Let f (x)be a function that describes a grey-level image. Further, let ti be an offsetand S = ti : i = 1, . . . , N be a structuring element as described above. Thedilation of f (x) by S is given by: f (x) ⊕ S = max{f (x − t)}. (4.2) t∈SAn example is given in Figure 4.1. Figure 4.1(a) shows a binary image of theletter E, where the background has a value of zero and the foreground has a valueof one. Figure 4.1(b) shows the structuring element defined by Equation 4.1.Figure 4.1(c) shows the dilation of the image of the letter E by the structuringelement. The figure illustrates how dilation removes intensity troughs that aresmaller than the structuring element. Dilation increases the object size and canbe used to fill gaps.The dual of dilation is erosion, which is defined as: f (x) S = min{f (x − t)}. (4.3) t∈S 3 Note that we use (r, c)—row, column—indexing, as opposed to (x, y) indexing.
  • Chapter 4—Mathematical morphology 110 (a) (b) (c)Figure 4.1: Dilation.A binary image matrix is shown in (a). It is dilated by the structuring elementshown in (b). The result is shown in (c).Erosion removes intensity peaks that are smaller than the structuring element.4.2.2 Opening and closingDilation and erosion can be used to remove image features, but they change theglobal appearance of the image (the object in Figure 4.1 is made larger by dilationand would be made smaller by erosion). Dilation and erosion can be combinedso that targeted features are removed without changing the global appearance ofthe image. These combinations are called the opening and closing operators, andare respectively defined as: f ◦ S = (f S) ⊕ S, (4.4) f • S = (f ⊕ S) S, (4.5)where we drop the image indexing for simplicity. Opening and closing respectivelyremove intensity peaks and troughs that are smaller than the structuring element,without altering the global image appearance. They are idempotent operators,
  • Chapter 4—Mathematical morphology 111which means that successive applications of the same operation do not alter theprevious result.4.2.3 M- and N-filtersOpening and closing allows intensity peaks and troughs to be removed withoutaltering the global image appearance, but are tuned to the polarity of the featureson which they operate. Combining an opening and a closing is called sieving[11]. Sieves remove image features that are smaller than the structuring element,irrespective of the feature’s polarity. Two sieves—called M- and N-filters—arerespectively defined as: f S = (f ◦ S) • S, (4.6) f S = (f • S) ◦ S. (4.7)An example of grey-level sieving is shown in Figure 4.2. A mammographic regionof interest is sieved using a rectangular structuring element oriented at approx-imately 45◦ . The figure shows how image structure that is smaller than theoriented structuring element is removed.
  • Chapter 4—Pixel signatures 112 (a) (b)Figure 4.2: A sieved mammographic image.Image (a) is a mammographic region of interest around a spiculated lesion. Image(b) shows the result of sieving image (a) with a rectangular structuring element,oriented at approximately 45◦ . The structuring element is shown in red in thetop-right corner of (b).4.3 Pixel signatures4.3.1 Local scale-orientation descriptorsPixel signatures are rich feature descriptors of local image structure that are ex-pressed in terms of scale and orientation. Describing mammographic features interms of scale and orientation is useful for a number of reasons. Mammogramscontain features that have an associated orientation (e.g. curvilinear structures)and which do not have a particular orientation (e.g. circumscribed masses); thesefeatures may exist over a range of scales. Radiologists often talk about mam-mographic features in terms of scale and orientation (e.g. features that ‘point’towards the nipple or ‘radiate’ from a particular location). Further, it is known
  • Chapter 4—Pixel signatures 113that the mammalian primary visual cortex explicitly encodes visual informationin terms of scale and orientation (see [183] for a discussion of the work of Hubel[93] and Wiesel [184]).The pixel signatures discussed in this thesis are developed from those described byHolmes [90], which used M-filters, and these were in turn developed from thosedescribed by Zwiggelaar et al. [192], which used directional recursive medianfilters4 .4.3.2 Constructing pixel signaturesFor a given input image, a scale-orientation pixel signature is computed at eachpixel location as follows. A set of sieved images are generated from the input im-age by sieving it with structuring elements at a number of scales and orientations.The pixel signatures used by Holmes et al. [91, 92, 90] were computed using aBresenham line structuring element [26]. Each signature is a 2-D array where therows are measurements for the same scale and the columns are measurements forthe same orientation (see Figure 4.3).Formally, let f (x) be a grey-scale image. f (x) is sieved using a set of structuringelements {Sσ,φ }, where σ indexes scale and φ indexes orientation. The result isa set of grey-scale images {sσ,φ (x)}. The value at (σ, φ) in the pixel signature 4 The principal advantage of using morphological operators is that there is an efficient wayto perform erosion and dilation [167], however today’s desktop computers can construct pixelsignatures reasonably quickly using a na¨ implementation. ıve
  • Chapter 4—Pixel signatures 114Figure 4.3: Example pixel signatures.Pixel signatures taken from the centres of Gaussian blob and line images.associated with location x in f (x) is given by ρ(x, σ, φ) = sσ−1,φ (x) − sσ,φ (x). (4.8)Stated simply, for a particular image pixel and a given scale and orientation, thesignature value is the grey-level difference between the pixel value in the sievedimages at the previous and current scales.Figure 4.3 shows two pixel signatures taken from the centres of two syntheticimages. One image is a Gaussian blob and the other is a Gaussian line. Thesignature for the Gaussian blob shows approximately uniform scale which is in-dependent of orientation5 . The signature for the Gaussian line shows that as onelooks across the line it appears to have a limited scale, but when one looks alongthe line it appears to be much larger. Pixel signatures from non-trivial images arenot as simple to interpret and are intended to be used as feature vectors withina machine learning framework such as a classifier.In the previously reported work [91, 92] pixel signatures were generated for 12 5 We will see later that a limitation in the implementation results in the non-ideal behaviourin the signature for the Gaussian blob in Figure 4.3.
  • Chapter 4—Pixel signatures 115regularly-spaced orientations and 10 scales—ranging from 150 µm to 2 cm. Thesescales encompass image features that we would like to measure, from microcalci-fications to small masses. The scales increase logarithmically to give preferentialsampling resolution to small features. We use the same scheme in the researchpresented in this chapter.4.3.3 Metric propertiesAlthough pixel signatures give a rich local description of image structure, the Eu-clidean distance between two pixel signatures treated as points in a vector spaceis an imperfect similarity measure. This is because responses to two similar imagestructures may appear in slightly different locations in the corresponding signa-tures. The work presented by Holmes et al. describes a sophisticated approachto dealing with this problem by treating signature similarity as a transportationproblem, where similarity is measured by the cost of transforming one signatureinto another [91, 92, 90]. Further, an efficient way of computing this measure isdescribed, where signatures are transformed into a space where Euclidean dis-tance approximates the transportation cost. This chapter deals with improvingthe raw signatures, and so the metric properties of pixel signatures will not bediscussed further.
  • Chapter 4—Analysis of the current implementation 1164.4 Analysis of the current implementationIn this section we analyse the existing implementation of pixel signatures andpropose two improvements. The first addresses the length of the structuringelement and the second addresses the coverage of the structuring element.4.4.1 Structuring element lengthFigure 4.3 shows a problem with the implementation of pixel signatures used in[91, 92, 90]: even though the Gaussian blob is circular (up to the image quan-tisation), the pixel signature for the central pixel shows the scale to vary withorientation. This is caused by incorrect computation of the length of the struc-turing element, which should be invariant to orientation. If, for a particular scale,one were to plot the position of the ends of the structuring element as it is rotatedabout a pixel, it should trace a circle. Instead, the structuring element traces asquare, with the structuring element being longer at the diagonal orientationsthan at the horizontal and vertical orientations. This is illustrated in Figure 4.4:as the structuring element moves from position A to B the structuring element“grows” in length (although all three structuring elements have the same numberof pixels). This problem is corrected in our implementation, as Figure 4.7(b)shows.
  • Chapter 4—Analysis of the current implementation 1174.4.2 Local coverageThe structuring elements in the existing implementation are 1-D (i.e. a singleline of pixels), as illustrated in Figure 4.4. The area between rotations of thestructuring element—the shaded region in Figure 4.4—does not contribute tothe pixel signature. If we neglect quantisation, this region has an area of r2 θ(i.e. two sectors of a circle), where r is the length of the structuring elementand θ is the angle between adjacent structuring elements. This is a problembecause there is likely to be useful information in the region that is not considered.While information in this area may contribute to nearby signatures, it shouldbe contained in the signature for the pixel. The solution is contained withinFigure 4.4: the structuring element should be shaped like a bow tie—i.e. like theshaded region in the figure.Recall from Section 4.2 that our morphological operators are defined in terms ofminima and maxima of areas under the structuring element. The bow tie-shapedstructuring element is non-trivial to construct on the quantised image plane forarbitrary sizes and orientations. Further, computing the minimum or maximumvalue under such a shape—particularly for large images such as mammograms—islikely to be computationally demanding. We seek to improve the signatures byconsidering the relevant pixels using a suitable structuring element, but withoutincurring the computational penalty associated with a complex shape.Figure 4.5 shows a series of approximations of the bow tie-shaped structuringelement. Simplifying the shape of the structuring element yields the elementshown in Figure 4.5(b), which has a shape that is easier to construct, but gives
  • Chapter 4—Analysis of the current implementation 118Figure 4.4: An illustration of the two limitations of the existing implementation.Three rotations of a structuring element are shown. As the structuring elementis rotated, it “grows” in length. The red shaded region illustrates the area notcovered by the 1-D structuring elements of the existing implementation and thedesired length of the diagonal structuring element.
  • Chapter 4—Analysis of the current implementation 119 (a) (b) (c) (d)Figure 4.5: Incremental approximations of the bow tie structuring element.The computationally expensive bow tie-shaped structuring element is shown in(a). An initial approximation is shown in (b), which is closely approximated by(c). The structuring element in (c) can be approximately decomposed as (d).
  • Chapter 4—Analysis of the current implementation 120consideration to the regions either side of the centre. When quantised, this ap-proximation is actually a rectangle for all but the largest structuring elements,and the additional regions either side of the centre are insignificant. Using a solidstructuring element is expensive because of the number of pixels that need to becompared when computing the minimum or maximum.It is possible to approximately decompose a sieving with an arbitrarily orientedrectangular structuring element as a sieving with two orthogonal 1-D structuringelements. The first structuring element has the same length and orientation asthe longest side of the rectangular structuring element. The second structuringelement has the same length and orientation as the shortest side of the rectangularstructuring element. The input image is sieved using the first structuring elementand the resulting image is then sieved using the second structuring element6 . Forthe majority of pixels (60%–90%), there is no difference between the full sievingand the approximation. Approximation errors are very rarely more than 10 grey-level values in magnitude (in 8-bit images) and are imperceptible. Because wehave been able to decompose the sieving in terms of two structuring elementsthat are one pixel wide, Soille’s algorithm can be used to perform the erosionsand dilations efficiently [167].The width of the rectangular structuring element—and hence the second 1-Dstructuring element—needs to be such that the rectangle “fits” as it is rotatedfrom one orientation to another. If the length of the first (longest) structuring θelement is r then the length of the second structuring element is 2r sin 2 where 6 Experimental work showed that reversing the order in which sieving was performed de-creased the accuracy of the approximation.
  • Chapter 4—Analysis of the current implementation 121Figure 4.6: Rotating the “rectangular” structuring elements.The diagram shows how the width of the rectangle—and hence the length of thesecond approximating structuring element—needs to be selected so that correctcoverage is achieved (i.e. the corners of adjacent structuring elements need totouch).θ is the angle through which the elements are rotated when moving from oneorientation column to another. This is illustrated in Figure 4.6.Our proposed new method of computing pixel signatures ensures that structuringelement length is constant over orientation (allowing for the quantised imageplane) and uses the orthogonal elements approximation to give consideration topixels that the original method neglected. Figure 4.7 shows a pixel signaturecomputed for the centre of a Gaussian blob using the new method. The non-linearity that remains is due to quantisation. Signatures from the centre of a
  • Chapter 4—An information theoretic measure of signature quality 122 (a) (b)Figure 4.7: An “improved” pixel signature from the centre of a Gaussian blob.A Gaussian blob is shown in (a) and a pixel signature, computed using our method,is shown in (b). Note that the signature does not exhibit the non-linearity of theequivalent signature in Figure 4.3.Gaussian line are similar to those of the original method.4.5 An information theoretic measure of signa- ture qualityThe most obvious way to compare the original and new methods of computingsignatures would be to run a classification experiment. However, to build an ac-curate picture of how well each performed would require large-scale experiments,targeting the various different forms of abnormality. Consequently we sought amore direct measure of comparing their behaviour.In producing pixel signatures, we hope to encapsulate useful information aboutlocal image appearance. A signature that contains more information than anotheris likely to be more useful. Shannon’s entropy [163] is a measure of the averageinformation carried by a discrete symbol emitted from some source. The entropy
  • Chapter 4—An information theoretic measure of signature quality 123measure is derived by considering the “uncertainty” that is associated with asymbol (or the “surprise” associated with the symbol). Given a symbol withprobability p, selected from some alphabet A = {a1 , a2 , · · · , aN }, the measure ofthe uncertainty associated with the symbol, u(p), is defined axiomatically: • u(1) = 0. We are certain of—or unsurprised by—the certain event. • u(p) > u(q) ⇐⇒ p < q. We are more uncertain of—or more surprised by—less probable symbols. • u(pq) = u(p) + u(q). The uncertainty measure is additive for a sequence of symbols. • u(p) is continuous in p.Shannon showed that the only function satisfying these axioms is u(p) = −K loga p.The constant K is usually set to unity and the base of the logarithm is usuallyset to 2, in which case the uncertainty—usually interpreted as the informationcontent—is measured in bits. The expected information content of a symbolemitted by a source is given by: N H=− pi log2 pi . (4.9) i=1Shannon’s entropy can be illustrated as follows. Imagine two coins, each of whichhas an associated probability mass function. Assume that one coin is fair andthe other is very heavily biased towards Heads. Further, imagine that a friendknows these models and has to guess the outcomes of coin tosses, given that they
  • Chapter 4—An information theoretic measure of signature quality 124can know which coin was tossed. Telling your friend that the unfair coin wastossed gives them a very good chance of correctly guessing the message, but theactual message itself (‘The coin landed with the Head facing upwards.’ ) containslittle surprise (information). Conversely, if the friend is told that the fair coinis tossed, they have little information about what the message might be and sothe message carries more surprise (information) than the message for the unfaircoin. In summary: on average, events from peaked distributions convey littleinformation, while events from flat distributions convey more information.An experiment to compare the two methods of computing scale-orientation pix-els signatures using the information theoretic measure of signature quality isdescribed below.4.5.1 AimsThe aim of the experiment was to determine if the modifications made to thepixel signatures increases the information content of the new signatures, relativeto the original method.4.5.2 MethodWe would ideally treat each pixel signature as a symbol and compute the expectedinformation that each of the two types of signature carries (i.e. treat the signaturetype as the source). However, because the pixel signatures we use are essentiallypoints in a 120-D space, building a model of the probability mass function for
  • Chapter 4—An information theoretic measure of signature quality 125signatures is intractable; it is very unlikely that multiple identical signatures willbe encountered, even in a large sample7 . If an equal number of original andnew signatures were sampled, and each signature occurred only once, then theShannon entropy of each source would be identical. Such a measure would notbe useful. Instead, we consider each pixel signature to be a source, where thevalues of the signature elements are the message symbols. If all the elements insignatures had similar values the signatures would carry little useful information,whereas signatures where different elements take on distinct values can carryuseful information.A set of 10 regions of interest, each approximately 400 mm2 , around spiculatedlesions were pseudo-randomly selected from the Digital Database for ScreeningMammography [83]. As well as containing the abnormal feature, the regionswere large enough to contain pixels from tissue that a CADe system should labelas being normal. For each pixel in each image, pixel signatures were computedusing both methods and the corresponding Shannon entropies were calculated.Despite the relatively small number of images, our sample size is actually verylarge (2 310 342 pixel locations). A more comprehensive study would look at allindicative signs of abnormality (e.g. the various types of mass, microcalcifications,architectural distortions), but such work was beyond the scope of this experiment. 7 Multiple identical signatures would only exist in a set of images that contained multipleidentical regions.
  • Chapter 4—An information theoretic measure of signature quality 1264.5.3 ResultsShannon entropy was computed for 2 310 342 pixel signatures. The total Shannonentropy was 6 426 499 bits for the original signatures and 7 638 189 bits for the newsignatures. This is an average increase of over 0.5 bits per pixel or nearly 19%.A t-test on the paired differences between the two sets of entropies at the 95%significance level showed that the new method yields a statistically significantincrease in Shannon entropy. Figure 4.8 shows three regions of interest aroundspiculated lesions and illustrates where the additional information is distributed.4.5.4 DiscussionThe results show that our attempt to improve the way that pixel signatures arecomputed increases the information content of the signatures for spiculated lesionand surrounding “normal” tissue. Although pixel signatures for almost all typesof tissue included see an increase in information content, the increase seems tobe larger for regions around masses—particularly for spicules. Little increase ininformation, or a decrease, is seen in homogeneous regions. We cannot draw anyconclusions for regions containing microcalcifications—as we did not include suchimages—but as inhomogeneous regions see the most increase in Shannon entropy,we would expect an increase for pixel signatures from such regions. The followingexperiment investigates whether our modifications yielded better results when thenew method of computing signatures is used in a practical application.
  • Chapter 4—An information theoretic measure of signature quality 127Figure 4.8: Regions of increased Shannon entropy.The left column shows three regions of interest (to scale). The right column showsthe pixel-wise differences in Shannon entropy between the new and original meth-ods (i.e. positive values illustrate where the new method has more information).Thresholding the difference images shows that almost all pixel pixel signaturescomputed using the new method have more information than those computed withthe original method.
  • Chapter 4—Classification-based evaluation 1284.6 Classification-based evaluation4.6.1 AimsThe information theoretic evaluation demonstrates that the new signatures con-tain more information than those produced by the previous method. Although itis intuitive to expect that a more informative description will yield better resultswhen used within a learning framework such as a classifier, we need to demon-strate that this is the case. The aim of this experiment is to determine if the newsignatures can be applied more successfully than those produced by the originalmethod.4.6.2 MethodAn expert radiologist provided annotations for the images described in Section 4.5(an example region of interest is shown in Figure 4.9). A set of just over 20 000locations within the images were randomly sampled, such that half were sampledfrom the abnormal regions and half from the normal regions. Pixel signatures—computed using the two methods as described previously—were then extractedfor these locations. The columns of the signatures were concatenated, convertingthe 2-D signatures into vectors that can be considered to be points in a 120-Dspace. For each type of signature—i.e. original and new—training and test setswere formed by randomly allocating signatures to either a training set or a testset.
  • Chapter 4—Classification-based evaluation 129 Figure 4.9: An example region of interest and its groundtruth.There are many pattern classification techniques—e.g. nearest neighbour clas-sifier, linear discriminant analysis, artificial neural networks—and the supportvector machine classifier has become popular for its classification ability and abil-ity to generalise. A support vector machine classifier [31] was trained using thetraining set for the original signatures. Suitable training parameters were selectedby validating on the test set for the original signatures. A second classifier wasthen trained on the training set for the new signatures, using the same trainingparameters as were selected for the original signatures. This approach attemptsto remove bias towards the new method of producing pixel signatures. The testset for the signatures produced using the new method were then classified usingthe second classifier.
  • Chapter 4—Classification-based evaluation 130 Original signatures New signatures nTP 3 729 3 932 nTN 3 745 3 746 nFP 1 276 1 266 nFN 1 283 1 080 Specificity 0.747 0.751 Sensitivity 0.744 0.785Table 4.1: Classification results for the two signature types.The table shows the number of true positives (nTP ), true negatives (nTN ), falsepositives (nFP ), false negatives (nFN ) and the specificity and sensitivity for thetwo signature types. See Section 3.11 for an explanation of these quantities.4.6.3 ResultsThe results of the classification experiment are summarised in Table 4.1. Boththe specificity and sensitivity are improved when using the classifier trained usingthe new signatures.4.6.4 DiscussionThe results show that the new signatures can yield better results in classificationexperiments. It should be noted that the results for the classifier trained usingthe new signatures is pessimistic, since the classifier parameters were tuned to theoriginal signatures. As stated previously, Euclidean distance is not a good met-ric for pixel signatures. Classifiers trained in a space where Euclidean distanceapproximates the transportation-based similarity measure perform better thanthose trained in the raw pixel signature space [90]. We could therefore expectclassification performance to be improved if we used signatures in a more appro-
  • Chapter 4—Summary 131priate space and selected classifier parameters for the new signatures, rather thanfor the original signatures.4.7 SummaryThis chapter presented work on improving the way that scale-orientation pixelsignatures are computed. In summary: • Mathematical morphology was introduced. • Scale-orientation pixel signatures were introduced and an existing imple- mentation was analysed. Two flaws with the existing method were ad- dressed, yielding a new way to compute pixel signatures. An efficient way of computing the new signatures was developed. • An information theoretic measure of signature quality was developed. Com- paring pixel signatures computed on mammographic images using the old and new methods showed that the new method increased the information content of the signatures by approximately 19%. • A classification experiment was reported in which signatures computed us- ing the two methods were used to discriminate between pixels belonging to normal and spiculated lesion tissues. The new signatures outperformed the original signatures in terms of both specificity and sensitivity. By tuning the classifier parameters to the new signatures—rather than the old ones—it is expected that even better performance could be achieved.
  • Chapter 4—Summary 132Although the pixel signature approach shows some promise as a method of mod-elling mammographic appearance, it does not lead to the generative approachadvocated in the introduction to the thesis. The remainder of the thesis focuseson developing generative statistical models of mammographic appearance.
  • Chapter 5Modelling distributions withmixtures of Gaussians5.1 IntroductionThis chapter presents background information on the multivariate normal distri-bution and a class of statistical model called the Gaussian mixture model, both ofwhich are used extensively in the remainder of the thesis. The chapter presents: • A brief overview of the density estimation problem and a review of common approaches used to model distributions. • The Gaussian mixture model and two algorithms for learning the model parameters from training data. • Some useful properties of the multivariate normal distribution and Gaussian 133
  • Chapter 5—Background 134 mixture models (computing marginal and conditional distributions). • A method of learning Gaussian mixture model parameters from large train- ing sets using a variant of the k-means clustering algorithm.5.2 BackgroundThis thesis is largely concerned with statistical modelling, which is used to de-scribe scenarios (experiments) that are governed by stochastic processes, or whichcan be assumed to be governed by such processes. One of the characteristics ofrandomness is variation, and this thesis deals with the variation of mammographictexture and appearance. We use statistical models to cope with this variation.A random variable (e.g. X) is a function that maps every possible outcome of anexperiment to a unique number1 . In this way, the random variable is governed bythe stochastic process. The probabilities of discrete events are usually describedusing a probability mass function (pmf), P (X = x), abbreviated as P (x). Foran event x outside the possibility space (i.e. an impossible event), P (x) = 0.The certain event is assigned a probability of unity. The discrete cumulativedistribution function (cdf) is defined as C(x) = P (X ≤ x) = P (x). (5.1) X≤x 1 Experiments often do not have numerical outcomes, e.g. tossing a coin has outcomes Headsand Tails.
  • Chapter 5—Background 135Similarly, the continuous cdf is defined as x C(x) = P (X ≤ x) = p(α) dα, (5.2) −∞where p(x) is the probability density function (pdf). A pdf is simply the derivativeof the corresponding cdf, and is the continuous equivalent of the pmf. Probabilitymass and density functions are nonnegative and must sum or integrate to unity(because the probability of any event occurring in the possibility space is thecertain event).Events in continuous distributions are defined as being regions within the possibil-ity space and so the probability of an event is equal to the integral of the pdf withinthe region that delimits the event. In this thesis the possibility spaces are typi-cally measured on multiple axes, and so the pdfs are multivariate (i.e. are scalarfunctions of vectors). In the multivariate case, pdfs have a value at every pointin the possibility space, and probabilities are the integrals over hyper-volumes ofthe regions that delimit the events.The problem of density estimation can be stated as follows: given a trainingset, T = {xi ∈ Rd }, i ∈ {1, · · · , N }, of samples from a particular population,how do we compute the value of the associated pdf for an arbitrary point in thespace? Implicit in this question is the assumption that the pdf cannot be knowna priori. Further to estimating the pdf, it is often necessary to manipulate thepdf to determine further densities, such as marginal and conditional distributions(Section 5.5 presents some background on these topics), or to compute likelihoodsor probabilities by integrating the pdf.
  • Chapter 5—Density estimation 1365.3 Density estimationA common approach to density estimation is to assume that the data in T followsa known trivial distribution, such as a uniform or normal distribution. Thevalidity of this assumption can be assessed roughly by plotting the data on eachpair of dimensions. If the data do not follow the assumed distribution, then moresophisticated approaches are required. We will now look at a few common densityestimation techniques and consider how they support the following three tasks: 1. Computing a marginal distribution. 2. Computing a conditional distribution. 3. Sampling from the underlying pdf.(A full description of what the terms marginal and conditional mean is given inSection 5.5.)A simple density estimator is the histogram. The possibility space is brokeninto discrete regions called bins, and each bin is assigned a value equal to thenumber of training data that lay within the associated region. When normalisedto sum to unity, the histogram defines a pmf, and the situation changes frombeing continuous to being discrete. If there are ample data, the granularityof the estimate of the probability mass function can be such that it is a goodapproximation of the pdf. Addressing our three tasks: 1. Computing a marginal distribution simply involves summing the histogram along the marginal dimensions.
  • Chapter 5—Density estimation 137 2. Computing a conditional distribution can be achieved by constructing a lower-dimensional histogram from the bins that intersect the conditions, and then normalising the resulting pmf to sum to unity. 3. A multivariate histogram could be sampled as follows: construct an associa- tive array that maps the probabilities to their bin locations in the possibility space and then sample one of these mappings according to the probabilities.The histogram approach works well in low dimensions. However, the amountof data required to populate a space with a given density of data increases ex-ponentially with the dimensionality of the space. Imagine that one can feasiblysample from 10 000 individuals. If we could only measure one attribute for eachindividual, on a scale of 1 to 100, then the density of data points in the possibilityspace would be 10 000/100 = 100. If one could measure two attributes for eachindividual (using the same scale), then the density of the possibility space wouldbe 10 000/1002 = 1. Measuring three attributes yields a density of 0.01, and soon. This effect is called the curse of dimensionality [14].A na¨ representation of a high-dimensional histogram would be a multi-dimensional ıvearray. To approximate a continuous density, each dimension requires a reasonablenumber of elements. The result is a multidimensional array with ad elements—where a determines the quality of the approximation to the continuous density—which quickly becomes impractical. The curse of dimensionality implies that mostof the histogram bins will be empty. A more practical implementation could ex-ploit this redundancy and use a sparse representation, but this would lack theconceptual simplicity of the histogram and would make computing marginal and
  • Chapter 5—Density estimation 138conditional distributions more difficult.The k-nearest neighbour approach [56] uses the data directly to facilitate densityestimation. The idea is to estimate the local density around a given location inthe possibility space by considering the distance, r, to the k-th nearest neighbour.The assumption is that the content (hyper-volume) of a hypersphere of radiusr around a point of interest, will be smaller in densely populated regions of thepossibility space than in sparsely populated regions. The density at a point xcan be estimated as k 1 p(x) ≈ (5.3) N vd (r) kwhere N is an estimate of the probability represented by the k data points andvd (r) is the content of a d-dimensional hypersphere with radius r = x − xk 2 ,where xk is the k-th nearest neighbour to x. The main problem with this methodis that an efficient method of finding the nearest neighbours is required. Address-ing our three tasks: 1. Computing a marginal density simply involves modifying the nearest neigh- bour routine to neglect measurements from the non-marginal dimensions and using the appropriate value for d when Equation 5.3 is used. 2. It is difficult to see how this method would allow conditional distributions to be computed. 3. Since samples from a population are likely to be more common from dense regions of the pdf, “new” samples could be generated simply by choosing one of the original samples at random, but this would restrict samples to the observed set. One could consider the hypersphere defined by r to be of
  • Chapter 5—Density estimation 139 uniform density, and draw a sample from such a region around a randomly chosen point in the set of observed samples.The Parzen window density estimator [56] is similar to the k-nearest neighbourapproach in that it uses the training data points directly to help model the den-sity. The method assumes that the underlying pdf to be estimated is nonzero atlocations near the training points and that less can be inferred about the pdf—based on a particular training point—as one moves further away from it in thepossibility space. The relationship between the inference that can be made aboutthe pdf, based on a particular training point xi , and the distance from that train-ing point, is represented by a kernel function which has the form k(r, xi ), wherer is the (often Euclidean) distance to xi . It is the kernel that defines the contri-bution of the data point to the estimate of the pdf: a kernel is centred on eachdata point, and the pdf is defined as the sum of these kernels, normalised suchthat the integral of the pdf is unity. The particular form of the kernel (e.g. theGaussian, boxcar or triangle functions)—and its parameterisation—must be cho-sen to be suitable for the application at hand. While the Parzen window densityestimator is reasonably simple, choosing the kernel function and its parameterscan be difficult. Addressing our three tasks: 1. Computing a marginal distribution simply involves ignoring measurements on the non-marginal axes and re-normalising the integral of the pdf to unity. 2. Computing a conditional distribution will depend upon the form of kernel chosen. If a Gaussian kernel is chosen, then a closed-form solution exists (see Section 5.5.2).
  • Chapter 5—Gaussian mixture models 140 3. Similarly to the k-nearest neighbour approach, one could sample from a Parzen window representation by choosing a data point at random, and then sampling from its associated kernel as if it were a distribution.In the remainder of the chapter we present the Gaussian mixture model, whichcan be viewed as a generalisation of the Parzen window density estimator. TheGaussian mixture model is an elegant and relatively simple density estimator thatcan be trained in a principled way. Further, there exist closed-form solutions forthe marginal and conditional distributions. It is also easy to sample from themodelled distribution. We shall exploit these properties in much of this thesisand see that these properties are extremely useful for our image synthesis andanalysis methods (see Chapter 6 and Chapter 9).5.4 Gaussian mixture modelsThe GMM approximates an arbitrary pdf using a weighted sum of Gaussian (nor-mal) basis functions, which we call components. In the univariate case, whereobservations are measured on a single axis, each component is parameterised bya mean and a variance. In the multivariate case, where observations are mea-sured on multiple axes, each component is parameterised by a mean vector anda covariance matrix. In addition, each component has an associated probability(“weight”). We shall assume the multivariate case, but the same theory applies
  • Chapter 5—Gaussian mixture models 141in the univariate case. The GMM has the following form: k p(x) = P (i)g(x, µi , Σi ) (5.4) i=1where x is a point in the possibility space, p(x) is the pdf, i indexes the kcomponents, and µi and Σi are the mean vector and covariance matrix for the i-th component. The probability of the i-th component is P (i) and g is the functionthat describes the pdf of a single component: 1 e− 2 (x−µ) x−µ) . 1 T Σ −1 ( g(x, µ, Σ ) = (5.5) (2π |Σ |)n5.4.1 Learning the parametersTo perform density estimation using the GMM, one has to find the model param-eters that fit the model to the training data. This is an ill-posed problem, andthe most common regularisation strategy is to use maximum likelihood estima-tion, where we seek the model parameters that maximise the likelihood that themodel could generate the data. We shall present two solutions to the parameterselection problem shortly.Finding the model parameters would be simpler if we did not have to worryabout the parameter k—the number of model components—which effectively saysthat there is a countably infinite number of families of model. Unlike manyunsupervised learning problems, where one of the aims is to discover the classesthat exist within a mixed training set, all the data in T comes from the same class,
  • Chapter 5—Gaussian mixture models 142so we do not need to determine the “correct” number of components—we simplywant to model the distribution of the data. As the number of model componentsincreases, so does the level of pdf detail that can be modelled. However, we mustbe able to support the choice of parameters for each component using data from T ,so there is a practical upper bound on the number of components that a model canhave. We shall see later that once we have determined the model parameters andwant to use the model, we need to iterate over each component. This introducesa further constraint on the number of components, as the computational cost ofusing a GMM is related to this number. In short, provided that we have adequatesupport for the components, we can have as many as is practical.We will now describe two approaches to fitting a GMM to training data. The k-means clustering algorithm is a simple and intuitive method, but was not designedto fit GMMs to data, while the Expectation-Maximisation (EM) algorithm is moreprincipled.5.4.2 The k-means clustering algorithmThe k-means clustering algorithm [125, 102] is a simple example of unsupervisedlearning. The problem is posed as follows: given a set of multivariate measure-ments, T = ti : i = 1, . . . , N , form k disjoint subsets (called clusters) suchthat all the elements of a particular cluster are similar. There are many variantsof the algorithm, but we shall present two: the first clusters the data in a singlepass (see Algorithm 1); the second is iterative in nature, giving each data pointthe opportunity to migrate (see Algorithm 2).
  • Chapter 5—Gaussian mixture models 143Algorithm 1 The non-iterative k-means algorithm. Randomly assign each ti ∈ T to one of the k clusters. for i = 1, · · · , k do Compute the i-th cluster centre: the mean of the elements assigned to cluster i. end for for each element ti ∈ T do Using some metric compute the distance from ti to each cluster centre. if ti is not assigned to the cluster with the closest centre then Assign ti to the cluster with closest centre. Recompute the means of the two clusters involved in the reassignment. end if end forAlgorithm 2 The iterative k-means algorithm. Randomly assign each ti ∈ T to one of the k clusters. for i = 1, · · · , k do Compute the i-th cluster centre: the mean of the elements assigned to cluster i. end for repeat for each ti ∈ T do Using some metric compute the distance from ti to each cluster centre. if ti is not assigned to the cluster with the closest centre then Assign ti to the cluster with closest centre. end if end for Recompute the cluster centres. until some stopping criterion is met (see text).
  • Chapter 5—Gaussian mixture models 144The metric used to measure similarity can be selected to be appropriate to theproblem at hand, but Euclidean distance is often used. For the iterative algo-rithm, a range of stopping criteria can be used, but a common strategy is to stopiterating when no further reassignments occur.Once a final clustering has been obtained, it is a simple matter to fit a GMMto the clustering: the means, µi : i = 1, . . . , k , are simply the cluster centres;the covariance matrices, Σi : i = 1, . . . , k , are the covariance matrices com-puted from the elements assigned to each cluster and the component probabili-ties, {P (i) : i = 1, . . . , k}, are computed using the number of elements assignedto each cluster: ni P (i) = . (5.6) NThe clustering scheme can easily be modified to remove clusters if the number ofelements assigned to them falls to a level at which there is insufficient supportfor the corresponding Gaussian component.The k-means algorithm is intuitive and simple to implement, but it was notdesigned to fit a GMM to data. In statistical terms, the k-means algorithmminimises the within-cluster variances. Due to the random initialisation, a givenrun of the algorithm will find one possible local minimum. Several runs of thealgorithm give a reasonable chance of finding the global minimum or a suitablelocal minimum.
  • Chapter 5—Gaussian mixture models 1455.4.3 The Expectation Maximisation algorithm for Gaus- sian mixturesAlthough the k-means algorithm is intuitive, it was not designed to fit GMMs todata. The maximum likelihood formulation provides a more principled approachto this problem, where model parameters are sought that maximise the likeli-hood of the data having been generated. Unfortunately, there is no analyticalsolution to this optimisation problem, and so alternative approaches are used.The Expectation-Maximisation (EM) algorithm [137] is a general approach tosimplifying maximum likelihood problems, and in this section we shall presentthe EM algorithm for fitting a GMM to training data. We will start with a simpleone-dimensional problem with just two model components [81], and then gener-alise the algorithm to work in higher dimensions and with an arbitrary numberof components. (The expectation maximisation algorithm is presented in its ab-stract form in Appendix A, along with a proof that the algorithm converges to alocal maximum of the objective function.)We assume a training set, xi ∈ R : i = 1, . . . , N , that has been drawn from anunderlying distribution that can reasonably be modelled using a GMM with twocomponents. Using the random variables X, X1 and X2 , we can describe our modelas follows: X ∼ (1 − ∆)X1 + ∆X2 (5.7) 2 X1 ∼ N(µ1 , σ1 ) (5.8) 2 X2 ∼ N(µ2 , σ2 ) (5.9)
  • Chapter 5—Gaussian mixture models 146where ∆ ∈ {0, 1} with P (∆ = 1) = π.Equation 5.7 can be viewed as a simple generative model: generate a ∆ withprobability π; if ∆ = 0 then deliver X1 , otherwise deliver X2 . If gθ (x) is a normaldistribution with parameters θ = (µ, σ 2 ), the we can write the pdf of X as: p(x) = (1 − π)gθ1 (x) + πgθ2 (x). (5.10) 2 2The model is parameterised by a vector Θ = (π, θ1 , θ2 ) = (π, µ1 , σ1 , µ2 , σ2 ). Wewant to select an optimal vector, Θ , which is a maximiser of the likelihood of thedata having been generated by the model. The log-likelihood of the parametersgiven the N data points is: N N (Θ; X ) = log p(xi ) = log [(1 − π)gθ1 (xi ) + πgθ2 (xi )] . (5.11) i=1 i=1Unfortunately, there is not a closed-form solution to Equation 5.11 and so anumerical approach is required. If we knew the component from which each datapoint was drawn, then finding the optimum Θ would be easy—the componentmeans and variances could just be computed by considering each componentseparately, and π could be computed from the number of points assigned to eachcluster. Because we do not know the membership of each data point, we considerunobserved latent variables, ∆i ∈ {0, 1} : i = 1, . . . , N , as in Equation 5.7, and ˆmake soft (probabilistic) assignments. Given a current estimate, Θ, of the model
  • Chapter 5—Gaussian mixture models 147parameters, we compute the expected value of each ∆i : ˆ ˆ δi = E(∆i |Θ, X ) = P (∆i = 1|Θ, X ) (5.12)and we can call δi the responsibility of X2 for observation i. This is the expectationstep of the EM algorithm. In the maximisation step, the estimates of the modelparameters are updated using maximum-likelihood estimates weighted by theresponsibilities. The EM algorithm for fitting a GMM with two components toone-dimensional data is described by Algorithm 3.Just as the k-means algorithm requires an initial hard assignment of data pointsto clusters, the EM algorithm requires an initialisation. For example, the mixingproportion π can be set to 0.5, two of the xi may be chosen to be µ1 and µ2 , and the ˆ ˆ ˆ 1 Ncomponent variances can be set to be the overall sample variance, N x2 i=1 (xi −¯) .If a Gaussian component, with zero variance, is placed upon one of the datapoints, then the likelihood of that data point becomes infinite, thus giving anunfortunate maximum for Equation 5.11. Therefore the variances must be con-strained to be greater than zero. Dempster, Laird, and Rubin showed that aniteration of the EM algorithm cannot decrease the objective function [137]. Ingeneral, the objective function can have multiple optima, and several runs of thealgorithm—using different initialisations—may be required. Algorithm 4 gener-alises Algorithm 3 to the case of multivariate data and multiple model compo-nents. Notice that with multiple components, the component responsibilities forthe data points need to be computed in the computation of each type of modelparameter, and so the expectation and maximisations steps are combined.
  • Chapter 5—Gaussian mixture models 148Algorithm 3 The EM algorithm for fitting a GMM with two components toone-dimensional data. Initialise the parameters (see text): ˆ π ˆ ˆ2 ˆ ˆ2 Θ = (ˆ , µ1 , σ1 , µ2 , σ2 ). (5.13) repeat The expectation step: update the estimate of the responsibilities: π gθ2 (xi ) ˆ ˆ ˆ δi = , ∀ i ∈ {1, · · · , N }. (5.14) (1 − π )gθ1 (xi ) + π gθ2 (xi ) ˆ ˆ ˆ ˆ The maximisation step: update the weighted maximum-likelihood es- timates of the means and variances, and update the estimate of the mixing probability: N ˆ i=1 (1 − δi )xi µ1 = ˆ , (5.15) N ˆ (1 − δi ) i=1 N − ˆ δi )(xi − µ1 )2 i=1 (1 ˆ ˆ2 σ1 = , N ˆ − δi ) i=1 (1 N ˆ δi xi µ2 = i=1 ˆ N ˆ , i=1 δi N ˆ ˆ 2 i=1 δi (xi − µ2 ) ˆ2 σ2 = N ˆ , i=1 δi N ˆ δi π= ˆ . i−1 N until convergence.
  • Chapter 5—Gaussian mixture models 149 Figure 5.1: An illustration of the expectation maximisation algorithm.Figure 5.1 shows an illustration of the EM algorithm. The figure shows the jointdistribution of model parameters and latent data for a pedagogic example. Thevertical axis represents the model parameter space, and the horizontal axis rep-resents the latent variable space. The horizontal lines in the diagram representthe E-steps and the vertical lines represent the M-steps. The procedure beginswith an initial (poor) estimate of the model parameters. Keeping these constant,the E-step obtains an estimate of the latent data. Keeping the latent data con-stant, the M-step obtains a refined estimate for the model parameters. The twosteps are iterated until the algorithm converges to a local maximum. Note thatthis particular run of the algorithm finds a local maximum that is not the globalmaximum.
  • Chapter 5—Gaussian mixture models 150Algorithm 4 The EM algorithm for fitting a GMM with multiple componentsto multivariate data. Initialise the parameters: ˆ ˆ ˆ ˆ Θ = {P (i), µi , Σi }, ∀i ∈ {1, · · · , k}. (5.16) repeat Update the estimate of the mixing probabilities: N ˆ 1 ˆ P (i) = P (i|xj , Θ), ∀i ∈ {1, · · · , k}, (5.17) N j=1 where the “responsibility” of component i for xj is ˆ ˆ p(xj |i, Θ)P (i|Θ) ˆ P (i|xj , Θ) = (5.18) ˆ p(xj |Θ) by Bayes’ theorem. Update the estimate of the component means: N ˆ j=1 P (i|xj , Θ)xj ˆ µi = N , ∀i ∈ {1, · · · , k}. (5.19) ˆ j=1 P (i|xj , Θ) Update the estimate of the component covariance matrices: N ˆ j=1 P (i|xj , Θ)(xj − xi )(xj − xi )T ˆ ˆ ˆ Σi = , ∀i ∈ {1, · · · , k}. (5.20) N ˆ j=1 P (i|xj , Θ) until convergence.
  • Chapter 5—Useful properties of multivariate normal distributions 1515.5 Useful properties of multivariate normal dis- tributionsThe multivariate normal distribution has the very useful property that there existclosed-form solutions to the problems of computing the marginal and conditionaldistributions. We will review what is meant by these terms, describe the closed-form solutions for the multivariate normal (i.e. a single component), and thengeneralise these results to the multivariate GMM.5.5.1 Marginal distributionsImagine that three measurements are made for a sample of individuals on acontinuous scale (e.g. the height, weight and annual income of a number of peo-ple). We could fit a GMM to this data. Further, imagine that to answer aparticular question we are only interested in the distribution of one of these mea-surements (e.g. height) and have no constraining information for the other twodimensions. The distribution we seek is called a marginal distribution. Intuitively,the marginal distribution is the projection of the full pdf onto the dimensions thatwe are interested in. Figure 5.2 illustrates a two-dimensional pdf marginalisedover one dimension.Formally, if p(x) = p(x1 , · · · , xn ) is a multivariate pdf, then p(x) marginalised
  • Chapter 5—Useful properties of multivariate normal distributions 152Figure 5.2: A two-dimensional distribution marginalised over one dimension.The marginal distribution is the “shadow” at the back of the distribution.over all dimensions except those indexed by D = di : i = 1, . . . , m, m ≤ n is: p(xf1 , · · · , xfq ) = (5.21) ∞ ∞ ··· p(x) dxd1 · · · dxdm , F = fi : i = 1, . . . , q ⊂ D. −∞ −∞In Equation 5.21, F represents a set of dimension indices which are to be retained(i.e. we are interested in them). D represents a set of dimension indices whichare to be removed via marginalisation. Sets D and F cannot share indices andso they are disjoint.Although the definition of the marginal involves a series of integrals, there isa very simple general solution: we simply pretend that the dimensions that wewant to marginalise over do not exist (and so no measurements could have been
  • Chapter 5—Useful properties of multivariate normal distributions 153made for them) [103]. In the case of the Gaussian, the parameters that define thedistribution—the mean vector and covariance matrix—are modified by removingentries that correspond to the dimensions that we want to marginalise over. Anexample of this is shown below. If X ∼ N(µ, Σ ) with      µ1   Σ1,1 Σ1,2 Σ1,3      µ =  µ2  ,   Σ =  Σ2,1 Σ2,2 Σ2,3    (5.22)     µ3 Σ3,1 Σ3,2 Σ3,3then ∞ p(x1 , x3 ) = p(x) dx2 = N(µm , Σm ) (5.23) −∞where      µ1   Σ1,1 Σ1,3  µm =  , Σm =  . (5.24) µ3 Σ3,1 Σ3,3Note that the marginal Gaussian density is itself a Gaussian density. The proce-dure for computing the marginal distribution can be easily extended to the caseof a GMM by applying the above procedure to each component.5.5.2 Conditional distributionsImagine again a multivariate distribution. Also, imagine that we have madea measurement along one of the dimensions and want to know how this mea-surement constrains the distribution of values on the other dimensions. Thedistribution we seek is called the conditional distribution.
  • Chapter 5—Useful properties of multivariate normal distributions 154Figure 5.3: A conditional distribution.A joint density is shown on the left. Applying a condition on one dimensionconstrains the distribution along the other.In the general case of a multivariate pdf and multiple conditions, each conditiondefines a hyperplane through the full distribution. The hyperplanes are axis-aligned and mutually orthogonal. The conditional distribution is the functionthat describes the values of the pdf that lie on the intersection of these hyper-planes, normalised so that the function’s integral equals unity (i.e. is a valid pdf).Figure 5.3 illustrates this concept.We now derive the conditional distribution for the multivariate normal distribu-tion [103]. We seek an expression for p(x1 |x2 ). We will partition the randomvector X into X1 and X2 . X2 will be conditioned by X2 = x2 . Our approach is tofind a way of forcing independence between X1 and X2 . Recall that if two distri-butions, p(a) and p(b), are independent, then p(a|b) = p(a). Let X ∼ N(µ, Σ ).We partition Σ as follows:    Σ1,1 Σ1,2  Σ =  (5.25) Σ2,1 Σ2,2 .
  • Chapter 5—Useful properties of multivariate normal distributions 155Σ1,1 can be linearly transformed so that the covariances shared with Σ2,2 arezero and hence the two are independent. Assume X ∈ Rp and that there are qconditions. Let   −1 I −Σ1,2 Σ2,2    q×q  A= q×(p−q) . (5.26)   T  0 I  (p−q)×q (p−q)×(p−q)Applying A to Σ yields: AΣAT (5.27)     −1  I −Σ1,2 Σ2,2   Σ1,1 Σ1,2   I 0  =    −1 0T I Σ2,1 Σ2,2 (−Σ1,2 Σ2,2 )T I   −1  Σ1,1 − Σ1,2 Σ2,2 Σ2,1 0  =  0T Σ2,2 .We see that the off-diagonal covariances are zero. Applying the same transfor-mation to (X − µ):    X1 − µ1  A(X − µ) = A   (5.28) X2 − µ2    −1  I   X1 − µ1  −Σ1,2 Σ2,2 =   T 0 I X2 − µ2   −1  X1 − µ1 − Σ1,2 Σ2,2 (X2 − µ2 )  = , X2 − µ2
  • Chapter 5—Useful properties of multivariate normal distributions 156which has the distribution   −1  Nq (0, Σ1,1 − Σ1,2 Σ2,2 Σ2,1 ) . (5.29)   Np (0, Σ2,2 ) −1If we fix X2 = x2 , then µ1 − Σ1,2 Σ2,2 (x2 − µ2 ) is constant. Because X1 − µ1 − −1Σ1,2 Σ2,2 (x2 − µ2 ) and X2 − µ2 are independent, the conditional distribution of −1X1 − µ1 − Σ1,2 Σ2,2 (x2 − µ2 ) is the same as the unconditional distribution of −1 −1X1 − µ1 − Σ1,2 Σ2,2 (X2 − µ2 ), i.e. Nq (0, Σ1,1 − Σ1,2 Σ2,2 Σ2,1 ). Therefore, given −1 −1X2 = x2 , X1 ∼ Nq (µ1 + Σ1,2 Σ2,2 Σ2,1 (x2 − µ2 ), Σ1,1 − Σ1,2 Σ2,2 Σ2,1 ). Note that, aswith the marginal distribution, the conditional is itself a Gaussian density. Notealso that the conditional covariances are independent of x1 .For clarity, we summarise the result obtained above. If X is a multivariate randomvariable, where X ∼ N(µ, Σ ), then we can partition these as:    X1  X= , (5.30) X2    µ1  µ= , (5.31) µ2    Σ1,1 Σ1,2  Σ = . (5.32) Σ2,1 Σ2,2
  • Chapter 5—Useful properties of multivariate normal distributions 157The conditional distribution p(x1 |x2 ) = N(µ , Σ ) with −1 µ = µ1 + Σ1,2 Σ2,2 (x2 − µ2 ) (5.33) −1 Σ = Σ1,1 − Σ1,2 Σ2,2 Σ2,1 . (5.34)The dimensions of X1 and X2 do not have to be adjacent, which allows the dis-tribution to be conditioned over arbitrary dimensions.Computing the conditional distribution for a GMM involves computing the con-ditional means and covariances for each component as described above and com-puting P (i|x2 ) : i = 1, . . . , k , the set of conditional component probabilities.These are computed using Bayes’ theorem: p(x2 |i)P (i) P (i|x2 ) = (5.35) p(x2 )where p(x2 |i) is computed by marginalising each component over the unknowndimensions.Numerical issues −1The quantity Σ2,2 is required in order to compute a conditional distribution. Inpractice, covariance matrices can often be close to singular (numerically difficultto invert). An ad hoc approach to improving the condition of a covariance ma-trix is to add to the diagonal of the matrix. This essentially adds variance tothe distribution represented by the matrix. A significant problem with this ap-proach is that one does not usually know a priori how much variance should be
  • Chapter 5—Useful properties of multivariate normal distributions 158added. We have experimented with a scheme where small amounts of varianceare added incrementally until the matrix can be inverted. The method was rea-sonably successful—i.e. it could be used in the methods we describe in successivechapters—but computationally expensive.Another approach, and one that we have found to be a good solution, is tocompute the Moore-Penrose generalised inverse (commonly called the pseudo-inverse) of the covariance matrix instead [131, 139]. The Moore-Penrose inverseof the matrix A, which we will denote by A+ , has the following properties: AA+ A = A (5.36) A+ AA+ = A+ (5.37) (AA+ )T = AA+ (5.38) (A+ A)T = A+ A (5.39)and x = A+ b (5.40)is the least squares solution to Ax = b. (5.41)Although the Moore-Penrose generalised inverse is defined for any complex ma-trix, we shall restrict this discussion to covariance matrices, which are symmetric.The Moore-Penrose generalised inverse can be computed as follows. Note that
  • Chapter 5—Useful properties of multivariate normal distributions 159the inverse of the matrix A can be written as A−1 = (PDP −1 )−1 = PD −1 P −1 (5.42)where D is a diagonal matrix of the eigenvalues of A, and P is a matrix whosecolumns are the eigenvectors of A (i.e. Equation 5.42 represents a rotation of Ato its principal axes). The matrix D −1 is trivial to compute, as it is simply a di-agonal matrix where each diagonal element is the reciprocal of the correspondingelement in D. For near singular matrices, some of the eigenvalues will be small.We modify P by discarding the eigenvectors that have small corresponding eigen-values, and remove the elements of D that correspond to the small eigenvalues(e.g. if eigenvector 3 is small, row 3 and column 3 of D would be removed): ˆˆ ˆ A+ = P D −1 P −1 (5.43) ˆ ˆwhere P is the modified P and D is the modified D. Since P is orthonormal ˆ ˆP −1 = P T ⇒ P −1 = P T . Although we generally use the Moore-Penrose gen-eralised inverse for covariance matrices, we use the Σ −1 notation throughoutthis thesis—rather than Σ + —because other techniques are occasionally used (seeSection 8.3.1) and the Σ −1 notation implies intent rather than implementationdetail.Computing conditional Gaussians represents approximately 98% of the computa-tions performed in the work presented in Chapter 62 and a substantial proportionof those in Chapter 9, and so a hand-tuned implementation of the above algorithm 2 As determined by profiling our implementation.
  • Chapter 5—Useful properties of multivariate normal distributions 160was developed. This implementation allows Moore-Penrose generalised inversesof covariance matrices to be computed about 1.4 times faster than the implemen-tation provided by MATLAB—which uses LAPACK routines [6]—and is equallyrobust. A less portable version was much faster, being 1.5 to 2 times faster thanthe MATLAB implementation.5.5.3 Sampling from a Gaussian mixture modelSampling from an n-dimensional Gaussian mixture model is reasonably straight-forward. Firstly, one of the model components is selected at random. The distri-bution used for this sampling is the set of component probabilities, P (i) : i =1, . . . , k . A sample is then drawn from the selected component, described by thecovariance matrix Σ . If the component was aligned with the Cartesian axes, thensampling from the component would be easy because its covariance matrix wouldbe diagonal and the dimensions would be independent: a set of n scalars couldbe sampled from univariate normal distributions with variances correspondingto each diagonal element of the covariance matrix. In general, components arenot aligned with the Cartesian axes, and so we must first diagonalise the compo-nent’s covariance matrix. This is achieved by performing an eigen decompositionwhich yields a matrix P, where each column is an eigenvector of the covariancematrix. P represents the transformation needed to diagonalise the covariancematrix. This is a Principal Components Analysis (PCA) [104]. The diagonalisedcovariance matrix is given by: ΣD = P T ΣP. (5.44)
  • Chapter 5—Learning from large datasets 161An n-dimensional vector, sD , is then sampled from the diagonalised component,using the procedure described above. This vector is then transformed back to theoriginal space by applying the inverse transformation to yield a sample, s : s = PsD . (5.45)s is now in the space of our model, but is centred on the origin. We then translatethis sample, using the component’s mean vector, µ: s = s + µ. (5.46)The vector s is a sample from the distribution represented by the model.5.6 Learning from large datasetsMammograms are digitised at high resolution, yielding images that contain afew million pixels. It seems reasonable that, in order to learn the variation inmammographic appearance, we will need to consider large quantities of data. Wehave therefore considered how the k-means algorithm could be adapted to processsuch volumes of data.Jain et al. present a review of data clustering techniques where they discuss clus-tering large datasets [102]. The most natural approach for problems where theentire dataset cannot be stored in primary memory is the divide-and-conquer al-gorithm, which is illustrated in Figure 5.4. This algorithm stores the full dataset,
  • Chapter 5—Learning from large datasets 162D0 , in a secondary memory (e.g. on hard disk or a large networked store) and ran-domly divides it into p subsets, Si : i ∈ {1, · · · , p}, of equal size. Each Si is thenprocessed by a clustering algorithm, yielding clusters Ci,j , where j ∈ {1, · · · , k}.Each cluster Ci,j then contributes a number of representative data points to forma new data set, D1 . Let there be Ni data points in subset Si and ni,j data pointsin cluster Ci,j . The “probability” of Ci,j is ni,j /Ni . If there are to be η data pointscontributed from each Si to D1 , then cluster Ci,j contributes qj data points, where: ηni,j qj = . (5.47) Ni(Clusters which represent more data contribute appropriately, so that one “type”of data is not disproportionately represented in D1 .)If there are still too many data points in D1 for it to be clustered in primarymemory, the above process can be repeated to create D2 , D3 and so forth. Thenumber of times the divide-and-conquer algorithm will be run is determined bythe initial number of subsets, p, and the number of data points contributed fromeach subset, η.In our work in Chapter 6, we set p and η such that D1 can be clustered withinprimary memory (i.e. only one run through the divide-and-conquer algorithm isrequired). Since the data sets {Si } need to be clustered as an intermediate step,we use the non-iterative variant of the k-means algorithm on each of these, andthe iterative variant to yield the final clustering, from which the GMM parametersare computed.
  • Chapter 5—Learning from large datasets 163Figure 5.4: The divide-and-conquer clustering algorithm.This diagram illustrates how a large data set, D0 , can be divided into smaller ones{Si } which can be clustered in primary memory. Each clustering then contributessome representative data to form a new data set, D1 , which can be clustered inprimary memory to yield a final clustering.
  • Chapter 5—Summary 164In much of the work presented in Chapter 6, we adopt the divide-and-conquerapproach. However, although this variant of the k-means algorithm allows GMMsto be built from large datasets, we found there to be no appreciable differencebetween models built using the divide-and-conquer algorithm and those builtsimply by selecting a reasonable number of training points at random. The divide-and-conquer method simply makes it unlikely that the GMM will be built frombiased data.A better approach might be to implement an EM algorithm that can considerlarge datasets. The inner loop of the EM algorithm is written as an iteration overthe data points; with an appropriate caching strategy, the EM algorithm extendsnaturally to deal with large data sets. Unlike the divide-and-conquer variant ofk-means algorithm, every data point would contribute to every model parameter.5.7 SummaryThis chapter presented an introduction to Gaussian mixture models and the mul-tivariate normal distribution. In summary: • Gaussian mixture models are a flexible solution to the density estimation problem. • Gaussian mixture model parameters can be learned from training data using several approaches. This chapter described the k-means clustering and Expectation-Maximisation algorithms.
  • Chapter 5—Summary 165 • The k-means clustering algorithm is a simple and intuitive approach but was not designed to fit Gaussian mixture models to data. • The Expectation-Maximisation algorithm is a principled approach to learn- ing Gaussian mixture model parameters from training data. • The multivariate normal distribution has two useful properties: the marginal and conditional distributions can be computed using closed-form solutions. Further, the marginal and conditional distributions are themselves multi- or univariate normal distributions. • It is possible to sample from a multivariate normal distribution. • These properties of the multivariate normal distribution can be used to define equivalent operations on the multivariate Gaussian mixture model. • The chapter described an approach to learning parameters for Gaussian mixture models from large datasets using a variant of the k-means cluster- ing algorithm. The Expectation-Maximisation algorithm can be trivially extended to learn from large datasets.
  • Chapter 6Modelling mammographictexture for image synthesis andanalysis6.1 IntroductionThis chapter develops a generative parametric statistical model of stationarytexture. The model is based upon Efros and Leung’s non-parametric texturesynthesis method [60]. The chapter presents: • Efros and Leung’s texture synthesis algorithm. • A parametric model-based version of their algorithm that allows texture analysis as well as synthesis. 166
  • Chapter 6—Background 167 • A way of synthesising textures using the parametric model and some exam- ple synthetic images. • A novelty detection method that allows the parametric model to be used to analyse textures.6.2 BackgroundChapter 3 gave a brief overview of some computer-aided detection systems. Thesegenerally attempt to emulate radiologists’ interpretation strategies using patternrecognition. The approach is very common in computer-aided mammography,but may not be the best way to approach the problem.Instead of learning a classification rule that separates classes (e.g. malignantmasses from benign masses), we should instead learn what pathology-free mam-mograms look like within a framework that allows illegal instances to be iden-tified. If we can determine that the appearance of a particular mammogram isunlikely—given it is supposed to be free of pathology—then we can label thatmammogram as being novel (perhaps leaving an expert to determine exactly whyit is novel). Another name for the approach is outlier detection.To perform novelty detection on mammograms we need a model of the appearanceof pathology-free mammograms that can be used in an analytical mode. That isto say that the model needs to be able to identify unlikely model instances byassigning likelihoods (or similar measures) to model instances.
  • Chapter 6—Background 168The appearance of entire mammograms is difficult to model due to the nature ofthe imaging process and anatomical differences between women. In this chapterwe make the problem more tractable by assuming that mammograms are sta-tionary (i.e. the statistics of the texture do not vary over the image plane andso local appearance does not vary across the breast). This is certainly not true,for example the appearance of a pectoral muscle differs significantly from a fattybreast region, but the assumption allows us to concentrate on a manageable partof the problem. Because we are assuming stationarity, we do not need to worryabout shape—our simplified mammogram is a texture on a potentially infiniteplane. We address the problem of modelling the appearance of entire breasts inmammograms in Chapter 9.When developing models of appearance, it is useful to be able to visualise in-stances of the model—in our case to be able to generate synthetic examples ofmammographic textures. This will allow us to evaluate the generality and speci-ficity of the model. Because this generative property is so useful, we make it arequirement for our model.Sajda et al. [159, 169] used wavelet coefficients, computed from mammographicpatches, to statistically model mammographic patches using a tree-structuredvariant of a hidden Markov model. To synthesise an image, coefficients in finerlevels of the wavelet decomposition were sampled, conditional on those at coarserlevels. Subjectively, this approach was reasonably successful at capturing localtextural appearance of mammograms, and can be used in both generative andanalytical modes. The model was used to filter false positives produced by an-other CADe system. Unfortunately, due to the way in which finer levels are
  • Chapter 6—Background 169conditioned upon coarser ones, the synthetic images produced using the modelhad an obvious grid structure corresponding to the coarsest level of the waveletdecomposition. The approach is similar to De Bonet and Viola’s model of generictextures that allows both synthesis and analysis [53]. We present a model of entiremammograms that uses the hierarchical conditioning approach in Chapter 9.Bochud et al. [19] developed an algorithm to generate mammographic textureusing a strategy that placed basis functions at locations within a white noiseimage, according to a pdf. The white noise and kernel function were matchedto the power spectrum of real mammographic texture. The method producedreasonable synthetic textures, but the images could easily be distinguished fromreal examples. Brettle et al. [27] evaluated several methods for generating medicaltextures (including mammographic textures) and found that the generic texturesynthesis method developed by Efros and Leung produced the best results [60].Heine et al. modelled mammographic texture using a random field model [85].The authors assumed that mammographic texture could be modelled by a con-volution of a random field with a kernel function. The choice of the form of thekernel was based in part on studies of the fractal nature of mammograms. Theparameter governing the kernel function was learned from real mammographicdata, as were the statistical characteristics of the random field. Subjectively, theapproach allowed reasonably realistic synthetic textures to be generated. The au-thors analysed an obvious mass in a real mammogram by computing the randomfield that would have been required to generate the image under their model.The location of the mass was visible in computed field, and the approach can beviewed as an example of novelty detection.
  • Chapter 6—Non-parametric sampling for texture synthesis 170In addition to methods developed specifically to model mammographic texture,generic texture synthesis algorithms may also be useful for mammographic texturesynthesis (e.g. [53, 84, 144]). The work presented in this chapter extends the Efrosand Leung algorithm to a parametric statistical setting, which allows the methodto be used to generate synthetic textures and perform novelty detection.6.3 Non-parametric sampling for texture syn- thesisEfros and Leung describe a method of replicating texture, based upon non-parametric sampling [60]1 . Their algorithm is based upon an idea from Shan-non’s paper that introduced information theory [163]. Their method extendsShannon’s one dimensional Markov chain approach to producing English-lookingtext to the image plane, to achieve texture synthesis. Though the method issimple, it produces some of the best results in the texture synthesis literature.Assume an image containing a sample of the texture one wishes to replicate.This source image, IS , is simply a matrix of grey-level values. The aim is to fill anunpopulated target matrix, IT , with grey-level pixel values, such that the texturesin the two images are similar (but not identical). Their algorithm is described inAlgorithm 5.The method essentially has two parameters: the window size and the number 1 Efros and Leung’s method was developed independently of a similar method presented in[68].
  • Chapter 6—Non-parametric sampling for texture synthesis 171Algorithm 5 Efros and Leung’s texture synthesis algorithm. Select a region from IS and insert it at some location in IT . For example, any 3 × 3 pixel section could be used. Initially, let the set S = ∅. for each pixel p ∈ IS do Extract a square window, s, of size n × n, centred around p. Add s to S. end for repeat Compute a list, U, of unpopulated pixel locations that are 8-connected to the populated area of IT . Randomly choose a pixel location, u ∈ U. Examine a vector t, formed from a square window of size w × w pixels, centred on u. Some dimensions (pixels) will be populated and some will not. Find a small set of windows, S , from S that are similar to t. Randomly select one of the windows from S and place its centre pixel value into IT at location u. until all the pixels in IT have been populated.of similar windows to place into S . The first parameter is important for goodtexture synthesis and is a function of the actual texture. The authors say thatthe window size should be selected to be similar in size to the largest repeatingfeature in the texture. The second parameter is automatically adapted by findingthe distance, δ, to the most similar window, and then including all windows inS that are within a radius of (1 + )δ from t (Efros and Leung set to 0.1 [60]).The method also requires a metric that measures window similarity and takesinto consideration the missing (unpopulated) pixels from t. The authors use anormalised sum of squared differences metric, which is weighted by a Gaussiankernel to give more weight to the pixels near the centre of the window and henceencourage local similarity. The authors also trivially extend the method to workwith colour images, although this is not useful for a mammography application.
  • Chapter 6—A generative parametric model of texture 172Efros and Leung’s method does not consider texture analysis, but Efros claimsthat it could be achieved using k-nearest neighbour classification2 . Such methodsare problematic for two reasons: populating high-dimensions spaces is impractical[14] and finding the k-nearest neighbours is computationally demanding.In a subsequent paper [59], Efros and Freeman address the run-time efficiencyof the texture synthesis algorithm, proposing that instead of populating IT pixel-by-pixel, the texture is synthesised patch-by-patch. Although the fundamentalidea of conditional sampling was preserved, the non-parametric approach meantthat Efros and Leung’s algorithm had to be significantly modified and a differentsimilarity measure and sampling method were used.6.4 A generative parametric model of textureIn this section we propose an approach that unifies Efros and Leung’s and Efrosand Freeman’s methods within a parametric statistical framework. Our methodwill not only enable novelty detection to be performed using statistical inference,but will address two of the problems of the non-parametric approaches: theirsampling methods do not truly reflect the statistical distribution of window ap-pearance and the time complexity of the synthesis algorithms are a function ofthe number of pixels in IS .Our method is similar to those presented by Popat and Picard [142] and Grim andHaindl [74] in that we use a parametric model of the distribution of local textural 2 Personal communication.
  • Chapter 6—A generative parametric model of texture 173appearance. However, Popat and Picard’s model had a hierarchical configurationwhich was designed to capture overall texture structure. Grim and Haindl alsomodelled the distribution of local textural appearance, but a number of suchmodels were used, one for each decorrelated component of the colour space.We address the first of the above problems by using an explicit representation ofthe distribution of the appearance of the windows. We address the second prob-lem by moving the burden of iterating over the “training” set to a training stage,meaning that the computational complexity of image synthesis and analysis be-comes a function of the model parameters and is largely unrelated to the numberof training points. We also address building the model from large training sets.Our method assumes a training set of a number of images containing examples ofthe same texture. Centred on each pixel in the training set, we extract windowsof size w × w, where w is odd, centred on the pixel. Windows that overlap theborder of their image are discarded. The pixels in each window are concatenatedso that the windows may be considered as points in a high-dimensional space. Weseek to model the distribution of these points. The divide-and-conquer algorithm(described in Section 5.6) and the k-means algorithms are used to build a GMMof the distribution. We use the fast non-iterative variant of k-means for thefirst stage of the divide-and-conquer algorithm, and then the iterative variant toproduce the final clustering, and hence the model. We have also built modelsusing the EM algorithm for Gaussian mixtures.We now describe how the model can be used to generate new examples of themodelled texture.
  • Chapter 6—Generating synthetic textures 1746.5 Generating synthetic texturesWe have developed two algorithms for texture synthesis, which are parametricanalogues of the non-parametric methods used by Efros and Leung and Efros andFreeman. Each assumes a Gaussian mixture model of a particular class of textureparameterised by Θ, as described in Section 6.4. The algorithms are presented inAlgorithm 6 and Algorithm 7. Like the Efros and Leung and Efros and Freemanalgorithms, we define a target image, IT , whose pixels are initially unpopulated.We describe the two algorithms in the following sections.6.5.1 Pixel-wise texture synthesisAlgorithm 6 is analogous to the Efros and Leung synthesis algorithm.6.5.2 Patch-wise texture synthesisAlgorithm 7 is analogous to the Efros and Freeman synthesis algorithm. Thealgorithm is the same as for the pixel-wise algorithm, except the marginalisationstep is removed and the sample from the conditional model contains pixel valuesfor the remainder of the window, rather than individual pixels. Because theimage is filled patch-by-patch, synthesis is performed significantly faster than bythe pixel-wise algorithm.
  • Chapter 6—Generating synthetic textures 175Algorithm 6 Pixel-wise texture synthesis with a Gaussian mixture model oflocal textural appearance. Select a region from one of the training images and insert it at some location in IT . For example, any 3 × 3 pixel section could be used. repeat Compute a list, U, of unpopulated pixel locations that are 8-connected to the populated area of IT . Randomly choose a pixel location, u ∈ U. Examine a vector t, formed from a square window of size w × w pixels, centred on u. Some dimensions (pixels) will be populated and some will not. Marginalise the GMM over all dimensions corresponding to unpopulated pixels (not including the centre pixel), as described in Section 5.5.1. This yields a Gaussian Mixture Model parameterised by Θ . Condition the model with parameters Θ on the values of the populated pixels, at the corresponding dimensions, as described in Section 5.5.2. This yields a Gaussian Mixture Model parameterised by Θ∗ . This conditional model represents the likely distribution of pixel values for the centre pixel, given the local populated pixels. Sample a value, p, from the model parameterised by Θ∗ . Insert the value p into IT at location u. until all the pixels in IT have been populated.
  • Chapter 6—Generating synthetic textures 176Algorithm 7 Patch-wise texture synthesis with a Gaussian mixture model oflocal textural appearance. Select a region from one of the training images and insert it at some location in IT . For example, any 3 × 3 pixel section could be used. repeat Compute a list, U, of unpopulated pixel locations that are 8-connected to the populated area of IT . Randomly choose a pixel location, u ∈ U. Examine a vector t, formed from a square window of size w × w pixels, centred on u. Some dimensions (pixels) will be populated and some will not. if the window overlaps the edge of the image then Marginalise the model over the dimensions that lie outside IU . end if Condition the Gaussian mixture model on the values of the populated pixels, at the corresponding dimensions, as described in Section 5.5.2. This yields a Gaussian Mixture Model parameterised by Θ∗ . This conditional model represents the likely distribution of pixel values for the unpopulated pixels in the window around the pixel with location u. Sample a vector, p, from the model parameterised by Θ∗ . In the case that Θ∗ represents a univariate model, p will be a scalar. Insert the values p into IT to populate the remainder of the window centred on location u. until all the pixels in IT have been populated.
  • Chapter 6—Generating synthetic textures 1776.5.3 The advantages and disadvantages of a parametric statistical approachAs described in section Section 6.4, the Efros and Leung and Efros and Free-man sampling methods do not truly reflect the statistical distribution of windowappearance. The parametric model that we propose can easily be sampled in aprincipled manner.Because we have taken a statistical approach, we have been able to address therun-time efficiency of the pixel-wise algorithm, providing a natural extensionof the method to a patch-wise algorithm. Thus, our approach is a parametricgeneralisation of the Efros and Leung and Efros and Freeman algorithms.As we shall see in Section 6.7, the parametric approach allows us to perform nov-elty detection by assigning a likelihood to each pixel, which would be problematicand computationally expensive using a non-parametric approach.One can view the aim of texture synthesis as follows: to produce original examplesof an existing texture, which are both specific and general—i.e. the generatedtextures are similar to, and span the range of, the textural appearance observedin the set of example textures. The non-parametric methods use direct copying ofpixel values in an attempt to achieve specificity and ad hoc sampling methods inan attempt to achieve generality. In contrast, the parametric method we proposeattempts to achieve both specificity and generality by sampling from the learneddistribution.A potential drawback of the parametric approach is that synthesised pixels or
  • Chapter 6—Some texture models and synthetic textures 178patches will not necessarily exist in the training set (they are sampled from amodel, rather than copied from legal examples). While illegal pixel values orpatches cannot be synthesised by the non-parametric approach, they are simplyunlikely under the parametric method.6.6 Some texture models and synthetic texturesHaving described the theory behind our parametric texture model, we now showsynthesis results for two mammographic textures. The first is a “fractal”3 texturewhich is generated using a simple procedure. This texture is similar to mammo-graphic parenchymal patterns. Unlike real mammographic texture, these fractaltextures are stationary, and so the key assumption of our model is well-matchedto the properties of the texture. The fractal texture served as a useful “sanitycheck” while developing the model. The second set of textures are regions takenfrom real digitised mammograms.6.6.1 A model of fractal mammographic textureThe recursive procedure for generating the fractal textures is shown in Algo-rithm 84 . For the training images used to build the model presented in thissection, the initial size of the image was 4 × 4 pixels, and the algorithm was rununtil the image was 256 × 256 pixels. Example training textures are shown in 3 We refer to this type of texture as being fractal-like because of the generation process,which involves applying the same algorithm at a number of scales. It is the generative processthat is self-similar, rather than the final texture. 4 Implementation by Arjun Viswanathan.
  • Chapter 6—Some texture models and synthetic textures 179Algorithm 8 Fractal mammographic texture algorithm. An n × n grey-scale image matrix is initialised with random pixel values, sampled uniformly on [0, 1]. repeat The function underlying the image is interpolated to form an new image matrix with four times the number of pixels (i.e. each of the pixels in the previous image corresponds to four pixels in the new image). Each pixel value is perturbed by adding uniform random noise, sampled uniformly on [0, 1], and scaled by 2−i , where i is the iteration number. until the image reaches a predefined size.Figure 6.2A Gaussian mixture model of the fractal texture was built using the approachdescribed in Section 6.4, using 10 training textures generated by 10 runs of Al-gorithm 8, a window size of 11 × 11 and 50 model components (though 10 werediscarded by the k-means algorithm due to weak support). Some unconditionalsampled patches are shown in Figure 6.1. Examples of synthetic textures gener-ated from the model using both pixel- and patch-wise sampling (as described inSection 6 and Section 7) are shown in Figure 6.26.6.2 A model of real mammographic textureA Gaussian mixture model of the real texture was built using the approach de-scribed in Section 6.4, using 10 training textures that were manually selected fromthe Digital Database of Screening Mammography [83] to represent the range ofreal mammographic textural variation, a window size of 11 × 11 and 50 modelcomponents (again, 10 were discarded by the k-means algorithm due to weak sup-port). Some unconditional sampled patches are shown in Figure 6.3. Examples
  • Chapter 6—Some texture models and synthetic textures 180Figure 6.1: Unconditional samples from the fractal model.196 samples from the model of fractal texture. For this figure, all model compo-nents were equally likely to be sampled from.
  • Chapter 6—Some texture models and synthetic textures 181Figure 6.2: Fractal training and synthetic textures.Top row: Three training images. Middle row: Synthetic textures produced us-ing the pixel-wise algorithm. Bottom row: Synthetic textures produced using thepatch-wise algorithm.
  • Chapter 6—Some texture models and synthetic textures 182Figure 6.3: Unconditional samples from the real mammographic texture model.196 samples from the model of real mammographic texture. For this figure, allmodel components were equally likely to be sampled from.of synthetic textures generated from the model using both pixel- and patch-wisesampling (as described in Section 6 and Section 7) are shown in Figure 6.46.6.3 The quality of the synthetic texturesIt is relatively easy to qualitatively assess the quality of the synthetic textures,simply by comparing them to the training textures (a more quantitative evalua-
  • Chapter 6—Some texture models and synthetic textures 183Figure 6.4: Real training and synthetic textures.Top row: Three training images. Middle row: Synthetic textures produced us-ing the pixel-wise algorithm. Bottom row: Synthetic textures produced using thepatch-wise algorithm.
  • Chapter 6—Some texture models and synthetic textures 184tion is described in Section 7.2). In the case of the fractal texture, it is subjectivelyclear that the pixel-wise method produces very good synthetic textures, while thepatch-wise method produces much less convincing results. Except for the fractalmammographic texture, the synthetic textures that we have generated using thepatch-wise algorithm have been subjectively very similar to those produced bythe pixel-wise algorithm (where the same model was used by each algorithm). Itis not clear why the fractal patch-wise textures are so poor, but detailed work todetermine this is beyond the scope of this thesis.In the case of the real mammographic textures, both the pixel-wise and patch-wisemethods produce reasonable results, but the synthetic images are easily distin-guishable from the training images. The synthetic real mammographic texturesdo capture the local textural appearance of the training images, but the overallappearances are subjectively quite dissimilar. The most likely explanation forthis is that structure exists in mammograms on a number of levels; the texturewill be determined by local tissue type (e.g. glandular, fatty) and higher-levelstructure (such as a duct). The high-level structure breaks the assumption ofstationarity.It is possible for two areas in synthetic images to develop independently beforeultimately converging. If these two areas have different textural appearances, thenpixels or patches that are synthesised where the areas meet are forced to mergeone type of textural appearance into another. This can cause a discontinuity. Wehave not investigated strategies that may prevent this behaviour, but such workmay yield better synthetic textures. More extreme examples of this type of failureare shown in Figure 6.5. When the texture appears to have been adequately
  • Chapter 6—Some texture models and synthetic textures 185Figure 6.5: Examples of synthesis failure using patch-wise synthesis with amodel of real mammographic appearance.modelled, we estimate that failures of this type occur in fewer than 1 in 20attempts. Because all parts of the appearance space have non-zero density, it isalso possible for the synthesis procedure to transition to and “get stuck” in a partof the appearance space which is illegal. This results in incorrect texture beinggenerated. The frequency of such failure may be reduced by learning a “bettermodel”—this is possible because the k-means and EM algorithms converge tolocally optimal solutions. Determining if a particular model is the best is stillan open research question. We estimate that failures of this type occur in fewerthan 1 in 10 attempts.6.6.4 Time and space requirements of the parametric methodThe time required to build a parametric model of textural appearance dependsupon the number and size of the training images. Using all the training windowsin a set of training images with the divide-and-conquer algorithm is computa-tionally expensive. For example: building a model of 10 images, each an average
  • Chapter 6—Some texture models and synthetic textures 186of 300 × 300 pixels, can take a few days. Using a subset of the training set con-taining 20 000 training windows and building the model using the EM algorithm,a model can be built in around 12 hours5 .Storing the model is trivial on a modern workstation. The models of fractal andreal mammographic texture have—by coincidence—the same number of modelcomponents, and each uses 11×11 pixel windows. To encode such a model requiresstoring 40 component probabilities, mean vectors and covariance matrices. Thematrix of mean vectors has 40 × 121 elements and each covariance matrix has121 × 121 elements. We therefore need to store 40 + (40 × 121) + 40(121 × 121) =590 520 parameters. If double precision representation (IEEE Standard 754 [100])is used to encode these parameters, then each parameter requires 8 bytes, and sothe model can be stored in 4 724 160 bytes—less than 5MB—without compression.Storing this model in an uncompressed form consumes more space than storingthe original images (since pixel values are usually represented using relatively lowprecision and compression is commonly used). However, because the size of themodel is fixed, synthesis or analysis of each pixel can essentially be performed inO(1) time, while the non-parametric methods are required to iterate over eachpossible window in the “training” set. Marginalisation is computationally cheap,while computing a conditional distribution—which must be done for each pixelor patch—is relatively expensive. Profiling reveals that approximately 98% of theparametric synthesis algorithms’ time is spent computing Moore-Penrose gener-alised inverses. Using our optimised implementation (see Section 5.5.2), each 5 These figures are for a workstation with a 1.3GHz Intel Pentium 4 processor with 512MBof physical memory.
  • Chapter 6—Novelty detection 187pixel or patch takes approximately 0.22 seconds to generate on a computationalserver with a 2.8GHz Intel Xeon Hyperthreaded processor with 2GB of physicalmemory6 . Using the pixel-wise algorithm, a 300 × 300 pixel image can be syn-thesised in 5.5 hours, while an image of the same size can be synthesised in a fewminutes using the patch-wise algorithm.6.7 Novelty detectionBecause we have an explicit statistical model of the appearance of local texture,it is possible to assign likelihoods to pixels, based upon a local neighbourhood.Pixels marked as unlikely should be interpreted as being novel. The noveltydetection algorithm is very similar to the pixel-wise synthesis algorithm and isdescribed in Algorithm 9. We assume an unseen image IU which may containtexture that is not of the modelled class and a model of the expected texturewith parameters Θ. We will form an image of log-likelihoods IL and a binaryimage indicating novel pixels IB .Note that the likelihood of an event is the probability of the event had it notactually occurred (i.e. probabilities refer to future events, while likelihoods referto past events). In order to compute true likelihoods (or log-likelihoods), the pdfdefined by the conditional model would need to be integrated between suitablelimits. Since the conditional pdf—given by pΘ∗ (x)—is univariate, this is relativelysimple: a+r + L(a) = pΘ∗ (x) dx (6.1) a−r − 6 This machine was shared with one other large computational job.
  • Chapter 6—Novelty detection 188Algorithm 9 Novelty detection using a Gaussian mixture model of texture. for each pixel location p ∈ IU do Extract a square window, of size w × w, represented as a vector t, centred around the pixel at location p. if the window overlaps the edge of the image then Marginalise the model over the dimensions that lie outside IU . end if Condition the model upon all values in t, except for the centre pixel. Let the resulting univariate model be parameterised by Θ∗ . Compute the log-likelihood, l, of the centre pixel value under the model parameterised by Θ∗ (see the text for more details). Assign l to the pixel at location p in IL . end for In addition to the log-likelihood image, produce a binary image IB which identifies novel pixels using a threshold on the log-likelihoods (e.g. learned using an independent training set).where r− and r+ delimit the event and may be estimated from the expected noiseon the pixel value a. If we assume that the noise is constant then we may setr− = r+ = ∆ 2 (where ∆ defines a region around the pixel value a). If ∆ is suitablysmall, then Equation 6.1 can be approximated by L(a) ≈ ∆pΘ∗ (a). (6.2)We can use the conditional density at a as a proxy for the likelihood estimatedby Equation 6.2, as it is simply a scaling of Equation 6.2. If actual likelihoodsare required (for example by another system), then IL can be scaled using theestimate of ∆. In the novelty detection work in this thesis, we use the conditionaldensity at a as our likelihood measure.It is natural to ask if an analogue of the patch-wise synthesis algorithm could beused to efficiently perform novelty detection. Unfortunately, the density at any
  • Chapter 6—Summary 189point in a high dimensional space is vanishingly small, so a patch-wise noveltydetection algorithm cannot be used7 .6.8 SummaryThis chapter presented a parametric statistical model of stationary texture, de-veloped from the texture synthesis algorithm of Efros and Leung. In summary: • Efros and Leung’s texture synthesis algorithm was described. • A parametric version of Efros and Leung’s algorithm was developed. • Two methods of generating synthetic textures using the model were de- veloped. Our model can be viewed as a parametric generalisation of the methods of Efros and Leung and Efros and Freeman. • Synthetic textures that were generated using our model were presented and discussed. • A novelty detection method was developed that allows the parametric model to be used to analyse textures. 7 It may be possible to compute a few likelihoods at once, as a compromise between the twoextremes, but this has not been investigated further
  • Chapter 7Evaluating the texture model7.1 IntroductionThis chapter presents an evaluation of the parametric texture model for texturesynthesis and analysis. The chapter describes: • A psychophysical evaluation of synthetic mammographic textures produced using the model. • An evaluation of how well the model can detect abnormal features in sim- ulated and real mammographic images. 190
  • Chapter 7—Psychophysical evaluation of synthetic textures 1917.2 Psychophysical evaluation of synthetic tex- turesIt is relatively easy to make a personal qualitative assessment of whether a pair oftextures are similar or not. However, this approach is subjective and qualitative;an objective and quantitative approach is preferred. Few of the most frequentlycited papers in the texture modelling and synthesis literature present any suchevaluation (e.g. [53, 59, 60, 84, 144]). Little rigorous evaluation appears to be at-tempted. Brettle et al. evaluated methods for synthesising textures from medicalimages (including mammographic textures) using several texture measures [27].The synthetic images generated by Efros and Leung’s original method [60] werefound to be most realistic. Although texture features provide an objective andquantitative measure of textural properties, the best systems available to comparetextures are evolved biological vision systems, such as the human visual system.Psychophysical experiments can allow the human visual system to be used ob-jectively and quantitatively. We now present a psychophysical experiment thatevaluates the synthetic textures produced using our model.7.2.1 AimsThe primary aim of this experiment was to determine if textures generated usingthe parametric model of local texture can be differentiated from examples ofthe real texture. The secondary aim is to compare synthetic textures generatedusing the parametric model to those generated using Efros and Leung’s method.
  • Chapter 7—Psychophysical evaluation of synthetic textures 192Since the patch-wise synthetic images can easily be differentiated from the realtextures, we restrict ourselves to the pixel-wise images. We can therefore stateour experimental hypotheses: 1. Synthetic fractal mammographic textures generated by the parametric model are indistinguishable from real fractal mammographic textures. 2. Synthetic real mammographic textures generated by the parametric model are indistinguishable from real mammographic patches. 3. Synthetic fractal mammographic textures generated by the parametric model are more like real fractal mammographic texture than those produced using Efros and Leung’s method. 4. Synthetic real mammographic textures generated by the parametric model are more like real mammographic patches than those produced using Efros and Leung’s method.7.2.2 MethodThe forced-choice paradigm is well-suited to the process of comparing a pair oftextures. Participants were presented with a series of three textures which weshall call Image A, Image B and a reference image.In the case of experiments 1 and 2, the reference image was an example of a realtexture selected from the training set, Image A was a synthetic texture generatedusing the parametric model and Image B was an example of a real texture selectedfrom the training set (though different to the reference image).
  • Chapter 7—Psychophysical evaluation of synthetic textures 193In the case of experiments 3 and 4, the reference image was an example of a realtexture selected from the training set, Image A was a synthetic texture generatedusing the parametric model and Image B was a synthetic texture generated usingEfros and Leung’s method. Note that there was an important difference betweenthe way that the Image A synthetic textures and Image B synthetic textureswere generated in experiments 3 and 4: Image A textures were generated from amodel that was trained on a number of mammographic images while each ImageB texture was generated from a single “training” image using Efros and Leung’smethod. This was necessary because the Efros and Leung algorithm scales poorlywith the number of “training” pixels. It was expected that this would result inthe Image B textures to be highly specific to the image they were generated fromand appear to be more consistent and “plausible”.For each set of three images, all were of the same class of texture. The imageswere arranged in a row with the reference image in the centre. Image A wouldappear to the left of the reference image with probability 0.5, and the position ofImage B was set accordingly. Trials corresponding to the four hypotheses werepresented in a random order, so that participants could not easily guess the exactexperimental design and introduce bias into their responses. The participantswere asked to compare Image A and Image B to the reference image and choosethe one they thought to be most similar to the reference image.Each of the three images could be drawn from a set of 10 images (e.g. there were 10synthetic reals generated using our method, 10 synthetic fractals generated usingour method and 10 real mammographic images). The images in the trainingsets—and hence the reference image sets—were manually selected such that the
  • Chapter 7—Psychophysical evaluation of synthetic textures 194sets represented a broad range of textural appearance for the class of texturebeing investigated. No synthesised images were excluded (e.g. on the basis thatsynthesis failure occurred). Each participant was shown 10 image sets for eachexperiment. The number of images of each type was limited by the computationaltime required to synthesise the four classes of synthetic image and the number ofimages presented to each participant was selected such that the experiment couldbe completed within a reasonably short space of time (approximately 5 minutes).Ideally, the experiment would have been conducted using more reference andsynthesised textures. This would minimise the probability of the participantsseeing the same images (or combination thereof) and could more accurately reflectthe distribution of the various “types” of textural appearance. However, webelieve that the design achieves these aims to the maximum extent possible underthe time constraints imposed by the synthesis algorithms and the participants’patience.The experiment was implemented as a Internet-based application, delivered viaan XHTML [138] interface. Image A and Image B were hyperlinks that reportedthe participants’ choices to the application. The names of the image files weredisguised as “random” strings of text, so that web browser software could notdisclose the “correct” image by displaying the image filenames on-screen. Thehyperlink encoding of the participants’ selections was similarly disguised. The re-sponses were recorded in a database upon completion of the experiment (i.e. onlyresults from those who completed the experiment were recorded). A screenshotof one of the trials is shown in Figure 7.1. The number of times Image A andImage B were chosen was recorded for each participant, allowing χ2 analysis by
  • Chapter 7—Psychophysical evaluation of synthetic textures 195 Figure 7.1: A screenshot of one of the trials.pooling all participants.The experiment was run twice. The aim of the first run was to test the applicationusing a small number of participants. The experiment was advertised in an emailto all members of the division of Imaging Science and Biomedical Engineeringat the University of Manchester. The first run of the experiment attracted 24participants. The aim of the second run was to get as many people as possibleto take part. The experiment was advertised to all students—undergraduatesand postgraduates—of the University of Manchester via email. The second runof the experiment attracted 1 777 participants. Participants were therefore self-selecting, and we did not control for factors such as age and sex.
  • Chapter 7—Psychophysical evaluation of synthetic textures 196 Experiment Image A selection (small run) Image A selection (large run) 1 29% (of 240 trials) 34% (of 17 770 trials) 2 27% (of 240 trials) 28% (of 17 770 trials) 3 38% (of 240 trials) 41% (of 17 770 trials) 4 25% (of 240 trials) 26% (of 17 770 trials)Table 7.1: Results for the psychophysical experiment.Row 1: Synthetic fractal textures generated by our model versus real fractal tex-tures. Row 2: Synthetic real mammographic textures generated by our model ver-sus real fractal textures. Row 3: Synthetic fractal textures generated by our modelversus those generated using Efros and Leung’s algorithm. Row 4: Real mammo-graphic textures generated by our model versus those generated using Efros andLeung’s algorithm.7.2.3 ResultsResults for the small and large runs are shown in Table 7.1. The tables showsthe number of “votes” for Image A (the synthetic textures generated using ourparametric model) as a percentage of the total for each of the four experimentalconditions (refer to the Section 7.2). Image B was selected more often in all cases,and this result is statistically significant at the 95% significance level.7.2.4 DiscussionIn experimental conditions 1 and 2, we would have liked Image A to have beenchosen 50% of the time. This would have suggested that participants could notdifferentiate between the real examples of the two texture classes and those gener-ated by the parametric method. The results indicate that participants were ableto differentiate between the real and synthetic textures, but the synthetic imageswere realistic enough that the participants mistook them for the real textures
  • Chapter 7—Initial validation of the novelty detection method 197about a third of the time. Subjectively, the simulated fractal mammographictextures appear to be modelled more successfully than the real mammographictexture, and the results support this observation.In experimental conditions 3 and 4, we would have liked Image A to have been se-lected in preference to Image B (i.e. more than 50%). This would have suggestedthat participants thought that the synthetic images generated using the paramet-ric model were more like the real textures than the synthetic images generatedusing Efros and Leung’s method. The results indicate that participants were ableto differentiate between the synthetic images generated using the two methods.As suspected, participants favoured the synthetic images generated using Efrosand Leung’s method, but the images generated using the parametric method werepreferred in 41% (fractal mammographic texture) and 26% (real mammographictexture) of cases. However, this experimental condition was heavily biased infavour of Efros and Leung’s method because of the difference in the way that thetraining set was utilised by the two methods. Efros and Leung’s method producesmore specific textures but it cannot be used to analyse textures.7.3 Initial validation of the novelty detection methodIn order to use the novelty detection method developed in Chapter 6, we need tobe confident that it can perform texture discrimination. To validate the method,a simple experiment was performed.
  • Chapter 7—Initial validation of the novelty detection method 198Figure 7.2: Fractal and scrambled textures.A fractal texture is shown on the left. The right-hand texture is the left-handfractal texture after being scrambled. The grey-level histograms of both texturesare identical.7.3.1 AimThe aim of this experiment is to determine if the novelty detection method candiscriminate between two textures with similar characteristics.7.3.2 MethodA fractal mammographic image was generated. A second image was generatedfrom the first by scrambling the pixel locations. The resulting image has exactlythe same histogram as the fractal image, but has a different texture. An exampleis shown in Figure 7.2. Log-likelihood images were generated for each image byapplying Algorithm 9. The log-likelihood values obtained for each image werethen compared.
  • Chapter 7—Initial validation of the novelty detection method 199 Figure 7.3: ROC curve for texture discrimination.7.3.3 ResultsFigure 7.3 shows a ROC curve that was generated by varying a threshold on thelog-likelihood values to classify pixels as belonging to the fractal image or thescrambled image. The ROC curve shows excellent discrimination (Az = 0.98).Analysis of the log-likelihood histograms shows that pixels in the scrambled imageare considered to be less likely than those in the fractal image.7.3.4 DiscussionThe results show that the novelty detection method can discriminate on the basisof local textural appearance (rather than just pixel intensity, for example). Thisis important because although mammographic abnormalities often appear to be
  • Chapter 7—Evaluation of novelty detection performance 200brighter than their surrounding tissue, the pixel values are often within the rangeof those for normal tissue. The novelty detection method should also functionas a “brightness detector”—regions that are unusually bright (or dim) should beconsidered unlikely by our model.7.4 Evaluation of novelty detection performance7.4.1 IntroductionWe performed a number of novelty detection experiments on abnormal mam-mographic patches. Two types of mammographic texture were used: simulatedabnormal mammographic textures (based on the fractal textures) and patchesfrom real mammograms. Two classes of abnormality were used for each typeof texture: masses and calcifications. In the case of the fractal textures, theseabnormalities were simulated.7.4.2 AimsThe aim of these experiments was to determine whether a single model of mam-mographic textural appearance (for a particular class of texture) can be used todetect different forms of abnormality (where a conventional pattern recognitionalgorithm approach would require multiple classifiers and probably multiple typesof feature descriptor).The experiments were designed to answer the following questions:
  • Chapter 7—Evaluation of novelty detection performance 201 1. How well can abnormalities be detected in simulated mammographic tex- tures where the textures contain simulated calcifications? 2. How well can abnormalities be detected in simulated mammographic tex- tures where the textures contain simulated masses? 3. How well can abnormalities be detected in simulated mammographic tex- tures where the textures contain both simulated masses and calcifications? 4. How well can abnormalities be detected in real mammographic textures where the textures contain real calcifications? 5. How well can abnormalities be detected in real mammographic textures where the textures contain real masses? 6. How well can abnormalities be detected in real mammographic textures where the textures contain both real masses and calcifications?7.4.3 MethodFor each experiment a set of images were generated or collected which containedthe desired type, or types, of abnormality. We shortly describe how the simulatedabnormalities were generated. The real microcalcification patches were selectedby a colleague from a local database (as pixel-level expert annotation was avail-able) on the basis that the set should represent a broad range of appearancesof that class of abnormality. The other real data was pseudo-randomly selectedfrom the Digital Database for Screening Mammography (DDSM) [83]. Because
  • Chapter 7—Evaluation of novelty detection performance 202the analysis process is computationally expensive, the real mammographic im-ages were processed at low resolution (150 µm), using a model trained on 10pathology-free patches from the DDSM (scaled to the same resolution), selectedin the same way as above. Each test set contained 10–20 regions of interest.The size of the test sets were limited due to the computational time required toperform the analysis task.In the case of the simulated abnormalities, groundtruth images were automati-cally generated. In the case of the real mammographic data the groundtruth wasprovided by a digital mammography researcher1 . Care was taken during the an-notation to ensure that the groundtruth was as detailed as possible, rather thansimply marking the centres of abnormalities or providing coarse indications suchas circles that contain the abnormalities. In the case of the microcalcificationimages, for example, each microcalcification was individually annotated.We did not include separate normal images for analysis alongside the simulatedimages or the real microcalcification images for computational expediency. Thegroundtruth annotations were interpreted strictly: we considered “hits” on pixelslabelled as abnormal to be true positive detections, “hits” on pixels labelled asnormal to be false positive detections and so on for the true negative and falsenegative possibilities. Relative to the majority of results published in the liter-ature, this interpretation of groundtruth produces a pessimistic evaluation of adetection system because a “hit” close to an abnormal feature would be likely todraw a clinician’s attention to the area and so would be clinically useful2 . How- 1 Michael Board, a third year digital mammography PhD student in the Division of ImagingScience and Biomedical Engineering at the University of Manchester. 2 In the computer-aided mammography literature it is common to consider a single correct
  • Chapter 7—Evaluation of novelty detection performance 203ever, we believe that our detection criteria are appropriate for experiments onrelatively small test images because we want to measure how well the methoddetects specific indicative signs of abnormality, rather than measuring how wellthe method would alert clinicians to the presence of abnormality (which wouldpresumably result from accurate detections of abnormal features). The strictinterpretation of groundtruth treats each pixel in the test images as a separatedata point, delivering a large sample from a seemingly small training set. Thisallows the area under the ROC curve to be accurately estimated (i.e. with smallstandard error—see Section 7.4.4). However, results from experiments that use asmall number of images are obviously less representative than those from exper-iments that use a large number of images.Each pixel in each test image was assigned a log-likelihood—using the appropriatemodel—as described in Section 6.7. ROC analysis was performed on each set ofresults by thresholding the log-likelihoods, and the resulting classifications werecompared to the groundtruth annotation.For the real mass images, we found that pixels labelled as being masses were givenvery similar log-likelihoods to the surrounding non-mass pixels, to the extent thatno discrimination could be achieved. This was either because our approach failson this class of image, or because it is unrealistic to consider tissue close to a massto be normal (e.g. it may be distorted by the presence of the mass). For this classof image, we also analysed a set of 10 pathology-free images, and considered allpixels in an abnormal image to be abnormal, and all pixels in a normal image to“hit” in a coarsely annotated abnormal region to be a true positive detection of that region,irrespective of its absolute location or incorrect “hits” or “misses” within that region.
  • Chapter 7—Evaluation of novelty detection performance 204be normal.Generating synthetic calcifications and massesThe simulation of mammographic abnormalities has been investigated previously.Highnam et al. investigated adding simulated and real masses to mammogramsrepresented using hint (see Section 3.3 for details of hint ) [87]. Simulated masseswere generated by inferring 3-D models of 2-D mass shapes (obtained, for exam-ple, from annotations of real masses). The hint values of the real masses wereestimated by subtracting the average hint value of the surrounding non-mass re-gion from those of the mass region. In each case, the estimated mass hint valueswere then simply added to normal hint mammograms. Caulkin et al. modelledthe appearance of spiculated masses by estimating the contribution of the nor-mal tissue to the abnormal region and then learning statistical models of thecontributions due to the central mass and spicules [38]. A model of spicule place-ment and number was also learned. Simulated spiculated lesions were generatedby sampling from the models and adding the simulated abnormalities into nor-mal mammograms. Claridge and Richter modelled the cross-sectional profile ofmasses by convolving a step-edge function with a Gaussian kernel and then ro-tating the resulting function to form a surface, where the height is proportionalto the attenuation due to the mass [44] (a similar approach is described in moredetail below). Bliznakova et al. modelled masses, spicules and microcalcificationswithin a 3-D breast model (see Section 9.2.2 for a more detailed description ofthe approach) [16].
  • Chapter 7—Evaluation of novelty detection performance 205In our work, both the simulated calcification and mass images were based uponthe fractal mammographic textures described in Section 6.6.1. Fractal back-grounds were generated and simulated calcifications or masses were introducedusing an additive process, mimicking the attenuation process. We describe howeach type of abnormality was modelled below.The simulated microcalcifications were modelled by ellipses, rotated to randomangles and placed in clusters. The number of calcifications in each image was fixedat 30. Algorithm 10 describes how the microcalcifications were simulated and howthe groundtruth was generated. Note that although the simulated microcalcifi-cation shape and spatial distribution were modelled in an ad hoc way (thoughbroadly consistent with real data), microcalcification brightness was modelled tobe consistent with real data.Real mammographic masses can have well-defined borders, diffuse borders orspiculations. We decided to model masses with diffuse borders because well-defined borders may to too easy to detect, while spiculated lesions would beharder to model when only a simple simulation of abnormality is required.One might model a mass in a breast as being a sphere of uniform density, situatedwithin the normal breast tissue and distorted by the compression of the breastbetween two plates. While a detailed analytical model of the problem could bederived, a reasonable approximation that yields suitable test images would beacceptable. We experimented with three methods of simulating masses. All themethods modify the fractal background pixel values within a disc, and differ inhow the abnormal pixel values are modelled.
  • Chapter 7—Evaluation of novelty detection performance 206Algorithm 10 Simulating microcalcification clusters. Generate a fractal texture using Algorithm 8 Determine the centre of the image and consider the image edges to be 6 standard deviations from the centre. Select locations for the calcifications using the resulting bivariate normal distribution. for each calcification to be generated do Generate a 100 × 100 pixel disc. Warp the disc to a random elliptical shape (with a mean eccentricity of 2 and associated variance of 0.5). Rotate the ellipse to a random angle. Convolve the ellipse with a Gaussian kernel (with a standard deviation of 10 pixels), to remove the hard edges. Resize the ellipse to be 4 pixels long along its major axis. Normalise the ellipse so that its maximum values are set to unity, and the other pixels are scaled accordingly. Scale the simulated microcalcification pixel values such that when added to the image, the ratio of the mean calcification pixel value to the mean fractal background pixel value is normalised to the same ratio for real mammograms containing microcalcifications. Insert the calcification into the fractal image by adding the pixel values of the calcification to the background image pixel values. end for Compute the groundtruth image by subtracting the calcification image from the original fractal image and then threshold at a low value to discard the effect of the convolution with the Gaussian kernel.
  • Chapter 7—Evaluation of novelty detection performance 207Our first method adds a two dimensional Gaussian to the fractal background.This method produces very diffuse borders. An example of such an image isshown in Figure 7.6a.Our other two methods use concentric discs, where the central disc has uniformpixel values and the annulus transitions from the uniform value to zero. Thefirst of these methods assumes that the compressed mass has a cross-sectionas illustrated in the top graph in Figure 7.4. The cross-section of the annularregion is semi-circular with radius k. The function, f (d), that describes the X-ray attenuation in this model (i.e. the depth of the mass) is described by:  0 : d>m      f (d) = 1 : d<m−k (7.1)    1 − ( d−(m−k) )2 : otherwise   kwhere d is distance from the centre of the concentric discs, m is the distance alongd of the simulated mass boundary and k is the difference between the radii thatdescribe the two discs. The value of f (d) describes the thickness of the simulatedmass, which is a chord that is perpendicular to the d axis when d > m − kand d < m. We call this method the circle chord method. An example ofthe simulated mass images produced using the circle chord method is shown inFigure 7.6b.The second variant of the concentric discs model uses a sigmoid function to de-
  • Chapter 7—Evaluation of novelty detection performance 208Figure 7.4: The circle chord attenuation function.The top graph shows a cross-section of the model of the compressed mass. Thebottom graph shows the attenuation function, f (d). Figure 7.5: The sigmoid attenuation function.scribe the attenuation in the annular region: 1 f (d) = (7.2) 1 −α 1− k d−(m−k) 1+ewhere d, m and k are as before. The constant α determines the shape of thesigmoid and we use a value of 6. An illustration of the sigmoid attenuationfunction is shown in Figure 7.5. An example of the simulated mass imagesproduced using the sigmoid method is shown in Figure 7.6c.
  • Chapter 7—Evaluation of novelty detection performance 209 (a) (b) (c)Figure 7.6: Examples of simulated masses using the three methods.Gaussian method (a); circle chord method (b); sigmoid method (c).For each of the methods, the magnitude of f (d) was scaled so that the ratio of themean mass pixel value to the mean non-mass pixel value was equal to that foundin real mass images (i.e. we attempt to accurately model mass brightness). Weassessed by inspection that the sigmoid method produces acceptable test images.Because the novelty detection algorithm is computationally expensive, we limitedthe number of pixels to be analysed by cropping the mass images to contain anequal number of mass and non-mass pixels. The groundtruth was generated bycomputing the difference image between the original fractal images and the resultafter adding the synthetic mass.Although the mass region can be easily identified by eye, the actual pixel valuesare not necessarily higher than those in the fractal backgrounds. This is the casewhen, for example, a mass pixel is added to a relatively dark background region.If all of the mass pixel values (after being added to the fractal background) werehigher than those in fractal image (without the presence of a simulated mass),then simply thresholding the images would identify the mass region. However,
  • Chapter 7—Evaluation of novelty detection performance 210this is not the case.7.4.4 ResultsResults for simulated microcalcificationsFigure 7.7 shows a simulated microcalcification image and Figure 7.7 the corre-sponding log-likelihood image. Figure 7.7(c) shows the ROC curve for all simu-lated microcalcification images. The area under the curve is approximately 0.92.Results for simulated massesFigure 7.8 shows one of the simulated masses and the corresponding log-likelihoodimage. Figure 7.8(c) shows the ROC curve for all simulated mass images. Thearea under the curve is approximately 0.64.Results for simulated mass and microcalcifications (combined)Figure 7.9 shows the ROC curve for the experiment where the single noveltydetection method is used to detect both simulated masses and microcalcifications(in equal proportions). The area under the curve is approximately 0.75.
  • Chapter 7—Evaluation of novelty detection performance 211 (a) (b) (c)Figure 7.7: Example log-likelihood image and ROC curve for simulated micro-calcifications.An example simulated microcalcification image (a); the corresponding log-likelihood image (b); the ROC curve for the simulated calcification images (c).The log-likelihoods range from −276 to −0.5.
  • Chapter 7—Evaluation of novelty detection performance 212 (a) (b) (c)Figure 7.8: Example log-likelihood image and ROC curve for a simulated mass.An example simulated mass image (a); the corresponding log-likelihood image (b);the ROC curve for the simulated mass images (c). The log-likelihoods range from−216 to −0.5.
  • Chapter 7—Evaluation of novelty detection performance 213Figure 7.9: ROC curve for simulated masses and microcalcifications (combined).
  • Chapter 7—Evaluation of novelty detection performance 214Results for real microcalcificationsFigure 7.10 shows a sample microcalcification cluster along with the correspond-ing groundtruth and log-likelihood images. Figure 7.10(d) shows the ROC curvefor all the test images. The area under the curve is approximately 0.56.Results for real massesFigure 7.11 shows the ROC curve for the real mass experiment. The area un-der the curve is approximately 0.54. We do not show sample images (see thediscussion of these results in Section 7.4.5).Results for real mass and microcalcifications (combined)Figure 7.12 shows the ROC curve for the real mass experiment. The area underthe curve is approximately 0.53. A hypothesis test at the 95% significance levelusing the method described by Hanley and McNeil [78] showed that there was astatistically significant difference between the area under the ROC curve and thearea under the curve corresponding to random discrimination (i.e. the diagonalline with area equal to 0.5)3 . 3 It was assumed that the diagonal had the same number of data points as the ROC curve.
  • Chapter 7—Evaluation of novelty detection performance 215 (a) (b) (c) (d)Figure 7.10: Example log-likelihood image and ROC curve for a real microcal-cification cluster.An example real microcalcification image (a); the corresponding groundtruth im-age (b); the corresponding log-likelihood image (c); the ROC curve for the simu-lated mass images (d). The log-likelihoods range from −25 to −0.3.
  • Chapter 7—Evaluation of novelty detection performance 216 Figure 7.11: ROC curve for real masses.
  • Chapter 7—Evaluation of novelty detection performance 217Figure 7.12: ROC curve for real microcalcifications and masses (combined).
  • Chapter 7—Evaluation of novelty detection performance 2187.4.5 DiscussionSimulated microcalcificationsAs Figure 7.10(b) shows, the synthetic microcalcifications are easily identified,which is reflected in the corresponding ROC curve. There are no false-positives inthe normal regions; however, because of the strict pixel-wise evaluation criterion,there are a few false-positives at the microcalcification edges. This is where thesampled window is not centred on a pixel that is labelled as abnormal, but doesborder an abnormal pixel. The result is that the model is partially conditionedupon abnormal image data—which biases the conditional model—yielding lowerlog-likelihood values for the centre pixel. We will call this the local bias effect.Note that the log-likelihoods in the abnormal regions of the simulated micro-calcifications are lower than for the simulated masses, which corresponds withsubjective assertions that microcalcifications are easier to detect than masses.Simulated massesThe results for the simulated masses are not as good as for the simulated micro-calcifications, but Figure 7.8(b) shows that the mass is identified. The annularregion of the simulated mass is marked as being more abnormal than the centralregion. This may be because the model was not trained on images with thissort of intensity change, while the model did see the more uniform texture ofsimilar brightness from the centre of the simulated mass during training. Thelog-likelihoods for the central area are close to those of the normal background
  • Chapter 7—Evaluation of novelty detection performance 219texture, and this is reflected in the ROC curve in Figure 7.8(c).Simulated masses and microcalcifications (combined)Figure 7.9 shows the results of using the same model and method to detect bothtypes of abnormality, and shows that this is possible. Although the data weresimulated, this result is important because it shows that it is possible to identifymore than one type of abnormality using a single method. Note that Figure 7.9was constructed from data where the ratio of microcalcification to mass data wasequal to unity, so that performance on one type of abnormality did not contributedisproportionately.Real microcalcificationsThe ROC curve in Figure 7.10(d) is disappointing and indicates that the methodperforms only slightly better than a random classifier (as indicated by the reddiagonal line, which represents chance). Microcalcifications are considered tobe easy to detect because they are often very bright against the mammographicbackground. However, as Figure 7.10(a) shows, this is not always true. It appearsthat the local bias effect may also contribute to the poor performance. The log-likelihoods in the calcified area tend to be lower, but the most “unlikely” pixels donot correspond exactly to the individual microcalcifications—instead they tendto be a few pixels away. There are pixels in the uncalcified tissue which havelow log-likelihoods, and are essentially false positives. This may be because themodel is not sufficiently specific to pathology-free appearance, because the tissue
  • Chapter 7—Summary 220really was abnormal (and unannotated) or because it is incorrect to label tissueso close to a microcalcification cluster as being normal.Real massesThe results for real masses are similar to those for the real microcalcifications: themethod performs only slightly better than a random classifier. Because we usedseparate normal and abnormal test sets, this result adds weight to the hypothesisthat the model is not specific enough to pathology-free appearance. In the case ofmasses, it is unreasonable to expect a small local window to detect abnormality.A better approach might be to adopt a multi-scale approach where likelihoodsare propagated downwards, such as was used by Liu et al. [120].Real masses and microcalcifications (combined)Although the performance on real data is relatively poor, this result indicatesthat some discrimination of more than one class of abnormality can be achievedusing a single method.7.5 SummaryThis chapter presented an evaluation of the parametric texture model. In sum-mary: • A psychophysical evaluation was reported. The experiment was deployed
  • Chapter 7—Summary 221 as a Internet-based application. The application was tested by a small number of participants and then advertised to all students at the University of Manchester. • The synthetic textures were not indistinguishable from the real textures, but were selected in approximately one third of trials. • The synthetic images generated by Efros and Leung’s algorithm were con- sidered more realistic than those generated by the parametric model. The textures generated using the parametric model were selected in 26% and 41% of trials. However, the images generated by the Efros and Leung algo- rithm used a more specific “training” set than was used to train the para- metric model. Direct comparison of the two approaches should consider this experimental bias and the ability of the parametric model to analyse images via novelty detection. • A novelty detection experiment was reported. Simulated and real microcal- cification and mass images were analysed using parametric models. Results for the simulated data show that the novelty detection approach can success- fully detect multiple types of abnormality using a single method. Results for the real data show that some discrimination was possible, but significant improvement is needed. This may be achieved by improving the specificity of the model and the adoption of a hierarchical strategy.
  • Chapter 8GMMs in principal componentsspaces and low-dimensionaltexture models8.1 IntroductionThis chapter presents a method for learning Gaussian mixture models in low-dimensional spaces and describes how the parametric texture model may be im-proved by doing so. The chapter describes: • The motivation for learning in low-dimensional spaces. • How principal components analysis can be used to build Gaussian mixture models—and hence our parametric texture model—in a low-dimensional 222
  • Chapter 8—Dimensionality reduction 223 space that approximates the natural space of the data. • How textures can be synthesised using such a model.8.2 Dimensionality reductionThe dimensionality of the texture model described in Chapter 6 is reasonablyhigh. With an 11 × 11 window, for example, the model has 121 dimensions.Given that there is likely to be a high degree of correlation between neighbouringpixels in the windows, it is sensible to ask if this redundancy can be exploited.Dimensionality reduction can have a number of benefits in statistical modelling.Firstly, because the number of data points required to populate a space with fixeddensity increases exponentially with the number of dimensions, dimensionalityreduction can allow one to populate the space to be modelled more densely fora given size of training set. Secondly, since computations often involve iterationover the number of dimensions in the modelled space, dimensionality reductionmay allow us to develop more efficient algorithms.In the following sections we describe how a Gaussian mixture model can be builtin a low-dimensional space and how such a model may be used to perform texturesynthesis and analysis.
  • Chapter 8—Gaussian mixtures in principal components spaces 2248.3 Gaussian mixtures in principal components spacesA set of multivariate measurements, X = xi : i = 1, . . . , N , when thought ofas a cloud of points in a vector space, can be considered to have a set of mutuallyorthogonal axes that describe the main directions of variation. In general, theseaxes will not be aligned with the regular Cartesian axes (the covariance matrixof the data is unlikely to be diagonal).Principal Components Analysis (PCA) [104] is a technique that determines theseaxes (the principal components) and the variance associated with each. Theprincipal components are simply the eigenvectors of the covariance matrix, andthe variances are the associated eigenvalues. If we define P to be a matrix whereeach column is an eigenvector of the covariance matrix, then we can project xi , ameasurement in the natural data space, to a vector bi in the principal componentsspace and back again: bi = P T (xi − x) ¯ (8.1) xi = x + Pbi , ¯where x is the mean vector. (Note that since P has mutually orthogonal columns, ¯and each is a unit vector, it is orthonormal. Hence P −1 = P T .) Another wayof thinking about P is that it is the transformation needed to diagonalise the
  • Chapter 8—Gaussian mixtures in principal components spaces 225covariance matrix of the original data: Σb = P T Σx P (8.2)where Σb is the (diagonal) covariance matrix of the data in the principal compo-nents space and Σx is the covariance matrix of the data in the natural space.The total variance of the data can be computed by summing the eigenvalues.Since the eigenvalues describe the variance associated with each dimension ofthe principal components space, it is possible to discard eigenvectors with smallassociated variances. In this way, dimensionality reduction can be achieved: theoriginal data can be transformed into a lower-dimensional space while retainingan arbitrarily large proportion of the total variance. If P is constructed in thisway, then Equation 8.1 becomes approximate: bi ≈ P T (xi − x) ¯ (8.3) xi ≈ x + Pbi . ¯Building a Gaussian mixture model in a principal components space is simple.Compute Σx from X , perform an eigen decomposition to determine P (discard-ing eigenvectors to retain a given proportion of the total variance), and thenproject each xi into the lower-dimensional principal components space to formB = bi : i = 1, . . . , N . The Gaussian mixture model is then built using thedata in B.Once we have a low-dimensional model, we will need to compute conditional
  • Chapter 8—Gaussian mixtures in principal components spaces 226distributions in order to perform synthesis or analysis. It is not possible to ap-ply conditions directly to the principal components model—as we have done sofar—because the conditions and model exist in different spaces. Recall fromSection 5.5.2 that the conditional distribution p(x1 |x2 ) = N(µ , Σ ) with −1 µ = µ1 + Σ1,2 Σ2,2 (x2 − µ2 ) (8.4) −1 Σ = Σ1,1 − Σ1,2 Σ2,2 Σ2,1 . (8.5)We can partition the matrix P as:    P1  P = , (8.6) P2where the rows of P1 correspond to the unknown dimensions (i.e. x1 ) and therows of P2 correspond to the known dimensions (i.e. x2 ). We can write     T T  Σ1,1 Σ1,2  T  P1 Σb P1 P1 Σb P2  Σx =   ≈ PΣb P =  . (8.7) Σ2,1 Σ2,2 P2 Σb P1 T P2 Σb P2 TGiven Equation 8.7, it is straightforward to write approximations of the condi-tional mean vector and covariance matrix as: −1 µ = µ1 + Σ1,2 Σ2,2 (x2 − µ2 ) (8.8) ≈ µ1 + (P1 Σb P2 T )(P2 Σb P2 T )−1 (x2 − µ2 ) , −1 Σ = Σ1,1 − Σ1,2 Σ2,2 Σ2,1 (8.9) ≈ (P1 Σb P1 T ) − (P1 Σb P2 T )(P2 Σb P2 T )−1 (P2 Σb P1 T ) .
  • Chapter 8—Gaussian mixtures in principal components spaces 227It therefore seems likely that there is an elegant way to use Gaussian mixture mod-els in low-dimensional spaces. However, we have not yet considered how to com-pute the conditional component probabilities, P (i|x2 ) : i = 1, . . . , k . Unfortu-nately, in order to compute P (i|x2 ) : i = 1, . . . , k (as shown in Equation 5.35), P (x2 |i) : i = 1, . . . , k are required, which are computed by marginalising themodel—in its natural space—over the dimensions corresponding to x1 . Thismeans that two versions of the model are required: one in the principal compo-nents space, and one in the natural space. Though this seems awkward, it maybe acceptable if working in the principal components space is advantageous interms of computational efficiency or the more densely populated training spaceleads to better models.8.3.1 A numerical issueIn practice, the P2 Σb P2 T matrices are close to singular (numerically difficult toinvert). Common advice on dealing with this type of problem is to add a scaledidentity matrix to the ill-conditioned matrix. This increases the variances in thecorresponding distribution. The scalar is determined by the amount of varianceto be added to the distribution. In the current setting, this advice essentiallyassumes that the P2 Σb P2 T matrices are close to singular because not all of thevariance observed in the data was kept in the model as a result of the princi-pal components approximation. In the case of multiple model components, it isnot clear how to distribute the missing variance. We tried distributing the miss-ing variance in two ways: evenly over the components and in proportion to thecomponent probabilities. Neither approach performed satisfactorily.
  • Chapter 8—Texture synthesis in principal components spaces 228While we have found that computing the Moore-Penrose generalised inverse tobe the most satisfactory way to solve the problem for covariance matrices inthe natural data space, this approach does not work reliably in the principalcomponents case. We address this problem by computing the generalised inverseof Σb . If we keep all of the eigenvectors in P, then Σx = PΣb P T . By ignoringeigenvectors with small associated eigenvalues, Σb is a low rank approximation ofΣx , and similarly P2 Σb−1 P2 T is an approximation of Σx . −18.4 Texture synthesis in principal components spacesThe procedure for generating synthetic textures using a Gaussian mixture modelthat has been built in a principal components space is identical to the regularcase, except that the computation of the conditional distribution is performedas described in Section 8.3. Figure 8.1 shows an example training image takenfrom the MeasTex database1 and a synthetic texture generated using a Gaussianmixture model built in a principal components space. The model retained 95%of the total variance and the texture was produced using the patch-wise algo-rithm. The example shown is an example of successful synthesis. We were notable to synthesise mammographic textures to this level of quality using principalcomponents models. 1 Currently available at http://www.cssip.uq.edu.au/meastex/meastex.html
  • Chapter 8—Discussion 229 (a) (b)Figure 8.1: Synthesis using a principal components model.A training image is shown in (a) and a synthetic image is shown in (b).8.5 DiscussionOur aims in building reduced dimensionality models were: • To build better models by exploiting data redundancy to more densely populate the space of training examples. • To achieve faster training, synthesis and analysis.Although the second of these aims is partially achieved (training is faster, but theprojections used to compute conditionals result in slower synthesis), it is generallyat the expense of the quality of synthesis (and presumably analysis). Althoughthe texture shown in Figure 8.1 is one of the best synthetic textures in this thesis,textures generated using principal components models were generally not as goodas textures from models built in the natural space. The approximation used in
  • Chapter 8—Discussion 230the computation of the inverse of the P2 Σb P2 T matrices degrades the quality ofsynthesis (see below).There is a way to benefit from the advantages of dimensionality reduction withoutsuffering the disadvantages. It is possible to build the model in the low dimen-sional principal components space and then project the entire model into thenatural space for all subsequent processing (i.e. synthesis and analysis). Thus,model building is accelerated. Little degradation in the quality of the synthetictextures is observed (hence the loss in quality noted above can be attributed tothe approximation used in the computation of the inverse of the P2 Σb P2 T ma-trices). Further, we do not need to project the component covariance matriceseach time a conditional distribution needs to be computed, and so synthesis andanalysis can be performed at normal speed. There does not appear to be anynoticeable benefit from having a more densely populated training space in themodels that we have built. The procedure is as follows: the model is built in theprincipal components space as described in Section 8.3, and then projected intothe natural space as follows: µx,i = x + Pµb,i , ∀ i ∈ {1, · · · , k} , ¯ (8.10) Σx,i = PΣb,i P T , ∀ i ∈ {1, · · · , k} , (8.11)where µx,i and µb,i are the i-th component mean vectors in the natural and prin-cipal components spaces respectively, and Σx,i and Σx,i are the i-th componentcovariance matrices in the natural and principal components spaces respectively.The component probabilities, P (i) : i = 1, . . . , k , are unaffected by the pro-
  • Chapter 8—Summary 231jection.8.6 SummaryThis chapter presented a method for learning the parameters of a Gaussian mix-ture model—and hence a parametric texture model—in low-dimensional spaces.In summary: • There are two reasons why building Gaussian mixture models in low di- mensional spaces might be useful. First, compared to higher-dimensional spaces, fewer training points are required to populate a low-dimensional space at a given density, so more specific models may be able to be built. Second, algorithms often iterate over the dimensions of the data, so working in a low-dimensional space is likely to yield more efficient algorithms. • A method to learn the parameters of Gaussian mixture models in principal components spaces was developed. The closed-form method of computing conditional distributions was extended to the principal components model and a numerical issue arising from this was addressed. • It is not straightforward to marginalise a principal components model over dimensions in the natural space. This problem makes working in a principal components space less attractive. • A method for synthesising textures from a principal components parametric texture model was described.
  • Chapter 8—Summary 232 • A synthetic texture, generated from a principal components model, was presented. Although it is possible to achieve excellent results using the approach, results for principal components models were much more variable than for the models built in the natural data space. • Gaussian mixture models can be built in low-dimensional spaces and then projected into the natural data space. This allows models to be built in more densely populated spaces in less time and used as if they had been built in the natural data space.
  • Chapter 9A generative statistical model ofentire mammograms9.1 IntroductionThis chapter presents a parametric statistical model of the appearance of entiremammograms. The chapter describes: • Why mammograms are difficult to model. • Approaches other authors have used to solve the problem. • The structure of our model. • How the model parameters are learned from training data. • How synthetic mammograms can be generated using the model. 233
  • Chapter 9—Background 2349.2 Background9.2.1 Why are mammograms hard to model?Mammograms are difficult to model because they vary dramatically in appearanceand are digitised at high resolution; their appearance is highly detailed. In thissection we will discuss the sources of this variation and comment on how it affectsthe images. Figure 9.1 illustrates the effects of these sources of variation.Size and shape variationIt is apparent that womens’ breasts vary in size and shape. This variation is dueto both the natural variation between individuals, but is also related to lifestyle(as breasts store fat, overweight or obese women are likely to have larger, morefatty breasts [54]). In addition to this natural variation, the apparent size andshape of breasts in mammograms varies due to the imaging process (e.g. thedegree of compression). Compare Figure 9.1(b) and Figure 9.1(e).Anatomical variationWomens’ breasts also vary in their composition. The proportion of glandularto fatty tissue—the density—is variable, with post-menopausal women usuallyhaving almost entirely fatty breasts. The number and configuration of ductsvaries between women, and the imaging process may capture them in varyingdegree. Mammograms are digitised at high resolution and the resulting images
  • Chapter 9—Background 235 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)Figure 9.1: Examples of mammographic variation.These images are to scale.
  • Chapter 9—Background 236are therefore large and contain a lot of detail. Another form of variation thatmight be considered “anatomical” is that introduced by surgery (e.g. lumpectomyor augmentation mammoplasty), but we do not consider these types of variationin this work. Compare Figure 9.1(g), a breast with a well-defined fibro-glandulartissue region, to Figure 9.1(h), which is almost entirely fatty.Variation in the imaging processDue to the manual placement of the breast in the X-ray equipment, features suchas the nipple or pectoral muscle may be absent, partially imaged, obscured or par-tially visible. While such images can be interpreted by trained clinicians, theseproblems pose a significant problem to computer-based methods which often relyupon reliable points of reference. Further, it is not feasible for radiographersto take more care in the acquisition process, because of the natural variation be-tween women and due to the compression part of the process being uncomfortableor painful. Compare Figure 9.1(k), where the pectoral muscle is not imaged, toFigure 9.1(a), where the pectoral muscle is included. The breast in Figure 9.1(h)has a poorly-defined border—which is probably due to the placement and com-pression of the breast—while Figure 9.1(c) is more defined. Non-uniformity ofthe intensity of the X-ray illumination field can also result in a visible differencein density over an X-ray image (e.g. the anode heel effect). In the next sec-tion a review of research on modelling and synthesising the appearance of entiremammograms is presented.
  • Chapter 9—Background 2379.2.2 Approaches to modelling the appearance of entire mammogramsThe most common approach to modelling the appearance of entire mammogramsfor synthesis and analysis is physics-based. Bakic et al. [10] developed a 3-Dmodel of the physical distribution of the various tissue types. They modelledcompression of the breast and the X-ray image formation process to generatesimulated X-rays.Taylor et al. [176] developed a 3-D model of breast development to allow simulatedmammograms to be generated. A voxel-based cellular automaton was initialisedwith a rudimentary ductal structure that represented a breast prior to maturation.Voxels contained a mixture of fatty and glandular tissue. The ductal structurewas developed by allowing it to branch and grow. This development was drivenby simulated branching and growth agents that had either promotive or inhibitiveeffects. A parameterised breast surface model was developed using data from realwomen. Synthetic mammograms were formed by simulating the compression ofthe breast and the projection of X-rays. The paper does not give examples ofsynthetic mammograms, but some examples are available on the author’s website[146].Bliznakova et al. developed a highly detailed model of the structure of the breast[16]. The authors modelled the breast surface, ductal system, terminal ductallobular units, Cooper’s ligaments, pectoral muscle, 3-D parenchymal texture andseveral forms of abnormality. They separately modelled the X-ray image forma-tion process. The breast shape was modelled simply as two geometrical primitives.
  • Chapter 9—Background 238Large, medium and small-sized breasts were modelled separately. The ductal sys-tem was modelled as a tree structure composed of cylindrical components and aprobabilistic model was used to characterise the branching. The 3-D parenchymaltexture was simulated by mapping 2-D fractal textures into the volumetric space.Cooper’s ligaments were modelled by thin ellipsoidal shells occurring in randomlocations within the breast. Masses were modelled by ellipsoids, spiculations by aseries of connected cylinders and microcalcification clusters by collections of smallellipsoids. The characteristics of the abnormalities were controlled by user input.The authors report that simulated mammograms could be generated in less than5 minutes on a 2 GHz Intel Pentium 4 processor. Subjectively, their results areimpressive. The authors conducted a psychophysical experiment to determine theextent to which expert radiologists could differentiate real from simulated regionsof interest. Regions of interest of size 40 mm × 40 mm were inspected on screenand the radiologists correctly identified 80% of the simulated normal patches,67% of the real normal patches, 87% of the simulated calcification patches, 96%of the simulated mass patches and 100% of both the real calcification and masspatches.Although physics-based models are useful for investigating the image acquisitionprocess (e.g. patient positioning, radiation dose, breast compression and deforma-tion), the synthetic images they produce are usually subjectively not particularlyrealistic and these methods are not intended to support image analysis.Another important approach to modelling the appearance of objects in images isthe Active Appearance Model (AAM) [47], which models shape and shape-freeappearance. The work in this chapter is closely related to the AAM, and we give
  • Chapter 9—Background 239a brief overview of the method below.Overview of AAMsAn AAM is a model of the shape and appearance of a particular class of objectin an image—e.g. a face or an anatomical structure—combined with a searchstrategy that allows instances of the object to be located in previously unseenimages. The following discussion focuses on the model itself, rather than thesearch strategy.An AAM consists of shape and appearance sub-models, which are statisticallycoupled. The shape sub-model is built by annotating a set of training images—that contain instances of the object of interest—with landmarks. These land-marks are typically positioned on salient features of the object being modelledand must correspond across the training set (e.g. when modelling faces, if the5-th landmark identifies the left corner of the left eye, then it must do so in all ofthe training images). Each landmark in a 2-D image is represented by an (x, y)coordinate. If each image contains N landmarks then the landmark coordinatesfor each image can be concatenated to form a vector with 2N elements, whichis called a shape vector. Each of these vectors can be considered to be a pointin a 2N -dimensional space. There is likely to be significant redundancy in eachshape vector because the positions of landmarks will be correlated within eachimage and across the training set. This correlation is exploited using PrincipalComponents Analysis (PCA, which was presented in Chapter 8). PCA allowseach training shape vector to be projected into a low-dimensional space, so that
  • Chapter 9—Background 240Figure 9.2: Overview of the Active Appearance Model.The left-most image shows one of the training images with landmarks (in green).Nine samples from the AAM are shown on the right. The top row shows the meanappearance warped to three synthetic shapes sampled from the shape sub-model.The middle row shows three samples from the appearance sub-model warped tothe mean shape. The bottom row shows three joint samples (i.e. from both modelcomponents), illustrating how the model can represent a range of legal instancesof human faces.
  • Chapter 9—Background 241each shape vector in the training set has a corresponding shape parameter in theprincipal components space.The appearance sub-model is built by warping each object in the training setto the mean shape. This removes spatial variation from the training set—whichhas already been learned—leaving “textural” variation. Triangles can be definedbetween the landmarks in each training image. Image intensities are sampledwithin the triangles of each warped training image. The result is a set of vectors ofintensities, one for each training image. There is a dense correspondence betweenthe elements of the vectors. PCA is applied to exploit the redundancy in thesetexture vectors yielding a set of texture parameters in a low-dimensional space.The shape and texture parameters for each training image are concatenated anda further PCA is performed. This couples shape and appearance and exploitsany correlation between the two. The result is a set of low-dimensional vectorsthat describes both shape and appearance of the objects in the training set. Thedistribution of these vectors is modelled, typically using a multivariate normaldistribution. The model can be sampled and the corresponding vector recon-structed to form a synthetic object in the image plane. The model can also beused to constrain the AAM search strategy. An example is shown in Figure 9.2,which illustrates an AAM of the human face. The figure shows landmarks forone training image and nine samples from the AAM.AAMs rely on finding sufficient redundancy in the intensity information across aset of training images to dramatically reduce the dimensionality of the appearancespace, thus making it possible to train the model with a reasonably small set of
  • Chapter 9—Modelling and synthesising entire mammograms 242images (e.g. 30). However, the highly detailed nature of mammograms would notbe captured by the AAM approach.9.3 Modelling and synthesising entire mammo- gramsWe assume that the mammograms we will model are all of the same view,e.g. mediolateral oblique (MLO) or cranial-caudal (CC); in practice we haveworked with the MLO view. We decompose the problem of modelling mam-mograms, combining an AAM-like model of global shape and appearance witha wavelet-based model of stationary texture, allowing us to bypass the curse ofdimensionality [14]. The model is composed of three sub-models: a model ofshape, a model of approximate appearance1 , and a model of local textural ap-pearance. We have trained a model using a set of 36 mammograms from the Dig-ital Database for Screening Mammography [83]. The number of training imageswas limited by a computational “bottleneck”—solving the shape correspondenceproblem—which we discuss further in the next section. After outlining a seriesof pre-processing steps, we describe each of these sub-models and show how theycan be combined to synthesise mammograms. 1 Shape and approximate appearance are jointly modelled.
  • Chapter 9—Modelling and synthesising entire mammograms 2439.3.1 Breast shape and the correspondence problemWe assume a training set of images B, which are mammograms with the non-breast regions (e.g. markers) set to black, and normalised such that all breasts“point” to the right (i.e. the nipple is on the right). As in an AAM, a statisticalshape model (SSM) is used to cope with size and shape variation. A set of land-mark points is required to define the shape of the breast in each image. Theselandmarks must correspond across the training set. In the SSM framework, land-marks are often manually placed and chosen to correspond to easily identifiableimage features (e.g. the tips of the fingers when modelling hand shapes). Asmammograms lack reliable features, we seek to automate annotation. A na¨ ıveapproach is to use landmarks placed at regular intervals on the breast borders,starting at a reliable location (e.g. the right-most point of the breast, the approx-imate location of the nipple). Such landmarks are a good first approximation.However, synthetic shapes generated from a shape model built using landmarkswith such correspondences are subjectively unrealistic. This is demonstrated inthe top row of Figure 9.3. Better correspondences are required.Approaches proposed by Kotcheff and Taylor [115] and Davies et al. [51, 50]seek to improve correspondences across a set of training shapes. The idea is tosearch over parameterisations of the training shapes to find a model that bestdescribes the training set. The training shapes are re-parameterised using a setof monotonic mappings which guarantee that the landmark ordering is preservedacross the training set, and so the mapping is diffeomorphic2 . These methodsaim to find an optimal set of re-parameterisations—according to some concept of 2 A diffeomorphism is a mapping that does not tear or fold the manifold.
  • Chapter 9—Modelling and synthesising entire mammograms 244Figure 9.3: Samples from two shape models, illustrating the need for goodcorrespondences.Top row: Samples from a shape model built using regularly-spaced landmarks.Bottom row: Samples from a shape model built using optimal correspondences.Examples of real mammogram shapes, taken from the training set, are shown inFigure 9.6.
  • Chapter 9—Modelling and synthesising entire mammograms 245goodness—and so the problem is posed as an optimisation.The main difference between the two methods is the choice of the objective func-tion. Kotcheff and Taylor’s method uses the determinant of the (re-parameterised)shape model’s covariance matrix, which effectively measures the hyper-volume oc-cupied by the training set in shape space; minimising the measure yields morecompact models. The method proposed by Davies et al. uses an informationtheoretic measure of model quality. Their objective function computes the num-ber of bits required to transmit the training set by encoding it using a (re-parameterised) shape model. In order for a receiver to understand the message,the model must also be transmitted, and so contributes to the objective function.The authors refine the piecewise-linear re-parameterisation method presented byKotcheff and Taylor using the integral of a sum of Cauchy kernels, ensuring thatthe re-parameterisation functions are differentiable and hence more suited foruse in an optimisation scheme. Additionally, an efficient optimisation scheme ispresented.Although the method of Davies et al. is rigorously justified, it is closely approx-imated by that of Kotcheff and Taylor, whose method generally finds a goodsolution to the correspondence problem more quickly. We initialise Kotcheff andTaylor’s scheme with regularly-spaced landmarks, and run their optimisation.We refine the improved correspondences using the minimum description length(MDL) scheme of Davies et al. Figure 9.4 shows the value of the Kotcheff andTaylor objective function as a function of the iteration number for our trainingset and Figure 9.5 shows the objective function values for the MDL algorithm3 . 3 Note that the two methods’ objective functions are expressed in different units.
  • Chapter 9—Modelling and synthesising entire mammograms 246 Figure 9.4: Values of the Kotcheff and Taylor objective function.Figure 9.6 shows selected points from the initialisation and final solution, andillustrates how the correspondences are improved. Although the difference issubtle, and the solution is not necessarily intuitive, Figure 9.3 shows the impor-tance of good correspondences. Figure 9.3 shows that improved correspondencesyield an SSM that successfully limits illegal variation.The final shape model has the form: s = s + Ps bs ¯ (9.1)where s is a shape parameterised by bs , s is the mean shape, and Ps is a matrix ¯whose columns are a set of eigenvectors of the shape data covariance matrix,sufficient for the model to retain a given proportion of the total variance of the
  • Chapter 9—Modelling and synthesising entire mammograms 247Figure 9.5: Values of the MDL objective function.The solution found by running the Kotcheff and Taylor algorithm (see Figure 9.4)is refined in this run of the MDL algorithm.original data. The number of retained eigenvectors, ds , is typically much smallerthan the dimensionality of the original space: ds 2Nl (where there are Nllandmarks for each image), so the distribution of shape parameters can be learnedfrom a reasonably small training set.The computational cost of optimising the correspondences is related to the num-ber of training shapes. This limits the number of training images that can be usedto build the model of entire mammographic appearance—this is the “bottleneck”mentioned in the previous section. It should be possible to allow an arbitrarilylarge number of training images to be used. Correspondences could be optimisedfor a relatively small set of training shapes and an active shape or appearancemodel could then be built and used to locate the breast border in other training
  • Chapter 9—Modelling and synthesising entire mammograms 248Figure 9.6: The initial and final correspondences for the mammogram shapemodel.The figure shows every 10-th point for 6 of the 36 training shapes. The top rowshows the initial positions, and the bottom row shows the final solution.
  • Chapter 9—Modelling and synthesising entire mammograms 249images [165]. Hence correspondences in these other training images would bedefined implicitly. However, this work is beyond the scope of this thesis.9.3.2 Approximate appearanceWe consider mammographic appearance to have two components: an approx-imate appearance (the general global appearance of the mammogram) and adetailed appearance (the local textural details, see Section 9.3.3). In this sectionwe describe how approximate appearance is modelled. We address the appear-ance correspondence problem, describe how approximate appearance is separatedfrom detailed appearance and describe how approximate appearance is related tobreast shape.The appearance correspondence problemTo model the approximate appearance, we need to consider the appearance cor-respondence problem. If we could guarantee that all mammograms contain thesame features, then we could define dense correspondences between the contentsof a set of mammograms. This is not the case and, due to anatomical differencesbetween women, there are no underlying correspondences that can be exploited.We choose to cope with this form of variation implicitly by approximately regis-tering breasts to a canonical shape and then learning the variation in appearance(the mean shape s provides a natural canonical reference shape). For each seg- ¯mented breast in B, we use a thin plate spline [20] to warp to the mean shape,yielding a set N of segmented breasts in a shape-normalised space. The thin
  • Chapter 9—Modelling and synthesising entire mammograms 250plate spline does not guarantee diffeomorphic transformations, but since we donot use control points within the breast region, and the order of control points ispreserved by the correspondence optimisation algorithms, the resulting warps arewell-behaved. An alternative approach would be to use a non-rigid registrationalgorithm to define the correspondences between points within the breast region,but we have not investigated this.The steerable pyramid decompositionIn modelling mammographic appearance, we would like to be able treat the ap-pearance of each mammogram as a point in an appearance space so that theappearance could be modelled using straightforward statistical methods. Al-though the number of pixels in a mammogram is very large, if we could exploitredundancy in the shape-normalised appearance—for example by defining densecorrespondences—we might be able to populate the appearance space sufficientlyfor density estimation to be successful. Unfortunately, this is not the case. Toovercome this problem we use a hierarchical decomposition called the steerablepyramid [164]. This is a wavelet-like image decomposition developed for use intexture modelling and synthesis. Images are decomposed in terms of multiplescales and orientations using directional derivative basis functions which range inscale and orientation. This allows the coarse and fine structure of the images tobe treated separately within a single framework.The steerable pyramid was selected because it decomposes images in terms ofscale and orientation (which have been found to be useful in mammography ap-
  • Chapter 9—Modelling and synthesising entire mammograms 251plications, see Chapter 4); it has been used successfully in texture modelling andsynthesis [143, 144]; the decomposition is motivated by knowledge of biologicalvision ([183] discusses the work of Hubel [93] and Wiesel [184]); and there is afreely available implementation4 . See Section 9.3.3 for further notes on the useof the steerable pyramid.Figure 9.7 shows a block diagram of the decomposition. Analysis is shown onthe left-hand side. The image is separated into high- and low-pass sub-bandsusing filters H0 and L0 . The low-pass sub-band is then separated into a seriesof oriented bandpass sub-bands and another low-pass sub-band using filters {Bi }and L1 . This low-pass sub-band image is then sub-sampled by a factor of twoin each direction and the result passed recursively to {Bi } and L1 , as indicatedby the dark circle and shaded region in Figure 9.7. Synthesis is shown on theright-hand side of Figure 9.7, and involves reversing the analysis steps.We can think of the steerable pyramid decomposition as having a structure similarto a quad-tree. The pyramid has a number of levels which correspond to scale,and range from coarse (a few pixels square) to fine (the same size as the originalimage). Each level has a number of oriented sub-band images. In addition, thereis a coarse low-pass sub-band and a fine high-pass sub-band. Although there aremore coefficients in the pyramid than pixels in the original image, the hierarchicalstructure of the pyramid allows us to decompose our modelling problem further.We can consider the top part of the pyramid (the coarse levels) separately tothe bottom part of the pyramid (the fine levels). Figure 9.8 shows the top three 4 The steerable pyramid software is currently available at http://www.cns.nyu.edu/˜eero/steerpyr/
  • Chapter 9—Modelling and synthesising entire mammograms 252Figure 9.7: Block diagram for the steerable pyramid decomposition.Analysis is shown on the left and synthesis is shown on the right. The dark circleindicates the recursive computation of the shaded region. The {Bi } filters computethe oriented sub-band images.pyramid levels for a mammogram.The approximate appearance modelWe want to be able to represent the general appearance of a mammogram in away that allows us to subject it to statistical analysis. We decompose each imagein N to form the set of pyramids P. For each pyramid in P, we concatenate thecoefficients in the top few pyramid levels into a vector a. This vector describesthe approximate appearance of the shape-normalised mammogram. We againperform PCA, yielding a = a + Pa ba . ¯ (9.2)Initially, the coefficients in each pyramid level are effectively measured on differentintensity scales. In order to use a covariance matrix to model the distribution
  • Chapter 9—Modelling and synthesising entire mammograms 253Figure 9.8: The coefficients in the top three levels of a steerable pyramid de-composition of a mammogram.The breast is oriented so that the nipple points downwards. The oriented sub-bands are shown as L-shaped sets of images and the final low-pass image is in thetop-right corner of the image. The arrows indicate the orientation of the filtersused to compute the five sub-band images. The high-pass image is not shown.of such data—either for its own sake, or to perform PCA—it is best to use acommon scale. We normalise the data in each dimension either to z-scores asdescribed by Equation 9.3 [45], or to a common scale using a robust M-estimatorof spread [145], depending upon the characteristics of the data. If xi is a datapoint from a sample with mean x and standard deviation σ then zi , the z-score ¯for xi , is given by: xi − x ¯ zi = . (9.3) σFor simplicity, the conversion to and from these standard scales is assumed in therest of this chapter.
  • Chapter 9—Modelling and synthesising entire mammograms 254A joint model of shape and approximate appearanceWe have described how we can model mammographic appearance in the shape-normalised space. To perform synthesis we need to be able to warp to a plausibleshape. A na¨ approach would be to model the distribution of the shape param- ıveeters bs and then sample from it. However, this would not take into considerationthe fact that there may be a relationship between the appearance of a mammo-gram and its size and shape. For example, fatty breasts tend to be large, whileglandular breasts tend to be small. The approach we take is to model the jointdistribution of shape parameters and approximating appearance parameters andthen condition this model on the approximating parameters to yield a model ofplausible shapes for the generated mammogram. We use a single multivariateGaussian: p(bs , ba ) = p(bj ) = N(mj , Σj ). (9.4)9.3.3 Detailed appearanceThe approximating model provides a first approximation to the mammographicappearance, but does not include any information from the lower pyramid levels.We call these the detailing levels. A parent vector is defined as the set of coef-ficients on a path through a pyramid at locations corresponding to a particularpixel in the original image [53]. The parent vector contains information aboutthe local image behaviour at a particular location, from the coarsest level to thefinest. Using a notation similar to that in Figure 9.7, a parent vector, bt (x, y),
  • Chapter 9—Modelling and synthesising entire mammograms 255corresponding to a particular (x, y) location in the original image is given by: 1 x y 1 x y bt (x, y) = H0 (x, y), B0 ( 21 , 21 ), B1 ( 21 , 21 ), · · · , (9.5) 2 x y 2 x y x y B0 ( 22 , 22 ), B1 ( 22 , 22 ), · · · , LM −1 ( 2M −1 , 2M −1 ) T,where there are M levels, H0 (x, y) is the coefficient at (x, y) in the high-pass band,Bij (x, y) is the coefficient at (x, y) in the i-th oriented sub-band at the j-th level,LM −1 (x, y) is the coefficient at (x, y) in the low-pass band and the subscript t inbt indicates texture. The floor function—which returns the largest integer thatis less than or equal to x—is denoted by x . It serves here to ensure that thesub-bands are correctly indexed. In the remainder of this chapter we will drop the(x, y) indexing notation, as we assume that the detailed textural component ofmammographic appearance is stationary. This assumption makes the problem ofmodelling detail tractable. It is reasonable because we might expect local detailto depend only on tissue type, which is modelled implicitly by the approximatingmodel.We consider a parent vector bt to be a point in a high dimensional vector space.A suitable model of the distribution of parent vectors, p(bt ), would allow thedetailing levels to be populated by sampling the model, conditioned upon thecoefficients in the approximating levels. This approach is motivated by previouswork on hierarchical texture modelling by De Bonet and Viola [53] and Sajdaet al. [159]. Multivariate Gaussian, or mixture of multivariate Gaussian, repre-sentations are ideal for this purpose as there is a closed-form solution for theconditional Gaussian (see Section 5.5). p(bt ) is modelled as p(bt ) = N(µt , Σt ).
  • Chapter 9—Modelling and synthesising entire mammograms 256Although we use the steerable pyramid, the choice of decomposition is probablynot critical: other authors have reported success with hierarchical conditioning ofwavelet coefficients for texture modelling, synthesis and analysis and presumablyuse different decompositions (e.g. [159, 53]).9.3.4 Generating synthetic mammogramsAlgorithm 11 describes how synthetic mammograms are generated.Algorithm 11 Generating a synthetic mammogram Simultaneously sample an approximate appearance parameter, ba , and a shape parameter, bs from the joint model of p(bs , ba ). This is equivalent to sampling one and then conditionally sampling the other. Reconstruct the approximating steerable pyramid coefficients by projecting ba back to the natural space to yield the corresponding a. for each (x, y) location within the shape-normalised breast region do Sample the parent vector at the current location. The detailing coefficients will be unpopulated. Compute the distribution of detailing coefficients by conditioning the model of p(bt ) on the approximating coefficients in the sam- pled parent vector, using Equation 5.33. Sample from this conditional distribution, and place the sampled detailing coefficients into the parent vector at the current location. Note that because the steerable pyramid may not be a perfect quad-tree, the above two steps are implemented as iterations over the pyramid levels. end for Reconstruct the fully-populated pyramid to form the corresponding image in the shape normalised space. Project the shape parameter bs to its natural space, yielding the shape that corresponds to the parameter. Warp the reconstructed image to the sampled shape.
  • Chapter 9—Example synthetic mammograms 2579.4 Example synthetic mammogramsWe selected 36 pathology-free MLO mammograms—ranging in size, shape andappearance—from the Digital Database for Screening Mammography (DDSM)[83] and built a model of mammographic appearance as described in Section 9.3.The training set had relatively few images because the optimisation of the breastboundary landmark correspondences is computationally expensive. Such a smalltraining set cannot represent the full variation in mammographic appearance,although the synthetic images generated by the model are subjectively quiterealistic (future work should investigate if realistic results can be achieved withmodels trained with larger training sets).One hundred (100) landmark points were used to define the breast boundary. Weused 7 pyramid levels—including the high- and low-pass sub-bands—each with 5orientations. The top three pyramid levels were included in the approximatingmodel. These had 159 420 coefficients prior to PCA. We found that retaining90% of the total variance in the shape model and 99% in the approximating ap-pearance models yielded compact models that produced convincing results whensampled. One hundred thousand (100 000) locations within the breast regionswere randomly selected and the corresponding parent vectors were extracted.Their distribution was modelled using a single multivariate Gaussian component,as described in Section 9.3.3.Building the model took approximately 24 hours (most of this time was spentcomputing the optimal shape correspondences). Producing a synthetic mammo-gram takes approximately 2.5 hours (almost all of this time is spent sampling the
  • Chapter 9—Summary 258conditional parent vectors)5 . Figure 9.9 shows some synthetic mammograms thatwere generated using our model and Figure 9.10 shows some synthetic mammo-grams alongside a real mammogram.9.5 SummaryThis chapter presented a generative statistical model of the appearance of entiremammograms. In summary: • The appearance of entire mammograms is difficult to model because of the variation between women, variability in the imaging process and the high resolution of the images. • Our model is composed of components that model the breast shape, the approximate appearance and detailed texture. Detailed texture is assumed to be stationary. The three model components are statistically coupled, so that plausible synthetic mammograms can be generated. • The breast shape model can be learned by solving the shape boundary landmark correspondence problem. Algorithms developed by Kotcheff and Taylor and Davies et al. were used to solve this problem. • Synthetic mammograms can be generated by sampling from the joint model of shape and approximate appearance and sampling detailing coefficients using a hierarchical conditioning method. 5 Timings are for a computational server with a 2.8GHz Intel Xeon processor and 2GB ofRAM.
  • Chapter 9—Summary 259 Figure 9.9: Synthetic mammograms generated using the model.
  • Chapter 9—Summary 260Figure 9.10: Real and synthetic mammograms.A real mammogram is shown on the left and three synthetic mammograms areshown on the right.
  • Chapter 10Evaluating the syntheticmammograms10.1 IntroductionThis chapter presents an evaluation of the synthetic mammograms produced bythe model described in the previous chapter. The chapter describes: • A qualitative evaluation of the synthetic mammograms by an expert mam- mography radiologist. • A quantitative psychophysical evaluation of the synthetic mammograms. • An evaluation of the detailing component of the model. 261
  • Chapter 10—Qualitative evaluation by a mammography expert 26210.2 Qualitative evaluation by a mammography expertAn expert mammography radiologist evaluated our synthetic mammograms ina psychophysical experiment. We printed real and synthetic mammograms ontoquality A4 paper using a high-quality laser printer. For the real mammograms, weused the real mammograms from N (i.e. without markers and other non-breastregions). While one should not generally test using a training set, we believedthat although the synthetic mammograms are quite realistic, they would not begood enough to convince an expert radiologist, and so the experiment would notbe biased by testing with training data.We presented a “shuffled” set of 13 real and 13 synthetic full resolution mammo-grams to the radiologist and asked them to rank the mammograms according tohow realistic they were. The radiologist was quickly able to sort the mammogramsinto the two sets. Although they were able to identify the synthetic mammograms,their feedback was positive and the most useful feedback was obtained in infor-mal discussion. The radiologist said that some of the synthetic mammogramslooked ‘quite realistic’. One of the ways they could identify the synthetic imageswas by the lack of blood vessels, lymph nodes and benign calcifications. Suchstructures exist at the boundary of the approximate appearance model and thedetailing texture model and are not captured by our current model. The radiolo-gist pointed out that our synthetic mammograms were ‘a little fuzzy’ and lacked‘dark regions’ ; the latter criticism can probably be attributed to the relativelysmall training set. The radiologist said that one of the synthetic mammograms—a
  • Chapter 10—A quantitative psychophysical evaluation 263large, fatty breast—was unrealistic. The radiologist was dismissive of the qualityof the other examples of synthetic mammograms and mammographic texturesin the literature, and considered our synthetic images to be superior (though itshould be noted that realism is not the aim of some of these methods).10.3 A quantitative psychophysical evaluation10.3.1 AimsAware that the lack of blood vessels, lymph nodes and benign calcifications madethe difference between the real and synthetic mammograms more obvious, wewanted to determine whether the two classes could be distinguished when suchfeatures could not be used as prompts.10.3.2 MethodWe formed sets of 7 real and 7 synthetic mammograms at low resolution. The realmammograms were manually selected such that the set did not contain any withvery strong vascular clues. The synthetic set contained mammograms generatedusing our model. Some “fatty” synthetic mammograms were excluded becausesimilar real mammograms were often excluded from the set of reals because theycontained strong vascular clues1 . All selected mammograms were reduced in sizesuch that the remaining vascular clues could not easily be perceived in the set of 1 It is also the case that the fatty mammograms generated using our model were deemed tobe less realistic by the expert mammography radiologist.
  • Chapter 10—A quantitative psychophysical evaluation 264real mammograms. The resulting images were small (approximately 200 × 140pixels), but the synthetic mammograms contained contributions from both theapproximating and detailing models at this resolution. Each real mammogramin the set was paired with each synthetic mammogram to form a test set of 49pairs. The number of images used was limited by the time available to synthesisethe set of synthetic mammograms. However, we were not sufficiently confidentthat the synthetic mammograms would be realistic enough to be confused for thereal mammograms, and so a larger experiment would not have been justifiable.We recruited five participants2 and allowed them to study a training set of 6 realmammograms, scaled to fit within a 1024 × 768 pixel computer display. Theythen performed a forced choice experiment, in which they were asked to guessthe real mammogram from each of the 49 possible pairings of real and syntheticmammograms.10.3.3 ResultsAt the end of the experiment, the participants were asked if they could tell thedifference between the real and synthetic mammograms: none of the subjectsbelieved that they had been able to identify the real mammograms reliably.χ2 analysis (see Section 7.2.2) showed that one participant did no better thanrandom at the 95% significance level. The other participants differed significantlyfrom random, but consistently mistook the synthetic mammograms for the real 2 Computer vision researchers from the division of Imagine Science and Biomedical Engi-neering at the University of Manchester.
  • Chapter 10—Evaluating the detailing model 265ones. Between them, the participants correctly identified 75 real mammogramsout of 245 (31%). If we allow the consistent misclassification to count as correctidentification of the real mammograms, the participants collectively identified 191real mammograms out of 245 (78%).10.3.4 DiscussionThese results show that, although the reduced resolution synthetic images are notalways indistinguishable from real mammograms, they are sufficiently convinc-ing to make discrimination difficult. The fact that several subjects consistentlyselected the synthetic mammograms as the real ones implies that the differenceswere very subtle. It is interesting that the participants did not think they couldtell the difference, even though the statistical analysis indicates otherwise. It ispossible that the results can be attributed to the relatively small set of imagesused to train the model, the small number of images used to “train” the non-expert readers or by the selection of the real images used in the experiment. Itwould therefore be unwise to generalise the above result.10.4 Evaluating the detailing modelIt is difficult to show the contribution made by the detailing model—either onscreen or in print—by examining entire mammograms, because of the high res-olution of the images. Using a region of interest makes the contribution to thetextural appearance visible.
  • Chapter 10—Evaluating the detailing model 266Figure 10.1 shows the contribution made by the detailing levels to regions ofinterest from a real and synthetic mammogram. The left-hand column showscontributions for a real mammogram and the right-hand column shows contribu-tions for a synthetic mammogram. The top-row shows the contributions madeby the finest pyramid level, the second row shows the contributions made by thefinest and next-finest pyramid level, and so on. The bottom row shows the con-tributions made by all detailing levels. These contribution images were computedby taking the pixel-wise differences between regions that were reconstructed withand without the corresponding detailing levels. The real mammogram was se-lected to be subjectively similar in appearance to the synthetic mammogram (toallow comparison of the contribution images) and the regions of interest wereextracted from approximately the same location in each image. The detail modelcan be evaluated by comparing the textural characteristics of the real and syn-thetic contribution images.The images in the top row of Figure 10.1 subjectively have almost identical tex-tures. The coefficients at this level are likely to represent high-frequency signalssuch as “noise”. Subjectively, the images in the second row are texturally verysimilar, but the real mammographic data has a slightly larger range. The imagesin the third row are also subjectively similar, but the real data contains structurecorresponding to curvilinear features. This leads the real data to have a largerrange than the synthetic data. The images in the fourth row—showing all contri-butions made by the detailing coefficients—are subjectively similar, but there arelarge contributions made by curvilinear features in the real data. The histogramsof the contribution images in the bottom row show that the distributions of the
  • Chapter 10—Evaluating the detailing model 267Figure 10.1: Contributions of detailing coefficients to real and synthetic mam-mograms.Left-hand column: contributions for a real mammogram. Right-hand column:contributions for a synthetic mammogram. See text for details.
  • Chapter 10—Evaluating the detailing model 268difference values are approximately normal. The standard deviation of the realdata is approximately twice that of the synthetic data.The mammograms that the contribution images in Figure 10.1 correspond to aredifferent, but allow us to draw some conclusions about how the detailing modelworks with the approximating model to synthesise mammographic texture. Givenapproximating coefficients for two similar mammograms, the detailing model issubjectively successful in capturing the characteristics of the finest two levels.Subjectively, the second most coarse level is also modelled reasonably well. Thecoarsest level is not modelled particularly well. This is because the detailingmodel assumes stationarity, but in reality the level is dominated by curvilinearstructures. These structures feed down to the second most coarse detailing level tosome extent. Some small curvilinear structures are also found at the second mostcoarse detailing level. Similar results are obtained when detailing coefficients aresampled for the approximating coefficients of a real mammogram.The contribution images show that the use of a single multivariate Gaussiancomponent adequately models the detailed texture component of mammograms.There is little evidence to suggest that a more complex model (such as a mixtureof Gaussians) would dramatically improve the stationary aspects of the detailedtexture. However, it is clear that modelling curvilinear structures is of vital im-portance to the detailed texture. These long range structures tend to be mostevident in the coarsest detailing level. The model cannot currently capture suchstructures. Learning legal configurations of curvilinear features within a statisti-cal framework is likely to be a significant challenge. One approach to this problemwould be to extract networks of curvilinear structures using a method such as
  • Chapter 10—Summary 269that presented by Zwiggelaar and Marti and statistically model characteristics ofcurvilinear structure length, width, tortuosity and branching [191]. By learningthe joint distribution of these features and approximating parameters, it may bepossible to determine and synthesise the correct types of curvilinear networks fora particular type of breast.10.5 SummaryThis chapter presented an evaluation of synthetic mammograms generated usingthe model developed in the previous chapter. In summary: • An expert mammography radiologist could easily distinguish between real and synthetic mammograms. However, they commented that some of the synthetic mammograms looked ‘quite realistic’. The lack of blood vessels, lymph nodes and benign calcifications allowed the synthetic mammograms to be identified. • A quantitative psychophysical evaluation of reduced resolution synthetic mammograms showed that, in general, the synthetic mammograms could be differentiated from real mammograms, but not very reliably. One par- ticipant could not distinguish between the two classes at all and the other participants consistently misclassified the synthetic mammograms as real, reporting that they could not tell the difference between the two classes. The results indicate that, at low resolution, the synthetic mammograms are sufficiently realistic that differentiating real and synthetic mammograms is
  • Chapter 10—Summary 270 difficult. • An evaluation of the contribution made by the detailing model shows that, while local textural detail is successfully captured, the model cannot capture the appearance of curvilinear structures. As the qualitative evaluation by the expert mammography radiologist showed, these structures allow the real and synthetic mammograms to be easily differentiated. A method of modelling these structures was proposed.
  • Chapter 11Summary and conclusions11.1 IntroductionThis chapter presents: • A summary of the work presented in this thesis. • The conclusions that may be drawn from the work. • A final statement.11.2 Summary • Chapter 2 presented background information on breast cancer, the clinical problem and the various imaging modalities that are used to diagnose the 271
  • Chapter 11—Summary 272 disease. Breast cancer is a significant public health problem and many countries have X-ray mammography screening programmes. The image inspection task is performed visually and is subject to human error. • Chapter 3 presented a review of the computer-aided mammography liter- ature. CADe algorithms typically extract shape and texture features from candidate locations and use classifiers to differentiate between true and false detections of specific indicative signs of abnormality. Commercial systems are available and have been shown to improve radiologist performance; how- ever, they can also fail to improve performance. Psychophysical research has suggested that a false positive rate much lower than that achieved by current commercial systems is required for significant improvement in radi- ologist performance. Much more sophisticated approaches may be required to achieve such targets. One such method is novelty detection, which re- quires a model of normal mammographic appearance that can measure deviation from normal appearance. Statistical models should allow this deviation to be measured within a rigorous mathematical framework. If novelty detection is to be used, then the underlying model must be able to “legally” represent any pathology-free instance and be unable to legally represent abnormal instances. The only way to verify this is to be able to generate instances from the model; thus the model must be generative. Further, generative models make it relatively easy to visualise what has been modelled successfully and what has not. • Chapter 4 described work on improving the way that scale-orientation pixel signatures are computed. Two flaws with an existing implementation
  • Chapter 11—Summary 273 were identified and a new method of computing signatures was developed. An information theoretic measure of signature quality showed that, com- pared to the original method of computing pixel signatures, the new method increased signature information content by approximately 19%. A classi- fication experiment was reported in which signatures computed using the two methods were used to discriminate between pixels belonging to normal and spiculated lesion tissues. The new signatures outperformed the original signatures in terms of both specificity and sensitivity. • Chapter 5 presented background information on the multivariate normal distribution and the Gaussian mixture model. The Gaussian mixture model is a flexible solution to the density estimation problem. Model parame- ters can be learned using the k-means and Expectation-Maximisation algo- rithms. Both the marginal and conditional distributions can be computed for a Gaussian mixture model in closed-form; these distributions are them- selves Gaussian mixture models. It is straightforward to sample from a Gaussian mixture model. • Chapter 6 presented Efros and Leung’s algorithm for texture synthesis and developed the method into a parametric statistical model of texture that can be used in both generative and analytical modes. Methods of synthesising and analysing textures were developed and synthetic images were presented. • Chapter 7 presented a psychophysical evaluation of synthetic mammo- graphic textures produced by the parametric model. The synthetic textures were not indistinguishable from the real textures, but were selected in ap-
  • Chapter 11—Summary 274 proximately one third of trials. The synthetic images generated by Efros and Leung’s algorithm were considered more realistic than those gener- ated by the parametric model; the textures generated using the parametric model were selected in 26% and 41% of trials. However, the images gen- erated by the Efros and Leung algorithm used a more specific “training” set than was used to train the parametric model. Direct comparison of the two approaches should consider this experimental bias and the ability of the parametric model to analyse images via novelty detection. Simulated and real microcalcification and mass images were analysed using parametric models. Results for the simulated data show that the novelty detection approach can successfully detect multiple types of abnormality using a single method. Results for the real data show that some discrim- ination was possible, but significant improvement is needed. This may be achieved by improving the specificity of the model and the adoption of a hierarchical strategy. • Chapter 8 presented an investigation into how Gaussian mixture mod- els may be learned in low-dimensional principal components spaces. The closed-form method of computing conditional distributions was extended to the principal components model. The chapter described a method for synthesising textures from a parametric texture model built in a princi- pal components space. It is not straightforward to marginalise a principal components model over dimensions from the natural space. This problem makes working in a principal components space less attractive. Although it is possible to achieve excellent results using the approach, results for princi-
  • Chapter 11—Summary 275 pal components models were much more variable than for the models built in the natural data space. • Chapter 9 described a generative statistical model of entire mammograms and showed how synthetic mammograms may be generated. The model has components that model the breast shape, approximate appearance and the detailed texture. The breast shape model is learned by solving the shape boundary landmark correspondence problem using the approaches described by Kotcheff and Taylor and Davies et al. • Chapter 10 presented three evaluations of the synthetic mammograms generated using the model of entire mammograms. An expert mammogra- phy radiologist could easily distinguish between real and synthetic mammo- grams, but noted that some of the synthetic mammograms did look quite realistic. The lack of blood vessels, lymph nodes and benign calcifications allowed the synthetic mammograms to be identified. A quantitative psychophysical evaluation of reduced resolution synthetic mammograms showed that, in general, the synthetic mammograms could be differentiated from real mammograms. However, one participant could not distinguish between the two classes and the other participants consistently misclassified the synthetic mammograms as real, reporting that they could not tell the difference between the two classes. The results indicate that, at low resolution, the synthetic mammograms are sufficiently realistic that differentiating real and synthetic mammograms is difficult. An evaluation of the contribution made by the detailing model shows that, while local textural detail is successfully captured, the model cannot capture
  • Chapter 11—Conclusions 276 the appearance of curvilinear structures.11.3 ConclusionsThe work in this thesis should be considered in the correct context: while verymuch research has been done on the traditional approach to CADe, almost nowork on generative statistical models for novelty detection has been done previ-ously.As discussed in Chapter 3, one of the most significant problems that the computer-aided mammography community needs to address is the high false positive rateof CADe systems. We believe that this can only be achieved by systems thathave a much better “understanding” of mammographic appearance. In additionto reducing the false positive rate, it would be desirable if CADe systems coulddetect any indicative sign of abnormality, not just microcalcifications and masses.It would be elegant if a single algorithm could detect any indicative sign of ab-normality. We believe that the novelty detection approach is the most principledway to achieve these aims.The results of the novelty detection experiment in Chapter 7 show that it ispossible for a single algorithm to detect multiple types of abnormality within anovelty detection framework. Although the results for real mammographic datawere a little disappointing, the approach does have potential. The generativeproperty of the model developed in Chapter 6 was important as it allowed usto verify exactly what had been modelled successfully and what had not. This
  • Chapter 11—Conclusions 277was particularly useful during the development of the model and its implemen-tation. Although the assumption underpinning the model—that mammographicappearance is a stationary texture—is obviously invalid, the development of thismodel allowed us to gain an understanding of the problems involved in modellingmammographic appearance.The evaluation of the synthetic textures showed that they were good enough to beconfused with the real textures about a third of the time and compared favourablywith those produced using Efros and Leung’s method, which is considered to beone of the best methods in the literature. The parametric model is competitivewith the non-parametric method, but is much more flexible: synthetic texturescan be generated, images can be analysed using the novelty detection algorithmand the time and space complexity of the method scales well with the number oftraining pixels.There is a significant lack of rigorous evaluation of texture synthesis algorithmsin the literature. Psychophysical experiments allow the human visual system tobe used objectively and quantitatively. Psychophysical experiments can be de-ployed relatively easily via the Internet, allowing large numbers of participantsto be recruited. However, there are disadvantages to running experiments on-line: participants are self-selecting, participants may be unlikely to volunteer forexperiments that take a long time to complete or if personal information is so-licited and it is not possible to control the environment in which the experimentis conducted (e.g. distractions, viewing distance, ambient lighting).The generative model developed in Chapter 9 represents a significant step towards
  • Chapter 11—Conclusions 278understanding how to statistically model the appearance of entire mammograms.We decomposed the problem into modelling shape, general appearance and de-tailed textural appearance. All three components were successfully modelled.Curvilinear structures were not considered and were therefore not captured bythe model. Future work should consider how this important component of ap-pearance can be combined with the other model components.While the full synthetic mammograms can easily be differentiated from real mam-mograms by an expert mammography radiologist, computer vision researchersfound discrimination at low resolution difficult. The aim of developing the modelof entire mammograms was to further our understanding of how real mammo-grams may be statistically modelled, rather than to immediately solve the noveltydetection problem; future research should pursue both goals.Modelling the appearance of entire mammograms is extremely difficult, and weconclude with a suggestion for an alternative approach to the novelty detectionproblem. Consider a pair of mammograms taken of a particular patient. Eachmammogram in that pair is a very specific model of what the other should looklike. Asymmetry can therefore be considered as a novelty detection approach.It may be possible to statistically learn the legal transformations that may beapplied to a mammogram in a pair of normal mammograms. Novelty wouldcorrespond to an illegal transformation. It may be possible to generalise this ideato the case where both CC and MLO views are available, or to the temporal case.
  • Chapter 11—Final statement 27911.4 Final statementThis thesis proposed a new approach to detecting abnormalities in mammograms.Novelty detection requires a model of normal mammographic appearance that al-lows deviation from normality to be measured. Two generative statistical modelsof mammographic appearance have been developed and evaluated. A noveltydetection experiment showed that it is possible to detect multiple types of ab-normality using a model of normal appearance if that model is sufficiently spe-cific. Psychophysical experiments demonstrated that significant progress has beenmade towards being able to realistically model both mammographic texture andthe appearance of entire mammograms.
  • Appendix AThe expectation maximisationalgorithmA.1 IntroductionMaximum likelihood is an approach to finding “optimal” estimates for modelparameters. A set of model parameters, θ∗ , is optimal in the maximum likelihoodsense if they are most likely given some observed data: θ∗ = arg max L(θ|{y i }) (A.1) θwhere L is the likelihood function and {y i } are the observed data. The likelihoodfunction is usually replaced by the log-likelihood function for computationalconvenience. 280
  • Appendix A—The algorithm 281The expectation maximisation (EM) algorithm is a general approach to solvingmaximum likelihood problems in the presence of missing data [80, 137]. One formof missing data is latent data which is a contrivance that makes the parameterestimation problem tractable. Latent data can be assumed to exist—even if itcannot be measured—and in this way can be considered missing. We will nowpresent the abstract form of the EM algorithm (see Section 5.4.3 for an exampleof an application of the algorithm). The presentation of the algorithm is basedin part upon those of Ravishanker et al. [151] and Hastie et al. [81].A.2 The algorithmThe EM algorithm is named after its two steps, the expectation step and themaximisation step. These steps are iterated until the algorithm converges.Let YO denote the observed data, YL denote the latent data and let the completedata be denoted by Y = (YO , YL ). From conditional probability we can write P (YL , YO |θ) P (Y|θ) P (YO |θ) = = . (A.2) P (YL |YO , θ) P (YL |YO , θ)Taking logarithms: (θ; YO ) = 0 (θ; Y) − 1 (θ; YL |YO ) (A.3)where 1 is based upon P (YL |YO , θ). Taking expectations, conditioned on YO and
  • Appendix A—Proof of convergence 282the model parameters at the m-th iteration of the algorithm, θ(m) : (θ; YO ) = Q(θ, θ(m) ) − H(θ, θ(m) ) (A.4) def = E[ 0 (θ; Y)|YO , θ(m) ] − E[ 1 (θ; YL |YO )|YO , θ(m) ].Equation A.4 is the log-likelihood equivalent of the objective function we seek(Equation A.1). Q(θ, θ(m) ) is computed in the E-step. This is essentially a verticalslice through the density shown in Figure 5.1. The M-step obtains θ(m+1) bymaximising Q over θ: Q(θ(m+1) , θ(m) ) ≥ Q(θ, θ(m) ) ∀θ. (A.5)The actual form that Q takes is problem specific (see Section 5.4.3 for a moreintuitive example). We shall now show why maximising Q maximises (θ; YO )and prove that the EM algorithm converges by showing that each step of the EMalgorithm is guaranteed not to decrease the objective function.A.3 Proof of convergenceWe will show that (θ(m+1) ; YO ) − (θ(m) ; YO ) ≥ 0 (A.6)with equality if θ(m+1) = θ(m) . Consider what happens to the objective function—in terms of Q and H—as we move from one iteration of the EM algorithm to the
  • Appendix A—Proof of convergence 283next: (θ(m+1) ; YO ) − (θ(m) ; YO ) = (A.7) A Q(θ(m+1) , θ(m) ) − Q(θ(m) , θ(m) ) − H(θ(m+1) , θ(m) ) − H(θ(m) , θ(m) ) BThe M-step ensures that Q(θ(m+1) , θ(m) ) ≥ Q(θ(m) , θ(m) ), and so part A of Equa-tion A.7 will be non-negative. If part B of Equation A.7 is non-positive, then thean iteration of the EM algorithm cannot decrease the objective function—i.e. weneed to prove that: H(θ, θ(m) ) ≤ H(θ(m) , θ(m) ) ∀θ, (A.8)which can be read as ‘H(θ, θ(m) ) is maximised by θ = θ(m) ’.From Equation A.4 and the definition of conditional expectation we can write Has: H(θ, θ(m) ) = p(yL |YO , θ(m) ) log p(yL |YO , θ) dyL . (A.9) yL ∈LNote that H has the form ∞ p(x) log q(x) dx (A.10) −∞where p and q are densities with associated models θ(p) and θ(q) . Equation A.8says that H is maximised when θ(p) = θ(q) . Considering the discrete case: n pi log qi , (A.11) i
  • Appendix A—Proof of convergence 284we can state the following: n n log x ≤ x − 1 ⇒ pi log qi ≤ pi (qi − 1) (A.12) i i n = (pi qi − pi ) i n n = p i qi − pi i i n = p i qi − 1 i n i pi qi is a scaler product of two vectors, p and q. The Cauchy-Schwartz in-equality states that: |p · q| ≤ p 2 q 2 (A.13)so |p · q| ≤ 1. (A.14) p 2 q 2The scaler product of two vectors is |p · q| = p 2 q 2 cos θ (A.15)so if |p · q| is maximised, then |p · q| = p 2 q 2 ⇒ cos θ = 1 ⇒ θ = 0, and so p n nand q are parallel. The two vectors are parallel if p = tq. If i pi = i qi = 1,then t = 1 and p = q. Therefore Equation A.11 is maximised when pi = qi ∀i.Generalising to the continuous case: n ∞ lim pi log qi = p(x) log q(x) dx. (A.16) n→∞ −∞ i
  • Appendix A—Proof of convergence 285Equation A.16 is maximised when p = q (i.e. when θ(p) = θ(q) ), and so H(θ, θ(m) )is maximised when θ = θ(m) . Therefore part B of Equation A.7 is non-positiveand so an iteration of the EM algorithm cannot decrease the log-likelihood of themodel parameters given the observed data. In summary, the EM algorithm finds amaximum of the objective function. However, there is no guarantee that the max-imum will be the global maximum, and so several runs of the algorithm—startingfrom different initialisations—may be necessary to find a suitable solution.
  • Bibliography[1] L. V. Ackerman and E. E. Gose. Breast lesion classification by computer and xeroradiography. Cancer, 30(4):1025–1035, October 1972.[2] F. E. Alexander, T. J. Anderson, H. K. Brown, A. P. M. Forrest, W. Hep- burn, A. E. Kirkpatrick, B. B. Muir, R. J. Prescott, and A. Smith. 14 years of follow-up from the edinburgh randomised trial of breast-cancer screening. The Lancet, 353(9168):1903–1908, June 1999.[3] S. R. Amendolia, F. Estrella, T. Hauer, D. Manset, D. McCabe, R. Mc- Clatchey, M. Odeh, T. Reading, D. Rogulin, D. Schottlander, and T. Solomonides. Grid Databases for Shared Image Analysis in the Mam- moGrid Project. In Proceedings of International Database Engineering and Applications Symposium. IDEAS’04, pages 312–321. IEEE, July 2004.[4] Breast Cancer Facts and Figures 2003–2004. Annual report, American Cancer Society, Atlanta, Georgia, USA, 2003.[5] Cancer Facts and Figures 2004. Annual report, American Cancer Society, Atlanta, Georgia, USA, 2004. 286
  • Bibliography 287 [6] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 3rd edition, 1999. [7] S. Astley, R. Zwiggelaar, C. Wolstenholme, K. Davies, T. Parr, and C. Tay- lor. Prompting in mammography: How accurate must the prompt gener- ators be? In N. Karssemeijer, M. A. O. Thijssen, J. H. C. L. Hendriks, and L. J. T. O. van Erning, editors, Digital Mammography, volume 13 of Computational Imaging and Vision, pages 347–354. Kluwer Academic Pub- lishers, November 1998. [8] S. M. Astley, C. R. M. Boggis, K. Walker, S. Wallace, S. Tomkinson, V. Hillier, and J. Morris. An Evaluation of a Commercial Prompting System in a Busy Screening Centre. In H.-O. Peitgen, editor, Digital Mammogra- phy: IWDM—6th International Workshop on Digital Mammography, pages 471–475. Springer-Verlag, March 2003. [9] S. M. Astley, T. C. Mistry, C. R. M. Boggis, and V. F. Hillier. Should we use humans or a machine to pre-screen mammograms. In H.-O. Peit- gen, editor, Digital Mammography: IWDM—6th International Workshop on Digital Mammography, pages 476–480. Springer-Verlag, March 2003. [10] P. R. Bakic, M. Albert, D. Brzakovic, and A. D. A. Maidment. Mammogram synthesis using 3D simulation. I. Breast tissue model and image acquisition simulation. Medical Physics, 29:2131–2139, 2002.
  • Bibliography 288 [11] J. A. Bangham, P. D. Ling, and R. Young. Multiscale recursive medians, scale-space and transforms with applications to image processing. IEEE Transactions on Image Processing, 5(6):1043–1048, 1996. [12] N. Baxter. Preventive health care, 2001 update: should women be routinely taught breast self-examination to screen for breast cancer? Canadian Med- ical Association Journal, 164(13):1837–1846, June 2001. [13] A. O. Beacham, J. S. Carpenter, and M. A. Andrykowski. Impact of benign breast biopsy upon breast self-examination. Preventive Medicine, 38(6):723–731, June 2004. [14] R. E. Bellman. Adaptive Control Processes. Princeton University Press, Princeton, NJ, USA, 1961. [15] U. Bick, M. L. Giger, R. A. Schmidt, R. M. Nishikawa, D. Wolverton, and K. Doi. Automated segmentation of digitized mammograms. Academic Radiology, 2:1–9, 1995. [16] K. Bliznakova, Z. Bliznakov, V. Bravou, Z. Kolitsi, and N. Pallikarakis. A three-dimensional breast software phantom for mammography simulation. Physics in Medicine and Biology, 48(22):3699–3719, 2003. [17] M. Board, S. Astley, and C. Boggis. Multi-resolution transportation for the detection of mammographic asymmetry. In International Workshop on Digital Mammography, 2004. (Accepted, pending.).
  • Bibliography 289 [18] L. Bocchi, G Coppini, J. Nori, and G. Valli. Detection of single and clus- tered microcalcifications in mammograms using fractals models and neural networks. Medical Engineering and Physics, 26(4):303–312, May 2004. [19] F. O. Bochud, C. K. Abbey, and M. P. Eckstein. Statistical texture syn- thesis of mammographic images with clustered lumpy backgrounds. Optics Express, 4(1):33–43, January 1999. [20] F. L. Bookstein. Principal Warps: Thin-Plate Splines and the Decompo- sition of Deformations. IEEE Transactions on Pattern Analysis Machine Intelligence, 11(6):567–585, 1989. [21] H. Booth, M. Gautrey, M. Sheldrake, N. Cooper, and M. Quinn. Cancer statistics registrations: Registrations of cancer diagnosed in 2001, England. Annual report series MB1 no. 32, National Statistics, 2004. Crown copy- right. [22] N. F. Boyd, J. W. Byng, R. A. Long, E. K. Fishell, L. E. Little, A. B. Miller, G. A. Lockwood, D. L. Tritchler, and M. J. Yaffe. Qualitative clas- sification of mammographic densities and breast cancer risk: results from the Canadian National Breast Screening Study. Journal of the National Cancer Institute, 87(9):670–675, May 1995. [23] M. Brady, F. Gilbert, S. Lloyd, M. Jirotka, D. Gavaghan, A. Simp- son, R. Highnam, T. Bowles, D. Schottlander, D. McCabe, D. Watson, B. Collins, J. Williams, A. Knox, M. Oevers, and P. Taylor. eDiaMoND: the UK’s Digital Mammography National Database. In International Work- shop on Digital Mammography, 2004. (Accepted, pending.).
  • Bibliography 290 [24] Breast Cancer Factsheet—February 2004. Online, February 2004. Accessed March 13 2005. [25] J. L. Breau. Chemotherapy in the management of breast cancer (la chimio- thrapie dans le traitement du cancer du sein). Chirurgie; Memoires De l’Academie De Chirurgie, 120(6–7):354–356, 1994–1995. [26] J. Bresenham. Algorithm for computer control of digital plotter. IBM System Journal, 4:25–30, 1965. [27] D. S. Brettle, E. Berry, and M. A. Smith. Synthesis of texture from clinical images. Image and Vision Computing, 21:433–445, May 2003. [28] J. Brown, A. Coulthard, A. K. Dixon, J. M. Dixon, D. F. Easton, R. A. Eeles, D. G. R. Evans, F. G. Gilbert, C. Hayes, J. P. R. Jenkins, et al. Rationale for a national multi-centre study of magnetic resonance imaging screening in women at genetic risk of breast cancer. The Breast, 9(2):72–77, April 2000. [29] D. Brzakovic, X. M. Luo, and P. Brzakovic. An Approach to Automated Detection of Tumors in Mammograms. IEE Transactions on Medical Imag- ing, 9(3):233–241, September 1990. [30] P. C. Bunch, J. F. Hamilton, G. K. Sanderson, and A. H. Simmons. A free response approach to measurement and characterization of radiographic observer performance. SPIE Proceedings, 127:124–135, 1977. [31] C. J. C. Burges. A tutorial on support vector machines for pattern recog- nition. Knowledge Discovery and Data Mining, 2(2):1–43, 1998.
  • Bibliography 291 [32] Warren L. J. Burhenne, S. A. Wood, C. J. D’Orsi, S. A. Feig, D. B. Kopans, K. F. O’Shaughnessy, E. A. Sickles, L. Tabar, C. J. Vyborny, and R. A. Castellino. Potential contribution of computer-aided detection to the sen- sitivity of screening mammography. Radiology, 215(2):554–562, May 2000. [33] J. W. Byng, N. F. Boyd, E. Fishell, R. A. Jong, and M. J. Yaffe. The quantitative analysis of mammographic densities. Physics in Medicine and Biology, 39(10):1629–1638, October 1994. [34] C. B. Caldwell, S. J. Stapleton, D. W. Holdsworth, R. A. Jong, W. J. Weiser, G. Cooke, and M. J. Yaffe. Characterization of mammographic parenchymal pattern by fractal dimension. Physics in Medicine and Biology, 35(2):235–247, February 1990. [35] R. Campanini, D. Dongiovanni, E. Iampieri, N. Lanconelli, M. Masotti, G. Palermo, A. Riccardi, and M. Roffilli. A novel featureless approach to mass detection in digital mammograms based on Support Vector Machines. Physics in Medicine and Biology, 49(6):961–975, March 2004. [36] N. A. Campbell and J. B. Reece. Biology. Benjamin Cummings, 7th edition, December 2004. [37] The stages, http://www.cancerhelp.org.uk/help/default. asp?page=3315, accessed July 10 2005. [38] S. J. Caulkin, S. M. Astley, A. Mills, and C. R. M. Boggis. Generating Realistic Spiculated Lesions in Digital Mammograms. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th International Workshop, pages 713–720. Medical Physics Publishing, December 2001.
  • Bibliography 292 [39] N. Cerneaz and M. Brady. Finding curvilinear structures in mammograms. In N. Ayache, editor, Computer Vision, Virtual Reality and Robotics in Medicine, volume 905 of Lecture Notes in Computer Science, pages 372– 382. Springer, March 1995. [40] D. P. Chakraborty. Maximum likelihood analysis of free-response receiver operating characteristic (FROC) data. Medical Physics, 16(4):561–568, July 1989. [41] H.-P. Chan, D. Wei, M. A. Helvie, B. Sahiner, D. D. Adler, M. M. Goodsitt, and N. Petrick. Computer-aided classification of mammographic masses and normal tissue: linear discriminant analysis in texture feature space. Physics in Medicine and Biology, 40(5):857–875, May 1995. [42] R. Chandrasekhar and Y. Attikiouzel. Automatic Breast Border Segmen- tation by Background Modelling and Subtraction. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th International Workshop, pages 560–565, Madison, Wisconsin, USA, December 2001. Medical Physics Pub- lishing. [43] P. Chaturvedi. Does smoking increase the risk of breast cancer? The Lancet Oncology, 4(11):657–658, November 2003. [44] E. Claridge and J. H. Richter. Characterisation of mammographic lesions. In A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital Mammography: Proceedings of the 2nd International Workshop on Digi- tal Mammography, York, UK, 10–12 July 1994, pages 241–250. Elsevier Science, September 1994.
  • Bibliography 293 [45] G. M. Clarke and D. Cooke. A Basic Course in Statistics. Arnold Publish- ers, 4th edition, October 1998. [46] P. Collinson. Of bombers, radiologists, and cardiologists: time to ROC. Heart, 80(3):236, February 1998. [47] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active Appearance Models. IEEE Transactions on Pattern Analysis Machine Intelligence, 23(6):681– 685, 2001. [48] T. F. Cootes, C. J. Taylor, and A. Lanitis. Active shape models: Evaluation of a multi-resolution method for improving image search. In E. Hancock, editor, Proceedings of the 5th British Machine Vision Conference, pages 327–336. BMVA Press, September 1994. [49] I. Daubechies. Ten Lectures on Wavelets. CBMS-NSF Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, January 1992. [50] Rh. H. Davies. Learning Shape: Optimal Models for Analysing Shape Vari- ability. PhD thesis, The Victoria University of Manchester, Manchester, United Kingdom, 2002. [51] Rh. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor. A Minimum Description Length Approach to Statistical Shape Modelling. IEEE Transactions on Medical Imaging, 2002. [52] USF Digital Mammography Home Page, http://marathon.csee. usf.edu/Mammography/Database.html, accessed January 2005.
  • Bibliography 294 [53] J. S. De Bonet and P. Viola. A Non-Parametric Multi-Scale Statistical Model for Natural Images. Advances in Neural Information Processing, 10, 1997. [54] I. den Tonkelaar, P. H. M. Peeters, and P. A. H. van Noord. Increase in breast size after menopause: prevalence and determinants. Maturitas, 48(1):51–57, May 2004. [55] J. Dengler, S. Behrens, and J. F. Desaga. Segmentation of Microcalcifica- tions in Mammograms. IEEE Transactions on Medical Imaging, 12(4):634– 642, December 1993. [56] P. A. Devijer and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice Hall International, 1982. [57] J. Dinnes, S. Moss, J. Melia, R. Blanks, F. Song, and J. Kleijnen. Effec- tiveness and cost-effectiveness of double reading of mammograms in breast cancer screening: findings of a systematic review. The Breast, 10(6):455– 463, December 2001. [58] C. J. D’Orsi, D. J. Getty, J. A. Swets, R. M. Pickett, S. E. Seltzer, and B. J. McNeil. Reading and Decision Aids for Improved Accuracy and Standard- ization of Mammographic Diagnosis. Radiology, 184:619–622, September 1992. [59] A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and transfer. In L. Pocock, editor, SIGGRAPH ’01: Proceedings of the 28th annual conference on computer graphics and interactive techniques, pages 341–346, New York, USA, 2001. ACM Press.
  • Bibliography 295 [60] A. A. Efros and T. K. Leung. Texture Synthesis by Non-Parametric Sam- pling. In 7th International Conference on Computer Vision (ICCV ’99), volume 2, pages 1033–1039. IEEE Computer Society Press, November 1999. [61] T. Ema, K. Doi, R. M. Nishikawa, Y. Jiang, and J. Papaioannou. Image feature analysis and computer-aided diagnosis in mammography: reduc- tion of false-positive clustered microcalcifications using local edge-gradient analysis. Medical Physics, 22(2):161–169, February 1995. [62] C. Evans, K. Yates, and M. Brady. Statistical Characterization of Normal Curvilinear Structures in Mammograms. In H.-O. Peitgen, editor, Digital Mammography: IWDM—6th International Workshop on Digital Mammog- raphy, pages 285–291. Springer-Verlag, March 2003. [63] A. Fenster, K. Surry, W. Smith, and D. B. D´wney. The use of three- o dimensional ultrasound imaging in breast biopsy and prostate therapy. Mea- surement, 36(3–4):245–256, October–December 2004. [64] B. Fisher, J. Bryant, J. J. Dignam, D. L. Wickerham, E. P. Mamounas, E. R. Fisher, R. G. Margolese, L. Nesbitt, S. Paik, T. M. Pisansky, and N. Wolmark. Tamoxifen, Radiation Therapy, or Both for Prevention of Ipsilateral Breast Tumor Recurrence After Lumpectomy in Women With Invasive Breast Cancers of One Centimeter or Less. Journal of Clinical Oncology, 20(20):4141–4149, October 2002. [65] C. E. Floyd, J. Y. Lo, A. J. Yun, D. C. Sullivan, and P. J. Kornguth. Pre- diction of Breast Cancer Malignancy Using and Artificial Neural Network. Cancer, 74(11):2944–2948, December 1994.
  • Bibliography 296 [66] P. Forrest. Breast cancer screening. Report to Health Ministers of England, Wales, Scotland and Northern Ireland by Working Group chaired by Sir Patrick Forrest, 1987. HMSO. [67] T. W. Freer and M. J. Ulissey. Screening mammography with computer- aided detection: prospective study of 12 860 patients in a community breast center. Radiology, 220(3):781–786, September 2001. [68] D. D. Garber. Computational Models for Texture Analysis and Texture Synthesis. PhD thesis, University of Southern California, May 1981. [69] GE Healthcare — Product Technology — Mammography — Senographe 2000D, http://www.gehealthcare.com/euen/mammography/ products/senographe-2000d/2000d_cad.html, accessed July 20 2005. [70] M. L. Giger, Z. Huo, C. J. Vyborny, L. Lan, R. M. Nishikawa, and I. Rosen- bourgh. Results of an Observer Study with an Intelligent Mammographic Workstation for CAD. In H.-O. Peitgen, editor, Digital Mammography: IWDM—6th International Workshop on Digital Mammography, pages 297– 303. Springer-Verlag, March 2003. [71] P. Giger, M. L. Lu and Z. Huo. CAD in mammography: Computerized detection and classification of masses. In A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital Mammography: Proceedings of the 2nd International Workshop on Digital Mammography, York, UK, 10– 12 July 1994, page 281. Elsevier Science, September 1994.
  • Bibliography 297 [72] F. J. Gilbert, A. Kirkpatrick, C. Boggis, S. Astley, S. Field, A. Gale, C. Hancock, K. Young, J. Cooke, S. Moss, R. Blanks, and L. Garvican. Computer Aided Detection in Mammography: Working Party of the Ra- diologists Quality Assurance Coordinating Group. Technical report, NHS, NHS Cancer Screening Programmes, Sheffield, UK, January 2001. NHSBSP Publication No. 48. [73] P. C. Gøtzsche and O. Olsen. Is screening for breast cancer with mammog- raphy justifiable? The Lancet, 355(9198):129–134, January 2000. [74] J. Grim and M. Haindl. A Discrete Mixtures Colour Texture Model. In Texture 2002: The 2nd international workshop on texture analysis and syn- thesis, pages 59–63, 1 June 2002. [75] ATAC Trialists’ Group. Results of the ATAC (Arimidex, Tamoxifen, Alone or in Combination) trial after completion of 5 years’ adjuvant treatment for breast cancer. The Lancet, 356(9453):60–62, January 2005. [76] D. Gur, J. H. Sumkin, H. E. Rockette, M. Ganott, C. Hakim, L. Hard- esty, W. R. Poller, R. Shah, and L. Wallace. Changes in Breast Cancer Detection and Mammography Recall Rates After the Introduction of a Computer-Aided Detection System. Journal of the National Cancer In- stitute, 96(3):185–190, 2004. [77] W. C. Hahn. Telomerase and Cancer. Clinical Cancer Research, 7:2953– 2954, October 2001.
  • Bibliography 298 [78] J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36, April 1982. [79] R. M. Haralick, K. Shanmugan, and I. Dinstein. Texture features for image classification. IEEE Transactions on Systems, Man and Cybernetics, 3:610– 621, 1973. [80] H. Hartley. Maximum likelihood estimation from incomplete data. Biomet- rics, 14:174–194, 1958. [81] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer, 2001. [82] Health Service Quarterly. Report 18, Office for National Statistics, London, UK, Summer 2003. [83] M. Heath, K. Bowyer, D. Kopans, R. Moore, and P. Kegelmeyer Jr. The Digital Database for Screening Mammography. In M. J. Yaffe, editor, Dig- ital Mammography: IWDM 2000, 5th International Workshop, pages 212– 218, Madison, Wisconsin, USA, December 2001. Medical Physics Publish- ing. [84] D. J. Heeger and J. R. Bergen. Pyramid-Based Texture Analysis/Synthesis. In SIGGRAPH 95: 22nd International ACM Conference on Computer Graphics and Interactive Techniques, pages 229–238. ACM Press, 1995.
  • Bibliography 299 [85] J. J. Heine, S. R. Deans, R. P. Velthuizen, and L. P. Clarke. On the statisti- cal nature of mammograms. Medical Physics, 26(11):2254–2265, November 1999. [86] R. Highnam and M. Brady. Mammographic Image Analysis. Computational Imaging and Vision Series. Kluwer, April 1999. [87] R. P. Highnam, J. M. Brady, and R. E. English. Simulating Disease in Mammography. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th International Workshop, pages 727–731. Medical Physics Publishing, December 2001. [88] R. P. Highnam, J. M. Brady, and B. J. Shepstone. A representation for mammographic image processing. Medical Image Analysis, 1(1):1–18, March 1996. [89] F. L. Hitchcock. The distribution of a product from several sources to numerous localities. Journal of Mathematics and Physics, 20:224–230, 1941. [90] A. Holmes. Computer-aided Detection of Abnormalities in Mammograms. PhD thesis, The Victoria University of Manchester, Manchester, United Kingdom, 2001. [91] A. S. Holmes, C. J. Rose, and C. J. Taylor. Measuring Similarity between Pixel Signatures. Image and Vision Computing, 20(5–6):331–340, April 2002.
  • Bibliography 300 [92] A. S. Holmes, C. J. Rose, and C. J. Taylor. Transforming Pixel Signa- tures into an Improved Metric Space. Image and Vision Computing, 20(9– 10):701–707, August 2002. [93] D. H. Hubel. Exploration of the Primary Visual Cortex. Nature, 299:515– 524, 1982. [94] Z. Huo, M. L. Giger, C. V. Vyborny, U. Bick, P. Lu, D. E. Wolverton, and R. A. Schmidt. Analysis of spiculation in the computerized classification of mammographic masses. Medical Physics, 22(10):1569–1579, October 1995. [95] I. W. Hutt. The computer-aided detection of abnormalities in digital mam- mograms. PhD thesis, The Victoria University of Manchester, Manchester, United Kingdom, 1996. [96] I. W. Hutt, S. M. Astley, and C. R. M. Boggis. Prompting as an aid to Diagnosis in Mammography. In A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital Mammography: Proceedings of the 2nd International Workshop on Digital Mammography, York, UK, 10–12 July 1994, pages 389–398. Elsevier Science, September 1994. [97] P. T. Huynh, A. M. Jarolimek, and S. Daye. The false negative mammo- gram. Radiographics, 18:1137–1154, 1998. [98] Press release, http://www.icadmed.com, accessed January 2005. [99] iCAD Breast Cancer Detection, http://www.icadmed.com, accessed July 20 2005.
  • Bibliography 301[100] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arith- metic, IEEE Standard 754-1985. Standard, IEEE, 1985.[101] International Breast Cancer Screening Network, http:// appliedresearch.cancer.gov/ibsn/, accessed July 20 2005.[102] A. K. Jain, M. N. Murty, and P. J. Flynn. Data Clustering: A Review. ACM Computing Surveys, 31(3), September 1999.[103] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analy- sis. Prentice-Hall, 5th edition, 2002.[104] I. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics. Springer Verlag, New York, USA, 2nd edition, 2002.[105] N. Karssemeijer. Adaptive noise equalization and recognition of micro- calcification clusters in mammograms. International Journal of Pattern Recognition and Artificial Intelligence, 7(6):1357–1376, 1993.[106] N. Karssemeijer. Adaptive Noise Equalization and Image Analysis in Mam- mography. In H. H. Barrett and A. F. Gmitro, editors, International Con- ference on Information Processing in Medical Imaging, volume 687 of Lec- ture Notes in Computer Science, pages 472–486, Flagstaff, Arizona, USA, June 14–18 1998. Springer.[107] N. Karssemeijer. Automated classification of parenchymal patterns in mam- mograms. Physics in Medicine and Biology, 43(2):365–378, February 1998.[108] N. Karssemeijer. Local orientation distribution as a function of spatial ˇa scale for detection of masses in mammograms. In A. Kuba, M. S´mal, and
  • Bibliography 302 A. Todd-Pokropek, editors, Information Processing in Medical Imaging: 16th International Conference, IPMI ’99, Visegrad, Hungary, June 28-July 2, 1999, volume 1613 of Lecture Notes in Computer Science, pages 280–293. Springer, June 1999.[109] N. Karssemeijer, J. D. M. Otten, A. L. M. Verbeek, J. H. Groenewoud, H. J. de Koning, J. H. C. L. Hendriks, and R. Holland. Computer-aided Detection versus Independent Double Reading of Masses on Mammograms. Radiology, 227:192–200, February 2003.[110] N. Karssemeijer and G. M. te Brake. Detection of stellate distortions in mammograms. IEEE Transactions on Medical Imaging, 15(5):611–619, Oc- tober 1996.[111] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International Journal of Computer Vision, 1(4):321–331, 1987.[112] T. J. Key, N. E. Allen, E. A. Spencer, and R. C. Travis. Nutrition and breast cancer. Breast (Edinburgh, Scotland), 12(6):412–416, December 2003.[113] J. Kilday, F. Palmieri, and M. D. Fox. Classifying Mammographic Le- sions Using Computerized Image Analysis. IEEE Transactions on Medical Imaging, 12(4):664–669, December 1993.[114] KODAK Mammography Computer-Aided Detection (CAD) System, http://www.kodak.com/global/en/health/productsByType/ medFilmSys/eqp/system/mamCad.jhtml?pq-path=6498, ac- cessed July 20 2005.
  • Bibliography 303[115] A. C. W. Kotcheff and C. J. Taylor. Automatic Construction of Eigenspace Models by Direct Optimisation. Medical Image Analysis, 2:303–314, 1998.[116] S. Lai, X. Li, and W. Bischoff. On techniques for detecting circumscribed masses in mammograms. IEEE Transactions on Medical Imaging, 8(4):377– 386, December 1989.[117] J.-L. Lamarque. An Atlas of The Breast: Clinical Radiodiagnosis. Wolfe Medical Atlases. Wolfe Medical Publications, London, United Kingdom, 1981.[118] M. Larkin. Breast self examination does more harm than good, says task force. The Lancet, 357(9274):2109, June 2001. News article.[119] B. Leyland-Jones. Trastuzumab: hopes and realities. The Lancet Oncology, 3(3):137–144, March 2002.[120] S. Liu, C. F. Babbs, and E. J. Delp. Multiresolution Detection of Spiculated Lesions in Digital Mammograms. IEEE Transactions on Image Processing, 10(6):874–884, June 2001.[121] S. L. Lou, H. D. Lin, K. P. Lin, and D. Hoogstrate. Automatic breast re- gion extraction from digital mammograms for PACS and telemammography applications. Computerized Medical Imaging and Graphics, 24(4):205–220, August 2000.[122] C. G. Mallat, S. G. Mallat, and S. Mallat. A Wavelet Tour of Signal Processing. Wavelet Analysis and Its Applications Series. Elsevier Science & Technology Books, 2nd edition, September 1999.
  • Bibliography 304[123] L. N. Mascio, S. D. Frankel, J. M. Hernandez, and C. M. Logan. Building the LLNL/UCSF Digital Mammogram Library with image groundtruth. In K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Dig- ital Mammography ’96: Proceedings of the 3rd International Workshop on Digital Mammography, International Congress Series, pages 427–430, Hills- borough, New Jersey, USA, December 1996. Excerpta Medica.[124] G. Matheron. Random Sets and Integral Geometry. Probability and Statis- tics Series. Wiley, February 1975.[125] J. McQueen. Some Methods for Classification and Analysis of Multivariate Observations. In 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.[126] C. E. Metz. ROC methodology in radiologic imaging. Investigative Radi- ology, 21(9):720–733, September 1986.[127] C. E. Metz. Evaluation of digital mammography by ROC analysis. In K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Digi- tal Mammography ’96: Proceedings of the 3rd International Workshop on Digital Mammography, International Congress Series, pages 61–68, Hills- borough, New Jersey, USA, December 1996. Excerpta Medica.[128] The NEW MIAS Digital Mammogram Database, http://www.wiau. man.ac.uk/services/MIAS/MIASweb.html, accessed July 20 2005.[129] P. Miller and S. Astley. Automated detection of breast asymmetries. In J. Illingworth, editor, British Machine Vision Conference, pages 519–528. BMVA Press, September 1993.
  • Bibliography 305[130] The mini-MIAS database of mammograms, http://peipa.essex.ac. uk/info/mias.html, accessed July 20 2005.[131] E. H. Moore. On the Reciprocal of the General Algebraic Matrix. (Ab- stract). Bulletin of the American Mathematical Society, 26:394–395, 1920.[132] N. R. Mudigonda, R. M. Rangayyan, and J. E. L. Desautels. Gradient and Texture Analysis for the Classification of Mammographic Masses. IEEE Transactions on Medical Imaging, 19(10):1032–1043, October 2000.[133] NHS Breast Screening Programme Annual Review 2004. Technical report, NHS, 2004.[134] The NHS Breast Screening Programme, http://www. cancerscreening.nhs.uk/breastscreen/, accessed February 2005.[135] R. M. Nishikawa, R. E. Johnston, D. E. Wolverton, R. A. Schmidt, E. D. Pisano, B. M. Hemminger, and J. Moody. A Common Database of Mam- mograms for Research in Digital Mammography. In K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Digital Mammography ’96: Proceedings of the 3rd International Workshop on Digital Mammography, International Congress Series, pages 435–438, Hillsborough, New Jersey, USA, December 1996. Excerpta Medica.[136] O. Olsen and P. C. Gøtzsche. Cochrane review on screening for breast cancer with mammography. The Lancet, 358(9290):1340–1342, October 2001.
  • Bibliography 306[137] Dempster A. P., Laird N. M., and Rubin D. B. Maximum Likelihood for Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.[138] S. Pemberton, D. Austin, J. Axelsson, T. Celik, D. Dominiak, H. Elenbaas, ¸ B. Epperson, M. Ishikawa, S. Matsui, S. McCarron, A. Navarro, S. Peru- vemba, R. Relyea, S. Schnitzenbaumer, and P. Stark. XHTML 1.0 The Extensible HyperText Markup Language (Second Edition). W3C Recom- mendation, World Wide Web Consortium (W3C), August 2002.[139] R. A. Penrose. A Generalised Inverse for Matrices. Proceedings of the Cambridge Philosophical Society, 51:406–413, 1955.[140] A. Petrie and C. Sabin. Medical Statistics at a Glance. At a Glance series. Blackwell Science, Oxford, UK, June 2000.[141] A. Petrosian, H.-P. Chan, M. A. Helvie, M. M. Goodsitt, and D. D. Adler. Computer-aided diagnosis in mammography: classification of mass and normal tissue by texture analysis. Physics in Medicine and Biology, 39(12):2273–2288, December 1994.[142] K. Popat and R. Picard. Novel Cluster-Based Probability Model for Texture Synthesis, Classification, and Compression. In B. G. Haskell and H.-M. Hang, editors, Visual Communications and Image Processing ’93, volume 2094, pages 756–768, Bellingham, Washington, USA, October 1993. SPIE.[143] J. Portilla and E. P. Simoncelli. Texture Modelling and Synthesis using Joint Statistics of Complex Wavelet Coefficients. In IEEE Workshop on
  • Bibliography 307 Statistical and Computational Theories of Vision, Fort Collins, Colorado, USA, June 1999.[144] J. Portilla and E. P. Simoncelli. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. International Journal of Computer Vision, 40(1):49–71, 2000.[145] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numer- ical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1992.[146] Digital Mammography Research, http://www.csse.uwa.edu.au/ ˜ptaylor/digmam.html, accessed March 6 2005.[147] W. Qian, M. Kallergi, L. P. Clarke, H.-D. Li, P. Venugopal, D. Song, and R. A. Clark. Tree structured wavelet transform segmentation of micro- calcifications in digital mammography. Medical Physics, 22(8):1247–1254, August 1995.[148] M. Quinn. Cancer survival, England, 1993-2000. National Statistics Press Release, January 2002.[149] Press release: R2 Introduces Smarter CAD Algorithm and Work- flow For Mammography Products, http://www.r2tech.com/main/ company/news_one_up.php?prID=140, accessed June 2005.[150] R2 Home, http://www.r2tech.com, accessed July 20 2005.[151] N. Ravishanker and D. K. Dey. A First Course in Linear Model Theory. Chapman and Hall/CRC, 2002.
  • Bibliography 308[152] C. J. Rose and C. J. Taylor. An Improved Method of Computing Scale- Orientation Signatures. In Medical Image Understanding and Analysis, pages 5–8, July 2001.[153] C. J. Rose and C. J. Taylor. A Statistical Model of Texture for Medical Im- age Synthesis and Analysis. In Medical Image Understanding and Analysis, pages 1–4, July 2003.[154] C. J. Rose and C. J. Taylor. A Generative Statistical Model of Mammo- graphic Appearance. In D. Rueckert, J. Hajnal, and G.-Z. Yang, editors, Medical Image Understanding and Analysis 2004, pages 89–92, Imperial College London, UK, September 2004.[155] C. J. Rose and C. J. Taylor. A Model of Mammographic Appearance. In British Journal of Radiology Congress Series: Proceedings of UK Radio- logical Congress 2004, pages 34–35, Manchester, United Kingdom, June 2004.[156] C. J. Rose and C. J. Taylor. A Statistical Model of Mammographic Ap- pearance for Synthesis and Analysis. In International Workshop on Digital Mammography, 2004. (Accepted, pending.).[157] C. J. Rose and C. J. Taylor. A Holistic Approach to the Detection of Abnor- malities in Mammograms. In British Journal of Radiology Congress Series: Proceedings of UK Radiological Congress 2005, page 29, Manchester, United Kingdom, June 2005.[158] B. Sahiner, H.-P. Chan, N. Petrick, M. A. Helvie, and L. M. Hadjiiski. Im- provement of mammographic mass characterization using spiculation mea-
  • Bibliography 309 sures and morphological features. Medical Physics, 28(7):1455–1465, July 2001.[159] P. Sajda, C. Spence, and L. Parra. A multi-scale probabilistic network model for detection, synthesis and compression in mammographic image analysis. Medical Image Analysis, 7(2):187–204, June 2003.[160] A. Salomon. Beitr¨ge zur pathologie und klinik der mammokarzinome. a Archiv f¨r Klinische Chirurgie, 101:573–668, 1913. u[161] J. A. Serra, editor. Image Analysis and Mathematical Morphology, volume 1. Academic Press, April 1982.[162] J. A. Serra, editor. Image Analysis and Mathematical Morphology: Theo- retical Advances, volume 2. Academic Press, 1988.[163] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423 and 623–656, July and October 1948.[164] E. P. Simoncelli and W. T. Freeman. The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation. In Second Interna- tional Conference on Image Processing, volume 3, pages 444–447. IEEE Signal Processing Society, 1995.[165] J. H. Smith. Prediction of the risk of breast cancer using computer vision techniques. PhD thesis, The Victoria University of Manchester, Manchester, United Kingdom, 1998.[166] J. H. Smith, S. M. Astley, J. Graham, and A. P. Hufton. The calibration of grey-levels in mammograms. In K. Doi, M. L. Giger, R. M. Nishikawa,
  • Bibliography 310 and R. A. Schidt, editors, Digital Mammography ’96: Proceedings of the 3rd International Workshop on Digital Mammography, International Congress Series, pages 195–200, Hillsborough, New Jersey, USA, December 1996. Excerpta Medica.[167] P. Soille, J. Breen, and R. Jones. Recursive Implementation of Erosions and Dilations along Discrete Lines at Arbitrary Angles. IEEE Transactions on Pattern Analysis and Machine Vision, 18(5):562–567, May 1996.[168] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis and Ma- chine Vision. PWS (Brooks/Cole Publishing), International Thomson Pub- lishing Europe, High Holborn, London, England, 2nd edition, 1999.[169] C. Spence, L. Parra, and P. Sajda. Detection, Synthesis and Compression in Mammographic Image Analysis with a Hierarchical Image Probability Model. In L. Staib, editor, IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, pages 3–10. IEEE, 2001.[170] S. J. Starr, C. E. Metz, L. B. Lusted, and D. J. Goodenough. Visual detection and localization of radiographic images. Radiology, 116:533–538, 1975.[171] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Stamatakis, N. Cerneaz, S. Kok, P. Taylor, D. Betal, and J. Savage. The mammographic image analysis society digital mammogram database. In A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital Mammography: Proceedings of the 2nd International Workshop on Digi-
  • Bibliography 311 tal Mammography, York, UK, 10–12 July 1994, pages 375–378. Elsevier Science, September 1994.[172] L. Tab´r, P. B. Dean, and T. Tot. Teaching Atlas of Mammography. Thieme a Medical Publishers, New York, USA, 3rd edition, January 2001.[173] P. G. Tahoces, J. Correa, M. Soutu, L. Gomez, and J. J. Vidal. Computer- assisted diagnosis: the classification of mammographic breast parenchymal patterns. Physics in Medicine and Biology, 40(1):103–117, January 1995.[174] L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady. Novelty detection for the identification of masses in mammograms. In Proceedings of the Fourth International Conference on Artificial Neural Networks, pages 442– 447. IEEE, June 1995.[175] P. Taylor, S. Hajnal, M.-H. Dilhuydy, and B. Barreau. Measuring image texture to separate “difficult” from “easy” mammograms. British Journal of Radiology, 67(797):456–463, 1994.[176] P. Taylor, R. Owens, and D. Ingram. 3-D Fractal Modelling of Breast Growths. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th International Workshop, pages 785–791, Madison, Wisconsin, USA, Decem- ber 2001. Medical Physics Publishing.[177] G.M. te Brake and N. Karssemeijer. Segmentation of suspicious densities in digital mammograms. Medical Physics, 28(2):259–266, February 2001.[178] C. H. van Gils, J. H. C. L. Hendriks, R. Holland, N. Karssemeijer, J. D. M. Otten, H. Straatman, and A. L. M. Verbeek. Changes in mammographic
  • Bibliography 312 breast density and concomitant changes in breast cancer risk. European Journal of Cancer Prevention, 8(6):509–515, December 1999.[179] J. H. Veldkamp, N. Karssemeijer, J. D. M. Otten, and J. H. C. L. Hendriks. Automated classification of clustered microcalcifications into malignant and benign types. Medical Physics, 27(11):2600–2608, November 2000.[180] W. Veldkamp and N. Karssemeijer. Improved correction for signal de- pendent noise applied to automatic detection of microcalcifications. In N. Karssemeijer, M. A. O. Thijssen, J. H. C. L. Hendriks, and L. J. T. O. van Erning, editors, Digital Mammography, volume 13 of Computational Imaging and Vision, pages 169–176. Kluwer Academic Publishers, Novem- ber 1998.[181] VuCOMP—Redefining CAD, http://www.vucomp.com/, accessed July 20 2005.[182] R. Warren, M. Harvie, and A. Howell. Strategies for managing breast cancer risk after the menopause. Treat Endocrinol, 3(5):289–307, 2004.[183] A. P. Wickens. Foundations of Biopsychology. Pearson Education, Harlow, England, 2nd edition, 2005.[184] T. N. Wiesel. Postnatal development of the visual cortex and the influence of the environment. Nature, 299:583–591, 1982.[185] J. N. Wolfe. Risk for breast cancer development determined by mammo- graphic parenchymal pattern. Cancer, 37(5):2486–2492, May 1976.
  • Bibliography 313[186] C. J. Wright and C. B. Mueller. Screening mammography and public health policy: the need for perspective. The Lancet, 436(8966):29–32, July 1995.[187] Y. Wu, M. L. Giger, K. Doi, C. Vyborny, R. A. Schmidt, and C. E. Metz. Artificial Neural Networks in Mammography: Application to Decision Mak- ing in the Diagnosis of Breast Cancer. Radiology, 187:81–87, April 1993.[188] W. Zhang, K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schmidt. An improved shift-invariant artificial neural network for computerized detection of clustered microcalcifications in digital mammograms. Medical Physics, 23(4):595–601, April 1996.[189] C. Zhou, H.-P. Chan, N. Petrick, M. A. Helvie, M. M. Goodsitt, B. Sahiner, and L. M. Hadjiiski. Computerized image analysis: Estimation of breast density on mammograms. Medical Physics, 28(6):1056–1069, June 2001.[190] R. Zwiggelaar, S. M. Astley, C. R. M. Boggis, and C. J. Taylor. Linear Structures in Mammographic Images: Detection and Classification. IEEE Transactions on Medical Imaging, 23(9):1077–1087, September 2004.[191] R. Zwiggelaar and R. Marti. Detecting Linear Structures In Mammographic Images. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th International Workshop, pages 436–442, Madison, Wisconsin, USA, Decem- ber 2001. Medical Physics Publishing.[192] R. Zwiggelaar, T. C. Parr, J. E. Schuum, I. W Hutt, S. M. Astley, C. J. Taylor, and C. R. M. Boggis. Model-based detection of spiculated lesions in mammograms. Medical Image Analysis, 3(1):39–62, 1999.
  • Bibliography 314[193] R. Zwiggelaar, P. Planiol, J. Marti, R. Marti, L. Blot, E. R. E. Denton, and C. M. E. Rubin. EM Texture Segmentation of Mammographic Im- ages. In H.-O. Peitgen, editor, Digital Mammography: IWDM—6th In- ternational Workshop on Digital Mammography, pages 223–227. Springer- Verlag, March 2003.