Upcoming SlideShare
Loading in …5
×

# Medioni, kang -_emerging_topics_in_computer_vision

847 views
831 views

Published on

Medioni, kang -_emerging_topics_in_computer_vision
gonzalo santiago martinez

Published in: Education
0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total views
847
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Medioni, kang -_emerging_topics_in_computer_vision

1. 1. Emerging Topics in Computer Vision Edited by G´erard Medioni and Sing Bing Kang
2. 2. Contents PREFACE ix CONTRIBUTORS x 1 INTRODUCTION 1 1.1 Organization 1 1.2 How to Use the Book 2 1.3 Contents of DVDs 2 SECTION I: FUNDAMENTALS IN COMPUTER VISION 3 2 CAMERA CALIBRATION 5 Zhengyou Zhang 2.1 Introduction 5 2.2 Notation and Problem Statement 7 2.2.1 Pinhole Camera Model 8 2.2.2 Absolute Conic 9 2.3 Camera Calibration with 3D Objects 11 2.3.1 Feature Extraction 13 2.3.2 Linear Estimation of the Camera Projection Matrix 13 2.3.3 Recover Intrinsic and Extrinsic Parameters from P 14 2.3.4 Reﬁne Calibration Parameters Through a Nonlinear Optimization 15 2.3.5 Lens Distortion 15 2.3.6 An Example 17 2.4 Camera Calibration with 2D Objects: Plane Based Technique 18 2.4.1 Homography between the model plane and its image 18 2.4.2 Constraints on the intrinsic parameters 19 2.4.3 Geometric Interpretation2 19 2.4.4 Closed-form solution 20 i
3. 3. ii Contents 2.4.5 Maximum likelihood estimation 22 2.4.6 Dealing with radial distortion 22 2.4.7 Summary 23 2.4.8 Experimental Results 24 2.4.9 Related Work 26 2.5 Solving Camera Calibration With 1D Objects 27 2.5.1 Setups With Free-Moving 1D Calibration Objects 28 2.5.2 Setups With 1D Calibration Objects Moving Around a ﬁxed Point 29 2.5.3 Basic Equations 30 2.5.4 Closed-Form Solution 32 2.5.5 Nonlinear Optimization 33 2.5.6 Estimating the ﬁxed point 34 2.5.7 Experimental Results 35 2.6 Self Calibration 39 2.7 Conclusion 39 2.8 Appendix: Estimating Homography Between the Model Plane and its Image 40 Bibliography 41 3 MULTIPLE VIEW GEOMETRY 45 Anders Heyden and Marc Pollefeys 3.1 Introduction 45 3.2 Projective Geometry 46 3.2.1 The central perspective transformation 46 3.2.2 Projective spaces 47 3.2.3 Homogeneous coordinates 49 3.2.4 Duality 52 3.2.5 Projective transformations 54 3.3 Tensor Calculus 56 3.4 Modelling Cameras 58 3.4.1 The pinhole camera 58 3.4.2 The camera matrix 59 3.4.3 The intrinsic parameters 59 3.4.4 The extrinsic parameters 60 3.4.5 Properties of the pinhole camera 62 3.5 Multiple View Geometry 63 3.5.1 The structure and motion problem 63 3.5.2 The two-view case 64 3.5.3 Multi-view constraints and tensors 70
4. 4. Contents iii 3.6 Structure and Motion I 75 3.6.1 Resection 75 3.6.2 Intersection 75 3.6.3 Linear estimation of tensors 76 3.6.4 Factorization 78 3.7 Structure and Motion II 80 3.7.1 Two-view geometry computation 80 3.7.2 Structure and motion recovery 83 3.8 Auto-calibration 87 3.9 Dense Depth Estimation 90 3.9.1 Rectiﬁcation 90 3.9.2 Stereo matching 92 3.9.3 Multi-view linking 93 3.10 Visual Modeling 94 3.10.1 3D surface reconstruction 95 3.10.2 Image-based rendering 98 3.10.3 Match-moving 101 3.11 Conclusion 101 Bibliography 103 4 ROBUST TECHNIQUES FOR COMPUTER VISION 109 Peter Meer 4.1 Robustness in Visual Tasks 109 4.2 Models and Estimation Problems 113 4.2.1 Elements of a Model 113 4.2.2 Estimation of a Model 119 4.2.3 Robustness of an Estimator 122 4.2.4 Deﬁnition of Robustness 124 4.2.5 Taxonomy of Estimation Problems 127 4.2.6 Linear Errors-in-Variables Regression Model 130 4.2.7 Objective Function Optimization 134 4.3 Location Estimation 139 4.3.1 Why Nonparametric Methods 140 4.3.2 Kernel Density Estimation 141 4.3.3 Adaptive Mean Shift 146 4.3.4 Applications 150 4.4 Robust Regression 157 4.4.1 Least Squares Family 158 4.4.2 M-estimators 163 4.4.3 Median Absolute Deviation Scale Estimate 166
5. 5. iv Contents 4.4.4 LMedS, RANSAC and Hough Transform 169 4.4.5 The pbM-estimator 173 4.4.6 Applications 178 4.4.7 Structured Outliers 180 4.5 Conclusion 183 Bibliography 183 5 THE TENSOR VOTING FRAMEWORK 191 G´erard Medioni and Philippos Mordohai 5.1 Introduction 191 5.1.1 Motivation 192 5.1.2 Desirable descriptions 194 5.1.3 Our approach 195 5.1.4 Chapter Overview 197 5.2 Related Work 198 5.3 Tensor Voting in 2D 203 5.3.1 Second Order Representation and Voting in 2-D 203 5.3.2 First Order Representation and Voting in 2-D 210 5.3.3 Voting Fields 213 5.3.4 Vote analysis 215 5.3.5 Results in 2-D 218 5.3.6 Illusory Contours 219 5.4 Tensor Voting in 3D 221 5.4.1 Representation in 3-D 222 5.4.2 Voting in 3-D 224 5.4.3 Vote analysis 226 5.4.4 Results in 3-D 228 5.5 Tensor Voting in ND 229 5.5.1 Computational Complexity 232 5.6 Application to Computer Vision Problems 233 5.6.1 Initial Matching 234 5.6.2 Uniqueness 235 5.6.3 Discrete Densiﬁcation 236 5.6.4 Discontinuity Localization 237 5.6.5 Stereo 239 5.6.6 Multiple View Stereo 241 5.6.7 Visual Motion from Motion Cues 243 5.6.8 Visual Motion on Real Images 245 5.7 Conclusion and Future Work 246 5.8 Acknowledgment 250
6. 6. Contents v Bibliography 250 SECTION II: APPLICATIONS IN COMPUTER VISION 254 6 IMAGE BASED LIGHTING 255 Paul E. Debevec 6.1 Basic Image Based Lighting 257 6.1.1 Capturing Light 257 6.1.2 Illuminating Synthetic Objects with Real Light 260 6.1.3 Lighting Entire Environments with IBL 269 6.2 Advanced Image Based Lighting 269 6.2.1 Capturing a Light Probe in Direct Sunlight 272 6.2.2 Compositing objects into the scene including shadows 282 6.2.3 Image-Based Lighting in Fiat Lux 288 6.2.4 Capturing and Rendering Spatially-Varying Illumination291 6.3 Image Based Relighting 294 6.4 Conclusion 299 Bibliography 301 7 COMPUTER VISION IN VISUAL EFFECTS 305 Doug Roble 7.1 Introduction 305 7.2 Computer Vision Problems Unique to Film 306 7.2.1 Welcome to the Set 306 7.3 Feature Tracking 319 7.4 Optical Flow 321 7.5 Camera Tracking and Structure from Motion 325 7.6 The Future 330 Bibliography 330 8 CONTENT BASED IMAGE RETRIEVAL: AN OVERVIEW 333 Theo Gevers and Arnold W.M. Smeulders 8.1 Overview of the chapter 334 8.2 Image Domains 339 8.2.1 Search modes 339 8.2.2 The sensory gap 341 8.2.3 The semantic gap 342 8.2.4 Discussion 343 8.3 Image Features 344
7. 7. vi Contents 8.3.1 Color 345 8.3.2 Shape 348 8.3.3 Texture 349 8.3.4 Discussion 352 8.4 Representation and Indexing 352 8.4.1 Grouping data 353 8.4.2 Features accumulation 354 8.4.3 Feature accumulation and image partitioning 357 8.4.4 Salient features 358 8.4.5 Shape and object features 359 8.4.6 Structure and lay-out 361 8.4.7 Discussion 361 8.5 Similarity and Search 362 8.5.1 Semantic interpretation 362 8.5.2 Similarity between features 363 8.5.3 Similarity of object outlines 366 8.5.4 Similarity of object arrangements 367 8.5.5 Similarity of salient features 368 8.5.6 Discussion 369 8.6 Interaction and Learning 369 8.6.1 Interaction on a semantic level 369 8.6.2 Classiﬁcation on a semantic level 370 8.6.3 Learning 371 8.6.4 Discussion 371 8.7 Conclusion 372 Bibliography 372 9 FACE DETECTION, ALIGNMENT AND RECOGNITION 385 Stan Z. Li and Juwei Lu 9.1 Introduction 385 9.2 Face Detection 388 9.2.1 Appearance and Learning Based Approach 389 9.2.2 Preprocessing 391 9.2.3 Neural and Kernel Methods 393 9.2.4 Boosting Based Methods 394 9.2.5 Post-Processing 400 9.2.6 Evaluation 401 9.3 Face Alignment 404 9.3.1 Active Shape Model 405 9.3.2 Active Appearance Model 407
8. 8. Contents vii 9.3.3 Modeling Shape from Texture 408 9.3.4 Dealing with Head Pose 414 9.3.5 Evaluation 416 9.4 Face Recognition 419 9.4.1 Preprocessing 419 9.4.2 Feature Extraction 420 9.4.3 Pattern Classiﬁcation 431 9.4.4 Evaluation 439 Bibliography 445 10 PERCEPTUAL INTERFACES 455 Matthew Turk and Mathias K¨olsch 10.1 Introduction 455 10.2 Perceptual Interfaces and HCI 457 10.3 Multimodal Interfaces 464 10.4 Vision Based Interfaces 472 10.4.1 Terminology 476 10.4.2 Elements of VBI 479 10.4.3 Computer Vision Methods for VBI 491 10.4.4 VBI Summary 504 10.5 Brain-Computer Interfaces 504 10.6 Summary 507 Bibliography 509 SECTION III: PROGRAMMING FOR COMPUTER VISION 520 11 OPEN SOURCE COMPUTER VISION LIBRARY 521 Gary Bradski 11.1 Overview 521 11.1.1 Installation 522 11.1.2 Organization 527 11.1.3 Optimizations 529 11.2 Functional Groups: What’s Good for What 532 11.2.1 By Area 534 11.2.2 By Task 538 11.2.3 Demos and Samples 542 11.3 Pictorial Tour 545 11.3.1 Functional Groups 545 11.3.2 Demo Tour 561
9. 9. viii Contents 11.4 Programming Examples Using C/C++ 561 11.4.1 Read Images from Disk 566 11.4.2 Read AVIs from Disk, or Video from a Camera 568 11.5 Other Interfaces 570 11.5.1 Ch 570 11.5.2 Matlab 575 11.5.3 Lush 577 11.6 Appendix A 578 11.7 Appendix B 579 Bibliography 580 12 SOFTWARE ARCHITECTURE FOR COMPUTER VI- SION 585 Alexandre R.J. Fran¸cois 12.1 Introduction 585 12.1.1 Motivation 585 12.1.2 Contribution 588 12.1.3 Outline 589 12.2 SAI: A Software Architecture Model 590 12.2.1 Beyond Pipes and Filters 590 12.2.2 The SAI Architectural Style 597 12.2.3 Example Designs 601 12.2.4 Architectural Properties 620 12.3 MFSM: An Architectural Middleware 622 12.3.1 MFSM overview 623 12.3.2 A First Image Manipulation Example 626 12.3.3 Custom Elements 634 12.3.4 A Shared Memory Access Example 643 12.4 Conclusion 648 12.4.1 Summary 648 12.4.2 Perspectives 649 12.5 Acknowledgments 651 Bibliography 651
10. 10. PREFACE One of the major changes instituted at the 2001 Conference on Computer Vision and Pattern Recognition (CVPR) in Kauai, HI was the replacement of the traditional tutorial sessions with a set of short courses. The topics of these short courses were carefully chosen to reﬂect the diversity in com- puter vision and represent very promising areas. The response to these short courses was a very pleasant surprise, with up to more than 200 people attend- ing a single short course. This overwhelming response was the inspiration for this book. There are three parts in this book. The ﬁrst part covers some of the more fundamental aspects of computer vision, the second describes a few interest- ing applications, and third details speciﬁc approaches to facilitate program- ming for computer vision. This book is not intended to be a comprehensive coverage of computer vision; it can, however, be used as a complement to most computer vision textbooks. A unique aspect of this book is the accompanying DVD which features videos of lectures by the contributors. We feel that these lectures would be very useful for readers as quick previews of the topics covered in the book. In addition, these lectures are much more eﬀective in depicting results in the form of video or animations, compared to printed material. We would like to thank all the contributors for all their hard work, and Bernard Goodwin for his support and enthusiasm for our book project. The USC Distance Education Network helped to tape and produce the lectures and Bertran Harden tirelessly assembled all the multimedia content onto a DVD. We are also grateful to P. Anandan and Microsoft Corporation for the ﬁnancial support used to defray some of the lecture production costs. G´erard Medioni, University of Southern California Sing Bing Kang, Microsoft Research November, 2003 ix
11. 11. CONTRIBUTORS Gary Bradski Mgr: Machine Learning Group Intel Labs SC12-303 2200 Mission College Blvd. Santa Clara, CA 95052-8119 USA www.intel.com/research/mrl/research/opencv www.intel.com/research/mrl/research/media-visual.htm Paul Debevec USC Institute for Creative Technologies 13274 Fiji Way, 5th Floor Marina del Rey, CA 90292 USA http://www.debevec.org/ Alexandre R.J. Fran¸cois PHE-222 MC-0273 Institute for Robotics and Intelligent Systems University of Southern California Los Angeles, CA 90089-0273 USA afrancoi@usc.edu iris.usc.edu/ afrancoi x
12. 12. Contributors xi Theo Gevers University of Amsterdam Kruislaan 403 1098 SJ Amsterdam The Netherlands gevers@science.uva.nl http://carol.science.uva.nl/ gevers/ Anders Heyden Centre for Mathematical Sciences Lund University Box 118 SE-221 00 Lund Sweden heyden@maths.lth.se www.maths.lth.se/matematiklth/personal/andersp/ Mathias K¨olsch Computer Science Department University of California Santa Barbara, CA 93106 USA matz@cs.ucsb.edu http://www.cs.ucsb.edu/∼matz Stan Z. Li Microsoft Research Asia 5/F, Beijing Sigma Center No. 49, Zhichun Road, Hai Dian District Beijing, China 100080 szli@microsoft.com www.research.microsoft.com/∼szli Juwei Lu Bell Canada Multimedia Lab University of Toronto Bahen Centre for Information Technology Room 4154, 40 St George Str. Toronto, ON, M5S 3G4 Canada juwei@dsp.utoronto.ca www.dsp.utoronto.ca/∼juwei/
13. 13. xii Contributors Gerard Medioni SAL 300, MC-0781 Computer Science Department University of Southern California Los Angeles, CA 90089-0781 USA medioni@iris.usc.edu iris.usc.edu/home/iris/medioni/User.html Peter Meer Electrical and Computer Engineering Department Rutgers University 94 Brett Road Piscataway, NJ 08854-8058 USA meer@caip.rutgers.edu www.caip.rutgers.edu/∼meer Philippos Mordohai PHE 204, MC-0273 3737 Watt Way Los Angeles, CA 90089-0273 USA mordohai@usc.edu iris.usc.edu/home/iris/mordohai/User.html Marc Pollefeys Department of Computer Science University of North Carolina Sitterson Hall, CB#3175 Chapel Hill, NC 27599-3175 USA marc@cs.unc.edu www.cs.unc.edu/∼marc/ Doug Roble Digital Domain 300 Rose Av Venice, CA 90291 USA www.d2.com
14. 14. Contributors xiii Arnold Smeulders ISIS group, University of Amsterdam Kruislaan 403 1098SJ AMSTERDAM The Netherlands smeulder@science.uva.nl www.science.uva.nl/isis/ Matthew Turk Computer Science Department University of California Santa Barbara, CA 93106 USA mturk@cs.ucsb.edu www.cs.ucsb.edu/∼mturk Zhengyou Zhang Microsoft Corporation One Microsoft Way Redmond, WA 98052 USA zhang@microsoft.com www.research.microsoft.com/∼zhang/
15. 15. Chapter 1 INTRODUCTION The topics in this book were handpicked to showcase what we consider to be exciting and promising in computer vision. They are a mix of more well-known and traditional topics (such as camera calibration, multi-view geometry, and face detection), and newer ones (such as vision for special eﬀects and tensor voting framework). All have the common denominator of either demonstrated longevity or potential for endurance in computer vision, when the popularity of a number of areas have come and gone in the past. 1.1 Organization The book is organized into three sections, covering various fundamentals, applications, and programming aspects of computer vision. The fundamentals section consists of four chapters. Two of the chapters deal with the more conventional but still popular areas: camera calibration and multi-view geometry. They deal with the most fundamental operations associated with vision. The chapter on robust estimation techniques will be very useful for researchers and practitioners of computer vision alike. There is also a chapter on a more recent tool (namely the tensor voting framework) developed that can be customized for a variety of problems. The applications section covers two more recent applications (image- based lighting and vision for visual eﬀects) and three in more conventional areas (image seach engines, face detection and recognition, and perceptual interfaces). One of the more overlooked area in computer vision is the programming aspect of computer vision. While there are generic commercial packages that can be used, there exists popular libraries or packages that are speciﬁcally geared for computer vision. The ﬁnal section of the book describes two diﬀerent approaches to facilitate programming for computer vision. 1
16. 16. 2 Introduction Chapter 1 1.2 How to Use the Book The book is designed to be accompanying material to computer vision text- books. Each chapter is designed to be self-contained, and is written by well- known authorities in the area. We suggest that the reader watch the lecture ﬁrst before reading the chapter, as the lecture (given by the contributor) provides an excellent overview of the topic. 1.3 Contents of DVDs The two DVDs are organized by chapter as follows: – Chap. 2: Camera Calibration (Z. Zhang) – VS – Chap. 3: Multiple View Geometry (A. Heyden, M. Pollefeys) – VS – Chap. 4: Robust Techniques for Computer Vision (P. Meer) – VS – Chap. 5: The Tensor Voting Framework (G. Medioni, P. Mordohai) – VS – Chap. 6: Image Based Lighting (P.E. Debevec) – VSC – Chap. 7: Computer Vision in Visual Eﬀects (D. Roble) – SC – Chap. 8: Content Based Image Retrieval: An Overview (T. Gevers, A.W.M. Smeulders) – V – Chap. 9: Face Detection, Alignment and Recognition (S.Z. Li, J. Lu) – V – Chap. 10: Perceptual Interfaces (M. Turk, M. K¨olsch) – VS – Chap. 11: Open Source Computer Vision Library (G. Bradski) – SP – Chap. 12: Software Architecture for Computer Vision (A.R.J. Fran¸cois) – VS (Note: V=video presentation, S=slides in PDF format, C=color images in both BMP and PDF formats, P=project and source code.)
17. 17. SECTION I: FUNDAMENTALS IN COMPUTER VISION It is only ﬁtting that we start with some of the more fundamental concepts in computer vision. The range of topics covered in this section is wide: camera calibration, structure from motion, dense stereo, 3D modeling, robust techniques for model ﬁtting, and a more recently developed concept called tensor voting. In Chapter 2, Zhang reviews the diﬀerent techniques for calibrating a camera. More speciﬁcally, he describes calibration techniques that use 3D reference objects, 2D planes, and 1D lines, as well as self-calibration tech- niques. One of more popular (and diﬃcult) areas in computer vision is stereo. Heyden and Pollefeys describe how camera motion and scene structure can be reliably extracted from image sequences in Chapter 3. Once this is ac- complished, dense depth distributions can be extracted for 3D surface recon- struction and image-based rendering applications. A basic task in computer vision is hypothesizing models (e.g., 2D shapes) and using input data (typically image data) to corroborate and ﬁt the models. In practice, however, robust techniques for model ﬁtting must be used to handle input noise. In Chapter 4, Meer describes various robust regression techniques such as M-estimators, RANSAC, and Hough transform. He also covers the mean shift algorithm for the location estimation problem. The claim by Medioni and his colleagues that computer vision problems can be addressed within a Gestalt framework is the basis of their work on tensor voting. In Chapter 5, Medioni and Mordohai provide an introduction to the concept of tensor voting, which is a form of binning according to 3
18. 18. 4 Section I: Fundamentals in Computer Vision proximity to ideal primitives such as edges and points. They show how this scheme can be applied to a variety of applications, such as curve and surface extraction from noisy 2D and 3D points (respectively), stereo matching, and motion-based grouping.
19. 19. Chapter 2 CAMERA CALIBRATION Zhengyou Zhang Camera calibration is a necessary step in 3D computer vision in order to extract metric information from 2D images. It has been studied extensively in computer vision and photogrammetry, and even recently new techniques have been proposed. In this chapter, we review the techniques proposed in the literature include those using 3D apparatus (two or three planes or- thogonal to each other, or a plane undergoing a pure translation, etc.), 2D objects (planar patterns undergoing unknown motions), 1D objects (wand with dots) and unknown scene points in the environment (self-calibration). The focus is on presenting these techniques within a consistent framework. 2.1 Introduction Camera calibration is a necessary step in 3D computer vision in order to extract metric information from 2D images. Much work has been done, starting in the photogrammetry community (see [3, 6] to cite a few), and more recently in computer vision ([12, 11, 33, 10, 37, 35, 22, 9] to cite a few). According to the dimension of the calibration objects, we can classify those techniques roughly into three categories. 3D reference object based calibration. Camera calibration is performed by observing a calibration object whose geometry in 3-D space is known with very good precision. Calibration can be done very eﬃciently [8]. The calibration object usually consists of two or three planes orthog- onal to each other. Sometimes, a plane undergoing a precisely known translation is also used [33], which equivalently provides 3D reference points. This approach requires an expensive calibration apparatus and 5
20. 20. 6 Camera Calibration Chapter 2 an elaborate setup. 2D plane based calibration. Techniques in this category requires to ob- serve a planar pattern shown at a few diﬀerent orientations [42, 31]. Diﬀerent from Tsai’s technique [33], the knowledge of the plane motion is not necessary. Because almost anyone can make such a calibration pattern by him/her-self, the setup is easier for camera calibration. 1D line based calibration. Calibration objects used in this category are composed of a set of collinear points [44]. As will be shown, a camera can be calibrated by observing a moving line around a ﬁxed point, such as a string of balls hanging from the ceiling. Self-calibration. Techniques in this category do not use any calibration object, and can be considered as 0D approach because only image point correspondences are required. Just by moving a camera in a static scene, the rigidity of the scene provides in general two con- straints [22, 21] on the cameras’ internal parameters from one camera displacement by using image information alone. Therefore, if images are taken by the same camera with ﬁxed internal parameters, cor- respondences between three images are suﬃcient to recover both the internal and external parameters which allow us to reconstruct 3-D structure up to a similarity [20, 17]. Although no calibration objects are necessary, a large number of parameters need to be estimated, re- sulting in a much harder mathematical problem. Other techniques exist: vanishing points for orthogonal directions [4, 19], and calibration from pure rotation [16, 30]. Before going further, I’d like to point out that no single calibration tech- nique is the best for all. It really depends on the situation a user needs to deal with. Following are my few recommendations: – Calibration with apparatus vs. self-calibration. Whenever possible, if we can pre-calibrate a camera, we should do it with a calibration appa- ratus. Self-calibration cannot usually achieve an accuracy comparable with that of pre-calibration because self-calibration needs to estimate a large number of parameters, resulting in a much harder mathematical problem. When pre-calibration is impossible (e.g., scene reconstruction from an old movie), self-calibration is the only choice. – Partial vs. full self-calibration. Partial self-calibration refers to the case where only a subset of camera intrinsic parameters are to be cal-
21. 21. Section 2.2. Notation and Problem Statement 7 ibrated. Along the same line as the previous recommendation, when- ever possible, partial self-calibration is preferred because the number of parameters to be estimated is smaller. Take an example of 3D re- construction with a camera with variable focal length. It is preferable to pre-calibrate the pixel aspect ratio and the pixel skewness. – Calibration with 3D vs. 2D apparatus. Highest accuracy can usually be obtained by using a 3D apparatus, so it should be used when accuracy is indispensable and when it is aﬀordable to make and use a 3D apparatus. From the feedback I received from computer vision researchers and practitioners around the world in the last couple of years, calibration with a 2D apparatus seems to be the best choice in most situations because of its ease of use and good accuracy. – Calibration with 1D apparatus. This technique is relatively new, and it is hard for the moment to predict how popular it will be. It, however, should be useful especially for calibration of a camera network. To calibrate the relative geometry between multiple cameras as well as their intrinsic parameters, it is necessary for all involving cameras to simultaneously observe a number of points. It is hardly possible to achieve this with 3D or 2D calibration apparatus1 if one camera is mounted in the front of a room while another in the back. This is not a problem for 1D objects. We can for example use a string of balls hanging from the ceiling. This chapter is organized as follows. Section 2.2 describes the camera model and introduces the concept of the absolute conic which is important for camera calibration. Section 2.3 presents the calibration techniques using a 3D apparatus. Section 2.4 describes a calibration technique by observing a freely moving planar pattern (2D object). Its extension for stereo calibration is also addressed. Section 2.5 describes a relatively new technique which uses a set of collinear points (1D object). Section 2.6 brieﬂy introduces the self- calibration approach and provides references for further reading. Section 2.7 concludes the chapter with a discussion on recent work in this area. 2.2 Notation and Problem Statement We start with the notation used in this chapter. 1 An exception is when those apparatus are made transparent; then the cost would be much higher.
22. 22. 8 Camera Calibration Chapter 2 2.2.1 Pinhole Camera Model CC θθ α β ),( 00 vu           = Z Y X M mm ),( tR Figure 2.1. Pinhole camera model A 2D point is denoted by m = [u, v]T . A 3D point is denoted by M = [X, Y, Z]T . We use x to denote the augmented vector by adding 1 as the last element: m = [u, v, 1]T and M = [X, Y, Z, 1]T . A camera is modeled by the usual pinhole (see Figure 2.1): The image of a 3D point M, denoted by m is formed by an optical ray from M passing through the optical center C and intersecting the image plane. The three points M, m, and C are collinear. In Figure 2.1, for illustration purpose, the image plane is positioned between the scene point and the optical center, which is mathematically equivalent to the physical setup under which the image plane is in the other side with respect to the optical center. The relationship between the 3D point M and its image projection m is given by sm = A R t P M ≡ PM , (2.1) with A =   α γ u0 0 β v0 0 0 1   (2.2) and P = A R t (2.3) where s is an arbitrary scale factor, (R, t), called the extrinsic parameters, is the rotation and translation which relates the world coordinate system to the camera coordinate system, and A is called the camera intrinsic matrix, with (u0, v0) the coordinates of the principal point, α and β the scale factors
23. 23. Section 2.2. Notation and Problem Statement 9 in image u and v axes, and γ the parameter describing the skew of the two image axes. The 3 × 4 matrix P is called the camera projection matrix, which mixes both intrinsic and extrinsic parameters. In Figure 2.1, the angle between the two image axes is denoted by θ, and we have γ = α cot θ. If the pixels are rectangular, then θ = 90◦ and γ = 0. The task of camera calibration is to determine the parameters of the transformation between an object in 3D space and the 2D image observed by the camera from visual information (images). The transformation includes – Extrinsic parameters (sometimes called external parameters): orienta- tion (rotation) and location (translation) of the camera, i.e., (R, t); – Intrinsic parameters (sometimes called internal parameters): charac- teristics of the camera, i.e., (α, β, γ, u0, v0). The rotation matrix, although consisting of 9 elements, only has 3 degrees of freedom. The translation vector t obviously has 3 parameters. Therefore, there are 6 extrinsic parameters and 5 intrinsic parameters, leading to in total 11 parameters. We use the abbreviation A−T for (A−1)T or (AT )−1. 2.2.2 Absolute Conic Now let us introduce the concept of the absolute conic. For more details, the reader is referred to [7, 15]. A point x in 3D space has projective coordinates x = [x1, x2, x3, x4]T . The equation of the plane at inﬁnity, Π∞, is x4 = 0. The absolute conic Ω is deﬁned by a set of points satisfying the equation x2 1 + x2 2 + x2 3 = 0 x4 = 0 . (2.4) Let x∞ = [x1, x2, x3]T be a point on the absolute conic (see Figure 2.2). By deﬁnition, we have xT ∞x∞ = 0. We also have x∞ = [x1, x2, x3, 0]T and xT ∞x∞ = 0. This can be interpreted as a conic of purely imaginary points on Π∞. Indeed, let x = x1/x3 and y = x2/x3 be a point on the conic, then x2 + y2 = −1, which is an imaginary circle of radius √ −1. An important property of the absolute conic is its invariance to any rigid transformation. Let the rigid transformation be H = R t 0 1 . Let x∞ be a point on Ω. By deﬁnition, its projective coordinates: x∞ = x∞ 0 with
24. 24. 10 Camera Calibration Chapter 2 Plane at infinity 01 =∞ −− ∞ mAAm TT C ∞m ∞x Absolute Conic 0=∞∞xxT Image of Absolute Conic Figure 2.2. Absolute conic and its image xT ∞x∞ = 0. The point after the rigid transformation is denoted by x∞, and x∞ = Hx∞ = Rx∞ 0 . Thus, x∞ is also on the plane at inﬁnity. Furthermore, x∞ is on the same Ω because x T ∞x∞ = (Rx∞)T (Rx∞) = xT ∞(RT R)x∞ = 0 . The image of the absolute conic, denoted by ω, is also an imaginary conic, and is determined only by the intrinsic parameters of the camera. This can be seen as follows. Consider the projection of a point x∞ on Ω, denoted by
25. 25. Section 2.3. Camera Calibration with 3D Objects 11 m∞, which is given by m∞ = sA[R t] x∞ 0 = sARx∞ . It follows that mT A−T A−1 m = s2 xT ∞RT Rx∞ = s2 xT ∞x∞ = 0 . Therefore, the image of the absolute conic is an imaginary conic, and is deﬁned by A−T A−1. It does not depend on the extrinsic parameters of the camera. If we can determine the image of the absolute conic, then we can solve the camera’s intrinsic parameters, and the calibration is solved. We will show several ways in this chapter how to determine ω, the image of the absolute conic. 2.3 Camera Calibration with 3D Objects The traditional way to calibrate a camera is to use a 3D reference object such as those shown in Figure 2.3. In Fig. 2.3a, the calibration apparatus used at INRIA [8] is shown, which consists of two orthogonal planes, on each a checker pattern is printed. A 3D coordinate system is attached to this apparatus, and the coordinates of the checker corners are known very accurately in this coordinate system. A similar calibration apparatus is a cube with a checker patterns painted in each face, so in general three faces will be visible to the camera. Figure 2.3b illustrates the device used in Tsai’s technique [33], which only uses one plane with checker pattern, but the plane needs to be displaced at least once with known motion. This is equivalent to knowing the 3D coordinates of the checker corners. A popular technique in this category consists of four steps [8]: 1. Detect the corners of the checker pattern in each image; 2. Estimate the camera projection matrix P using linear least squares; 3. Recover intrinsic and extrinsic parameters A, R and t from P; 4. Reﬁne A, R and t through a nonlinear optimization. Note that it is also possible to ﬁrst reﬁne P through a nonlinear optimization, and then determine A, R and t from the reﬁned P. It is worth noting that using corners is not the only possibility. We can avoid corner detection by working directly in the image. In [25], calibration
26. 26. 12 Camera Calibration Chapter 2 Know n displacem ent Know n displacem ent (a) (b) Figure 2.3. 3D apparatus for calibrating cameras is realized by maximizing the gradients around a set of control points that deﬁne the calibration object. Figure 2.4 illustrates the control points used in that work. Figure 2.4. Control points used in a gradient-based calibration technique
27. 27. Section 2.3. Camera Calibration with 3D Objects 13 2.3.1 Feature Extraction If one uses a generic corner detector, such as Harris corner detector, to detect the corners in the check pattern image, the result is usually not good because the detector corners have poor accuracy (about one pixel). A better solution is to leverage the known pattern structure by ﬁrst estimating a line for each side of the square and then computing the corners by intersecting the ﬁtted lines. There are two common techniques to estimate the lines. The ﬁrst is to ﬁrst detect edges, and then ﬁt a line to the edges on each side of the square. The second technique is to directly ﬁt a line to each side of a square in the image such that the gradient on the line is maximized. One possibility is to represent the line by an elongated Gaussian, and estimate the parameters of the elongated Gaussian by maximizing the total gradient covered by the Gaussian. We should note that if the lens distortion is not severe, a better solution is to ﬁt just one single line to all the collinear sides. This will leads a much more accurate estimation of the position of the checker corners. 2.3.2 Linear Estimation of the Camera Projection Matrix Once we extract the corner points in the image, we can easily establish their correspondences with the points in the 3D space because of knowledge of the patterns. Based on the projection equation (2.1), we are now able to estimate the camera parameters. However, the problem is quite nonlinear if we try to estimate directly A, R and t. If, on the other hand, we estimate the camera projection matrix P, a linear solution is possible, as to be shown now. Given each 2D-3D correspondence mi = (ui, vi) ↔ Mi = (Xi, Yi, Zi), we can write down 2 equations based on (2.1): Xi Yi Zi 1 0 0 0 0 uiXi uiYi uiZi ui 0 0 0 0 Xi Yi Zi 1 viXi viYi viZi vi Gi p = 0 where p = [p11, p12, . . . , p34]T and 0 = [0, 0]T . For n point matches, we can stack all equations together: Gp = 0 with G = [GT 1 , . . . , GT n ]T Matrix G is a 2n × 12 matrix. The projection matrix can now be solved by min p Gp 2 subject to p = 1
28. 28. 14 Camera Calibration Chapter 2 The solution is the eigenvector of GT G associated with the smallest eigen- value. In the above, in order to avoid the trivial solution p = 0 and considering the fact that p is deﬁned up to a scale factor, we have set p = 1. Other normalizations are possible. In [1], p34 = 1, which, however, introduce a sin- gularity when the correct value of p34 is close to zero. In [10], the constraint p2 31 + p2 32 + p2 33 = 1 was used, which is singularity free. Anyway, the above linear technique minimizes an algebraic distance, and yields a biased estimation when data are noisy. We will present later an unbiased solution. 2.3.3 Recover Intrinsic and Extrinsic Parameters from P Once the camera projection matrix P is known, we can uniquely recover the intrinsic and extrinsic parameters of the camera. Let us denote the ﬁrst 3×3 submatrix of P by B and the last column of P by b, i.e., P ≡ [B b]. Since P = A[R t], we have B = AR (2.5) b = At (2.6) From (2.5), we have K ≡ BBT = AAT =        α2 + γ2 + u2 0 ku u0 v0 + c β kc u0 u0 v0 + c α kc α2 v + v2 0 kv v0 u0 v0 1        Because P is deﬁned up to a scale factor, the last element of K = BBT is usu- ally not equal to 1, so we have to normalize it such that K33(the last element) = 1. After that, we immediately obtain u0 = K13 (2.7) v0 = K23 (2.8) β = kv − v2 0 (2.9) γ = kc − u0 v0 β (2.10) α = ku − u2 0 − γ2 (2.11)
29. 29. Section 2.3. Camera Calibration with 3D Objects 15 The solution is unambiguous because: α > 0 and β > 0. Once the intrinsic parameters, or equivalently matrix A, are known, the extrinsic parameters can be determined from (2.5) and (2.6) as: R = A−1 B (2.12) t = A−1 b . (2.13) 2.3.4 Reﬁne Calibration Parameters Through a Nonlinear Opti- mization The above solution is obtained through minimizing an algebraic distance which is not physically meaningful. We can reﬁne it through maximum likelihood inference. We are given n 2D-3D correspondences mi = (ui, vi) ↔ Mi = (Xi, Yi, Zi). Assume that the image points are corrupted by independent and identically distributed noise. The maximum likelihood estimate can be obtained by minimizing the distances between the image points and their predicted po- sitions, i.e., min P i mi − φ(P, Mi) 2 (2.14) where φ(P, Mi) is the projection of Mi onto the image according to (2.1). This is a nonlinear minimization problem, which can be solved with the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. It requires an initial guess of P which can be obtained using the linear technique de- scribed earlier. Note that since P is deﬁned up to a scale factor, we can set the element having the largest initial value as 1 during the minimization. Alternatively, instead of estimating P as in (2.14), we can directly esti- mate the intrinsic and extrinsic parameters, A, R, and t, using the same criterion. The rotation matrix can be parameterized with three variables such as Euler angles or scaled rotation vector. 2.3.5 Lens Distortion Up to this point, we use the pinhole model to describe a camera. It says that the point in 3D space, its corresponding point in image and the camera’s optical center are collinear. This linear projective equation is sometimes not suﬃcient, especially for low-end cameras (such as WebCams) or wide-angle cameras; lens distortion has to be considered. According to [33], there are four steps in camera projection including lens distortion:
30. 30. 16 Camera Calibration Chapter 2 Step 1: Rigid transformation from world coordinate system (Xw, Yw, Zw) to camera one (X, Y, Z): [X Y Z]T = R [Xw Yw Zw]T + t Step 2: Perspective projection from 3D camera coordinates (X, Y, Z) to ideal image coordinates (x, y) under pinhole camera model: x = f X Z , y = f Y Z where f is the eﬀective focal length. Step 3: Lens distortion2: ˘x = x + δx , ˘y = y + δy where (˘x, ˘y) are the distorted or true image coordinates, and (δx, δy) are distortions applied to (x, y). Step 4: Aﬃne transformation from real image coordinates (˘x, ˘y) to frame buﬀer (pixel) image coordinates (˘u, ˘v): ˘u = d−1 x ˘x + u0 , ˘v = d−1 y ˘y + v0 , where (u0, v0) are coordinates of the principal point; dx and dy are dis- tances between adjacent pixels in the horizontal and vertical directions, respectively. There are two types of distortions: Radial distortion: It is symmetric; ideal image points are distorted along radial directions from the distortion center. This is caused by imperfect lens shape. Decentering distortion: This is usually caused by improper lens assem- bly; ideal image points are distorted in both radial and tangential di- rections. The reader is referred to [29, 3, 6, 37] for more details. 2 Note that the lens distortion described here is diﬀerent from Tsai’s treatment. Here, we go from ideal to real image coordinates, similar to [36].
31. 31. Section 2.3. Camera Calibration with 3D Objects 17 The distortion can be expressed as power series in radial distance r = x2 + y2: δx = x(k1r2 + k2r4 + k3r6 + · · · ) + [p1(r2 + 2x2 ) + 2p2xy](1 + p3r2 + · · · ) , δy = y(k1r2 + k2r4 + k3r6 + · · · ) + [2p1xy + p2(r2 + 2y2 )](1 + p3r2 + · · · ) , where ki’s are coeﬃcients of radial distortion and pj’s and coeﬃcients of decentering distortion. Based on the reports in the literature [3, 33, 36], it is likely that the distortion function is totally dominated by the radial components, and es- pecially dominated by the ﬁrst term. It has also been found that any more elaborated modeling not only would not help (negligible when compared with sensor quantization), but also would cause numerical instability [33, 36]. Denote the ideal pixel image coordinates by u = x/dx, and v = y/dy. By combining Step 3 and Step 4 and if only using the ﬁrst two radial distortion terms, we obtain the following relationship between (˘u, ˘v) and (u, v): ˘u = u + (u − u0)[k1(x2 + y2 ) + k2(x2 + y2 )2 ] (2.15) ˘v = v + (v − v0)[k1(x2 + y2 ) + k2(x2 + y2 )2 ] . (2.16) Following the same reasoning as in (2.14), camera calibration including lens distortion can be performed by minimizing the distances between the image points and their predicted positions, i.e., min A,R,t,k1,k2 i mi − ˘m(A, R, t, k1, k2, Mi) 2 (2.17) where ˘m(A, R, t, k1, k2, Mi) is the projection of Mi onto the image according to (2.1), followed by distortion according to (2.15) and (2.16). 2.3.6 An Example Figure 2.5 displays an image of a 3D reference object, taken by a camera to be calibrated at INRIA. Each square has 4 corners, and there are in total 128 points used for calibration. Without considering lens distortion, the estimated camera projection ma- trix is P =   7.025659e−01 −2.861189e−02 −5.377696e−01 6.241890e+01 2.077632e−01 1.265804e+00 1.591456e−01 1.075646e+01 4.634764e−04 −5.282382e−05 4.255347e−04 1   From P, we can calculate the intrinsic parameters: α = 1380.12, β = 2032.57, γ ≈ 0, u0 = 246.52, and v0 = 243.68. So, the angle between the two
32. 32. 18 Camera Calibration Chapter 2 Figure 2.5. An example of camera calibration with a 3D apparatus image axes is 90◦, and the aspect ratio of the pixels is α/β = 0.679. For the extrinsic parameters, the translation vector t = [−211.28, −106.06, 1583.75]T (in mm), i.e., the calibration object is about 1.5m away from the camera; the rotation axis is [−0.08573, −0.99438, 0.0621]T (i.e., almost vertical), and the rotation angle is 47.7◦. Other notable work in this category include [27, 38, 36, 18]. 2.4 Camera Calibration with 2D Objects: Plane-based Tech- nique In this section, we describe how a camera can be calibrated using a moving plane. We ﬁrst examine the constraints on the camera’s intrinsic parameters provided by observing a single plane. 2.4.1 Homography between the model plane and its image Without loss of generality, we assume the model plane is on Z = 0 of the world coordinate system. Let’s denote the ith column of the rotation matrix
33. 33. Section 2.4. Camera Calibration with 2D Objects: Plane Based Technique 19 R by ri. From (2.1), we have s   u v 1   = A r1 r2 r3 t     X Y 0 1     = A r1 r2 t   X Y 1   . By abuse of notation, we still use M to denote a point on the model plane, but M = [X, Y ]T since Z is always equal to 0. In turn, M = [X, Y, 1]T . Therefore, a model point M and its image m is related by a homography H: sm = HM with H = A r1 r2 t . (2.18) As is clear, the 3 × 3 matrix H is deﬁned up to a scale factor. 2.4.2 Constraints on the intrinsic parameters Given an image of the model plane, an homography can be estimated (see Appendix). Let’s denote it by H = [h1 h2 h3]. From (2.18), we have [h1 h2 h3] = λA [ r1 r2 t ] , where λ is an arbitrary scalar. Using the knowledge that r1 and r2 are orthonormal, we have hT 1 A−T A−1 h2 = 0 (2.19) hT 1 A−T A−1 h1 = hT 2 A−T A−1 h2 . (2.20) These are the two basic constraints on the intrinsic parameters, given one homography. Because a homography has 8 degrees of freedom and there are 6 extrinsic parameters (3 for rotation and 3 for translation), we can only obtain 2 constraints on the intrinsic parameters. Note that A−T A−1 actually describes the image of the absolute conic [20]. In the next subsection, we will give an geometric interpretation. 2.4.3 Geometric Interpretation We are now relating (2.19) and (2.20) to the absolute conic [22, 20]. It is not diﬃcult to verify that the model plane, under our convention, is described in the camera coordinate system by the following equation: r3 rT 3 t T     x y z w     = 0 ,
34. 34. 20 Camera Calibration Chapter 2 where w = 0 for points at inﬁnity and w = 1 otherwise. This plane intersects the plane at inﬁnity at a line, and we can easily see that r1 0 and r2 0 are two particular points on that line. Any point on it is a linear combination of these two points, i.e., x∞ = a r1 0 + b r2 0 = ar1 + br2 0 . Now, let’s compute the intersection of the above line with the absolute conic. By deﬁnition, the point x∞, known as the circular point [26], satisﬁes: xT ∞x∞ = 0, i.e., (ar1 +br2)T (ar1 +br2) = 0, or a2 +b2 = 0 . The solution is b = ±ai, where i2 = −1. That is, the two intersection points are x∞ = a r1 ± ir2 0 . The signiﬁcance of this pair of complex conjugate points lies in the fact that they are invariant to Euclidean transformations. Their projection in the image plane is given, up to a scale factor, by m∞ = A(r1 ± ir2) = h1 ± ih2 . Point m∞ is on the image of the absolute conic, described by A−T A−1 [20]. This gives (h1 ± ih2)T A−T A−1 (h1 ± ih2) = 0 . Requiring that both real and imaginary parts be zero yields (2.19) and (2.20). 2.4.4 Closed-form solution We now provide the details on how to eﬀectively solve the camera calibration problem. We start with an analytical solution. This initial estimation will be followed by a nonlinear optimization technique based on the maximum likelihood criterion, to be described in the next subsection. Let B = A−T A−1 ≡   B11 B12 B13 B12 B22 B23 B13 B23 B33   (2.21) =     1 α2 − γ α2β v0γ−u0β α2β − γ α2β γ2 α2β2 + 1 β2 −γ(v0γ−u0β) α2β2 − v0 β2 v0γ−u0β α2β −γ(v0γ−u0β) α2β2 − v0 β2 (v0γ−u0β)2 α2β2 + v2 0 β2 +1     . (2.22)
35. 35. Section 2.4. Camera Calibration with 2D Objects: Plane Based Technique 21 Note that B is symmetric, deﬁned by a 6D vector b = [B11, B12, B22, B13, B23, B33]T . (2.23) Let the ith column vector of H be hi = [hi1, hi2, hi3]T . Then, we have hT i Bhj = vT ijb (2.24) with vij = [hi1hj1, hi1hj2+hi2hj1, hi2hj2, hi3hj1+hi1hj3, hi3hj2+hi2hj3, hi3hj3]T . Therefore, the two fundamental constraints (2.19) and (2.20), from a given homography, can be rewritten as 2 homogeneous equations in b: vT 12 (v11 − v22)T b = 0 . (2.25) If n images of the model plane are observed, by stacking n such equations as (2.25) we have Vb = 0 , (2.26) where V is a 2n×6 matrix. If n ≥ 3, we will have in general a unique solution b deﬁned up to a scale factor. If n = 2, we can impose the skewless constraint γ = 0, i.e., [0, 1, 0, 0, 0, 0]b = 0, which is added as an additional equation to (2.26). (If n = 1, we can only solve two camera intrinsic parameters, e.g., α and β, assuming u0 and v0 are known (e.g., at the image center) and γ = 0, and that is indeed what we did in [28] for head pose determination based on the fact that eyes and mouth are reasonably coplanar. In fact, Tsai [33] already mentions that focal length from one plane is possible, but incorrectly says that aspect ratio is not.) The solution to (2.26) is well known as the eigenvector of VT V associated with the smallest eigenvalue (equivalently, the right singular vector of V associated with the smallest singular value). Once b is estimated, we can compute all camera intrinsic parameters as follows. The matrix B, as described in Sect. 2.4.4, is estimated up to a scale factor, i.e.,, B = λA−T A with λ an arbitrary scale. Without diﬃculty, we can uniquely extract the intrinsic parameters from matrix B. v0 = (B12B13 − B11B23)/(B11B22 − B2 12) λ = B33 − [B2 13 + v0(B12B13 − B11B23)]/B11 α = λ/B11 β = λB11/(B11B22 − B2 12) γ = −B12α2 β/λ u0 = γv0/α − B13α2 /λ .
36. 36. 22 Camera Calibration Chapter 2 Once A is known, the extrinsic parameters for each image is readily computed. From (2.18), we have r1 = λA−1 h1 , r2 = λA−1 h2 , r3 = r1 × r2 , t = λA−1 h3 with λ = 1/ A−1h1 = 1/ A−1h2 . Of course, because of noise in data, the so-computed matrix R = [r1, r2, r3] does not in general satisfy the properties of a rotation matrix. The best rotation matrix can then be obtained through for example singular value decomposition [13, 41]. 2.4.5 Maximum likelihood estimation The above solution is obtained through minimizing an algebraic distance which is not physically meaningful. We can reﬁne it through maximum likelihood inference. We are given n images of a model plane and there are m points on the model plane. Assume that the image points are corrupted by independent and identically distributed noise. The maximum likelihood estimate can be obtained by minimizing the following functional: n i=1 m j=1 mij − m(A, Ri, ti, Mj) 2 , (2.27) where m(A, Ri, ti, Mj) is the projection of point Mj in image i, according to equation (2.18). A rotation R is parameterized by a vector of 3 parameters, denoted by r, which is parallel to the rotation axis and whose magnitude is equal to the rotation angle. R and r are related by the Rodrigues formula [8]. Minimizing (2.27) is a nonlinear minimization problem, which is solved with the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. It requires an initial guess of A, {Ri, ti|i = 1..n} which can be obtained using the technique described in the previous subsection. Desktop cameras usually have visible lens distortion, especially the ra- dial components. We have included these while minimizing (2.27). See my technical report [41] for more details. 2.4.6 Dealing with radial distortion Up to now, we have not considered lens distortion of a camera. However, a desktop camera usually exhibits signiﬁcant lens distortion, especially radial distortion. The reader is referred to Section 2.3.5 for distortion modeling. In this section, we only consider the ﬁrst two terms of radial distortion.
37. 37. Section 2.4. Camera Calibration with 2D Objects: Plane Based Technique 23 Estimating Radial Distortion by Alternation. As the radial distortion is ex- pected to be small, one would expect to estimate the other ﬁve intrinsic parameters, using the technique described in Sect. 2.4.5, reasonable well by simply ignoring distortion. One strategy is then to estimate k1 and k2 after having estimated the other parameters, which will give us the ideal pixel coordinates (u, v). Then, from (2.15) and (2.16), we have two equations for each point in each image: (u−u0)(x2+y2) (u−u0)(x2+y2)2 (v−v0)(x2+y2) (v−v0)(x2+y2)2 k1 k2 = ˘u−u ˘v−v . Given m points in n images, we can stack all equations together to obtain in total 2mn equations, or in matrix form as Dk = d, where k = [k1, k2]T . The linear least-squares solution is given by k = (DT D)−1 DT d . (2.28) Once k1 and k2 are estimated, one can reﬁne the estimate of the other param- eters by solving (2.27) with m(A, Ri, ti, Mj) replaced by (2.15) and (2.16). We can alternate these two procedures until convergence. Complete Maximum Likelihood Estimation. Experimentally, we found the convergence of the above alternation technique is slow. A natural extension to (2.27) is then to estimate the complete set of parameters by minimizing the following functional: n i=1 m j=1 mij − ˘m(A, k1, k2, Ri, ti, Mj) 2 , (2.29) where ˘m(A, k1, k2, Ri, ti, Mj) is the projection of point Mj in image i ac- cording to equation (2.18), followed by distortion according to (2.15) and (2.16). This is a nonlinear minimization problem, which is solved with the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. A rota- tion is again parameterized by a 3-vector r, as in Sect. 2.4.5. An initial guess of A and {Ri, ti|i = 1..n} can be obtained using the technique described in Sect. 2.4.4 or in Sect. 2.4.5. An initial guess of k1 and k2 can be obtained with the technique described in the last paragraph, or simply by setting them to 0. 2.4.7 Summary The recommended calibration procedure is as follows:
38. 38. 24 Camera Calibration Chapter 2 1. Print a pattern and attach it to a planar surface; 2. Take a few images of the model plane under diﬀerent orientations by moving either the plane or the camera; 3. Detect the feature points in the images; 4. Estimate the ﬁve intrinsic parameters and all the extrinsic parameters using the closed-form solution as described in Sect. 2.4.4; 5. Estimate the coeﬃcients of the radial distortion by solving the linear least-squares (2.28); 6. Reﬁne all parameters, including lens distortion parameters, by mini- mizing (2.29). There is a degenerate conﬁguration in my technique when planes are parallel to each other. See my technical report [41] for a more detailed description. In summary, this technique only requires the camera to observe a planar pattern from a few diﬀerent orientations. Although the minimum number of orientations is two if pixels are square, we recommend 4 or 5 diﬀerent orientations for better quality. We can move either the camera or the planar pattern. The motion does not need to be known, but should not be a pure translation. When the number of orientations is only 2, one should avoid positioning the planar pattern parallel to the image plane. The pattern could be anything, as long as we know the metric on the plane. For example, we can print a pattern with a laser printer and attach the paper to a reasonable planar surface such as a hard book cover. We can even use a book with known size because the four corners are enough to estimate the plane homographies. 2.4.8 Experimental Results The proposed algorithm has been tested on both computer simulated data and real data. The closed-form solution involves ﬁnding a singular value decomposition of a small 2n × 6 matrix, where n is the number of images. The nonlinear reﬁnement within the Levenberg-Marquardt algorithm takes 3 to 5 iterations to converge. Due to space limitation, we describe in this section one set of experiments with real data when the calibration pattern is at diﬀerent distances from the camera. The reader is referred to [41] for more experimental results with both computer simulated and real data, and to the following Web page:
39. 39. Section 2.4. Camera Calibration with 2D Objects: Plane Based Technique 25 (A) (B) Figure 2.6. Two sets of images taken at diﬀerent distances to the calibration pattern. Each set contains ﬁve images. On the left, three images from the set taken at a close distance are shown. On the right, three images from the set taken at a larger distance are shown.
40. 40. 26 Camera Calibration Chapter 2 Table 2.1. Calibration results with the images shown in Figure 2.6 image set α β ϑ u0 v0 k1 k2 A 834.01 839.86 89.95◦ 305.51 240.09 -0.2235 0.3761 B 836.17 841.08 89.92◦ 301.76 241.51 -0.2676 1.3121 A+B 834.64 840.32 89.94◦ 304.77 240.59 -0.2214 0.3643 http://research.microsoft.com/˜zhang/Calib/ for some experimental data and the software. The example is shown in Fig. 2.6. The camera to be calibrated is an oﬀ-the-shelf PULNiX CCD camera with 6 mm lens. The image resolution is 640×480. As can be seen in Fig. 2.6, the model plane contains a 9 × 9 squares with 9 special dots which are used to identify automatically the cor- respondence between reference points on the model plane and square corners in images. It was printed on a A4 paper with a 600 DPI laser printer, and attached to a cardboard. In total 10 images of the plane were taken (6 of them are shown in Fig. 2.6). Five of them (called Set A) were taken at close range, while the other ﬁve (called Set B) were taken at a larger distance. We applied our calibration algorithm to Set A, Set B, and also to the whole set (called Set A+B). The results are shown in Table 2.1. For intuitive understanding, we show the estimated angle between the image axes, ϑ, instead of the skew factor γ. We can see that the angle ϑ is very close to 90◦, as expected with almost all modern CCD cameras. The cameras parameters were estimated consistently for all three sets of images, except the distortion parameters with Set B. The reason is that the calibration pattern only occupies the central part of the image in Set B, where lens distortion is not signiﬁcant and therefore cannot be estimated reliably. 2.4.9 Related Work Almost at the same time, Sturm and Maybank [31], independent from us, developed the same technique. They assumed the pixels are square (i.e., γ = 0) and have studied the degenerate conﬁgurations for plane-based camera calibration. Gurdjos et al. [14] have re-derived the plane-based calibration technique from the center line constraint. My original implementation (only the executable) is available at
41. 41. Section 2.5. Solving Camera Calibration With 1D Objects 27 http://research.microsoft.com/˜zhang/calib/. Bouguet has re-implemented my technique in Matlab, which is available at http://www.vision.caltech.edu/bouguetj/calib doc/. In many applications such as stereo, multiple cameras need to be cali- brated simultaneously in order to determine the relative geometry between cameras. In 2000, I have extended (not published) this plane-based technique to stereo calibration for my stereo-based gaze-correction project [40, 39]. The formulation is similar to (2.29). Consider two cameras, and denote the quantity related to the second camera by . Let (Rs, ts) be the rigid trans- formation between the two cameras such that (R , t ) = (R, t) ◦ (Rs, ts) or more precisely: R = RRs and t = Rts + t. Stereo calibration is then to solve A, A , k1, k2, k1, k2, {(Ri, ti)|i = 1, . . . , n}, and (Rs, ts) by minimizing the following functional: n i=1 m j=1 δij mij − ˘m(A, k1, k2, Ri, ti, Mj) 2 + δij mij − ˘m(A , k1, k2, Ri, ti, Mj) 2 (2.30) subject to Ri = RiRs and ti = Rits + ti . In the above formulation, δij = 1 if point j is visible in the ﬁrst camera, and δij = 0 otherwise. Similarly, δij = 1 if point j is visible in the second camera. This formulation thus does not require the same number of feature points to be visible over time or across cameras. Another advantage of this formulation is that the number of extrinsic parameters to be estimated has been reduced from 12n if the two cameras are calibrated independently to 6n + 6. This is a reduction of 24 dimensions in parameter space if 5 planes are used. Obviously, this is a nonlinear optimization problem. To obtain the ini- tial guess, we run ﬁrst single-camera calibration independently for each cam- era, and compute Rs through SVD from Ri = RiRs (i = 1, . . . , n) and ts through least-squares from ti = Rits + ti (i = 1, . . . , n). Recently, a closed- form initialization technique through factorization of homography matrices is proposed in [34]. 2.5 Solving Camera Calibration With 1D Objects In this section, we describe in detail how to solve the camera calibration problem from a number of observations of a 1D object consisting of 3 collinear points moving around one of them [43, 44]. We only consider this minimal
42. 42. 28 Camera Calibration Chapter 2 conﬁguration, but it is straightforward to extend the result if a calibration object has four or more collinear points. 2.5.1 Setups With Free-Moving 1D Calibration Objects We now examine possible setups with 1D objects for camera calibration. As already mentioned in the introduction, we need to have several observa- tions of the 1D objects. Without loss of generality, we choose the camera coordinate system to deﬁne the 1D objects; therefore, R = I and t = 0 in (2.1). Two points with known distance. This could be the two endpoints of a stick, and we take a number of images while waving freely the stick. Let A and B be the two 3D points, and a and b be the observed image points. Because the distance between A and B is known, we only need 5 parameters to deﬁne A and B. For example, we need 3 parameters to specify the coordinates of A in the camera coordinate system, and 2 parameters to deﬁne the orientation of the line AB. On the other hand, each image point provides two equations according to (2.1), giving in total 4 equations. Given N observations of the stick, we have 5 intrinsic parameters and 5N parameters for the point positions to estimate, i.e., the total number of unknowns is 5+5N. However, we only have 4N equations. Camera calibration is thus impossible. Three collinear points with known distances. By adding an additional point, say C, the number of unknowns for the point positions still remains the same, i.e., 5 + 5N, because of known distances of C to A and B. For each observation, we have three image points, yielding in total 6N equations. Calibration seems to be plausible, but is in fact not. This is because the three image points for each observation must be collinear. Collinearity is preserved by perspective projection. We therefore only have 5 independent equations for each observation. The total number of independent equations, 5N, is always smaller than the number of unknowns. Camera calibration is still impossible. Four or more collinear points with known distances. As seen above, when the number of points increases from two to three, the number of independent equations (constraints) increases by one for each observation. If we have a fourth point, will we have in total 6N independent equations? If so, we would be able to solve the problem because the number of unknowns remains the same, i.e., 5 + 5N, and we would have more than enough constraints if N ≥ 5. The reality is that the addition of the fourth point or even more points does not increase the number of independent equations. It will always
43. 43. Section 2.5. Solving Camera Calibration With 1D Objects 29 be 5N for any four or more collinear points. This is because the cross ratio is preserved under perspective projection. With known cross ratios and three collinear points, whether they are in space or in images, other points are determined exactly. 2.5.2 Setups With 1D Calibration Objects Moving Around a ﬁxed Point From the above discussion, calibration is impossible with a free moving 1D calibration object, no matter how many points on the object. Now let us examine what happens if one point is ﬁxed. In the sequel, without loss of generality, point A is the ﬁxed point, and a is the corresponding image point. We need 3 parameters, which are unknown, to specify the coordinates of A in the camera coordinate system, while image point a provides two scalar equations according to (2.1). Two points with known distance. They could be the endpoints of a stick, and we move the stick around the endpoint that is ﬁxed. Let B be the free endpoint and b, its corresponding image point. For each observation, we need 2 parameters to deﬁne the orientation of the line AB and therefore the position of B because the distance between A and B is known. Given N observations of the stick, we have 5 intrinsic parameters, 3 parameters for A and 2N parameters for the free endpoint positions to estimate, i.e., the total number of unknowns is 8 + 2N. However, each observation of b provides two equations, so together with a we only have in total 2 + 2N equations. Camera calibration is thus impossible. Three collinear points with known distances. As already explained in the last subsection, by adding an additional point, say C, the number of unknowns for the point positions still remains the same, i.e., 8+2N. For each observa- tion, b provides two equations, but c only provides one additional equation because of the collinearity of a, b and c. Thus, the total number of equations is 2 + 3N for N observations. By counting the numbers, we see that if we have 6 or more observations, we should be able to solve camera calibration, and this is the case as we shall show in the next section. Four or more collinear points with known distances. Again, as already ex- plained in the last subsection, The number of unknowns and the number of independent equations remain the same because of invariance of cross-ratios. This said, the more collinear points we have, the more accurate camera cal- ibration will be in practice because data redundancy can combat the noise in image data.
44. 44. 30 Camera Calibration Chapter 2 2.5.3 Basic Equations Figure 2.7. Illustration of 1D calibration objects Refer to Figure 2.7. Point A is the ﬁxed point in space, and the stick AB moves around A. The length of the stick AB is known to be L, i.e., B − A = L . (2.31) The position of point C is also known with respect to A and B, and therefore C = λAA + λBB , (2.32) where λA and λB are known. If C is the midpoint of AB, then λA = λB = 0.5. Points a, b and c on the image plane are projection of space points A, B and C, respectively. Without loss of generality, we choose the camera coordinate system to deﬁne the 1D objects; therefore, R = I and t = 0 in (2.1). Let the unknown depths for A, B and C be zA, zB and zC, respectively. According to (2.1), we
45. 45. Section 2.5. Solving Camera Calibration With 1D Objects 31 have A = zAA−1 a (2.33) B = zBA−1 b (2.34) C = zCA−1 c . (2.35) Substituting them into (2.32) yields zCc = zAλAa + zBλBb (2.36) after eliminating A−1 from both sides. By performing cross-product on both sides of the above equation with c, we have zAλA(a × c) + zBλB(b × c) = 0 . In turn, we obtain zB = −zA λA(a × c) · (b × c) λB(b × c) · (b × c) . (2.37) From (2.31), we have A−1 (zBb − zAa) = L . Substituting zB by (2.37) gives zA A−1 a + λA(a × c) · (b × c) λB(b × c) · (b × c) b = L . This is equivalent to z2 AhT A−T A−1 h = L2 (2.38) with h = a + λA(a × c) · (b × c) λB(b × c) · (b × c) b . (2.39) Equation (2.38) contains the unknown intrinsic parameters A and the un- known depth, zA, of the ﬁxed point A. It is the basic constraint for camera calibration with 1D objects. Vector h, given by (2.39), can be computed from image points and known λA and λB. Since the total number of unknowns is 6, we need at least six observations of the 1D object for calibration. Note that A−T A actually describes the image of the absolute conic [20].
46. 46. 32 Camera Calibration Chapter 2 2.5.4 Closed-Form Solution Let B = A−T A−1 ≡   B11 B12 B13 B12 B22 B23 B13 B23 B33   (2.40) =     1 α2 − γ α2β v0γ−u0β α2β − γ α2β γ2 α2β2 + 1 β2 −γ(v0γ−u0β) α2β2 − v0 β2 v0γ−u0β α2β −γ(v0γ−u0β) α2β2 − v0 β2 (v0γ−u0β)2 α2β2 + v2 0 β2 +1     . (2.41) Note that B is symmetric, and can be deﬁned by a 6D vector b = [B11, B12, B22, B13, B23, B33]T . (2.42) Let h = [h1, h2, h3]T , and x = z2 Ab, then equation (2.38) becomes vT x = L2 (2.43) with v = [h2 1, 2h1h2, h2 2, 2h1h3, 2h2h3, h2 3]T . When N images of the 1D object are observed, by stacking n such equations as (2.43) we have Vx = L2 1 , (2.44) where V = [v1, . . . , vN ]T and 1 = [1, . . . , 1]T . The least-squares solution is then given by x = L2 (VT V)−1 VT 1 . (2.45) Once x is estimated, we can compute all the unknowns based on x = z2 Ab. Let x = [x1, x2, . . . , x6]T . Without diﬃculty, we can uniquely extract the intrinsic parameters and the depth zA as v0 = (x2x4 − x1x5)/(x1x3 − x2 2) zA = x6 − [x2 4 + v0(x2x4 − x1x5)]/x1 α = zA/x1 β = zAx1/(x1x3 − x2 2) γ = −x2α2 β/zA u0 = γv0/α − x4α2 /zA . At this point, we can compute zB according to (2.37), so points A and B can be computed from (2.33) and (2.34), while point C can be computed according to (2.32).
47. 47. Section 2.5. Solving Camera Calibration With 1D Objects 33 2.5.5 Nonlinear Optimization The above solution is obtained through minimizing an algebraic distance which is not physically meaningful. We can reﬁne it through maximum likelihood inference. We are given N images of the 1D calibration object and there are 3 points on the object. Point A is ﬁxed, and points B and C moves around A. Assume that the image points are corrupted by independent and identically distributed noise. The maximum likelihood estimate can be obtained by minimizing the following functional: N i=1 ai − φ(A, A) 2 + bi − φ(A, Bi) 2 + ci − φ(A, Ci) 2 , (2.46) where φ(A, M) (M ∈ {A, Bi, Ci}) is the projection of point M onto the image, according to equations (2.33) to (2.35). More precisely, φ(A, M) = 1 zM AM, where zM is the z-component of M. The unknowns to be estimated are: – 5 camera intrinsic parameters α, β, γ, u0 and v0 that deﬁne matrix A; – 3 parameters for the coordinates of the ﬁxed point A; – 2N additional parameters to deﬁne points Bi and Ci at each instant (see below for more details). Therefore, we have in total 8 + 2N unknowns. Regarding the parameteri- zation for B and C, we use the spherical coordinates φ and θ to deﬁne the direction of the 1D calibration object, and point B is then given by B = A + L   sin θ cos φ sin θ sin φ cos θ   where L is the known distance between A and B. In turn, point C is computed according to (2.32). We therefore only need 2 additional parameters for each observation. Minimizing (2.46) is a nonlinear minimization problem, which is solved with the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. It requires an initial guess of A, A, {Bi, Ci|i = 1..N} which can be obtained using the technique described in the last subsection.
48. 48. 34 Camera Calibration Chapter 2 2.5.6 Estimating the ﬁxed point In the above discussion, we assumed that the image coordinates, a, of the ﬁxed point A are known. We now describe how to estimate a by considering whether the ﬁxed point A is visible in the image or not. Invisible ﬁxed point. The ﬁxed point does not need to be visible in the image. And the camera calibration technique becomes more versatile without the visibility requirement. In that case, we can for example hang a string of small balls from the ceiling, and calibrate multiple cameras in the room by swinging the string. The ﬁxed point can be estimated by intersecting lines from diﬀerent images as described below. Each observation of the 1D object deﬁnes an image line. An image line can be represented by a 3D vector l = [l1, l2, l3]T , deﬁned up to a scale factor such as a point m = [u, v]T on the line satisﬁes lT m = 0. In the sequel, we also use (n, q) to denote line l, where n = [l1, l2]T and q = l3. To remove the scale ambiguity, we normalize l such that l = 1. Furthermore, each l is associated with an uncertainty measure represented by a 3 × 3 covariance matrix Λ. Given N images of the 1D object, we have N lines: {(li, Λi)|i = 1, . . . , N}. Let the ﬁxed point be a in the image. Obviously, if there is no noise, we have lT i a = 0, or nT i a + qi = 0. Therefore, we can estimate a by minimizing F = N i=1 wi lT i a 2 = N i=1 wi nT i a + qi 2 = N i=1 wi(aT ninT i a + 2qinT i a + q2 i ) (2.47) where wi is a weighting factor (see below). By setting the derivative of F with respect to a to 0, we obtain the solution, which is given by a = − N i=1 wininT i −1 N i=1 wiqini . The optimal weighting factor wi in (2.47) is the inverse of the variance of lT i a, which is wi = 1/(aT Λia). Note that the weight wi involves the unknown a. To overcome this diﬃculty, we can approximate wi by 1/ trace(Λi) for the ﬁrst iteration, and by re-computing wi with the previously estimated a in the subsequent iterations. Usually two or three iterations are enough. Visible ﬁxed point. Since the ﬁxed point is visible, we have N observations: {ai|i = 1, . . . , N}. We can therefore estimate a by minimizing N i=1 a−ai 2, assuming that the image points are detected with the same accuracy. The solution is simply a = ( N i=1 ai)/N.
49. 49. Section 2.5. Solving Camera Calibration With 1D Objects 35 The above estimation does not make use of the fact that the ﬁxed point is also the intersection of the N observed lines of the 1D object. Therefore, a better technique to estimate a is to minimize the following function: F = N i=1 (a−ai)T V−1 i (a−ai)+wi lT i a 2 = N i=1 (a−ai)T V−1 i (a−ai)+wi nT i a+qi 2 (2.48) where Vi is the covariance matrix of the detected point ai. The derivative of the above function with respect to a is given by ∂F ∂a = 2 N i=1 V−1 i (a − ai) + wininT i a + wiqini . Setting it to 0 yields a = N i=1 (V−1 i + wininT i ) −1 N i=1 (V−1 i ai − wiqini) . If more than three points are visible in each image, the known cross ratio provides an additional constraint in determining the ﬁxed point. For an accessible description of uncertainty manipulation, the reader is referred to [45, Chapter 2]. 2.5.7 Experimental Results The proposed algorithm has been tested on both computer simulated data and real data. Computer Simulations The simulated camera has the following property: α = 1000, β = 1000, γ = 0, u0 = 320, and v0 = 240. The image resolution is 640×480. A stick of 70 cm is simulated with the ﬁxed point A at [0, 35, 150]T . The other endpoint of the stick is B, and C is located at the half way between A and B. We have generated 100 random orientations of the stick by sampling θ in [π/6, 5π/6] and φ in [π, 2π] according to uniform distribution. Points A, B, and C are then projected onto the image. Gaussian noise with 0 mean and σ standard deviation is added to the projected image points a, b and c. The estimated camera parameters are compared with the ground truth, and we measure their relative errors with respect to the focal length α. Note that we measure the relative errors in
50. 50. 36 Camera Calibration Chapter 2 Table 2.2. Calibration results with real data. Solution α β γ u0 v0 Closed-form 889.49 818.59 -0.1651 (90.01◦) 297.47 234.33 Nonlinear 838.49 799.36 4.1921 (89.72◦) 286.74 219.89 Plane-based 828.92 813.33 -0.0903 (90.01◦) 305.23 235.17 Relative diﬀerence 1.15% 1.69% 0.52% (0.29◦) 2.23% 1.84% (u0, v0) with respect to α, as proposed by Triggs in [32]. He pointed out that the absolute errors in (u0, v0) is not geometrically meaningful, while computing the relative error is equivalent to measuring the angle between the true optical axis and the estimated one. We vary the noise level from 0.1 pixels to 1 pixel. For each noise level, we perform 120 independent trials, and the results shown in Fig. 2.8 are the average. Figure 2.8a displays the relative errors of the closed-form solution while Figure 2.8b displays those of the nonlinear minimization result. Errors increase almost linearly with the noise level. The nonlinear minimization reﬁnes the closed-form solution, and produces signiﬁcantly better result (with 50% less errors). At 1 pixel noise level, the errors for the closed-form solution are about 12%, while those for the nonlinear minimization are about 6%. Real Data For the experiment with real data, I used three toy beads from my kids and strung them together with a stick. The beads are approximately 14 cm apart (i.e., L = 28). I then moves the stick around while trying to ﬁx one end with the aid of a book. A video of 150 frames was recorded, and four sample images are shown in Fig. 2.9. A bead in the image is modeled as a Gaussian blob in the RGB space, and the centroid of each detected blob is the image point we use for camera calibration. The proposed algorithm is therefore applied to the 150 observations of the beads, and the estimated camera parameters are provided in Table 2.2. The ﬁrst row is the estimation from the closed-form solution, while the second row is the reﬁned result after nonlinear minimization. For the image skew parameter γ, we also provide the angle between the image axes in parenthesis (it should be very close to 90◦). For comparison, we also used the plane-based calibration technique de- scribed in [42] to calibrate the same camera. Five images of a planar pattern were taken, and one of them is shown in Fig. 2.10. The calibration result is shown in the third row of Table 2.2. The fourth row displays the relative
51. 51. Section 2.5. Solving Camera Calibration With 1D Objects 37 (a) Closed-form solution (b) Nonlinear optimization Figure 2.8. Calibration errors with respect to the noise level of the image points.
52. 52. 38 Camera Calibration Chapter 2 Frame 10 Frame 60 Frame 90 Frame 140 Figure 2.9. Sample images of a 1D object used for camera calibration. Figure 2.10. A sample image of the planar pattern used for camera calibration.
53. 53. Section 2.6. Self Calibration 39 diﬀerence between the plane-based result and the nonlinear solution with re- spect to the focal length (we use 828.92). As we can observe, the diﬀerence is about 2%. There are several sources contributing to this diﬀerence. Besides ob- viously the image noise and imprecision of the extracted data points, one source is our current rudimentary experimental setup: – The supposed-to-be ﬁxed point was not ﬁxed. It slipped around on the surface. – The positioning of the beads was done with a ruler using eye inspection. Considering all the factors, the proposed algorithm is very encouraging. 2.6 Self-Calibration Self-calibration is also called auto-calibration. Techniques in this category do not require any particular calibration object. They can be considered as 0D approach because only image point correspondences are required. Just by moving a camera in a static scene, the rigidity of the scene provides in general two constraints [22, 21, 20] on the cameras’ internal parameters from one camera displacement by using image information alone. Absolute conic, described in Section 2.2.2, is an essential concept in understanding these constraints. Therefore, if images are taken by the same camera with ﬁxed internal parameters, correspondences between three images are suﬃcient to recover both the internal and external parameters which allow us to recon- struct 3-D structure up to a similarity [20, 17]. Although no calibration objects are necessary, a large number of parameters need to be estimated, resulting in a much harder mathematical problem. We do not plan to go further into details of this approach because two recent books [15, 7] provide an excellent recount of those techniques. 2.7 Conclusion In this chapter, we have reviewed several camera calibration techniques. We have classiﬁed them into four categories, depending whether they use 3D apparatus, 2D objects (planes), 1D objects, or just the surrounding scenes (self-calibration). Recommendations on choosing which technique were given in the introduction section. The techniques described so far are mostly focused on a single-camera calibration. We touched a little bit on stereo calibration in Section 2.4.9.
54. 54. 40 Camera Calibration Chapter 2 Camera calibration is still an active research area because more and more applications use cameras. In [2], spheres are used to calibrate one or more cameras, which can be considered as a 2D approach since only the surface property is used. In [5], a technique is described to calibrate a camera net- work consisting of an omni-camera and a number of perspective cameras. In [24], a technique is proposed to calibrate a projector-screen-camera system. 2.8 Appendix: Estimating Homography Between the Model Plane and its Image There are many ways to estimate the homography between the model plane and its image. Here, we present a technique based on maximum likelihood criterion. Let Mi and mi be the model and image points, respectively. Ideally, they should satisfy (2.18). In practice, they don’t because of noise in the extracted image points. Let’s assume that mi is corrupted by Gaussian noise with mean 0 and covariance matrix Λmi . Then, the maximum likelihood estimation of H is obtained by minimizing the following functional i (mi − mi)T Λ−1 mi (mi − mi) , where mi = 1 ¯hT 3 Mi ¯hT 1 Mi ¯hT 2 Mi with ¯hi, the ith row of H. In practice, we simply assume Λmi = σ2I for all i. This is reasonable if points are extracted independently with the same procedure. In this case, the above problem becomes a nonlinear least-squares one, i.e., minH i mi − mi 2. The nonlinear minimization is conducted with the Levenberg-Marquardt Al- gorithm as implemented in Minpack [23]. This requires an initial guess, which can be obtained as follows. Let x = [¯hT 1 , ¯hT 2 , ¯hT 3 ]T . Then equation (2.18) can be rewritten as MT 0T −uMT 0T MT −vMT x = 0 . When we are given n points, we have n above equations, which can be written in matrix equation as Lx = 0, where L is a 2n×9 matrix. As x is deﬁned up to a scale factor, the solution is well known to be the right singular vector of L associated with the smallest singular value (or equivalently, the eigenvector of LT L associated with the smallest eigenvalue). In L, some elements are constant 1, some are in pixels, some are in world coordinates, and some are multiplication of both. This makes L poorly conditioned numerically. Much better results can be obtained by performing a simple data normalization prior to running the above procedure.
55. 55. Bibliography 41 Bibliography [1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation into object space coordinates in close-range photogrammetry. In Proceedings of the Sym- posium on Close-Range Photogrammetry, University of Illinois at Urbana- Champaign, Urbana, Illinois, pages 1–18, January 1971. [2] M. Agrawal and L. Davis. Camera calibration using spheres: A semi-deﬁnite programming approach. In Proceedings of the 9th International Conference on Computer Vision, pages 782–789, Nice, France, October 2003. IEEE Computer Society Press. [3] D. C. Brown. Close-range camera calibration. Photogrammetric Engineering, 37(8):855–866, 1971. [4] B. Caprile and V. Torre. Using Vanishing Points for Camera Calibration. The International Journal of Computer Vision, 4(2):127–140, March 1990. [5] X. Chen, J. Yang, and A. Waibel. Calibration of a hybrid camera network. In Proceedings of the 9th International Conference on Computer Vision, pages 150–155, Nice, France, October 2003. IEEE Computer Society Press. [6] W. Faig. Calibration of close-range photogrammetry systems: Mathematical formulation. Photogrammetric Engineering and Remote Sensing, 41(12):1479– 1486, 1975. [7] O. Faugeras and Q.-T. Luong. The Geometry of Multiple Images. The MIT Press, 2001. With contributions from T. Papadopoulo. [8] O. Faugeras. Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press, 1993. [9] O. Faugeras, T. Luong, and S. Maybank. Camera self-calibration: theory and experiments. In G. Sandini, editor, Proc 2nd ECCV, volume 588 of Lecture Notes in Computer Science, pages 321–334, Santa Margherita Ligure, Italy, May 1992. Springer-Verlag. [10] O. Faugeras and G. Toscani. The calibration problem for stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 15–20, Miami Beach, FL, June 1986. IEEE. [11] S. Ganapathy. Decomposition of transformation matrices for robot vision. Pat- tern Recognition Letters, 2:401–412, December 1984. [12] D. Gennery. Stereo-camera calibration. In Proceedings of the 10th Image Un- derstanding Workshop, pages 101–108, 1979. [13] G.H. Golub and C.F. van Loan. Matrix Computations. The John Hopkins University Press, Baltimore, Maryland, 3 edition, 1996. [14] P. Gurdjos, A. Crouzil, and R. Payrissat. Another way of looking at plane-based calibration: the centre circle constraint. In Proceedings of the 7th European Conference on Computer Vision, volume IV, pages 252–266, Copenhagen, May 2002. [15] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. [16] R. Hartley. Self-calibration from multiple views with a rotating camera. In J- O. Eklundh, editor, Proceedings of the 3rd European Conference on Computer Vision, volume 800-801 of Lecture Notes in Computer Science, pages 471–478, Stockholm, Sweden, May 1994. Springer-Verlag.
56. 56. 42 Camera Calibration Chapter 2 [17] R. Hartley. An algorithm for self calibration from several views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 908–912, Seattle, WA, June 1994. IEEE. [18] J. Heikkil¨a and O. Silv´en. A four-step camera calibration procedure with im- plicit image correction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1106–1112, San Juan, Puerto Rico, June 1997. IEEE Computer Society. [19] D. Liebowitz and A. Zisserman. Metric rectiﬁcation for perspective images of planes. In Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, pages 482–488, Santa Barbara, California, June 1998. IEEE Computer Society. [20] Q.-T. Luong and O.D. Faugeras. Self-calibration of a moving camera from point correspondences and fundamental matrices. The International Journal of Computer Vision, 22(3):261–289, 1997. [21] Q.-T. Luong. Matrice Fondamentale et Calibration Visuelle sur l’Environnement-Vers une plus grande autonomie des syst`emes robotiques. PhD thesis, Universit´e de Paris-Sud, Centre d’Orsay, December 1992. [22] S. J. Maybank and O. D. Faugeras. A theory of self-calibration of a moving camera. The International Journal of Computer Vision, 8(2):123–152, August 1992. [23] J.J. More. The levenberg-marquardt algorithm, implementation and theory. In G. A. Watson, editor, Numerical Analysis, Lecture Notes in Mathematics 630. Springer-Verlag, 1977. [24] T. Okatani and K. Deguchi. Autocalibration of projector-screen-camera sys- tem: Theory and algorithm for screen-to-camera homography estimation. In Proceedings of the 9th International Conference on Computer Vision, pages 774–781, Nice, France, October 2003. IEEE Computer Society Press. [25] L. Robert. Camera calibration without feature extraction. Computer Vision, Graphics, and Image Processing, 63(2):314–325, March 1995. also INRIA Tech- nical Report 2204. [26] J.G. Semple and G.T. Kneebone. Algebraic Projective Geometry. Oxford: Clarendon Press, 1952. Reprinted 1979. [27] S.W. Shih, Y.P. Hung, and W.S. Lin. Accurate linear technique for camera calibration considering lens distortion by solving an eigenvalue problem. Optical Engineering, 32(1):138–149, 1993. [28] I. Shimizu, Z. Zhang, S. Akamatsu, and K. Deguchi. Head pose determina- tion from one image using a generic model. In Proceedings of the IEEE Third International Conference on Automatic Face and Gesture Recognition, pages 100–105, Nara, Japan, April 1998. [29] C. C. Slama, editor. Manual of Photogrammetry. American Society of Pho- togrammetry, fourth edition, 1980. [30] G. Stein. Accurate internal camera calibration using rotation, with analysis of sources of error. In Proc. Fifth International Conference on Computer Vision, pages 230–236, Cambridge, Massachusetts, June 1995.
57. 57. Bibliography 43 [31] P. Sturm and S. Maybank. On plane-based camera calibration: A general algorithm, singularities, applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 432–437, Fort Collins, Colorado, June 1999. IEEE Computer Society Press. [32] B. Triggs. Autocalibration from planar scenes. In Proceedings of the 5th Euro- pean Conference on Computer Vision, pages 89–105, Freiburg, Germany, June 1998. [33] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D ma- chine vision metrology using oﬀ-the-shelf tv cameras and lenses. IEEE Journal of Robotics and Automation, 3(4):323–344, August 1987. [34] T. Ueshiba and F. Tomita. Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices. In Proceedings of the 9th International Conference on Computer Vision, pages 966–973, Nice, France, October 2003. IEEE Computer Society Press. [35] G.Q. Wei and S.D. Ma. A complete two-plane camera calibration method and experimental comparisons. In Proc. Fourth International Conference on Computer Vision, pages 439–446, Berlin, May 1993. [36] G.Q. Wei and S.D. Ma. Implicit and explicit camera calibration: Theory and experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):469–480, 1994. [37] J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10):965–980, October 1992. [38] R. Willson. Modeling and Calibration of Automated Zoom Lenses. PhD thesis, Department of Electrical and Computer Engineering, Carnegie Mellon Univer- sity, 1994. [39] R. Yang and Z. Zhang. Eye gaze correction with stereovision for video tele- conferencing. In Proceedings of the 7th European Conference on Computer Vision, volume II, pages 479–494, Copenhagen, May 2002. Also available as Technical Report MSR-TR-01-119. [40] R. Yang and Z. Zhang. Model-based head pose tracking with stereovision. In Proc. Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FG2002), pages 255–260, Washington, DC, May 2002. Also avail- able as Technical Report MSR-TR-01-102. [41] Z. Zhang. A ﬂexible new technique for camera calibration. Technical Report MSR-TR-98-71, Microsoft Research, December 1998. Available together with the software at http://research.microsoft.com/˜zhang/Calib/. [42] Z. Zhang. A ﬂexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330–1334, 2000. [43] Z. Zhang. Camera calibration with one-dimensional objects. Technical Report MSR-TR-2001-120, Microsoft Research, December 2001. [44] Z. Zhang. Camera calibration with one-dimensional objects. In Proc. Euro- pean Conference on Computer Vision (ECCV’02), volume IV, pages 161–174, Copenhagen, Denmark, May 2002. [45] Z. Zhang and O.D. Faugeras. 3D Dynamic Scene Analysis: A Stereo Based Approach. Springer, Berlin, Heidelberg, 1992.
58. 58. Chapter 3 MULTIPLE VIEW GEOMETRY Anders Heyden and Marc Pollefeys 3.1 Introduction There exist intricate geometric relations between multiple views of a 3D scene. These relations are related to the camera motion and calibration as well as to the scene structure. In this chapter we introduce these concepts and discuss how they can be applied to recover 3D models from images. In Section 3.2 a rather thorough description of projective geometry is given. Section 3.3 gives a short introduction to tensor calculus and Sec- tion 3.4 describes in detail the camera model used. In Section 3.5 a modern approach to multiple view geometry is presented and in Section 3.6 simple structure and motion algorithms are presented. In Section 3.7 more ad- vanced algorithms are presented that are suited for automatic processing on real image data. Section 3.8 discusses the possibility of calibrating the cam- era from images. Section 3.9 describes how the depth can be computed for most image pixels and Section 3.10 presents how the results of the previous sections can be combined to yield 3D models, render novel views or combine real and virtual elements in video. 45
59. 59. 46 Multiple View Geometry Chapter 3 3.2 Projective Geometry Projective geometry is a fundamental tool for dealing with structure from motion problems in computer vision, especially in multiple view geometry. The main reason is that the image formation process can be regarded as a projective transformation from a 3-dimensional to a 2-dimensional projective space. It is also a fundamental tool for dealing with auto-calibration prob- lems and examining critical conﬁgurations and critical motion sequences. This section deals with the fundamentals of projective geometry, includ- ing the deﬁnitions of projective spaces, homogeneous coordinates, duality, projective transformations and aﬃne and Euclidean embeddings. For a tra- ditional approach to projective geometry, see [9] and for more modern treat- ments, see [14], [15], [24]. 3.2.1 The central perspective transformation We will start the introduction of projective geometry by looking at a central perspective transformation, which is very natural from an image formation point of view, see Figure 3.1. x Πi E Πo X h l l2 l1 Figure 3.1. A central perspective transformation Deﬁnition 1. A central perspective transformation maps points, X,
60. 60. Section 3.2. Projective Geometry 47 on the object plane, Πo, to points on the image plane Πi, by intersecting the line through E, called the centre, and X with Πi. We can immediately see the following properties of the planar perspective transformation: – All points on Πo maps to points on Πi except for points on l, where l is deﬁned as the intersection of Πo with the plane incident with E and parallel with Πi. – All points on Πi are images of points on Πo except for points on h, called the horizon, where h is deﬁned as the intersection of Πi with the plane incident with E and parallel with Πo. – Lines in Πo are mapped to lines in Πi. – The images of parallel lines intersects in a point on the horizon, see e.g. l1 and l2 in Figure 3.1. – In the limit where the point E moves inﬁnitely far away, the planar perspective transformation turns into a parallel projection. Identify the planes Πo and Πi with R2, with a standard cartesian coordinate system Oe1e2 in Πo and Πi respectively. The central perspective transfor- mation is nearly a bijective transformation between Πo and Πi, i.e. from R2 to R2. The problem is the lines l ∈ Πo and h ∈ Πi. If we remove these lines we obtain a bijective transformation between R2 l and R2 h, but this is not the path that we will follow. Instead, we extend each R2 with an extra line deﬁned as the images of points on h and points that maps to l, in the natural way, i.e. maintaining continuity. Thus by adding one artiﬁcial line to each plane, it is possible to make the central perspective transformation bijective from (R2 + an extra line) to (R2 + an extra line). These extra lines correspond naturally to directions in the other plane, e.g. the images of the lines l1 and l2 intersects in a point on h corresponding to the direction of l1 and l2. The intersection point on h can be viewed as the limit of images of a point on l1 moving inﬁnitely far away. Inspired by this observation we make the following deﬁnition: Deﬁnition 2. Consider the set L of all lines parallel to a given line l in R2 and assign a point to each such set, pideal, called an ideal point or point at inﬁnity, cf. Figure 3.2. 3.2.2 Projective spaces We are now ready to make a preliminary deﬁnition of the two-dimensional projective space, i.e. the projective plane.
61. 61. 48 Multiple View Geometry Chapter 3 pideal L e1 e2 Figure 3.2. The point at inﬁnity corresponding to the set of lines L. Deﬁnition 3. The projective plane, P2, is deﬁned according to P2 = R2 ∪ {ideal points} . Deﬁnition 4. The ideal line, l∞ or line at inﬁnity in P2 is deﬁned ac- cording to l∞ = {ideal points} . The following constructions could easily be made in P2: 1. Two diﬀerent points deﬁne a line (called the join of the points) 2. Two diﬀerent lines intersect in a point with obvious interpretations for ideal points and the ideal line, e.g. the line deﬁned by an ordinary point and an ideal point is the line incident with the ordinary point with the direction given by the ideal point. Similarly we deﬁne Deﬁnition 5. The projective line, P1, is deﬁned according to P1 = R1 ∪ {ideal point} .
62. 62. Section 3.2. Projective Geometry 49 Observe that the projective line only contains one ideal point, which could be regarded as the point at inﬁnity. In order to deﬁne three-dimensional projective space, P3, we start with R3 and assign an ideal point to each set of parallel lines, i.e. to each direction. Deﬁnition 6. The projective space, P3, is deﬁned according to P3 = R3 ∪ {ideal points} . Observe that the ideal points in P3 constitutes a two-dimensional manifold, which motivates the following deﬁnition. Deﬁnition 7. The ideal points in P3 builds up a plane, called the ideal plane or plane at inﬁnity, also containing ideal lines. The plane at inﬁnity contains lines, again called lines at inﬁnity. Every set of parallel planes in R3 deﬁnes an ideal line and all ideal lines builds up the ideal plane. A lot of geometrical constructions can be made in P3, e.g. 1. Two diﬀerent points deﬁne a line (called the join of the two points) 2. Three diﬀerent points deﬁne a plane (called the join of the three points) 3. Two diﬀerent planes intersect in a line 4. Three diﬀerent planes intersect in a point 3.2.3 Homogeneous coordinates It is often advantageous to introduce coordinates in the projective spaces, so called analytic projective geometry. Introduce a cartesian coordinate system, Oexey in R2 and deﬁne the line l : y = 1, see Figure 3.3. We make the following simple observations: Lemma 1. The vectors (p1, p2) and (q1, q2) determines the same line through the origin iﬀ (p1, p2) = λ(q1, q2), λ = 0 . Proposition 1. Every line, lp = (p1, p2)t, t ∈ R, through the origin, except for the x-axis, intersect the line l in one point, p. We can now make the following deﬁnitions:
63. 63. 50 Multiple View Geometry Chapter 3 lp : (p1.p2)t, t ∈ R ex ey p l : y = 1 Figure 3.3. Deﬁnition of homogeneous coordinates in P1 . Deﬁnition 8. The pairs of numbers (p1, p2) and (q1, q2) are said to be equiv- alent if (p1, p2) = λ(q1, q2), λ = 0 . We write (p1, p2) ∼ (q1, q2) . There is a one-to-one correspondence between lines through the origin and points on the line l if we add an extra point on the line, corresponding to the line x = 0, i.e. the direction (1, 0). By identifying the line l augmented with this extra point, corresponding to the point at inﬁnity, p∞, with P1, we can make the following deﬁnitions: Deﬁnition 9. The one-dimensional projective space, P1, consists of pairs of numbers (p1, p2) (under the equivalence above), where (p1, p2) = (0, 0). The pair (p1, p2) is called homogeneous coordinates for the corre- sponding point in P1. Theorem 1. There is a natural division of P1 into two disjoint subsets P1 = {(p1, 1) ∈ P1 } ∪ {(p1, 0) ∈ P1 } , corresponding to ordinary points and the ideal point. The introduction of homogeneous coordinates can easily be generalized to P2 and P3 using three and four homogeneous coordinates respectively. In the case of P2 ﬁx a cartesian coordinate system Oexeyez in R3 and deﬁne the plane Π : z = 1, see Figure 3.4. The vectors (p1, p2, p3) and (q1, q2, q3)
64. 64. Section 3.2. Projective Geometry 51 p ex ey lp : (p1.p2, p3)t, t ∈ R Π : z = 1 ez Figure 3.4. Deﬁnition of homogeneous coordinates in P2 . determines the same line through the origin iﬀ (p1, p2, p3) = λ(q1, q2, q3), λ = 0 . Every line through the origin, except for those in the x-y-plane, intersect the plane Π in one point. Again, there is a one-to-one correspondence between lines through the origin and points on the plane Π if we add an extra line, corresponding to the lines in the plane z = 0, i.e. the line at inﬁnity, l∞, built up by lines of the form (p1, p2, 0). We can now identifying the plane Π augmented with this extra line, corresponding to the points at inﬁnity, l∞, with P2. Deﬁnition 10. The pairs of numbers (p1, p2, p3) and (q1, q2, q3) are said to be equivalent iﬀ (p1, p2, p3) = λ(q1, q2, q3), λ = 0 written (p1, p2, p3) ∼ (q1, q2, q3) . Deﬁnition 11. The two-dimensional projective space P2 consists of all triplets of numbers (p1, p2, p3) = (0, 0, 0). The triplet (p1, p2, p3) is called homogeneous coordinates for the corresponding point in P2. Theorem 2. There is a natural division of P2 into two disjoint subsets P2 = {(p1, p2, 1) ∈ P2 } ∪ {(p1, p2, 0) ∈ P2 } , corresponding to ordinary points and ideal points (or points at inﬁnity). The same procedure can be carried out to construct P3 (and even Pn for any n ∈ N), but it is harder to visualize, since we have to start with R4.