The document discusses camera models used in computer vision. It begins by defining a camera as a mapping from the 3D world to a 2D image. The basic pinhole camera model is then described, including the camera center, image plane, principal axis, and principal point. Central projection using homogeneous coordinates is shown. The camera calibration matrix K is introduced, which relates the camera coordinate system to pixel coordinates. Finally, the full camera matrix P is defined, which combines camera intrinsics K, rotation R, and translation -C to map 3D world points to 2D image points.
Custom Metadata and Custom Settings and Custom Lebel in Salesforce.pptxmohayyudin7826
Delve into Salesforce customization with Custom Metadata, Custom Settings, and Custom Labels. Learn to tailor CRM to unique needs, manage data efficiently, and create multilingual interfaces. Join us for insights into optimizing Salesforce for peak efficiency and personalized user experiences
Introducing Lightning Component Architecture, Component Driven Development Approach and their benefits. Places where Lightning Components can run, Tools used for Lightning Component Development & Debugging
AWS Black Belt Online Seminarの最新コンテンツ: https://aws.amazon.com/jp/aws-jp-introduction/#new
過去に開催されたオンラインセミナーのコンテンツ一覧: https://aws.amazon.com/jp/aws-jp-introduction/aws-jp-webinar-service-cut/
This document provides information about the Computer Vision Laboratory 2012 course at the Institute of Visual Computing. The course focuses on computer vision on mobile devices and will involve 180 hours of project work per person. Students will work in groups of 1-2 people on topics like 3D reconstruction from silhouettes or stereo images on mobile devices. Key dates are provided for submitting a work plan, mid-term presentation, and final report. Contact information is given for the lecturers and teaching assistant.
Camera calibration involves determining the internal camera parameters like focal length, image center, distortion, and scaling factors that affect the imaging process. These parameters are important for applications like 3D reconstruction and robotics that require understanding the relationship between 3D world points and their 2D projections in an image. The document describes estimating internal parameters by taking images of a calibration target with known geometry and solving the equations that relate the 3D target points to their 2D image locations. Homogeneous coordinates and projection matrices are used to represent the calibration transformations mathematically.
This document provides an overview of OpenCV, an open source computer vision and machine learning software library. It discusses OpenCV's core functionality for representing images as matrices and directly accessing pixel data. It also covers topics like camera calibration, feature point extraction and matching, and estimating camera pose through techniques like structure from motion and planar homography. Hints are provided for Android developers on required permissions and for planar homography estimation using additional constraints rather than OpenCV's general homography function.
Custom Metadata and Custom Settings and Custom Lebel in Salesforce.pptxmohayyudin7826
Delve into Salesforce customization with Custom Metadata, Custom Settings, and Custom Labels. Learn to tailor CRM to unique needs, manage data efficiently, and create multilingual interfaces. Join us for insights into optimizing Salesforce for peak efficiency and personalized user experiences
Introducing Lightning Component Architecture, Component Driven Development Approach and their benefits. Places where Lightning Components can run, Tools used for Lightning Component Development & Debugging
AWS Black Belt Online Seminarの最新コンテンツ: https://aws.amazon.com/jp/aws-jp-introduction/#new
過去に開催されたオンラインセミナーのコンテンツ一覧: https://aws.amazon.com/jp/aws-jp-introduction/aws-jp-webinar-service-cut/
This document provides information about the Computer Vision Laboratory 2012 course at the Institute of Visual Computing. The course focuses on computer vision on mobile devices and will involve 180 hours of project work per person. Students will work in groups of 1-2 people on topics like 3D reconstruction from silhouettes or stereo images on mobile devices. Key dates are provided for submitting a work plan, mid-term presentation, and final report. Contact information is given for the lecturers and teaching assistant.
Camera calibration involves determining the internal camera parameters like focal length, image center, distortion, and scaling factors that affect the imaging process. These parameters are important for applications like 3D reconstruction and robotics that require understanding the relationship between 3D world points and their 2D projections in an image. The document describes estimating internal parameters by taking images of a calibration target with known geometry and solving the equations that relate the 3D target points to their 2D image locations. Homogeneous coordinates and projection matrices are used to represent the calibration transformations mathematically.
This document provides an overview of OpenCV, an open source computer vision and machine learning software library. It discusses OpenCV's core functionality for representing images as matrices and directly accessing pixel data. It also covers topics like camera calibration, feature point extraction and matching, and estimating camera pose through techniques like structure from motion and planar homography. Hints are provided for Android developers on required permissions and for planar homography estimation using additional constraints rather than OpenCV's general homography function.
This document describes an efficient framework for part-based object recognition using pictorial structures. The framework represents objects as graphs of parts with spatial relationships. It finds the optimal configuration of parts through global minimization using distance transforms, allowing fast computation despite modeling complex spatial relationships between parts. This enables soft detection to handle partial occlusion without early decisions about part locations.
This document presents a summary of control techniques for quadrotors, including dynamic inversion, backstepping, and sliding mode control. It provides the dynamic model and state space model of a quadrotor. It discusses challenges in controlling quadrotors due to underactuation, nonlinearity, and coupling. It then describes how dynamic inversion, backstepping, and sliding mode control can address these challenges. Finally, it provides simulation results comparing the position and attitude responses of the quadrotor and control inputs with and without disturbances using these control techniques.
1) The document discusses perspective projection, which models image formation by projecting a 3D scene onto a 2D projection plane from a single center of projection, analogous to a camera.
2) It introduces homogeneous coordinates to represent 3D points as 4D vectors, allowing perspective transformations to be represented by 4x4 matrices.
3) Two examples of perspective projections are shown - onto the plane z=f, and onto z=0 with the center of projection at (0,0,f). 4x4 matrices representing these transformations are derived.
1. The correlation ellipsoid and correlation ellipse provide geometric representations of correlations between multiple variables. The ellipsoid/ellipse shows the correlation coefficients and partial correlations between variables.
2. The correlation ellipsoid directly depicts multiple correlations, partial correlations, and standardized regression coefficients between variables through its shape and properties of tangency with the unit hypercube.
3. These geometric representations provide intuitive insights into how multicollinearity occurs and how regression coefficients can invert signs, which is difficult to understand through algebraic expressions alone.
This document discusses concepts related to rotational kinematics and dynamics including:
1. Rotational kinematics equations relating angular displacement (θ), angular velocity (ω), angular acceleration (α), and time (t).
2. Rotational dynamics equations relating torque (τ), moment of inertia (I), angular acceleration (α), and angular velocity (ω).
3. Examples calculating values like angular velocity, angular acceleration, linear velocity, torque, power, work, and kinetic energy for rotating objects using the rotational kinematics and dynamics equations.
This document discusses concepts related to rotational kinematics and dynamics including:
1. Rotational kinematics equations relating angular displacement (θ), angular velocity (ω), angular acceleration (α), and time (t).
2. Rotational dynamics equations relating torque (τ), moment of inertia (I), angular acceleration (α), and angular velocity (ω).
3. Examples calculating values like angular velocity, angular acceleration, linear velocity, torque, power, work, and kinetic energy for rotating objects using the rotational kinematics and dynamics equations.
The document discusses dimensionality reduction techniques for reducing high-dimensional data to fewer dimensions. It categorizes dimensionality reduction into feature extraction and feature selection. Feature extraction transforms features to generate new ones, while feature selection selects the best original features. The document then discusses several feature selection algorithms from different categories (filter, wrapper, hybrid) and evaluates their performance on cancer datasets. It finds that linear support vector machines using mRMR feature selection provided the best results.
Cervical cancer rates have dramatically declined in the United States due to widespread Pap smear screening and the ability to treat precancerous lesions before they develop into cancer. The introduction of the Pap test in the 1940s allowed early detection and helped reduce cervical cancer incidence and mortality rates by over 60% between 1955 and 1992. New automated screening systems using digital imaging and computational analysis now further aid in screening and may help expand screening to rural areas through remote image analysis.
The document summarizes linear dynamical models and tracking using the Kalman filter. It discusses prediction using the previous state estimate, correction using the new measurement, and modeling the system and measurements as Gaussian processes. The key steps of prediction using the dynamic model and correction by updating the state estimate based on the new measurement are derived for a linear system with a one-dimensional state vector.
DLT stands for Direct Linear Transformation. It is an algorithm that estimates the camera matrix P by minimizing the algebraic error between measured image points xi and projected 3D points PXi. Specifically, DLT finds P by solving the equation Ap=0, where A is constructed from point correspondences and p contains the entries of P. This minimizes the sum of squared algebraic distances between the points. For affine cameras, the algebraic and geometric distances are equivalent. DLT provides an initial estimate of P that can be refined using nonlinear optimization techniques.
This document discusses singular value decomposition (SVD) and its applications. SVD decomposes a matrix into three component matrices that reveal useful properties about the matrix's structure and rank. SVD can be used to find the best-fitting line to a set of points by minimizing the sum of squared distances between points and the line. The solution involves computing the SVD of a transformed matrix and taking the right singular vector corresponding to the second largest singular value.
The document discusses estimating 2D homography from point correspondences between two images using the Direct Linear Transformation algorithm. It describes how each point correspondence provides two linear equations relating the entries of the homography matrix. At least four point correspondences are needed to compute the homography using DLT. The document also discusses issues like degenerate configurations, data normalization, robust estimation techniques like RANSAC to deal with outlier correspondences.
The document discusses projective geometry in 3D space (P3). It defines how points, planes, and lines are represented using homogeneous coordinates. Under projective transformations, incidence relations between points and planes are preserved. Three non-coplanar points uniquely define a plane, and three planes intersect at a point. The hierarchy of transformations from projective to Euclidean is described, along with the invariants each preserve. The plane at infinity π∞ and absolute conic Ω∞ allow measurement of affine and metric properties within a projective frame.
The document discusses projective geometry and its applications in computer vision. It begins by introducing planar geometry and algebraic geometry. It then describes the 2D projective plane and how points and lines can be represented using homogeneous coordinates. Ideal points and the line at infinity are discussed. Projective transformations including homographies are explained. Conic sections and how they transform under projectivities are covered. The key concepts of duality and various subgroups of projective transformations are summarized. Examples of projective transformations and corrections are provided.
The document summarizes linear dynamical models and tracking using the Kalman filter. It discusses prediction using the previous state estimate, correction using the new measurement, and representing the state as a Gaussian distribution. Key steps include predicting the next state using the dynamic model, then correcting the prediction using the new measurement via Bayes' rule to get an updated state estimate. Calculations involve multiplying and summing Gaussian probability densities.
The document discusses probabilistic segmentation using mixture models and the expectation-maximization (EM) algorithm. It addresses image segmentation and line fitting applications.
For image segmentation, the missing data is an (n x g) matrix of indicator variables showing which pixel belongs to which segment. The E-step computes the probability each pixel belongs to each segment. The M-step re-estimates the mixture model parameters to maximize the complete data log-likelihood.
For line fitting, the missing data is similarly an (n x g) matrix showing which point belongs to which line. The E-step computes the probability each point was drawn from each line. The M-step then re-estimates the line parameters.
The document discusses segmentation and is from the Computer Science and Engineering department at the Indian Institute of Technology in Kharagpur. It contains 29 pages of content about segmentation but provides no other context or summaries of the information within.
The trifocal tensor encapsulates the projective geometry relations between three views. It depends only on the relative pose between the three cameras and their internal parameters. The trifocal tensor can uniquely determine point and line correspondences between the three views and can be used to transfer points from a correspondence in two views to the corresponding point in the third view. It consists of three 3x3 matrices that relate image lines between the views and can induce homographies between views from lines in one of the images.
The document discusses two-view geometry and epipolar geometry in computer vision. It contains the following key points in 3 sentences:
Epipolar geometry describes the intrinsic projective geometry between two views of a scene and is defined by the fundamental matrix F, which is a 3x3 matrix that maps a point in one image to an epipolar line in the other image. The epipolar line is the intersection of the epipolar plane containing the baseline between cameras and the second image plane. Special motions like pure translation result in all epipolar lines intersecting at the epipole, which is the image of the camera center from the other view.
Camera calibration involves determining the internal camera parameters like focal length, image center, distortion, and scaling factors that affect the imaging process. These parameters are important for applications like 3D reconstruction and robotics that require understanding the relationship between 3D world points and their 2D projections in an image. The document describes estimating internal parameters by taking images of a calibration target with known 3D positions and solving for the camera projection matrix P that relates 3D scene points to their 2D image coordinates.
This document discusses probabilistic segmentation using mixture models. It explains that a mixture model represents the probability of generating a pixel measurement vector as a weighted sum of component densities. The likelihood for all observations is calculated as the product of probabilities for each data point. Missing data problems are also discussed, where the incomplete data likelihood is calculated as the product of probabilities for each incomplete data observation.
The document discusses segmentation and is from the Computer Science and Engineering department at the Indian Institute of Technology in Kharagpur. It contains 29 pages of content about segmentation but provides no other context or summaries of the information within.
This document describes an efficient framework for part-based object recognition using pictorial structures. The framework represents objects as graphs of parts with spatial relationships. It finds the optimal configuration of parts through global minimization using distance transforms, allowing fast computation despite modeling complex spatial relationships between parts. This enables soft detection to handle partial occlusion without early decisions about part locations.
This document presents a summary of control techniques for quadrotors, including dynamic inversion, backstepping, and sliding mode control. It provides the dynamic model and state space model of a quadrotor. It discusses challenges in controlling quadrotors due to underactuation, nonlinearity, and coupling. It then describes how dynamic inversion, backstepping, and sliding mode control can address these challenges. Finally, it provides simulation results comparing the position and attitude responses of the quadrotor and control inputs with and without disturbances using these control techniques.
1) The document discusses perspective projection, which models image formation by projecting a 3D scene onto a 2D projection plane from a single center of projection, analogous to a camera.
2) It introduces homogeneous coordinates to represent 3D points as 4D vectors, allowing perspective transformations to be represented by 4x4 matrices.
3) Two examples of perspective projections are shown - onto the plane z=f, and onto z=0 with the center of projection at (0,0,f). 4x4 matrices representing these transformations are derived.
1. The correlation ellipsoid and correlation ellipse provide geometric representations of correlations between multiple variables. The ellipsoid/ellipse shows the correlation coefficients and partial correlations between variables.
2. The correlation ellipsoid directly depicts multiple correlations, partial correlations, and standardized regression coefficients between variables through its shape and properties of tangency with the unit hypercube.
3. These geometric representations provide intuitive insights into how multicollinearity occurs and how regression coefficients can invert signs, which is difficult to understand through algebraic expressions alone.
This document discusses concepts related to rotational kinematics and dynamics including:
1. Rotational kinematics equations relating angular displacement (θ), angular velocity (ω), angular acceleration (α), and time (t).
2. Rotational dynamics equations relating torque (τ), moment of inertia (I), angular acceleration (α), and angular velocity (ω).
3. Examples calculating values like angular velocity, angular acceleration, linear velocity, torque, power, work, and kinetic energy for rotating objects using the rotational kinematics and dynamics equations.
This document discusses concepts related to rotational kinematics and dynamics including:
1. Rotational kinematics equations relating angular displacement (θ), angular velocity (ω), angular acceleration (α), and time (t).
2. Rotational dynamics equations relating torque (τ), moment of inertia (I), angular acceleration (α), and angular velocity (ω).
3. Examples calculating values like angular velocity, angular acceleration, linear velocity, torque, power, work, and kinetic energy for rotating objects using the rotational kinematics and dynamics equations.
The document discusses dimensionality reduction techniques for reducing high-dimensional data to fewer dimensions. It categorizes dimensionality reduction into feature extraction and feature selection. Feature extraction transforms features to generate new ones, while feature selection selects the best original features. The document then discusses several feature selection algorithms from different categories (filter, wrapper, hybrid) and evaluates their performance on cancer datasets. It finds that linear support vector machines using mRMR feature selection provided the best results.
Cervical cancer rates have dramatically declined in the United States due to widespread Pap smear screening and the ability to treat precancerous lesions before they develop into cancer. The introduction of the Pap test in the 1940s allowed early detection and helped reduce cervical cancer incidence and mortality rates by over 60% between 1955 and 1992. New automated screening systems using digital imaging and computational analysis now further aid in screening and may help expand screening to rural areas through remote image analysis.
The document summarizes linear dynamical models and tracking using the Kalman filter. It discusses prediction using the previous state estimate, correction using the new measurement, and modeling the system and measurements as Gaussian processes. The key steps of prediction using the dynamic model and correction by updating the state estimate based on the new measurement are derived for a linear system with a one-dimensional state vector.
DLT stands for Direct Linear Transformation. It is an algorithm that estimates the camera matrix P by minimizing the algebraic error between measured image points xi and projected 3D points PXi. Specifically, DLT finds P by solving the equation Ap=0, where A is constructed from point correspondences and p contains the entries of P. This minimizes the sum of squared algebraic distances between the points. For affine cameras, the algebraic and geometric distances are equivalent. DLT provides an initial estimate of P that can be refined using nonlinear optimization techniques.
This document discusses singular value decomposition (SVD) and its applications. SVD decomposes a matrix into three component matrices that reveal useful properties about the matrix's structure and rank. SVD can be used to find the best-fitting line to a set of points by minimizing the sum of squared distances between points and the line. The solution involves computing the SVD of a transformed matrix and taking the right singular vector corresponding to the second largest singular value.
The document discusses estimating 2D homography from point correspondences between two images using the Direct Linear Transformation algorithm. It describes how each point correspondence provides two linear equations relating the entries of the homography matrix. At least four point correspondences are needed to compute the homography using DLT. The document also discusses issues like degenerate configurations, data normalization, robust estimation techniques like RANSAC to deal with outlier correspondences.
The document discusses projective geometry in 3D space (P3). It defines how points, planes, and lines are represented using homogeneous coordinates. Under projective transformations, incidence relations between points and planes are preserved. Three non-coplanar points uniquely define a plane, and three planes intersect at a point. The hierarchy of transformations from projective to Euclidean is described, along with the invariants each preserve. The plane at infinity π∞ and absolute conic Ω∞ allow measurement of affine and metric properties within a projective frame.
The document discusses projective geometry and its applications in computer vision. It begins by introducing planar geometry and algebraic geometry. It then describes the 2D projective plane and how points and lines can be represented using homogeneous coordinates. Ideal points and the line at infinity are discussed. Projective transformations including homographies are explained. Conic sections and how they transform under projectivities are covered. The key concepts of duality and various subgroups of projective transformations are summarized. Examples of projective transformations and corrections are provided.
The document summarizes linear dynamical models and tracking using the Kalman filter. It discusses prediction using the previous state estimate, correction using the new measurement, and representing the state as a Gaussian distribution. Key steps include predicting the next state using the dynamic model, then correcting the prediction using the new measurement via Bayes' rule to get an updated state estimate. Calculations involve multiplying and summing Gaussian probability densities.
The document discusses probabilistic segmentation using mixture models and the expectation-maximization (EM) algorithm. It addresses image segmentation and line fitting applications.
For image segmentation, the missing data is an (n x g) matrix of indicator variables showing which pixel belongs to which segment. The E-step computes the probability each pixel belongs to each segment. The M-step re-estimates the mixture model parameters to maximize the complete data log-likelihood.
For line fitting, the missing data is similarly an (n x g) matrix showing which point belongs to which line. The E-step computes the probability each point was drawn from each line. The M-step then re-estimates the line parameters.
The document discusses segmentation and is from the Computer Science and Engineering department at the Indian Institute of Technology in Kharagpur. It contains 29 pages of content about segmentation but provides no other context or summaries of the information within.
The trifocal tensor encapsulates the projective geometry relations between three views. It depends only on the relative pose between the three cameras and their internal parameters. The trifocal tensor can uniquely determine point and line correspondences between the three views and can be used to transfer points from a correspondence in two views to the corresponding point in the third view. It consists of three 3x3 matrices that relate image lines between the views and can induce homographies between views from lines in one of the images.
The document discusses two-view geometry and epipolar geometry in computer vision. It contains the following key points in 3 sentences:
Epipolar geometry describes the intrinsic projective geometry between two views of a scene and is defined by the fundamental matrix F, which is a 3x3 matrix that maps a point in one image to an epipolar line in the other image. The epipolar line is the intersection of the epipolar plane containing the baseline between cameras and the second image plane. Special motions like pure translation result in all epipolar lines intersecting at the epipole, which is the image of the camera center from the other view.
Camera calibration involves determining the internal camera parameters like focal length, image center, distortion, and scaling factors that affect the imaging process. These parameters are important for applications like 3D reconstruction and robotics that require understanding the relationship between 3D world points and their 2D projections in an image. The document describes estimating internal parameters by taking images of a calibration target with known 3D positions and solving for the camera projection matrix P that relates 3D scene points to their 2D image coordinates.
This document discusses probabilistic segmentation using mixture models. It explains that a mixture model represents the probability of generating a pixel measurement vector as a weighted sum of component densities. The likelihood for all observations is calculated as the product of probabilities for each data point. Missing data problems are also discussed, where the incomplete data likelihood is calculated as the product of probabilities for each incomplete data observation.
The document discusses segmentation and is from the Computer Science and Engineering department at the Indian Institute of Technology in Kharagpur. It contains 29 pages of content about segmentation but provides no other context or summaries of the information within.
The document discusses least squares minimization and solving systems of linear equations. It begins by introducing overdetermined systems with more equations than unknowns and describes finding the least squares solution that minimizes the residual. It then presents the algorithm which uses the singular value decomposition to solve the normal equations and find the pseudo-inverse. It also covers solving homogeneous systems of equations by minimizing the residual subject to the constraint that the solution vector has unit length.
1. C OMPUTER V ISION : C AMERA M ODELS
IIT Kharagpur
Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.
(IIT Kharagpur) Camera Models Jan ’10 1 / 52
2. What is a camera?
A camera is a mapping between the 3D world (object space) and
a 2D image.
A camera model is a matrix with particular properties and
represent the camera matrix.
A general projective camera has specialized models:
Finite camera: This is a central projection camera having a finite
centre.
Centre at infinity: Camera with centre at infinity. For example: the
affine camera.
(IIT Kharagpur) Camera Models Jan ’10 2 / 52
3. The basic pin-hole model
The centre of projection is called as the camera centre.
The plane on which the image is formed is called as the image
plane.
The line through the camera centre and perpendicular to the
image plane is called as the principal axis of the camera.
The point where the principal axis meets the image plane is called
as the principal point.
The plane through the camera centre parallel to the image plane
is called as the principle plane of the camera.
(IIT Kharagpur) Camera Models Jan ’10 3 / 52
5. Camera settings
Typical settings:
The camera centre is taken to be the origin of the Euclidean
coordinate frame.
The image plane is taken to be the plane z = f .
The central projection mapping from Euclidean space R3 → R2 is
given as:
(X, Y, Z)T → (f X/Z, f Y/Z)T
(IIT Kharagpur) Camera Models Jan ’10 5 / 52
6. Central projection using homogeneous
coordinates
X X
Y
fX
f
0
Y
→ fY
=
f 0
Z
Z
Z 1 0
1 1
x = PX
P = diag(f , f , 1) [ I | 0]
The measurements on the image plane assume that the principal point
is the origin of the image plane.
(IIT Kharagpur) Camera Models Jan ’10 6 / 52
7. Principal point offset
If the principal point has general coordinates (px , py )T then the
mapping changes to
(X, Y, Z)T → (f X/Z + px , f Y/Z + py )T
X X
Y
f X + Zpx
f
px 0
Y
Z → f Y + Zpy f py 0
=
Z
Z 1 0
1 1
f px
f py
K=
1
x = K [ I | 0] Xcam
(IIT Kharagpur) Camera Models Jan ’10 7 / 52
8. Camera Calibration matrix
x = K [ I | 0] Xcam
The matrix K is the camera calibration matrix.
Writing Xcam denotes that the world point is represented in the
camera coordinate system, with the camera centre being the
origin.
(IIT Kharagpur) Camera Models Jan ’10 8 / 52
9. Camera rotation and translation
In general, points in space will be expressed in terms of a different
Euclidean coordinate frame, known as world coordinate frame.
The two coordinate frames are related via rotation and translation.
A point expressed in the world coordinate system as X can be
represented in the camera coordinate system as Xcam
Xcam = R(X − C)
C represents the coordinates of the camera centre in the world
coordinate frame. R is the rotation matrix.
X
R −RC Y R −RC
Xcam = Z =
X
0 1 0 1
1
(IIT Kharagpur) Camera Models Jan ’10 9 / 52
10. Concatenating the matrices
X
R −RC Y R −RC
x = K [ I | 0] Xcam
Xcam = Z =
X
0 1 0 1
1
x = K R [ I | − C] X
(IIT Kharagpur) Camera Models Jan ’10 10 / 52
11. Camera matrix
x = K R [ I | − C] X
Camera Matrix:
P = KR [ I | − C]
P is a 3 × 4 matrix. 9 degrees of freedom: 3 for K (elements
f , px , py ), 3 for R, 3 for C.
Parameters in K are the internal parameters.
Parameters in R and C are the external parameters.
A representation which hides the camera centre:
P = K [R | t] t = −RC
(IIT Kharagpur) Camera Models Jan ’10 11 / 52
12. CCD cameras Non-uniform scaling
A CCD camera has non-square pixels. This has the effect of
introducing unequal scale factors in the axial directions.
αx
f
x0 x0
f y0 αy y0
K=
changes to K =
1 1
mx and my denote the number of pixels per unit distance in image
coordinates in the x and y directions.
αx = fmx , αy = fmy
(x0 , y0 ) are coordinates of the principal point in terms of pixel
dimensions. x0 = mx px , y0 = my py
A CCD camera has 10 degrees of freedom.
(IIT Kharagpur) Camera Models Jan ’10 12 / 52
13. Finite Projective Camera Skew
If the coordinate system of the image plane is skewed then we have:
αx s x0
αy y 0
1
s is the skew parameter.
P = K R [ I | − C]
A finite projective camera has 11 degrees of freedom.
The left 3 × 3 sub-matrix of P is denoted as M.
M = KR
(IIT Kharagpur) Camera Models Jan ’10 13 / 52
14. Finite Projective Camera
M = KR
The camera matrix can be written as
P = K R [ I | − C] P = [M | p4 ]
where p4 denotes the last column of the camera matrix.
(IIT Kharagpur) Camera Models Jan ’10 14 / 52
15. Camera Anatomy Projective Camera
Camera centre:
PC = 0
Consider a line containing C and any other point A in 3-space.
Points on this line can be represented as:
X(λ) = λA + (1 − λ)C
Under the mapping x = PX, points on this line are projected to
x = PX(λ) = λPA + (1 − λ)PC = λPA
(IIT Kharagpur) Camera Models Jan ’10 15 / 52
16. Column Vectors Projective Camera
The columns of the projective camera are 3-vectors which have a
geometric meaning as particular image points.
The first 3 columns of P i.e. p1 , p2 , p3 are the vanishing points of
the world coordinate X, Y, Z respectively.
The column p4 is the image of the world origin.
(IIT Kharagpur) Camera Models Jan ’10 16 / 52
17. Row Vectors
The columns of the projective camera are 4-vectors which are
interpreted geometrically as particular world planes.
1T
p11 p12 p13 p14
P
P = p21 p22 p23 p24
= P2T
3T
p31 p32 p33 p34 P
The set of points X which lie on the plane P1 will satisfy P1T X = 0
The set of points X which lie on the plane P2 will satisfy P2T X = 0
The set of points X which lie on the plane P3 will satisfy P3T X = 0
(IIT Kharagpur) Camera Models Jan ’10 17 / 52
18. Principal plane P3
The principal plane is the plane through the camera centre,
parallel to the image plane.
It consists of the set of points which are imaged on the line at
infinity of the image.
If a point X lies on the principal plane, then PX = (x, y , 0)T . Thus
a point lies on the principal plane if and only if P3T X = 0
(IIT Kharagpur) Camera Models Jan ’10 18 / 52
20. Axis planes P1 , P2
The points on plane P1 have P1T X = 0, and so are imaged at
PX = (0, y , w)T . These are points on the image y axis.
Since PC = 0 and P1T C = 0, this implies that C also lies on the
plane P1 .
Plane P1 is defined by the camera centre and the line x = 0 in the
image.
Plane P2 is defined by the camera centre and the line y = 0 in the
image.
(IIT Kharagpur) Camera Models Jan ’10 20 / 52
21. Orthographic Projection
The projection along Z-axis in matrix form:
1 0 0 0
P= 0 1 0 0
0 0 0 1
The mapping takes a point (X, Y, Z, 1)T to the image point
(X, Y, 1)T , dropping the Z coordinate.
For a general orthographic projection mapping, we precede this
map by a 3D Euclidean coordinate change of the form
R t
H= H is a 4 × 4 homography in P3 .
0T 1
R is a 3 × 3 rotation matrix. t is 3 × 1 translation vector.
(IIT Kharagpur) Camera Models Jan ’10 21 / 52
22. Orthographic Projection
Writing t = (t 1 , t 2 , t 3 )T , and the rows r1T , r2T , r3T of 3 × 3 rotation
matrix, a general orthographic camera is of the form:
1T
2T t 1
r
R t r t2
H4×4 = =
3T
0T 1
r
T t3
0 1
Aligning the world coordinate system and the camera coodinate
system:
r1 T t 1
1T
1 0 0 0 2T r t1
r t
P × H4×4 = 0 1 0 0 3 T 2 = r2 T t 2
r
t3
T
0 0 0 1 0 1
T
0 1
(IIT Kharagpur) Camera Models Jan ’10 22 / 52
23. Orthographic Projection
1T
r t
2T 1
T t2
P= r
0 1
Five degrees of freedom: 3 for R and 2 for t , t .
1 2
The orthographic projection matrix P = [M | t] has the matrix M
with last row zero, with the first two rows orthonormal and of unit
norm, and t3 = 1
(IIT Kharagpur) Camera Models Jan ’10 23 / 52
24. Scaled orthographic projection
Orthographic projection followed by isotropic scaling.
1T
t1 r1T t1
k
r
2T
k t2 = r2T t2
P=
r
T
T
1 0 1 0 1/k
Six degrees of freedom.
A scaled orthographic projection matrix P = [M | t] has matrix M
with last row zero, and the first two rows orthogonal and of equal
norm.
(IIT Kharagpur) Camera Models Jan ’10 24 / 52
25. Weak perspective projection
It is camera at infinity for which the scale factors in the two axial
image directions are not equal.
αx
1T
r
t1
αy
2T
t2
P=
r
T
1 0 1
Seven degrees of freedom.
A weak perspective projection matrix P = [M | t] has matrix M with
last row zero, and the first two rows orthogonal (they need not
have equal norm).
(IIT Kharagpur) Camera Models Jan ’10 25 / 52
26. The affine camera
αx r1T t1 m11 m12 m13 t1
s
αy
2T
t2 ≡ m21 m22 m23 t2
PA =
r
T
1 0 1 0 0 0 1
Eight degrees of freedom.
An affine projection matrix P = [M | t] has matrix M with the first
two rows sub-matrix M2×3 having rank 2. This arises from the
requirement that the rank of P is 3.
(IIT Kharagpur) Camera Models Jan ’10 26 / 52
27. The affine camera PA
Projection under an affine camera is a linear mapping on
inhomogeneous coordinates composed with a translation:
X
x m11 m12 m13 t
Y + 1
PA = =
y m21 m22 m23
t2
Z
(IIT Kharagpur) Camera Models Jan ’10 27 / 52
28. Properties of the affine camera PA
The plane at infinity in space is mapped to points at infinity in the
image.
P A (X, Y, Z, 0)T = (X, Y, 0)T
The principal plane of the camera is the plane at infinity.
Parallel world lines are projected to parallel image lines.
The vector d satisfying M2×3 d = 0 is the direction of parallel
projection.
d
The camera centre is (dT , 0)T since P A =0
0
(IIT Kharagpur) Camera Models Jan ’10 28 / 52
30. Push Broom camera
The Linear Pushbroom (LP) camera is the commonly used type of
sensor for satellites.
A linear sensor array is used to capture a single line of imagery at
a time.
As the sensor moves the sensor plane sweeps out a region of
space, capturing the image a single line at a time.
The second dimension of the image is provided by the motion of
the sensor.
In the linear pushbroom model, the sensor is assumed to move in
a straight line at a constant velocity with respect to the ground.
(IIT Kharagpur) Camera Models Jan ’10 30 / 52
31. Push Broom camera
In the direction of the sensor, the image is effectively a
perspective image.
In the direction of the sensor motion it is an orthographic
projection.
Like the general projective camera the mapping from the object
space to the image may be described with a 3 × 4 camera matrix.
The interpretation of the result changes.
Let X = (X, Y, Z, 1)T be an object point, and let
P be the camera matrix of the linear pushbroom
camera. Suppose that PX = (x, y , w)T . Then
the corresponding image point (represented as
an inhomogeneous 2-vector) is (x, y /w)T
(IIT Kharagpur) Camera Models Jan ’10 31 / 52
32. Cameras at infinity
A camera at infinity means that the camera center is at infinity.
The camera center is the 1-dimensional right null-space C of P,
i.e. PC = 0
−M−1 p4
Finite Camera: (M is not singular) C=
1
d
Camera at infinity: (M is singular) C= i.e. Md = 0
0
Md = 0 implies that M has a one dimen-
sional right null space d. Hence M is sin-
gular.
(IIT Kharagpur) Camera Models Jan ’10 32 / 52
33. Cameras at infinity
Affine Camera
An affine camera is one that has the camera matrix P in which the last
row P3T is of the form (0 0 0 1).
Points at infinity are mapped to points at infinity.
Non-Affine Camera
The 3 × 3 matrix M is singular.
(IIT Kharagpur) Camera Models Jan ’10 33 / 52
34. Smooth transition
Projective camera to Affine camera
Consider what
happens as we
apply a
cinematographic
technique of
"tracking back"
while
"zooming-in", in
such a way as to
keep objects of
interest the same
size.
(IIT Kharagpur) Camera Models Jan ’10 34 / 52
35. Projective to Affine Camera Model Transition
Tracking back implies that we are moving the camera centre away
from the object.
Zooming implies increasing the focal length.
We take the limit of the process of tracking back and zooming in
such that both the focal length and the distance of the camera
from the object go on increasing.
The initial camera model is:
1T
r
−r1T C
P0 = KR [ I | − C] = K r2T −r2T C
3T
3T C
r −r
where ri T is the i−th row of the rotation matrix R.
(IIT Kharagpur) Camera Models Jan ’10 35 / 52
36. Projective to Affine Camera Model Transition
1T
r
−r1T C
P0 = KR [ I | − C] = K r2T −r2T C
3T
3T C
r −r
The vector r3 gives the direction of the principal ray.
d 0 = −r3T C is the distance of the world origin from the camera
centre in the direction of the principal ray.
Start moving the camera back:
The camera centre is moved backwards along
the principal ray at unit speed for a time t so that
the centre of the camera is moved to C − tr3
Substitute for the updated centre in the camera matrix.
(IIT Kharagpur) Camera Models Jan ’10 36 / 52
37. Projective to Affine Camera Model Transition
1T
r
−r1T (C − tr3 )
2T
Pt = K r −r2T (C − tr3 )
3T
r −r3T (C − tr3 )
Terms ri T r3 are zero for i = 1, 2, because R is a rotation matrix.
1T
−r1T C
r
Pt = K r2T −r2T C
3T
r dt
The scalar d t = −r3T C + t is the depth of the world origin with
respect to the camera centre in the direction of principal ray r3 of
the camera.
(IIT Kharagpur) Camera Models Jan ’10 37 / 52
38. Projective to Affine Camera Model Transition
Effect of Tracking:
1T
r
−r1T C
P0 = K r2T −r2T C
3T
3T C
r −r
1T
−r1T C
r
Pt = K r2T −r2T C
3T
r dt
The effect of tracking along the principal ray is to replace the (3, 4)
entry of the matrix by the depth d t of the camera centre from the
world origin.
(IIT Kharagpur) Camera Models Jan ’10 38 / 52
39. Projective to Affine Camera Model Transition
Effect of Zooming:
The focal length is increased by a factor k . i.e. the calibration
matrix K is multiplied by diag(k , k , 1)
k
k
K = K
1
(IIT Kharagpur) Camera Models Jan ’10 39 / 52
40. Projective to Affine Camera Model Transition
Effect of TRACKING + ZOOMING:
The focal length is increased by a factor k = d t /d 0 so that the
image size remains fixed.
d t /d 0
1T
−r1T C
r
d t /d 0 r2T −r2T C
Pt = K
3T
1
r dt
1T
−r1T C
r
d t 2T
−r2T C
Pt = K r
d 0 d 0 3T
dt r d0
(IIT Kharagpur) Camera Models Jan ’10 40 / 52
41. Projective to Affine Camera Model Transition
Effect of TRACKING + ZOOMING:
1T
−r1T C
r
d t 2T
−r2T C
Pt = K r
d 0 d 0 3T
dt r d0
dt
The factor d0 can be ignored.
When t = 0 the camera matrix Pt is the same as P0 .
In the limit as d t tends to ∞, this matrix becomes
1T
−r1T C
r
= lim Pt = K r2T −r2T C
P∞
t→∞
T
0 d0
(IIT Kharagpur) Camera Models Jan ’10 41 / 52
42. Projective to Affine Camera Model Transition
Effect of TRACKING + ZOOMING:
1T
−r1T C
r
2T
−r2T C
P∞ = lim Pt = K r
t→∞
T
0 d0
This is a subcategory of affine camera:
The weak perspective camera.
(IIT Kharagpur) Camera Models Jan ’10 42 / 52
43. Error in employing an Affine Camera Model
Any point on the plane through the world origin and perpendicular
to the principal axis direction r3 can be
αr1 + βr2
X=
1
One can verify that P0 X = Pt X = P∞ X for all t
1T
−r1T C
1T
−r1T C
r r
d t 2T
−r2T C
P0 = K r2T −r2T C
Pt = K r
d0
d 0 3T
3T
r d0 r d0
dt
1T
−r1T C
r
2T
−r2T C
P∞ = K r
T
0 d0
(IIT Kharagpur) Camera Models Jan ’10 43 / 52
44. Error in employing an Affine Camera Model
One can verify that P0 X = Pt X = P∞ X for all t, since
r3T (αr1 + βr2 ) = 0 .
This means that the image of the point X is unchanged by
combined zooming and backward tracking.
For points not on this plane, the images under P0 and P∞ differ.
How much will be the Error?
(IIT Kharagpur) Camera Models Jan ’10 44 / 52
45. Error in employing an Affine Camera Model
Consider a point X which is at a perpendicular distance ∆ from
this plane.
αr1 + βr2 + ∆r3
X=
1
The point X is imaged by the cameras P0 and P∞ as:
˜
x
˜
x
˜
y ˜
= P∞ X = K y
xproj = P0 X = K
and x
affine
d0 + ∆ d0
where x = α − r1T C and y = β − r2T C
˜ ˜
(IIT Kharagpur) Camera Models Jan ’10 45 / 52
46. Error in employing an Affine Camera Model
˜
x
˜
x
˜
y ˜
= P∞ X = K y
xproj = P0 X = K
and xaffine
d0 + ∆ d0
where x = α − r1T C and y = β − r2T C
˜ ˜
Using the calibration matrix K
K2×2 ˜0
x
K= ˜T
0 1
K2×2 ˜ + (d 0 + ∆)˜0
x x K2×2 ˜ + d 0 ˜0
x x
xproj = and xaffine =
d0 + ∆ d0
(IIT Kharagpur) Camera Models Jan ’10 46 / 52
47. Error in employing an Affine Camera Model
K2×2 ˜ + (d 0 + ∆)˜0
x x K2×2 ˜ + d 0 ˜0
x x
xproj = and xaffine =
d0 + ∆ d0
After dehomogenizing the two points xproj and xaffine we have
K2×2 ˜
x
˜proj = ˜0 +
x x
d0 + ∆
K2×2 ˜
x
˜affine = ˜0 +
x x
d0
d 0 +∆
Error: xaffine − x0 =
˜ ˜ d0 xproj − x0
˜ ˜
(IIT Kharagpur) Camera Models Jan ’10 47 / 52
48. Error in employing an Affine Camera Model
d 0 +∆
xaffine − x0 =
˜ ˜ d0 xproj − x0
˜ ˜
The effect of
the affine approximation P∞ to the true camera matrix P0 is to move
the image of the point X radially towards or away from the principal
d +∆
point ˜0 by a factor equal to 0d 0
x
(IIT Kharagpur) Camera Models Jan ’10 48 / 52
49. Error in employing an Affine Camera Model
Rewriting the error as:
∆
xaffine − xproj =
˜ ˜ xproj − x0
˜ ˜
d0
The distance between the true perspective image position and the
position obtained using the affine camera approximations P∞ will
be small provided:
The depth relief (∆) is small
compared to the average depth
(d 0 ).
The distance of the point from
the principal ray is small.
(IIT Kharagpur) Camera Models Jan ’10 49 / 52
50. Error in employing an Affine Camera Model
Rewriting the error as:
∆
xaffine − xproj =
˜ ˜ xproj − x0
˜ ˜
d0
The distance between the true perspective image position and the
position obtained using the affine camera approximations P∞ will
be small provided:
The latter condition is satisfied by a
The depth relief (∆) is small
compared to the average depth small field of view.
(d 0 ). Images acquired using a lens with
a longer focal length tend to satisfy
The distance of the point from these conditions.
the principal ray is small.
(IIT Kharagpur) Camera Models Jan ’10 50 / 52
51. Error in employing an Affine Camera Model
For scenes at which there are many points at different depths, the
affine camera is not a good approximation.
If the scene contains close foreground as well as background
objects, the affine camera model should not be used.
(IIT Kharagpur) Camera Models Jan ’10 51 / 52
52. Conclusion
We have discussed several types of camera projection matrices.
In the most general form the camera matrix P has 11 degrees of
freedom.
CCD camera (non-uniform scale + skew) −→ 11
Non-CCD −→ 9
Orthographic projection −→ 5
Orthographic / Weak perspective (uniform scale) −→ 6
Orthographic projection (non-uniform scale) −→ 7
Affine projection (non-uniform scale + skew) −→ 8
Since we are bothered about working with simple models, we also
explored what happens when we use a simple affine camera
model (6 dof) instead of a general camera model (9 dof). Our
analysis of imaging errors indicate that affine camera can indeed
be used to approximate a projective camera under certain settings
of the scene.
(IIT Kharagpur) Camera Models Jan ’10 52 / 52
53. What Next?
We now understand and appreciate the linear model P for the
projective mapping from the 3-D scene to the camera image
plane.
Who will provide us with the linear model?
Most of the time we work with camera as a black-box given to us.
Thankfully we have access to the acquired image.
We also have some knowledge about the settings of the scene.
(IIT Kharagpur) Camera Models Jan ’10 53 / 52