Camera Matrix Estimation from Point Correspondences

C OMPUTER V ISION : C OMPUTATION OF C AMERA
M ATRIX

IIT Kharagpur

Computer Science and Engineering,
Indian Institute of Technology
Kharagpur.

(IIT Kharagpur) Camera Matrix Feb ’10 1 / 47

O UTLINE I

Computation of Camera matrix
Computation of camera matrix is known as resectioning.
We shall study numerical methods for estimating the camera
projection matrix from corresponding 3-space and image entities.
A 3D point X gets mapped to its image x under the unknown
camera mapping.
Given sufficiently many correspondences Xi ↔ xi the camera
matrix P can be identified.
P can also be determined from sufficiently many corresponding
world and image lines.


Given a number of point correspondences Xi ↔ xi we are required
to ﬁnd the 3 × 4 camera matrix P such that xi = PXi for all i.
This problem is similar to computing the 2D projective
transformation H.
For each correspondence Xi ↔ xi we derive the relation:
   1 
 0T
 −wi XiT yi XiT 
  P 

 2 
 wi XiT 0T −xi XiT  P =0

 
  
 
  
   3 

−yi XiT T
xi Xi 0T P
   

where each Pi T is a 4-vector, the i th row of P.
Since these equations are linearly dependent, we can choose only
the ﬁrst 2 equations.


 1 
T T T  P 
0 −wi Xi yi Xi 
 
 P2  = 0

wi XiT 0T −xi XiT

 
 3 
P
 

For a set of n point correspondences we obtain a 2n × 12 matrix A
by stacking up the equations for each correspondence.
The projection matrix P is computed by solving the set of
equations Ap = 0 where p is the vector containing the entries of
matrix P.


Minimal Solution Case 1
Since P has 12 entries, it has 11 dof, we need 11 equations to
solve for P.
Given 11 equations (from 6 point correspondences), the solution
is exact, i.e. the space points are projected exactly onto their
measured images.
Solution is obtained by solving Ap = 0, where A is 11 × 12 matrix
in this case.
In general A will have rank 11, and the solution vector p and the
solution vector p is the 1-dimensional right null-space of A.


Over-determined solution Case 2
Point measurements have noise.
Number of correspondences available n ≥ 6.
Exact solution to Ap = 0 is not possible.
A solution for P is obtained by minimizing an algebraic or
geometric error.
Minimize ||Ap|| subject to some normalization constraint.
1) ||p|| = 1
2) ||ˆ3 || = 1 where p3 is the vector
p ˆ
(p31 , p32 , p33 )T which are the ﬁrst 3 entries in the
last row of P.
The residual Ap is known as the algebraic error.


Objective: Gold Standard Algorithm: Stage 1

Given n ≥ 6 world to image point correspondences {Xi ↔ xi }, determine
the Maximum Likelihood estimate of the camera projection matrix P, i.e. P
which minimizes i d(xi , PXi )

Algorithm:
(i) Linear Solution: Compute an initial estimate of P using a linear algo-
rithm (DLT):

(a) Normalization: Use a similarity transformation T to normalize the
image points, and a second similarity transformation U to normalize the
space points.
Normalized image points are ˜i = Txi ,
x
˜
Normalized space points are Xi = UXi ,

(b) DLT: Form the 2n × 12 matrix A by stacking equations generated
by each correspondence xi ↔ Xi . Write p for the vectors containing the
entries of the matrix P. A solution of Ap = 0 is obtained from the unit
singular vector of A corresponding to the smallest singular value.


Data Normalization
Data normalization must be carried out before estimating the 2-D
homography.
Points xi on the image plane must be translated so that their
centroid is at the origin, and scaled so that their RMS
√
(root-mean-squared) distance from the origin is 2.

Normalization for 3D points
The centroid of the points is translated to the origin, and the
coordinates are scaled so that their RMS distance from the origin
√
is 3.
This approach is suitable for a compact distribution of points.


Data Normalization
Normalization for sparse 3D points
In the case of points at or near infinity in a plane, it is neither
reasonable nor feasible to normalize coordinates using the
isotropic (or non-isotropic) scaling schemes since the centroid and
scale are infinite or near infinite.
A method that seems to give good results is to normalize the set
of points x = (x i , y i , w i )T such that

xi = y i = 0; x2 + y2 = 2
i i w 2;
i
i i i i

x 2 + y 2 + w 2 = 1 ∀i
i i i


Using Line correspondences for computing P
A line in 3D may be represented by two points X0 and X1 through
which the line passes.
Suppose this line gets projected onto the image line l.
The plane formed by back-projecting from the image line l is equal
to P T l.
The condition that point Xj lies on this plane is then

lT PXj = 0 for j = 0, 1.

Each choice of j gives a single linear equation in the entries of the
matrix P. Two such equations are obtained for each 3D to 2D line
correspondence.
These constraints can also be used along with the constraints
obtained by point correspondences.


What is DLT trying to minimize?
DLT minimizes a mean square error: i d(xi , PXi )2
Suppose that
 
 Xi    
 xi   ˆ 
 xi 
 

 Y
 
Xi =  i
  
xi =  y i  ˆ  ˆ  ˆ x
PXi = w i  y i  = w i ˆi
 
    

 Z 
 
 
  
 i     
1 1

 
    
1
 


Consider Homography estimation problem....
We are given (measured) point correspondences xi ↔ xi
     
 xi 
   xi 
   ˆ 
 xi 
xi =  y i  xi =  y i   ˆ 
Hxi = î =  y i 
     











 x 
 
 ˆ 
1 wi wi
    

DLT minimizes the term: xi × î = 0, formulated as Ai h
x

ˆ ˆ
yi wi − wi yi
xi × î =
x = Ai h = i error
ˆ ˆ
wi xi − xi wi

The norm || i || gives the algebraic error.


The algebraic distance for point correspondences xi ↔ xi under
homography mapping H is

dalgebraic (xi , ˆi )2 = (y i w i − w i y i )2 + (w i x i − x i w i )2
x ˆ ˆ ˆ ˆ

The geometric distance is given as:
2 2
 
 x ˆ
x yiˆ
y
dgeometric (xi , ˆi )2 =  i − i − i
 

x 
 w + 

ˆ
wi wi wi
ˆ
 

i

ˆ
w i w i dgeometric = dalgebraic


The algebraic distance is related to but not the same as geometric
distance.
ˆ
If w i = w i = 1, then the two distances are identical.
For affine 2-D homographies, the value of w i will always be 1.
ˆ
 
 h11 h12 h13 
 
H A =  h21 h22 h23  ˆ
 





 î = H A xi then w i = 1 if w i = 1
x
0 0 1
 

For affine homographies, geometric distances and algebraic distances
are identical. Hence geometric distances can be minimized by the linear
DLT algorithm based on algebraic distance.



Depth of Points
We next consider what is the 3D depth of the points acquired
using a camera projection P.


Depth of Points
Consider the camera matrix P projects a point X in 3-space to
image point x.

˜
C
P = [M | p4 ] C= PC = 0
1
 1T   1T 
 m
 p1   P
  

  2T
P = [M | p4 ] =  m2T p2
 = P 


  
  

 3T   3T 
m 1 P
   
 
 X   

 Y 

˜  x 
X
   
x = PX = w  y  = P3T X
   
X=  Z =
 
  
  1 
 

1

 
  
1
 

what is w?


Depth of Points What is w?

w = P3T X = P3T (X − C) since PC = 0
3T ˜ ˜3T
w = P (X − C) = m (X − C)
m3 is the principal ray direction
˜ ˜
w = m3T (X − C) can be interpreted as the dot product of the ray
from the camera centre C to the point X, with the principal ray
direction.

If the camera matrix is normalized so that detM > 0
and ||m3 || = 1, then m3 is a unit vector pointing in the
positive axial direction.



˜ ˜
w = m3T (X − C) can be interpreted as the dot product of the ray
from the camera centre C to the point X, with the principal ray
direction.
w can be interpreted as the depth of the point X from the camera
centre C in the direction of the principal ray.


The interpretation of w as the depth assumes that the camera has
been normalized by multiplying it with an appropriate factor.
It is also possible to compute the depth of a point X without having
to normalize the camera matrix:

Let X = (X, Y, Z, T)T be a 3D point and P = [M | p4 ] be a
camera matrix for a ﬁnite camera. Suppose P(X, Y, Z, T)T =
w(x, , 1)T . Then

sign(detM)w
depth(X; P) =
T||m3 ||
is the depth of the point X in front of the principal plane of
the camera.


Depth of a point X with respect to a camera with projection P is

sign(detM)w
depth(X; P) =
T||m3 ||
This gives us w
depth(X; P) T||m3 ||
w=
sign(detM)
If (detM) > 0, T = 1, and ||m3 || = 1, then

w = depth(X; P)

Thus the value of w can be interpreted as the depth of the point X
form the camera in the direction along the principal ray, provided
the camera is normalized so that ||m3 || = 1.


DLT minimizes a mean square error: i d(xi , PXi )2
Suppose that
 
 Xi    
 xi   ˆ 
 xi 
 

 Y
 
Xi =  i
  
xi =  y i  ˆ  ˆ  ˆ x
PXi = w i  y i  = w i î
 
    

 Z 
 
 
  
 i     
1 1

 
    
1
 

Minimizing the algebraic error

d alg (xi , PXi )2 d alg (xi , w i î )2
ˆ x (w i d geom (xi , î ))2
ˆ x
i i i

What is the geometric significance of w i d geom (xi , î ) ?
ˆ x



ˆ
wd = f∆
Algebraic error i d alg (xi , PXi )2 being minimized is:

(w i d geom (xi , ˆi ))2
ˆ x −→ f2 d geom (Xi , Xi )2
i i



Error term f2 i d geom (Xi , Xi )2 can be interpreted as the geometric
error.
The distance d geom (Xi , Xi ) is the correction that needs to be made
to the measured 3D points in order to correspond precisely with
the measured image points xi .



The correction d geom (Xi , Xi ) must be made in the direction
perpendicular to the principal axis of the camera.
ˆ
The point X is not the closest point Xi to Xi that maps to xi .
i
For points Xi not too far from the principal ray of the camera, the
distance d geom (Xi , Xi ) is a reasonable approximation to the
distance d geom (Xi , Xi ).
ˆ
For points farther away from the principal ray, the distance
d geom (Xi , Xi ) will be slightly larger than d geom (Xi , Xi ).
ˆ
DLT will also tend to minimize the focal length f when it minimizes
f i d geom (Xi , Xi )2 .



By minimizing ||Ap|| subject to the constraint
||ˆ 3 = 1||, the solution obtained is trying to
p
minimize the 3D geometric distances.
The interpretation of minimizing geometric
distances is not affected by similarity
transformations (e.g. translation, scaling etc.) in
either 3D space or the image space.


Objective: Gold Standard Algorithm Stage 2

Given n ≥ 6 world to image point correspondences {Xi ↔ xi }, determine
the Maximum Likelihood estimate of the camera projection matrix P, i.e. P
which minimizes i d(xi , PXi )

Algorithm:
(i) Linear Solution: Compute an initial estimate of P using a linear algo-
rithm (DLT) as given in previous slide.

(ii) Minimize Geometric Error Using the linear estimate as the starting
point minimize the geometric error:

x ˜˜
d(˜i , PXi )2 d(˜i , ˆi )2
x x
i i

˜
over P using an iterative algorithm such as Levenberg-Marquardt.

(iii) Denormalization: The camera matrix for the original (unnormalized)
˜ ˜
coordinates is obtained from P as: P = T−1 PU


Geometric Error
Error only in image measurements
If the world points are known accurately, then the measurement
errors are possible only in the image measurements.
The geometric error in the image is:

d(xi , ˆi )2
x
i

where xi is the measured point and ˆi is the point PXi which is the
x
exact image of Xi under P.


Geometric Error
Error in the world points
If the world points are not known accurately, then we may choose
to estimate P by minimizing a 3D geometric error, or an image
geometric error, or both.
The 3D geometric error for the world points:

d(Xi , Xi )2
ˆ
i

ˆ
where Xi is the closest point in space that maps onto xi via
ˆi
xi = P X


Geometric Error
Error in both world points and image points
We minimize a weighted sum of world and image errors.
The weights are chosen to reﬂect the relative accuracy of
measurements of the image and 3D points.
Image and world points are typically measured in different units.

γ d(Xi , Xi )2 +
ˆ ξ d(xi , ˆi )2
x
i


Estimation of an afﬁne camera
T
An afﬁne camera has P with the last row P3 = (0, 0, 0, 1)T .
 
 Xi   
   xi 
 Yi 

   
xi =  y i  xi × PXi = 0
  
Xi = 
 

 Zi 
 






1

 
  
1
 

   1 
 0T
 −w i XiT yi XiT 
  P 

 2 
 w XT 0T −xi XiT
   P =0
Ap =  i i










 3 

T
−yi Xi T
xi Xi 0T P
   

T
Substituting for values: P3 = (0, 0, 0, 1)T and w i = 1

0T −w i XiT P1 yi
Ap = + =0
w i XiT 0T P2 −x i


Estimation of an affine camera
T
An affine camera has P with the last row P3 = (0, 0, 0, 1)T .

0T −w i XiT P1 yi
Ap = + =0
T
w i Xi 0T P2 −x i

Considering all point correspondences Xi ↔ xi
2 2
d alg (xi , î )2 = ||Ap||2 =
x x i − P1T Xi + y i − P2T Xi
i i

For affine cameras, the algebraic error and geometric error are the
same:
d alg (xi , î ) = d geom (xi , î )
x x
Geometric image distances can be minimized by a linear
algorithm.


Objective: Gold Standard Algorithm: Afﬁne Camera

Given n ≥ 4 world to image point correspondences {Xi ↔ xi }, deter-
mine the Maximum Likelihood Estimate of the afﬁne camera projection
matrix P A , i.e. P which minimizes i d(xi , PXi ) subject to constraint
P3T = (0, 0, 0, 1)

Algorithm:
(i) Normalization: Normalized image points are ˜i = Txi , Normalized
x
˜
space points are Xi = UXi ,
(ii) DLT: Form the 2n × 8 matrix A8 by stacking equations generated
˜
by each correspondence ˜i ↔ Xi . Write p8 for the vectors containing the
x
entries of the matrix P.

 Xi 0T   P1 
 T   
 ˜   ˜ 
  2  = xi ˜
A8 p8 = b  T  

˜
˜   ˜  yi
 T
0 Xi P


(iii) Solve: A solution of A8 p8 = b is obtained by taking the pseudo
inverse of A8 to give p8 = A+ b and P3T = (0, 0, 0, 1)
8
˜
(iv) Denormalization: P A = T−1 P A U


Restricted Camera Estimation
 αx
 
 s x0 

P = K[R | − RC] αy y0
 
K=







1
 

Find the best-ﬁt camera P subject to restrictive conditions on the
camera parameters.
The skew s is zero.
The pixels are square αx = αy
The principal point (x 0 , y 0 ) is known.
The complete camera calibration matrix K is known.

In some cases it is possible to estimate a restricted camera matrix with
a linear algorithm.
A restricted camera can be solved by minimizing either geometric or
algebraic error.


Minimizing Geometric Error Restricted Camera
Geometric error can be minimized with respect to the set of
parameters using iterative minimization like Levenberg-Marquardt.

If we want to minimize only the image errors, then the LM
minimization is minimizing a function f : R9 → R2n .

If we want to minimize both the image errors (2D), and space
point errors (3D) then the LM minimization is minimizing a function
f : R3n+9 → R5n since the 3D points must be included among the
measurements and minimization also includes estimation of the
true positions of the 3D points.


Minimizing Geometric Error Restricted Camera
Use a linear algorithm such as DLT to ﬁnd an initial camera matrix.
Formulate the cost function for minimizing geometric error.
Assume that our constraints are s = 0 and αx = αy
Enforce soft constraints by adding extra terms to the cost function.

d geom (xi , PXi )2 + ws2 + w(αx − αy )2
i

The weights begin with low values and are increased at each
iteration of the estimation procedure.
The values of s and aspect ratio are drawn gently to their desired
values.
Finally these values may be clamped to their desired values for a
ﬁnal estimation.


Minimizing Algebraic Error Restricted Camera
Minimizing algebraic error is equivalent to minimizing ||Ap||.
In the case of a restricted camera, we estimate only a subset of
parameters q
We have the map p = g(q)
Thus we minimize ||Ag(q)||
The minimization can be done using DLT.
The minimization can also be done using Levenberg-Marquardt ⇒
the minimization function f = ||Ag(q)||. Clearly f : R9 → R2n since
there are 2n constraints.

Is it possible to reduce the size of the minimization function f ?


The 2n × 12 matrix A may have very large number of rows.
ˆ
It is possible to replace A by a square 12 × 12 matrix A such that

ˆT ˆ ˆ
||Ap|| = pT A T A p = pT A A p = ||Ap||
ˆ
Matrix A is called as the reduced measurement matrix.

A = UDV T ˆT ˆ
A T A = (VDU T )(UDV T ) = (VD)(DV T ) = A A
ˆ ˆ
If we deﬁne A = DV T then minimizing ||Ap|| is equivalent to
minimizing ||Ap||.
When using Levenberg-Marquardt, the minimization function
ˆ ˆ ˆ
f = ||Ap||. Hence f : R9 → R12 i.e. q → Ap or q → Ag(q)
ˆ
The minimization problem ||Ap|| is independent of the number n of
point correspondences.


Summary:
Given a set of n correspondences Xi ↔ xi , the problem of
ﬁnding a constrained camera matrix P that minimizes sum
of algebraic distances i d alg (xi , PXi )2 reduces to the min-
imization of a function f : R9 → R12 independent of n.


Radial Distortion

Short focal length Long focal length


Radial Distortion
For real lenses, the pin-hole camera assumption does not hold.
Because of radial distortion, straight lines do not map to straight
lines.
The error is more signiﬁcant as the focal length of the lens
decreases. Lenses which do not have radial distortion are very
costly.
A camera with a lens is not a linear device.


Radial Distortion

The cure for this distortion is to correct the image measurements
to those that would have been obtained under a perfect linear
camera action.
Radial distortion is measured with respect to the centre for radial
distortion.


Radial Distortion
Suppose a 3D point X projects to an image location (x , y ) according to
˜ ˜
linear projection.
(x , y ) is the ideal (correct) image point position
˜ ˜
(x d , y d ) is the actual image point position after radial distortion.

˜ is the radial distance x 2 + y 2 from the centre for radial
r ˜ ˜
distortion.
L(˜) is a distortion factor, which is a function of the radius ˜.
r r

The radial (lens) distortion is modeled as:

xd ˜
x
= L(˜)
r
yd ˜
y


Radial Distortion Correction of distortion
xd ˜
x
= L(˜)
r
yd ˜
y

In pixel coordinates the correction is written as:

ˆ
x = x c + L(r )(x − x c ) ˆ
y = y c + L(r )(y − y c )

(x , y ) are the corrected coordinates.
ˆ ˆ
(x c , y c ) is the centre of radial distortion.
r 2 = (x − x c )2 + (y − y c )2 (assuming aspect ratio as unity)

The corrected points (x , y ) are related to the coordinates of the 3D
ˆ ˆ
world point by a linear projective camera.



Choice of the distortion function
L(r ) is deﬁned for positive values of r .
L(0) = 1
An arbitrary function L(r ) can be approximated as:

L(r ) = 1 + κ1 r + κ2 r 2 + κ3 r 3 + . . .

The coefﬁcients of radial correction {κ1 , κ2 , κ3 , . . . , x c , y c } are
considered part of the interior calibration of the camera.
The principal point is often used as centre for radial distortion,
though these need not coincide exactly.



Estimation of distortion function Approach 1:
Approach 1: The distortion function may be included as part of the
imaging process, and the parameters {κ1 , κ2 , κ3 , . . . , x c , y c }
computed together with P during the iterative minimization of the
geometric error.



Estimation of distortion function Approach 2:
Approach 2: A straight scene line should be imaged as straight
line.
A cost function is deﬁned on the imaged lines after the corrective
mapping by L(r ). e.g. the distance between the line joining the
imaged line’s ends and its mid-point.
The cost function is iteratively minimized over the parameters
{κ1 , κ2 , κ3 , . . . , x c , y c }.


Image obtained after correcting for the radial distortion.


Camera Matrix Estimation from Point Correspondences

Recommended

Recommended

More Related Content

Similar to Camera Matrix Estimation from Point Correspondences

Similar to Camera Matrix Estimation from Point Correspondences (20)

More from Krishna Karri

More from Krishna Karri (14)

Camera Matrix Estimation from Point Correspondences