3d vision.pptxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Use of 3D vision
• Shape from X
• Shape from X is a generic name for techniques that aim
to extract shape from intensity images and other cues
such as focus.
• Some of these methods estimate local surface orientation
(e.g., surface normal) rather than absolute depth.
• Shape from motion

• 3D vision tasks
1 Marr’s theory
2 Other vision paradigms: Active and purposive vision
• Basics of projective geometry
1 Points and hyperplanes in projective space
2 Homography
3 Estimating homography from point correspondences

• Scene reconstruction from multiple views
1 Triangulation
2 Projective reconstruction
3 Matching constraints
4 Bundle adjustment
5 Upgrading the projective reconstruction, self-calibration

• Shape from X
1 Shape from motion
2 Shape from texture
3 Other shape from X techniques

• Full 3D objects
1 3D objects, models, and related issues
2 Line labeling
3 Volumetric representation, direct measurements
4 Volumetric modeling strategies
5 Surface modeling strategies
6 Registering surface patches and their fusion to get a full
3D model

• 2D view-based representations of a 3D scene
1 Viewing space
2 Multi-view representations and aspect graphs
• 3D reconstruction from an unorganized set of 2D
views, and Structure from Motion

There are many serious reasons why 3D vision using intensity
images as input is regarded as difficult.
1.The imaging system of a camera and the human eye
performs perspective projection, which leads to considerable
loss of information.
2. The relationship between image intensity and the 3D
geometry of the corresponding scene point is very
complicated.
3.Mutual occlusion of objects in the scene, and even self-
occlusion of one object,further complicates the vision task.
4.Noise in images, and the high time complexity of many
algorithms, contributes further to the problem, although this is
not specific to 3D vision.

• Marr [Marr, 1982] defines 3D vision as ‘From an image (or
a series of images) of a scene, derive an accurate three-
dimensional geometric description of the scene and
quantitatively determine the properties of the object in the
scene’.

Marr’s theory
• Marr proposed that a computer vision system was just an example of an
information processing device that could be understood at three levels:
1. Computational theory. The theory describes what the device is
supposed to do what information it provides from other information
provided as input. It should also describe the logic of the strategy that
performs this task.
2. Representation and algorithm. These address precisely how the
computation may be carried out in particular, information representations
and algorithms to manipulate them.
3. Implementation. The physical realization of the algorithm specifically,
programs and hardware.

• Having derived some such description, it is then
necessary to remove the dependence on the vantage
point and to transform the description into an object-
centered one.

• The requirement, then, is to move from pixels to surface
delineation, then to surface characteristic description
(orientation), then to a full 3D description. These
transformations are effected by moving from the 2D image
to a primal sketch, then to a 2.5D sketch, and thence to a
full 3D representation.

The primal sketch
• The primal sketch aims to capture, in as general a way
as possible, the significant intensity changes in an image.
Hitherto, such changes have been referred to as ‘edges’,
• but Marr makes the observation that this word implies a
physical meaning that cannot be inferred at this stage

The 2.5D sketch
• The 2.5D sketch reconstructs the relative distances from
the viewer of surfaces detected in the scene, and may be
called a depth map.

The 3D representation
• At this stage the Marr paradigm overlaps with top-down,
model-based approaches. It is required to take the
evidence derived so far and identify objects within it. This
can only be achieved with some knowledge about what
‘objects’ are, and, consequently, som means of describing
them. The important point is that this is a transition to an
object centered coordinate system, allowing object
descriptions to be viewer independent.

The Marr paradigm advocates a set of relatively independent
modules; the low-level modules aim to recover a meaningful
description of the input intensity image, the middle-level
modules use different cues such as intensity changes,
contours, texture, motion to recover shape or location in space.
The Marr paradigm is a nice theoretic framework, but
unfortunately does not lead to successful vision applications
performing, e.g., recognition and navigation tasks.
It was shown later that most low-level and middle-level tasks
are ill-posed, with no unique solution.
One popular way developed in the eighties to make the task
well-posed is regularization. A constraint requiring continuity
and smoothness of the solution is often added.

Other vision paradigms: Active and purposive vision
• When consistent geometric information has to be explicitly modeled (as
for manipulation of the object), an object-centered co-ordinate system
seems to be appropriate.
• Two schools are trying to explain the vision mechanism:
 The first and older one tries to use explicit metric information in the
early stages of the visual task (lines, curvatures, normals, etc.).
 Geometry is typically extracted in a bottom-up fashion without any
information about the purpose of this representation.
 The output is a geometric model.
• The second and younger school does not extract metric (geometric)
information from visual data until needed for a specific task.

• A database or collection of intrinsic images (or views) is the model.
• Many traditional computer vision systems and theories capture data
with cameras with fixed characteristics while active perception and
purposive vision may be appropriate.
• Active vision system ... characteristics of the data acquisition are
dynamically controlled by the scene interpretation.
• Many visual tasks tend to be simpler if the observer is active and
controls its visual sensors.
• The controlled eye (or camera) movement is an example.
• If there is not enough data to interpret the scene the camera can look at
it from other viewpoint.
• Active vision is an intelligent data acquisition controlled by the
measured, partially interpreted scene parameters and their errors from
the scene.

• The active approach can make most ill-posed vision
tasks tractable.

• There is no established theory that provides a mathematical
(computational) model explaining the understanding aspects of
human vision.
• Two recent developments towards new vision theory are:
• Qualitative vision
 that looks for a qualitative description of objects or scenes.
 The motivation is not to represent geometry that is not needed
for qualitative (non-geometric) tasks or decisions.
 Qualitative information is more invariant to various unwanted
transformations (e.g. slightly differing viewpoints) or noise than
quantitative ones.
 Qualitativeness (or invariance) enables interpretation of observed
events at several levels of complexity

• Purposive paradigm
 The key question is to identify the goal of the task, the
motivation being to ease the task by making explicit just that
piece of information that is needed.
 Collision avoidance for autonomous vehicle navigation is an
example where precise shape description is not needed.
 The approach may be heterogeneous and a qualitative
answer may be sufficient in some cases.
 The paradigm does not yet have a solid theoretical basis, but
the study of biological vision is a rich source of inspiration

55:148 Digital Image Processing
Chapter 11
3D Vision, Geometry
Topics:
Basics of projective geometry
Points and hyperplanes in projective space
Homography
Estimating homography from point correspondence

Basics of projective geometry
Single or multiple view geometry deals with mathematics of relation between
• 3D geometric features (points, lines, corners) in the scene
• their camera projections
• relations among multiple camera projections of a 3D scene
Points and hyperplanes in projective space
Scene: (𝒅 + 𝟏)-dimensional space excluding the origin, i.e., ℜ𝒅+𝟏 − 𝟎
Why origin is excluded?
Origin ≈ pinhole ≈ optical center
An equivalence relation “≅” is defined as follows:
𝒙𝟏, … , 𝒙𝒅+𝟏
𝐓 ≅ 𝒙𝟏
′
, … , 𝒙𝒅+𝟏
′ 𝐓
𝐢𝐟𝐟 ∃ 𝜶 ≠ 𝟎 𝐬. 𝐭. 𝒙𝟏, … , 𝒙𝒅+𝟏
𝐓 = 𝜶 𝒙𝟏
′
, … , 𝒙𝒅+𝟏
′ 𝐓

The area developed from photogrammetry, which measures 3D distances from
photographs.
The mathematical vehicle for multiple view geometry is projective geometry.
We require to study perspective projection (called also central projection),
which describes image formation by a pinhole camera or a thin lens.

Projective space: a 𝓟𝒅is the quotient space of this equivalence relation. It can be
imagined as the set of all lines in R^d+1 passing through the origin

Perspective projection of parallel lines

Homogeneous points
Each equivalent class of the relation “≅” generates an open line from the origin.
Note that the origin is not included in any of these lines and thus the disjoin
property of equivalent classes is satisfied
For each line or equivalent class, exactly one point is projected in the acquired
image and is the point where the projective hyperplane intersects the line.
These points in the projective space are referred to a homogeneous points.
What is the property of homogenous points?
Homogeneous points are coplanar lying on the projection plane.
For simplicity, let us assume that our projection plane is 𝒛 = 𝟏

Homogeneous points
Note that homogeneous points form the image hyperplane.
Thus, to determine the perspective projection of a scene point, we need to
determine corresponding homogeneous point
𝒙𝟏, … , 𝒙𝒅+𝟏
𝐓
𝑷
𝒙𝟏
′
, … , 𝒙𝒅+𝟏
′
= 𝟏 𝐓,
where 𝒙𝒊 = 𝜶𝒙𝒊
′
| 𝜶: 𝐜𝐨𝐧𝐬𝐭𝐚𝐧𝐭.
Note that the points 𝒙𝟏, … , 𝒙𝒅, 𝟎 𝐓
do not have an Euclidean counterpart
• Consider the limiting case 𝒙𝟏, … , 𝒙𝒅, 𝜶 𝐓
that is projectively equivalent to
𝒙𝟏/𝜶, … , 𝒙𝒅/𝜶, 𝟏 𝐓
, and assume that 𝜶 𝟎.
• This corresponds to a point on the projective hyperplane 𝓟𝒅 going to infinity
in the direction of the radius vector 𝒙𝟏, … , 𝒙𝒅, 𝟎 𝐓

Properties of projection
A line in the scene space through (but
not including) the origin is mapped
onto a point in the projective plane
A plane in the scene space through
the origin (but not including) is
mapped to a line on the projection
plane

Homography
Homography ≈ Collineation ≈ Projective
transformation
is a mapping from one projection plane to
another projection plane for the same
𝒅 + 𝟏 -dimensional scene and the common
origin
𝓟𝒅
𝑯
𝓟𝒅.
Also, expressed as
𝐮′
≅ 𝑯𝐮,
where 𝑯 is a 𝒅 + 𝟏 × 𝒅 + 𝟏 matrix.
Property:
Any three collinear points in 𝓟𝒅
remain
collinear in 𝓟𝒅
Prove!
Satisfies cross ratio property (see the
figure)

Matrix formulation for Homography
𝜶
𝒖′
𝒗′
𝟏
=
𝒉𝟏𝟏 𝒉𝟏𝟐 𝒉𝟏𝟑
𝒉𝟐𝟏 𝒉𝟐𝟐 𝒉𝟐𝟑
𝒉𝟑𝟏 𝒉𝟑𝟐 𝒉𝟑𝟑
𝒖
𝒗
𝟏
The scale factor 𝜶 ≠ 𝟎 and 𝐝𝐞𝐭 𝑯 ≠0; otherwise everything is mapped onto a
single point.
Eliminating the scale factor 𝜶, we get
𝒖′ =
𝒉𝟏𝟏𝒖+𝒉𝟏𝟐𝒗+𝒉𝟏𝟑
𝒉𝟑𝟏𝒖+𝒉𝟑𝟐𝒗+𝒉𝟑𝟑
and 𝒗′ =
𝒉𝟐𝟏𝒖+𝒉𝟐𝟐𝒗+𝒉𝟐𝟑
𝒉𝟑𝟏𝒖+𝒉𝟑𝟐𝒗+𝒉𝟑𝟑

Various linear transformations

Sub groups of homographys
Any homography can be uniquely decomposed as
𝑯 = 𝑯𝑷𝑯𝑨𝑯𝑺
where
𝑯𝑷 = 𝑰 𝟎
𝐚𝐓
𝒃
, 𝑯𝑨 = 𝑲 𝟎
𝟎𝐓
𝟏
, 𝑯𝑺 =
𝑹 −𝑹𝐭
𝟎𝐓
𝟏

Estimating homography from point correspondence
Given a set of orders pairs of points 𝒖𝒊, 𝒖𝒊
′
𝒊=𝟏
𝒎
To solve the homogeneous system of linear equations
𝜶𝒊𝒖𝒊
′
= 𝑯𝒖𝒊, 𝒊 = 𝟏, … , 𝒎
for 𝑯 and 𝜶𝒊.
Number of equations : 𝒎(𝒅 + 𝟏)
Number of unknowns: 𝒎 + 𝒅 + 𝟏 𝟐
− 𝟏
Degenerative configuration, i.e., 𝑯 may not be uniquely solved even if 𝒎 ≥ 𝐝 + 𝟐
and caused when 𝒅 or more points are coplanar
Correspondence of more than sufficient points lead to the notion of optimal
fitting reducing the effect of noise

Maximum likelihood estimation
𝒖𝒊, 𝒗𝒊
𝐓
and 𝒖𝒊
′
, 𝒗𝒊
′ 𝐓
| 𝒊 = 𝟏, … , 𝒎 are identified corresponding points in two different
projection planes
Principle: Find the homography (i.e., the transformation matrix 𝑯) that
maximizes the likelihood mapping of the points 𝒖𝒊, 𝒗𝒊
𝐓 on the first plane to
𝒖𝒊
′
, 𝒗𝒊
′ 𝐓
on to the second plane
Model:
Ideal points are in the vicinity of the identified points, i.e., there noise in the
process of locating the points 𝒖𝒊, 𝒗𝒊
𝐓 and 𝒖𝒊
′
, 𝒗𝒊
′ 𝐓
Method to solve the problem
• Determine the ML function using Gaussian model
• It contains several multiplicative terms
• Take log → multiplications are converted to addition
• Remove the minus sign (see the Gaussian expression)
• Maximization is converted to a minimization term

Final expression for maximum likelihood estimation
min
𝒉,𝒖𝒊,𝒗𝒊
𝒊=𝟏
𝒎 𝒖𝒊 − 𝒖𝒊
𝟐
+ 𝒗𝒊 − 𝒗𝒊
𝟐
+
𝒉𝟏𝟏𝒖𝒊 + 𝒉𝟏𝟐𝒗𝒊 + 𝒉𝟏𝟑
𝒉𝟑𝟏𝒖𝒊 + 𝒉𝟑𝟐𝒗𝒊 + 𝒉𝟑𝟑
− 𝒖𝒊
′
𝟐
+
𝒉𝟐𝟏𝒖𝒊 + 𝒉𝟐𝟐𝒗𝒊 + 𝒉𝟐𝟑
𝒉𝟑𝟏𝒖𝒊 + 𝒉𝟑𝟐𝒗𝒊 + 𝒉𝟑𝟑
− 𝒗𝒊
′
𝟐

Scene reconstruction from multiple views
• Triangulation

Matching constraints
• Matching constraints are relations satisfied by collections
of corresponding image points in n views. They have the
property that a multilinear function of homogeneous
image coordinates must vanish; the coefficients of these
functions form multiview tensors.

Bundle adjustment
• The non-linear least squares specialized for this task is
known from photogrammetry as bundle adjustment.

Upgrading the projective reconstruction, self-
calibration
• There are several kinds of additional knowledge,
permitting the projective ambiguity to be refined to an
affine, similarity, or Euclidean one. Methods that use
additional knowledge to compute a similarity
reconstruction instead of mere projective one are also
known as self-calibration because this is in fact
equivalent to finding intrinsic camera parameters

• Self-calibration methods can be divided into two groups:
constraints on the cameras and constraints on the
scene.

Shape from X
• Shape from X is a generic name for techniques that aim
to extract shape from intensity images and other cues
such as focus.
• Some of these methods estimate local surface orientation
(e.g., surface normal) rather than absolute depth.
• Shape may be extracted from motion, optical flow,
texture, focus/de-focus,vergence, and contour.
• Each of these techniques may be used to derive a 2.5D
sketch for Marr’s visiontheory; they are also of practical
use on their own.

Shape from motion
• Motion is a primary property exploited by human
observers of the 3D world.
• The real world we see is dynamic in many respects, and
the relative movement of objects in view, their translation
and rotation relative to the observer, the motion of the
observer relative to other static and moving objects all
provide very strong clues to shape and depth.

• 3D information from moving scenes can be done as a
two-phase process:
1. Finding correspondences or calculating the nature of
the flow is a lower-level phase that operates on pixel arrays.
2. The shape extraction phase follows as a separate,
higher-level process. This phase is examined here.

Rigidity, and the structure from motion theorem
• Ullman’s success in this area was based on the psycho-physical observation that the human
visual system seems to assume that objects are rigid.
• This rigidity constraint prompted the proof of an elegant structure from motion theorem
saying that three orthographic projections of four non-co-planar points have a unique 3D
interpretation as belonging to one rigid body.
• First note that the body’s motion may be decomposed into translational and rotational
movement; the former gives the movement of a fixed point with respect to the observer, and
the latter relative rotation of the body (for example, about the chosen fixed point).
• Ullman’s result is the best possible in the sense that unique reconstruction of a rigid
body cannot be guaranteed with fewer than three projections of four points, or with
three projections of fewer than four points. It should also be remembered that the
result refers to orthographic projection when in general image projections are
perspective, as far as it is recognizable, is easy to identify.

Full 3D objects
• Volumetric modeling strategies include constructive solid geometry, super_x0002_quadrics
and generalized cylinders.
• Surface modeling strategies include boundary representations, triangulated surfaces, and
quadric patches.
• Line labeling is an outmoded but accessible technique for reconstructing objects with planar
faces.
• Transitions to 3D objects need a co-ordinate system that is object centered.
• 3D objects may be measured mechanically by computed tomography, by range finders or by
shape from motion techniques.

3D model-based vision
• To create a full 3D model from a set of range images, the
surfaces must first be registered rotations and translations
should be found that match one surface to another.
• Model-based vision uses a priori knowledge about an
object to ease its recognition.
• Techniques exist to locate curved objects from range
images.

2D view-based representations of a 3D scene
• 2D view-based representations of 3D scenes may be
achieved with multi-view representations.
• It is possible to select a few stored reference images, and
render any view from them.
• Interpolation of views is not enough and view extrapolation is
needed. This requires knowledge of geometry, and the view-
based approach does not differ significantly from 3D
geometry reconstruction.
• It is possible to perform a 3D reconstruction from an
unorganized set of 2D views. This approach has been used
widely recently by, e.g., Google StreetView.

Reconstructing scene geometry
• Large scale scene features such as plane parameters
may be recaptured from properties of known objects such
as straight lines and approximate size.
• Well known geometric results identify vanishing points
and ground orientation.
• Similar approaches may well work even if large scale
clues are unavailable.

Shape from optical flow
• In a continuous sequence, we are therefore interested in
the apparent movement of each pixel (x, y) which is given
by the optical flow field (dx/dt, dy/dt).
Determining shape from optical flow is mathematically non-
trivial, and here an early simplification of the subject is
presented as an illustration [Clocksin, 1980]. The simpli-
fication is in two parts:

• Motion is due to the observer travelling in a straight line
through a static landscape.Without loss of generality, suppose
the motion is in the direction of the z axis of a viewer-centered
co-ordinate system (i.e., the observer is positioned at the origin).
• Rather than being projected onto a 2D plane, the image is seen
on the surface of a unit sphere, centered at the observer (a
‘spherical retina’). Points in 3D are represented in spherical polar
rather than Cartesian co-ordinates—spherical polar co-ordinates
(r, θ, ϕ) (see Figure 12.1) are related to (x, y, z) by the equations

Shape from texture
• The angle at which the surface is seen would cause a
(perspective) distortion of the texture primitive (texel), and
the relative size of the primitives would vary according to
distance from the observer.

• Considering a textured surface patterned with identical
texels which have been recovered by lower-level
processing, note that with respect to a viewer it has three
properties at any point projected onto a retinal image:
distance from the observer, slant; the angle at which the
surface is sloping away from the viewer (the angle between
the surface normal and the line of sight); and tilt, the
direction in which the slant takes place.
Attempts to re-capture some of this information is based on
the texture gradient—that is, the direction of maximum rate
of change of the perceived size of the texels, and a scalar
measurement of this rate.

• Texture is usually used as an additional or complementary
feature, augmenting another, stronger clue in shape
extraction.

Other shape from X techniques
• Shape from focus/de-focus techniques are based on the
fact that lenses have finite depth of field, and only objects at
the correct distance are in focus; others are blurred in
proportion to their distance.
• Two main approaches can be distinguished:
• Shape from focus measures depth in one location in an
active manner; this technique is used in 3D measuring
machines in mechanical engineering. The object to be
measured is fixed on a motorized table that moves along x,
y, z axes.

• Shape from de-focus typically estimates depth using two
input images captured at different depths. The relative
depth of the whole scene can be reconstructed from
image blur. The image is modeled as a convolution of the
image with a proper point spread function the function is
either known from capturing setup parameters or
estimated.
• Shape from vergence uses two cameras fixed on a
common rod. Using two servo_x0002_mechanisms, the
cameras can change the direction of their optical axes
(verge) in the plane containing a line segment joining their
optical centers. Such devices are called stereo heads;

• Shape from contour aims to describe a 3D shape from
contours seen from one or more view directions. Objects
with smooth bounding surfaces are quite difficult to
analyze.
• The set of all points on the object surface where surface
normal is perpendicular to the observer’s visual ray is
called a rim

Assuming orthographic projection, the rim points generate a
silhouette of an object in the image. Silhouettes can be
easily and reliably captured if back-light illumination is used,
although there is possible complication in thespecial case in
which two distinct rim points project to a single image point.

• The inherent difficulty in shape from contour comes from the
loss of information in projecting 3D to 2D.
Humans are surprisingly successful at perceiving clear 3D shapes from
contours, and it seems that tremendous background knowledge is used to
assist. Understanding this human ability is one of the major challenges for
computer vision.

Full 3D objects
3D objects, models, and related issues:
The notion of a 3D object allows us to consider a 3D
volume as a part of the entire 3D world.
This volume has a particular interpretation (semantics,
purpose) for the task in hand.
we have treated geometric and radiometric techniques that
provide intermediate 3D cues, and it was implicitly assumed
that such cues help to understand the nature of a 3D
object.
Shape is another informal concept that humans typically
connect with a 3D object.

• Computer vision aims at scientific methods for 3D object
description, but there are no mathematical tools yet
available to express shape in its general sense.
• Curvilinear surfaces with no restriction on surface shape
are called free-form surfaces.
• Roughly speaking, the 3D vision task distinguishes two
classes of approach:

1. Reconstruction of the 3D object model or representation
from real-world measurements with the aim of estimating a
continuous function representing the surface.
2. Recognition of an instance of a 3D object in the scene. It
is assumed that object classes are known in advance, and
that they are represented by a suitable 3D model.
Humans meet and recognize often deformable objects
that change their shape.

• Computer vision as well as computer graphics use 3D
models to encapsulate the shape of an 3D object.
• 3D models serve in computer graphics to generate detailed
surface descriptions used to render realistic 2D images.
• In computer vision, the model is used either for
reconstruction (copying, displaying an object from a different
viewpoint,modifying an object slightly during animation) or for
recognition purposes, where features are used that
distinguish objects from different classes.

• There are two main classes of models: volumetric and
surface.
• Volumetric models represent the ‘inside’ of a 3DZ object
explicitly, while surface models use only object surfaces,
as most vision-based measuring techniques can only see
the surface of a non-transparent solid.
• 3D models make a transition towards an object-centered
co-ordinate system, allowing
• object descriptions to be viewer independent. This is the
most difficult phase within Marr’s paradigm.

• 3D models of objects are common in other areas besides
computer vision, notably computer-aided design (CAD)
and computer graphics, where image synthesis is
required that is, an exact (2D) pictorial representation of
some modeled 3D object.
• Various representation schemes exist, with different
properties. A representation is called complete if two
different objects cannot correspond to the same model, so
a particular model is unambiguous.
• A representation is called unique if an object cannot
correspond to two different models.

• Most 3D representation methods sacrifice either the
completeness or the uniqueness property.
• Commercial CAD systems frequently sacrifice uniqueness.

Line labeling
• blocks world approach.
• Line labeling is an outmoded but accessible technique for
reconstructing objects with planar faces.
Independently, other researchers built on these ideas to develop what is now a very well known
line labeling algorithm

• Line labeling is able to detect ‘impossible’ .

Volumetric representation, direct measurements
• An object is placed in some reference co-ordinate system
and its volume is subdivided into small volume elements
called voxels—it is usual for these to be cubes.
• The most straightforward representation of voxel-based
volumetric models is the 3D occupancy grid, which is
implemented as a 3D Boolean array

• The object is fixed to a measuring machine, and an
absolute co-ordinate system is attached to it. Points on the
object surface are touched by a measuring needle which
provides 3D co-ordinates;
Another 3D measurement technique,computed tomography, looks
inside the object and thus yields more detailed information than the
binary occupancy grid.

Volumetric modeling strategies
• Constructive Solid Geometry
• The principal idea of Constructive Solid Geometry (CSG), which
has found some success is to construct 3D bodies from a
selection of solid primitives.
• A CSG model is stored as a tree, with leaf nodes representing the
primitive solid and edges enforcing precedence among the set
theoretical operations

• Super-quadrics
• Super-quadrics are geometric bodies that can be understood as a
generalization of basic quadric solids, introduced in computer
graphics [Barr, 1981].
• Super-ellipsoids are instances of super-quadrics used in computer
vision.

• where a1, a2, and a3 define the super-quadric size in the x,
y, and z directions, respectively. εvert is the squareness
parameter in the latitude plane and εhori is the squareness
parameter in the longitude plane.
• The squareness values used in respective planes are 0 (i.e.,
square) ≤ ε ≤ 2 (i.e., deltoid), as only those are convex
bodies. If squareness parameters are greater than 2, the
body changes to a cross-like shape.

Generalized cylinders
• Generalized cylinders, or generalized cones, are often also called
sweep representations.
• a cone is defined by a circle whose radius changes linearly with
distance traveled, moving along a straight line.

• These generalized cones turn out to be very good at
representing some classes of solid body.
• The advantage of symmetrical volumetric primitives, such
as generalized cylinders and super-quadrics, is their ability
to capture common symmetries and represent certain
shapes with few parameters.
• An influential early vision system called ACRONYM used
generalized cones as its modeling scheme.
• There is a modification of the sweep representation called
a skeleton representation, which stores only the spines of
the objects.

Surface modeling strategies
• A solid object can be represented by surfaces bounding it;
such a description can vary from simple triangular patches
to visually appealing structures such as non-uniform
rational B-splines (NURBS) popular in geometric modeling.
Computer vision solves two main problems with surfaces:
1. reconstruction creates surface description from sparse
depth measurements that are typically corrupted by outliers;
2.segmentation aims to classify surface or surface patches
into surface types.

• Boundary representations (B-reps) can be viewed
conceptually as a triple:
• A set of surfaces of the object.
• A set of space curves representing intersections between
the surfaces.
• A graph describing the surface connectivity.
B-reps are an appealing and intuitively natural way of
representing 3D bodies in that they consist of an explicit list
of the bodies’ faces.
In the simplest case, ‘faces’ are taken to be planar, so
bodies are always polyhedral, and we are dealing the
whole time with piecewise planar surfaces.

• Triangulation of irregular data points (e.g., a 3D point
cloud obtained from a range scanner) is an example of an
interpolation method.
• The best-known technique is called Delaunay
triangulation, which can be defined in two, three, or more
space dimensions.

Registering surface patches and their fusion
to get a full 3D model
• A range image represents distance measurements from
an observer to an object; it yields a partial 3D description
of the surface from one view only.
• Several range images are needed to capture the whole
surface of an object.
• Range image registration finds a rigid geometric
transformation between two range images of the same
object captured from two different viewpoints.

• The method automates the construction of a 3D model of a
3D free-form object from a set of range images as follows.
1. The object is placed on a turntable and a set of range
images from different viewpoints is measured by a
structured-light (laser-plane) range finder.
2. A triangulated surface is constructed over the range
images.
3. Large data sets are reduced by decimation of triangular
meshes in each view.
4. Surfaces are registered into a common object-centered co-
ordinate system and out_x0002_liers in measurements are
removed.

• A 4-connected mesh cannot represent all objects; e.g., a
sphere cannot be covered by a four-sided polygon.
• By splitting each polygon by an edge, a triangulation of
the surface, which is able to represent any surface, is
easily obtained.
• A polygon may be split two ways; it is preferable to
choose the shortest edge because this results in triangles
with larger inner angles.

2D view-based representations of a 3D scene
• Viewing space
• The trouble is that there is potentially an infinite number of
possible viewpoints that induce an infinite number of
object appearances.
• To cope with the huge number of viewpoints and
appearances it is necessary to sample a viewpoint space
and group together similar neighboring views.
• A simplified model is a viewing sphere model that is
often used in the orthographic projection case

Multi-view representations and aspect graphs
• Other representation methods attempt to combine all the
viewpoint-specific models into a single data structure.
One of them is the characteristic view technique in which
all possible 2D projections of the convex polyhedral object
are grouped into a finite number of topologically
equivalent classes.
• A similar approach is based on aspect which is defined as
the topological structure of singularities in a single view of
an object aspect has useful invariance properties.

• Most small changes in vantage point will not affect aspect,
and such vantage points (that isbmost) are referred to as
stable.

3D reconstruction from an unorganized set of 2D
views, and Structure from Motion

3d vision.pptxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

3d vision.pptxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Recommended

Recommended

More Related Content

Similar to 3d vision.pptxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

Similar to 3d vision.pptxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv (20)

More from shesnasuneer

More from shesnasuneer (13)

Recently uploaded

Recently uploaded (20)

3d vision.pptxvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv