SlideShare a Scribd company logo
1 of 8
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
1
Unit II
1 Affinity measures: Some model of segmentation simply requires a weight to place on each edge of the
graph; these weights are usually called affinity measures.
Clearly, the affinity measure depends on the problem at hand. The weight of an arc connecting similar
nodes should be large, and the weight on an arc connecting very different nodes should be small. It is
fairly easy to come up with affinity measures with these properties for a variety of important cases, and
we can construct an affinity function for a combination of cues by forming a product of powers of these
affinity functions.
Example:
i) Affinity by Distance
Affinity should go down quite sharply with distance, once the distance is over some threshold. One
appropriate expression has the form
ii) Affinity by Intensity
Affinity should be large for similar intensities, and smaller as the difference increases. Again, an
exponential form suggests itself, and we can use:
iii) Affinity by Colour
We need a colour metric to construct a meaningful colour affinity function and an appropriate
expression has the form
iv) Affinity by Texture
The affinity should be large for similar textures and smaller as the difference increases. We adopt a
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
2
collection of filters f1, . . ., fn, and describe textures by the outputs of these filters, which should span a
range of scales and orientations. Now for most textures, the filter outputs will not be the same at each
point in the texture
we use an exponential form:
v) Affinity by Motion
In the case of motion, the nodes of the graph are going to represent a pixel in a particular image in the
sequence. It is difficult to estimate the motion at a particular pixel accurately; instead, it makes sense to
construct a distribution over the possible motions. The quality of motion estimate available depends on
what the neighbourhood of the pixel looks like.
If we define a similarity measure for an image motion v at a pixel x to be
We have a measure that will be near one for a good value of the motion and near zero for a poor one.
This can be massaged into a probability distribution by ensuring that it somes to one, so we have
Now we need to obtain an affinity measure from this. The arcs on the graph will connect pixels that are
“nearby” in space and in time. For each pair of pixels, the affinity should be high if the motion pattern
around the pixels could look similar, and low otherwise. This suggests using a correlation measure for
the affinity.
2 Normalized Cuts
An approach to cut the graph into two connected components such that the cost of the cut is a
small fraction of the total affinity within each group.
We can formalise this as decomposing a weighted graph V into two components A and B, and
scoring the decomposition with
(where cut(A,B) is the sum of weights of all edges in V that have one end in A and the other in
B, and assoc(A, V ) is the sum of weights of all edges that have one end in A). This score will be
small if the cut separates two components that have very few edges of low weight between
them and many internal edges of high weight.
We would like to find the cut with the minimum value of this criterion, called a normalized cut.
 This problem is too difficult to solve in this form, because we would need to look at
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
3
every graph cut — it’s a combinatorial optimization problem, so we can’t use continuity
arguments to reason about how good a neighbouring cut is given the value of a
particular cut.
3 Human Vision: Stereopsis
Unlike the cameras rigidly attached to a passive stereo rig, the two eyes of a person can rotate
in their sockets. At each instant, they fixate on a particular point in space, i.e., they rotate so
that its two images form in the centers of the eyes’ foveas.
Below Figure illustrates a simplified, two-dimensional situation.
If l and r denote the angles between the vertical planes of symmetry of two eyes and two rays
passing through the same scene point, we define the corresponding disparity as d = r − l.
It is an elementary exercise in trigonometry to show that d = D −F, where D denotes the angle
between these rays, and F is the angle between the two rays passing through the fixated point.
Points with zero disparity lie on the ViethM¨uller circle that passes through the fixated point
and the anterior nodal points of the eyes.
Points lying inside this circle have a positive (or convergent) disparity, points lying outside it
have, a negative (or divergent) disparity, and the locus of all points having a given disparity d
forms, as d varies, the pencil of all circles passing through the two eyes’ nodal points. This
property is clearly sufficient to rank-order in depth dots that are near the fixation point.
However, it is also clear that the vergence angles between the vertical median plane of
symmetry of the head and the two fixation rays must be known in order to reconstruct the
absolute position of scene points.
4 Epipolar Geometry
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
4
5 Trinocular Stereo
Adding a third camera eliminates (in large part) the ambiguity inherent in two view point
matching. In essence, the third image can be used to check hypothetical matches between the
first two pictures (as shown in below figure, the three-dimensional point associated with such
a match is first reconstructed then reprojected into the third image.
If no compatible point lies nearby, then the match must be wrong. In fact, the
reconstruction/reprojection process can be avoided by noting, that, given three weakly (and a
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
5
fortiori strongly) calibrated cameras and two images of a point, one can always predict its
position in a third image by intersecting the corresponding epipolar lines.
The trifocal tensor can be used to also predict the tangent line to some image curve in one
image given the corresponding tangents in the other images: given matching tangents l2 and l3
in images 2 and 3, we can reconstruct the tangent l1 in image number 1 is
Unit III
1. What is tracking? List and explain the various applications of tracking. Describe tracking
people.
Tracking:
Tracking is the problem of generating an inference about the motion of an object given a
sequence of images.
Good solutions to this problem have a variety of applications:
• Motion Capture: if we can track a moving person accurately, then we can make an accurate
record of their motions. Once we have this record, we can use it to drive a rendering process;
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
6
for example, we might control a cartoon character, thousands of virtual extras in a crowd
scene, or a virtual stunt avatar. Furthermore, we could modify the motion record to obtain
slightly different motions. This means that a single performer can produce sequences they
wouldn’t want to do in person.
• Recognition From Motion: the motion of objects is quite characteristic. We may be able to
determine the identity of the object from its motion; we should be able to tell what it’s doing.
• Surveillance: knowing what objects are doing can be very useful. For example, different kinds
of trucks should move in different, fixed patterns in an airport; if they do not, then something
is going very wrong. Similarly, there are combinations of places and patterns of motions that
should never occur (no truck should ever stop on an active runway, say). It could be helpful to
have a computer system that can monitor activities and give a warning if it detects a problem
case.
• Targeting: a significant fraction of the tracking literature is oriented towards (a) deciding
what to shoot and (b) hitting it. Typically, this literature describes tracking using radar or infra-
red signals (rather than vision), but the basic issues are the same — what do we infer about an
object’s future position from a sequence of measurements? (i.e. where should we aim?)
In typical tracking problems, we have a model for the object’s motion, and some set of
measurements from a sequence of images. These measurements could be the position of
some image points, the position and moments of some image regions, or pretty much anything
else. They are not guaranteed to be relevant, in the sense that some could come from the
object of interest and some might come from other objects, or from noise.
Tracking People
 People are typically modelled as a collection of body segments, connected with rigid
transformations.
 These segments can be modelled as cylinders — in which case, we can ignore the top
and bottom of the cylinder and any variations in view, and represent the cylinder as an
image rectangle of fixed size — or as ellipsoids.
 The state of the tracker is then given by the rigid body transformations connecting
these body segments (and perhaps, various velocities and accelerations associated with
them).
 Both particle filters and (variants of) Kalman filters have been used to track people.
Each approach can be made to succeed, but neither is particularly robust.
2 Explain vehicle tracking application in detail
Vehicle tracking
Systems that can track cars using video from fixed cameras can be used to predict traffic
volume and flow; the ideal is to report on, and act to prevent, traffic problems as quickly as
possible. A number of systems can track vehicles successfully. The crucial issue is initiating a
track automatically.
• Sullivan et al. construct a set of regions of interest (ROI’s) in each frame. Because the
camera is fixed, these regions of interest can be chosen to span each lane, this means that
almost all vehicles must pass directly through a region of interest in a known direction.
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
7
Their system then watches for characteristic edge signatures in the ROI that indicate
the presence of a vehicle. These signatures can alias slightly — typically, a track is initiated
when the front of the vehicle enters the ROI, another is initiated when the vehicle lies in the
ROI, and a third is initiated close to the vehicle’s leaving — because some of the vehicle’s
edges are easily mistaken for others.
Each initiated track is tracked for a sequence of frames, during which time it
accumulates a quality score — essentially, an estimate of the extent to which predictions of
future position were accurate.
If this quality score is sufficiently high, the track is accepted as an hypothesis. An
exclusion region in space and time is constructed around each hypothesis, such that there can
be only one track in this region, and if the regions overlap, the track with the highest quality is
chosen.
The requirement that the exclusion regions do not overlap derives from the fact that
two cars can’t occupy the same region of space at the same time. Once a track has passed
these tests, the position in which and the time at which it will pass through another ROI can be
predicted. The track is finally confirmed or rejected by comparing this ROI at the appropriate
time with a template that predicts the car’s appearance. Typically, relatively few tracks that are
initiated reach this stage.
• An alternative method for initiating car tracks is to track individual features, and then
group those tracks into possible cars. Beymer et al. use this strategy rather successfully.
Because the road is plane and the camera is fixed, the homography connecting the road plane
and the camera can be determined. This homography can be used to determine the distance
between points; and points can lie together on a car only if this distance doesn’t change with
time.
Their system tracks corner points, identified using a second moment matrix, using a
Kalman filter. Points are grouped using a simple algorithm using a graph abstraction: each
feature track is a vertex, and edges represent a grouping relationship between the tracks.
When a new feature comes into view — and a track is thereby initiated — it is given an
edge joining it to every feature track that appears nearby in that frame. If, at some future time,
the distance between points in a track changes by too much, the edge is discarded.
An exit region is defined near where vehicles will leave the frame. When tracks reach
this exit region, connected components are defined to be vehicles. This grouper is successful,
both in example images and in estimating traffic parameters over long sequences.
The ground plane to camera transformation can provide a great deal of information;
once an object has been tracked, we can use this transformation to reason about spatial layout
and occlusion.
• Remagnino et al. track vehicles and pedestrians — pedestrians in coarse scale images
are represented with a closed B-spline curve, whose control points are tracked with a Kalman
filter; the B-spline tracks edge data, using a fairly narrow gate around a set of discrete points
along the spline and then reconstruct spatial relations using this homography. The advantage
of this approach is that one can engage in explicit occlusion reasoning, so that even
pedestrians partially occluded by a car can be tracked. Another use of the homography makes
it possible to track cars from moving vehicles. In this case, there are two issues to manage:
firstly, the motion of the camera platform (so-called ego-motion); and secondly, the motion of
other vehicles.
• Maybank et al. estimate the ego-motion by matching views of the road to one another
from frame to frame. With an estimate of the homography and of the ego-motion, we can now
refer tracks of other moving vehicles into the road coordinate system to come up with
Computer Vision (21AM504) - Unit II & III
Topics not covered in PPT
8
reconstructions of all vehicles visible on the road from a moving vehicle

More Related Content

Similar to Unit II & III_uncovered topics.doc notes

An automatic algorithm for object recognition and detection based on asift ke...
An automatic algorithm for object recognition and detection based on asift ke...An automatic algorithm for object recognition and detection based on asift ke...
An automatic algorithm for object recognition and detection based on asift ke...
Kunal Kishor Nirala
 
iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)
Moniroth Suon
 
Report bep thomas_blanken
Report bep thomas_blankenReport bep thomas_blanken
Report bep thomas_blanken
xepost
 

Similar to Unit II & III_uncovered topics.doc notes (20)

G04743943
G04743943G04743943
G04743943
 
Vehicle Tracking Using Kalman Filter and Features
Vehicle Tracking Using Kalman Filter and FeaturesVehicle Tracking Using Kalman Filter and Features
Vehicle Tracking Using Kalman Filter and Features
 
I010634450
I010634450I010634450
I010634450
 
Performance of Efficient Closed-Form Solution to Comprehensive Frontier Exposure
Performance of Efficient Closed-Form Solution to Comprehensive Frontier ExposurePerformance of Efficient Closed-Form Solution to Comprehensive Frontier Exposure
Performance of Efficient Closed-Form Solution to Comprehensive Frontier Exposure
 
visual realism in geometric modeling
visual realism in geometric modelingvisual realism in geometric modeling
visual realism in geometric modeling
 
Augmented reality session 4
Augmented reality session 4Augmented reality session 4
Augmented reality session 4
 
Practical Digital Image Processing 5
Practical Digital Image Processing 5Practical Digital Image Processing 5
Practical Digital Image Processing 5
 
Linear Image Processing
Linear Image Processing Linear Image Processing
Linear Image Processing
 
An automatic algorithm for object recognition and detection based on asift ke...
An automatic algorithm for object recognition and detection based on asift ke...An automatic algorithm for object recognition and detection based on asift ke...
An automatic algorithm for object recognition and detection based on asift ke...
 
02_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_201402_atiqa ijaz khan_05_2014
02_atiqa ijaz khan_05_2014
 
Motion and tracking
Motion and trackingMotion and tracking
Motion and tracking
 
iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)
 
An Assessment of Image Matching Algorithms in Depth Estimation
An Assessment of Image Matching Algorithms in Depth EstimationAn Assessment of Image Matching Algorithms in Depth Estimation
An Assessment of Image Matching Algorithms in Depth Estimation
 
A Survey on Approaches for Object Tracking
A Survey on Approaches for Object TrackingA Survey on Approaches for Object Tracking
A Survey on Approaches for Object Tracking
 
Object tracking with SURF: ARM-Based platform Implementation
Object tracking with SURF: ARM-Based platform ImplementationObject tracking with SURF: ARM-Based platform Implementation
Object tracking with SURF: ARM-Based platform Implementation
 
Report bep thomas_blanken
Report bep thomas_blankenReport bep thomas_blanken
Report bep thomas_blanken
 
Final Paper
Final PaperFinal Paper
Final Paper
 
427lects
427lects427lects
427lects
 
ROLE OF HYBRID LEVEL SET IN FETAL CONTOUR EXTRACTION
ROLE OF HYBRID LEVEL SET IN FETAL CONTOUR EXTRACTIONROLE OF HYBRID LEVEL SET IN FETAL CONTOUR EXTRACTION
ROLE OF HYBRID LEVEL SET IN FETAL CONTOUR EXTRACTION
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Unit II & III_uncovered topics.doc notes

  • 1. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 1 Unit II 1 Affinity measures: Some model of segmentation simply requires a weight to place on each edge of the graph; these weights are usually called affinity measures. Clearly, the affinity measure depends on the problem at hand. The weight of an arc connecting similar nodes should be large, and the weight on an arc connecting very different nodes should be small. It is fairly easy to come up with affinity measures with these properties for a variety of important cases, and we can construct an affinity function for a combination of cues by forming a product of powers of these affinity functions. Example: i) Affinity by Distance Affinity should go down quite sharply with distance, once the distance is over some threshold. One appropriate expression has the form ii) Affinity by Intensity Affinity should be large for similar intensities, and smaller as the difference increases. Again, an exponential form suggests itself, and we can use: iii) Affinity by Colour We need a colour metric to construct a meaningful colour affinity function and an appropriate expression has the form iv) Affinity by Texture The affinity should be large for similar textures and smaller as the difference increases. We adopt a
  • 2. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 2 collection of filters f1, . . ., fn, and describe textures by the outputs of these filters, which should span a range of scales and orientations. Now for most textures, the filter outputs will not be the same at each point in the texture we use an exponential form: v) Affinity by Motion In the case of motion, the nodes of the graph are going to represent a pixel in a particular image in the sequence. It is difficult to estimate the motion at a particular pixel accurately; instead, it makes sense to construct a distribution over the possible motions. The quality of motion estimate available depends on what the neighbourhood of the pixel looks like. If we define a similarity measure for an image motion v at a pixel x to be We have a measure that will be near one for a good value of the motion and near zero for a poor one. This can be massaged into a probability distribution by ensuring that it somes to one, so we have Now we need to obtain an affinity measure from this. The arcs on the graph will connect pixels that are “nearby” in space and in time. For each pair of pixels, the affinity should be high if the motion pattern around the pixels could look similar, and low otherwise. This suggests using a correlation measure for the affinity. 2 Normalized Cuts An approach to cut the graph into two connected components such that the cost of the cut is a small fraction of the total affinity within each group. We can formalise this as decomposing a weighted graph V into two components A and B, and scoring the decomposition with (where cut(A,B) is the sum of weights of all edges in V that have one end in A and the other in B, and assoc(A, V ) is the sum of weights of all edges that have one end in A). This score will be small if the cut separates two components that have very few edges of low weight between them and many internal edges of high weight. We would like to find the cut with the minimum value of this criterion, called a normalized cut.  This problem is too difficult to solve in this form, because we would need to look at
  • 3. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 3 every graph cut — it’s a combinatorial optimization problem, so we can’t use continuity arguments to reason about how good a neighbouring cut is given the value of a particular cut. 3 Human Vision: Stereopsis Unlike the cameras rigidly attached to a passive stereo rig, the two eyes of a person can rotate in their sockets. At each instant, they fixate on a particular point in space, i.e., they rotate so that its two images form in the centers of the eyes’ foveas. Below Figure illustrates a simplified, two-dimensional situation. If l and r denote the angles between the vertical planes of symmetry of two eyes and two rays passing through the same scene point, we define the corresponding disparity as d = r − l. It is an elementary exercise in trigonometry to show that d = D −F, where D denotes the angle between these rays, and F is the angle between the two rays passing through the fixated point. Points with zero disparity lie on the ViethM¨uller circle that passes through the fixated point and the anterior nodal points of the eyes. Points lying inside this circle have a positive (or convergent) disparity, points lying outside it have, a negative (or divergent) disparity, and the locus of all points having a given disparity d forms, as d varies, the pencil of all circles passing through the two eyes’ nodal points. This property is clearly sufficient to rank-order in depth dots that are near the fixation point. However, it is also clear that the vergence angles between the vertical median plane of symmetry of the head and the two fixation rays must be known in order to reconstruct the absolute position of scene points. 4 Epipolar Geometry
  • 4. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 4 5 Trinocular Stereo Adding a third camera eliminates (in large part) the ambiguity inherent in two view point matching. In essence, the third image can be used to check hypothetical matches between the first two pictures (as shown in below figure, the three-dimensional point associated with such a match is first reconstructed then reprojected into the third image. If no compatible point lies nearby, then the match must be wrong. In fact, the reconstruction/reprojection process can be avoided by noting, that, given three weakly (and a
  • 5. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 5 fortiori strongly) calibrated cameras and two images of a point, one can always predict its position in a third image by intersecting the corresponding epipolar lines. The trifocal tensor can be used to also predict the tangent line to some image curve in one image given the corresponding tangents in the other images: given matching tangents l2 and l3 in images 2 and 3, we can reconstruct the tangent l1 in image number 1 is Unit III 1. What is tracking? List and explain the various applications of tracking. Describe tracking people. Tracking: Tracking is the problem of generating an inference about the motion of an object given a sequence of images. Good solutions to this problem have a variety of applications: • Motion Capture: if we can track a moving person accurately, then we can make an accurate record of their motions. Once we have this record, we can use it to drive a rendering process;
  • 6. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 6 for example, we might control a cartoon character, thousands of virtual extras in a crowd scene, or a virtual stunt avatar. Furthermore, we could modify the motion record to obtain slightly different motions. This means that a single performer can produce sequences they wouldn’t want to do in person. • Recognition From Motion: the motion of objects is quite characteristic. We may be able to determine the identity of the object from its motion; we should be able to tell what it’s doing. • Surveillance: knowing what objects are doing can be very useful. For example, different kinds of trucks should move in different, fixed patterns in an airport; if they do not, then something is going very wrong. Similarly, there are combinations of places and patterns of motions that should never occur (no truck should ever stop on an active runway, say). It could be helpful to have a computer system that can monitor activities and give a warning if it detects a problem case. • Targeting: a significant fraction of the tracking literature is oriented towards (a) deciding what to shoot and (b) hitting it. Typically, this literature describes tracking using radar or infra- red signals (rather than vision), but the basic issues are the same — what do we infer about an object’s future position from a sequence of measurements? (i.e. where should we aim?) In typical tracking problems, we have a model for the object’s motion, and some set of measurements from a sequence of images. These measurements could be the position of some image points, the position and moments of some image regions, or pretty much anything else. They are not guaranteed to be relevant, in the sense that some could come from the object of interest and some might come from other objects, or from noise. Tracking People  People are typically modelled as a collection of body segments, connected with rigid transformations.  These segments can be modelled as cylinders — in which case, we can ignore the top and bottom of the cylinder and any variations in view, and represent the cylinder as an image rectangle of fixed size — or as ellipsoids.  The state of the tracker is then given by the rigid body transformations connecting these body segments (and perhaps, various velocities and accelerations associated with them).  Both particle filters and (variants of) Kalman filters have been used to track people. Each approach can be made to succeed, but neither is particularly robust. 2 Explain vehicle tracking application in detail Vehicle tracking Systems that can track cars using video from fixed cameras can be used to predict traffic volume and flow; the ideal is to report on, and act to prevent, traffic problems as quickly as possible. A number of systems can track vehicles successfully. The crucial issue is initiating a track automatically. • Sullivan et al. construct a set of regions of interest (ROI’s) in each frame. Because the camera is fixed, these regions of interest can be chosen to span each lane, this means that almost all vehicles must pass directly through a region of interest in a known direction.
  • 7. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 7 Their system then watches for characteristic edge signatures in the ROI that indicate the presence of a vehicle. These signatures can alias slightly — typically, a track is initiated when the front of the vehicle enters the ROI, another is initiated when the vehicle lies in the ROI, and a third is initiated close to the vehicle’s leaving — because some of the vehicle’s edges are easily mistaken for others. Each initiated track is tracked for a sequence of frames, during which time it accumulates a quality score — essentially, an estimate of the extent to which predictions of future position were accurate. If this quality score is sufficiently high, the track is accepted as an hypothesis. An exclusion region in space and time is constructed around each hypothesis, such that there can be only one track in this region, and if the regions overlap, the track with the highest quality is chosen. The requirement that the exclusion regions do not overlap derives from the fact that two cars can’t occupy the same region of space at the same time. Once a track has passed these tests, the position in which and the time at which it will pass through another ROI can be predicted. The track is finally confirmed or rejected by comparing this ROI at the appropriate time with a template that predicts the car’s appearance. Typically, relatively few tracks that are initiated reach this stage. • An alternative method for initiating car tracks is to track individual features, and then group those tracks into possible cars. Beymer et al. use this strategy rather successfully. Because the road is plane and the camera is fixed, the homography connecting the road plane and the camera can be determined. This homography can be used to determine the distance between points; and points can lie together on a car only if this distance doesn’t change with time. Their system tracks corner points, identified using a second moment matrix, using a Kalman filter. Points are grouped using a simple algorithm using a graph abstraction: each feature track is a vertex, and edges represent a grouping relationship between the tracks. When a new feature comes into view — and a track is thereby initiated — it is given an edge joining it to every feature track that appears nearby in that frame. If, at some future time, the distance between points in a track changes by too much, the edge is discarded. An exit region is defined near where vehicles will leave the frame. When tracks reach this exit region, connected components are defined to be vehicles. This grouper is successful, both in example images and in estimating traffic parameters over long sequences. The ground plane to camera transformation can provide a great deal of information; once an object has been tracked, we can use this transformation to reason about spatial layout and occlusion. • Remagnino et al. track vehicles and pedestrians — pedestrians in coarse scale images are represented with a closed B-spline curve, whose control points are tracked with a Kalman filter; the B-spline tracks edge data, using a fairly narrow gate around a set of discrete points along the spline and then reconstruct spatial relations using this homography. The advantage of this approach is that one can engage in explicit occlusion reasoning, so that even pedestrians partially occluded by a car can be tracked. Another use of the homography makes it possible to track cars from moving vehicles. In this case, there are two issues to manage: firstly, the motion of the camera platform (so-called ego-motion); and secondly, the motion of other vehicles. • Maybank et al. estimate the ego-motion by matching views of the road to one another from frame to frame. With an estimate of the homography and of the ego-motion, we can now refer tracks of other moving vehicles into the road coordinate system to come up with
  • 8. Computer Vision (21AM504) - Unit II & III Topics not covered in PPT 8 reconstructions of all vehicles visible on the road from a moving vehicle