The document discusses hierarchical models for 3D object recognition. It proposes a probabilistic representation called the "Voxel World" that models 3D scenes using volumetric units (voxels) with surface probability and appearance modeled by mixtures of Gaussians. The representation is compact, handles illumination variations, and automatically learns from multiple images without assumptions. It is well-suited for object recognition. The document also discusses insights on classical recognition methods and outlines compositional hierarchy approaches and experimental work to prove the concept.
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
1) The document proposes a SAR image despeckling method based on cartoon-texture decomposition, an improved directionlet transform, and Gaussian mixture modeling of coefficients.
2) It decomposes SAR images into cartoon and texture parts, applies an improved directionlet transform to the texture part, and estimates noise-free coefficients using a Gaussian mixture model.
3) Experimental results on a SAR field image show the proposed method achieves better subjective visual quality and detail preservation compared to other methods like the G-MAP, wavelet, and nonsubsampled contourlet transforms.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
The document discusses using divergence measures like the Jensen-Shannon divergence to align multiple point sets represented as probability density functions. It motivates using the JS divergence by modeling point sets as mixtures of density functions, and shows how the likelihood ratio between models leads to the JS divergence. It then formulates the problem of group-wise point set registration as minimizing the JS divergence between density functions, combined with a regularization term. Experimental results on aligning multiple 3D hippocampus point sets are also presented.
1. The document is an introduction to statistical machine learning by Christfried Webers from NICTA and The Australian National University in 2011.
2. It covers basic concepts in linear algebra that are important for statistical machine learning such as linear transformations, matrices, vectors, and matrix-vector multiplication.
3. The document provides code examples and visual explanations of concepts like how a matrix A multiplies a vector V to produce a result vector R.
Image Smoothing for Structure ExtractionJia-Bin Huang
The document discusses image smoothing techniques for structure extraction. It aims to achieve edge-aware smoothing while distinguishing texture from structure. Previous related work includes Gaussian blurring, L0 gradient minimization, and domain transformations. The proposed algorithm formulates smoothing as a global optimization problem that minimizes the data term and total variation regularization term. It uses a Huber loss function and iterative reweighted L1 norm to encourage sparsity. Test results will be conducted using source code from previous works. Future work includes implementing the algorithm in CVX and testing effectiveness.
Principal component analysis and matrix factorizations for learning (part 2) ...zukun
1) Spectral clustering is a technique for clustering data based on the eigenvectors of the similarity matrix of the data. 2) It works by computing the generalized eigenvectors of the normalized graph Laplacian matrix, which leads to a low-dimensional embedding of the data that can then be clustered using k-means. 3) Spectral clustering is related to other graph clustering techniques like normalized cut that aim to minimize similarities between clusters while balancing cluster sizes.
The document discusses optimal transport and its applications to color transfer for images. It introduces discrete and continuous optimal transport, which finds the optimal way of transferring mass between distributions to minimize cost. This allows computing distances between distributions and projecting images to match color statistics. Specifically, it describes using sliced Wasserstein projections to transfer the color distribution of a source image to match that of a style image. This modified color transfer method preserves the spatial structure of the source image better than traditional histogram equalization.
This presentation summarizes research on recovering binary class relationships from source code. It defines association, aggregation, and composition relationships based on their properties. Algorithms are presented to recover relationships by analyzing source code statically and dynamically. The approaches were validated on Java frameworks, achieving high precision and recall for composition relationships and moderate recall for aggregation relationships. Future work is proposed on detecting other UML elements and further experimental validation.
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
1) The document proposes a SAR image despeckling method based on cartoon-texture decomposition, an improved directionlet transform, and Gaussian mixture modeling of coefficients.
2) It decomposes SAR images into cartoon and texture parts, applies an improved directionlet transform to the texture part, and estimates noise-free coefficients using a Gaussian mixture model.
3) Experimental results on a SAR field image show the proposed method achieves better subjective visual quality and detail preservation compared to other methods like the G-MAP, wavelet, and nonsubsampled contourlet transforms.
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
The document discusses using divergence measures like the Jensen-Shannon divergence to align multiple point sets represented as probability density functions. It motivates using the JS divergence by modeling point sets as mixtures of density functions, and shows how the likelihood ratio between models leads to the JS divergence. It then formulates the problem of group-wise point set registration as minimizing the JS divergence between density functions, combined with a regularization term. Experimental results on aligning multiple 3D hippocampus point sets are also presented.
1. The document is an introduction to statistical machine learning by Christfried Webers from NICTA and The Australian National University in 2011.
2. It covers basic concepts in linear algebra that are important for statistical machine learning such as linear transformations, matrices, vectors, and matrix-vector multiplication.
3. The document provides code examples and visual explanations of concepts like how a matrix A multiplies a vector V to produce a result vector R.
Image Smoothing for Structure ExtractionJia-Bin Huang
The document discusses image smoothing techniques for structure extraction. It aims to achieve edge-aware smoothing while distinguishing texture from structure. Previous related work includes Gaussian blurring, L0 gradient minimization, and domain transformations. The proposed algorithm formulates smoothing as a global optimization problem that minimizes the data term and total variation regularization term. It uses a Huber loss function and iterative reweighted L1 norm to encourage sparsity. Test results will be conducted using source code from previous works. Future work includes implementing the algorithm in CVX and testing effectiveness.
Principal component analysis and matrix factorizations for learning (part 2) ...zukun
1) Spectral clustering is a technique for clustering data based on the eigenvectors of the similarity matrix of the data. 2) It works by computing the generalized eigenvectors of the normalized graph Laplacian matrix, which leads to a low-dimensional embedding of the data that can then be clustered using k-means. 3) Spectral clustering is related to other graph clustering techniques like normalized cut that aim to minimize similarities between clusters while balancing cluster sizes.
The document discusses optimal transport and its applications to color transfer for images. It introduces discrete and continuous optimal transport, which finds the optimal way of transferring mass between distributions to minimize cost. This allows computing distances between distributions and projecting images to match color statistics. Specifically, it describes using sliced Wasserstein projections to transfer the color distribution of a source image to match that of a style image. This modified color transfer method preserves the spatial structure of the source image better than traditional histogram equalization.
This presentation summarizes research on recovering binary class relationships from source code. It defines association, aggregation, and composition relationships based on their properties. Algorithms are presented to recover relationships by analyzing source code statically and dynamically. The approaches were validated on Java frameworks, achieving high precision and recall for composition relationships and moderate recall for aggregation relationships. Future work is proposed on detecting other UML elements and further experimental validation.
Pp linkedin example to upload to shareslideCindy Eack
The document provides tips for getting more from LinkedIn, recommending completing your profile to 100%, diligently updating your connections, adding a video to make your profile come alive, and tagging and filtering connections to build relationships.
This document outlines various digital design skills including pasting images, using grids for alignment, image resizing, using brushes, inverting colors, creating and formatting text boxes, cutting and transforming image sections, adding overlays and backgrounds, adding barcodes and fonts, and designing a digital promotional strategy plan with images and text.
The document discusses several album covers and how their visuals relate to the album titles and implied musical genres. The cover of "Lost Highway" depicts a deserted road reflecting the album title. "The Wall" features a simple wall matching its name. "Back in Black" stands out with a solid black background and white text, implying a rock sound. Another cover depicts graduation, relating to its theme of entering the real world. A black and white cover of a boy using the bathroom looks dirty, tying to its title.
This document discusses an applicant tracking system called Candidate Manager. It provides an overview of the company, key features of the applicant tracking software, and a case study of how a large retail client uses the system. Specifically, it allows clients to improve time to hire and reduce costs, has over 200 client implementations across 80 countries, features a candidate-friendly and branded application process, and helped a retail client manage over 1 million applications for 50,000 positions across the UK.
The document discusses several album covers and how their visuals relate to the album titles and implied musical genres. The cover of "Lost Highway" depicts a deserted road reflecting the album title. "The Wall" features a simple wall matching its name. "Back in Black" stands out with a solid black background and white text, implying a rock sound. Another cover depicts graduation, relating to its theme of entering the real world. A black and white cover of a boy using the bathroom looks dirty, tying to its title.
The document discusses planning for a "Search Conference" held at the Croft to get input from residents, co-workers, and other stakeholders on their values and visions for the future of the Croft community, which included creating a timeline of the Croft's history, small group discussions of individual's roles and strengths, and collecting ideas for future initiatives.
Candidate Manager, applicant tracking solutions enable organisations to recruit staff faster while also reducing on administration and recruitment agency costs.
This document discusses an applicant tracking system called Candidate Manager. It provides an overview of the company, key features of the applicant tracking software, and a case study of how a large retail client uses the system. Specifically, it allows clients to improve time to hire and reduce costs, has over 200 client implementations across 80 countries, features a candidate-friendly and branded application process, and integrates various modules like jobs, applicants, and reports. The case study outlines how a retailer with over 1 million annual applications uses it to manage a fully online recruitment process.
1) LANDesk Mobility Manager 9.0 provides solutions for common mobile device management problems such as unauthorized devices connecting to the network, configuring new devices, and remotely managing devices.
2) It offers features such as device connection policies, simple enrollment, device configuration policies, and remote management capabilities including locating lost devices and remotely wiping devices.
3) The document highlights support for iOS and Android platforms, listing the various policies and restrictions that can be applied, such as password policies, screen lock settings, and restricting certain device functions.
This magazine uses large colorful images and bold text to catch readers' attention, with the main images standing out on each page and highlighting what content can be found inside. While most pages feature person images, one uses a non-person image along with large bold text to advertise its contents differently.
This document repeatedly lists the URL www.TheLeadershipHub.com over multiple lines without any other text or context. It appears to be promoting or advertising the website www.TheLeadershipHub.com but provides no additional information about the site or its content.
The document outlines the roadmap and improvements for LANDesk Roadmap 9.0 SP3 scheduled for release in August and December 2011. Key focus areas include improved patching processes, provisioning support for Windows PE 3.1, usability enhancements to the web console and reporting, and strengthened security features such as third-party certificate support for the Cloud Services Appliance. Bugs and issues identified in previous versions will also be addressed.
A probabilistic model for recursive factorized image features pptirisshicat
The document describes a probabilistic model called recursive latent Dirichlet allocation (rLDA) for hierarchical image modeling. rLDA is based on latent Dirichlet allocation and has multiple layers of representations with increasing spatial support, where each layer learns representations jointly across layers through joint inference. This allows for distributed coding of local image features in a hierarchical manner while performing full Bayesian inference. The model is evaluated for its ability to learn hierarchical representations from images.
This document summarizes research on using deformable models for object recognition. It discusses using deformable part models to detect objects by optimizing part locations. Efficient algorithms like dynamic programming and min-convolutions are used for matching. Non-rigid objects are modeled using triangulated polygons that can deform individual triangles. Hierarchical shape models capture shape variations. The document applies these techniques to the PASCAL visual object recognition challenge, achieving state-of-the-art results on 10 of 20 object categories through discriminatively trained, multiscale deformable part models.
This document summarizes a class lecture on global illumination techniques for computer graphics. It discusses ray tracing and path tracing to solve the rendering equation through Monte Carlo integration. Radiosity for diffuse interreflection using form factors is covered. Participating media and photon mapping are also summarized. The next class will cover acceleration structures to speed up ray tracing computations. Project 4 is assigned, involving implementing a simple ray tracer.
This document discusses methods for large-scale image annotation and categorization using weakly supervised training data. It describes how traditional methods do not scale well to large datasets. Recent methods exploit linear models and distance metric learning to better scale. Specifically, Canonical Contextual Distance learning finds linear transformations to maximize correlation between image and label features in a latent subspace, providing a probabilistic similarity measure. This allows image auto-annotation on large datasets.
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
1) The document proposes a SAR image despeckling method based on cartoon-texture decomposition, an improved directionlet transform, and Gaussian mixture modeling of coefficients.
2) It decomposes SAR images into cartoon and texture parts, applies an improved directionlet transform to the texture part, and estimates noise-free coefficients using a Gaussian mixture model.
3) Experimental results on a SAR field image show the proposed method achieves better subjective visual quality and detail preservation compared to other methods like the G-MAP, wavelet, and nonsubsampled contourlet transforms.
Ptychography is a technique for scanning diffractive imaging that allows reconstruction of the phase and amplitude of an object from multiple diffraction patterns collected at different positions. It uses an iterative algorithm to recover the object by alternating between updating an estimated object and simulated diffraction patterns. This document discusses using ptychography at scanning transmission x-ray microscopes to achieve resolutions below 10 nm, as well as its applications in 3D imaging of biological samples with resolutions of 100nm or better and quantitative chemical analysis.
Pp linkedin example to upload to shareslideCindy Eack
The document provides tips for getting more from LinkedIn, recommending completing your profile to 100%, diligently updating your connections, adding a video to make your profile come alive, and tagging and filtering connections to build relationships.
This document outlines various digital design skills including pasting images, using grids for alignment, image resizing, using brushes, inverting colors, creating and formatting text boxes, cutting and transforming image sections, adding overlays and backgrounds, adding barcodes and fonts, and designing a digital promotional strategy plan with images and text.
The document discusses several album covers and how their visuals relate to the album titles and implied musical genres. The cover of "Lost Highway" depicts a deserted road reflecting the album title. "The Wall" features a simple wall matching its name. "Back in Black" stands out with a solid black background and white text, implying a rock sound. Another cover depicts graduation, relating to its theme of entering the real world. A black and white cover of a boy using the bathroom looks dirty, tying to its title.
This document discusses an applicant tracking system called Candidate Manager. It provides an overview of the company, key features of the applicant tracking software, and a case study of how a large retail client uses the system. Specifically, it allows clients to improve time to hire and reduce costs, has over 200 client implementations across 80 countries, features a candidate-friendly and branded application process, and helped a retail client manage over 1 million applications for 50,000 positions across the UK.
The document discusses several album covers and how their visuals relate to the album titles and implied musical genres. The cover of "Lost Highway" depicts a deserted road reflecting the album title. "The Wall" features a simple wall matching its name. "Back in Black" stands out with a solid black background and white text, implying a rock sound. Another cover depicts graduation, relating to its theme of entering the real world. A black and white cover of a boy using the bathroom looks dirty, tying to its title.
The document discusses planning for a "Search Conference" held at the Croft to get input from residents, co-workers, and other stakeholders on their values and visions for the future of the Croft community, which included creating a timeline of the Croft's history, small group discussions of individual's roles and strengths, and collecting ideas for future initiatives.
Candidate Manager, applicant tracking solutions enable organisations to recruit staff faster while also reducing on administration and recruitment agency costs.
This document discusses an applicant tracking system called Candidate Manager. It provides an overview of the company, key features of the applicant tracking software, and a case study of how a large retail client uses the system. Specifically, it allows clients to improve time to hire and reduce costs, has over 200 client implementations across 80 countries, features a candidate-friendly and branded application process, and integrates various modules like jobs, applicants, and reports. The case study outlines how a retailer with over 1 million annual applications uses it to manage a fully online recruitment process.
1) LANDesk Mobility Manager 9.0 provides solutions for common mobile device management problems such as unauthorized devices connecting to the network, configuring new devices, and remotely managing devices.
2) It offers features such as device connection policies, simple enrollment, device configuration policies, and remote management capabilities including locating lost devices and remotely wiping devices.
3) The document highlights support for iOS and Android platforms, listing the various policies and restrictions that can be applied, such as password policies, screen lock settings, and restricting certain device functions.
This magazine uses large colorful images and bold text to catch readers' attention, with the main images standing out on each page and highlighting what content can be found inside. While most pages feature person images, one uses a non-person image along with large bold text to advertise its contents differently.
This document repeatedly lists the URL www.TheLeadershipHub.com over multiple lines without any other text or context. It appears to be promoting or advertising the website www.TheLeadershipHub.com but provides no additional information about the site or its content.
The document outlines the roadmap and improvements for LANDesk Roadmap 9.0 SP3 scheduled for release in August and December 2011. Key focus areas include improved patching processes, provisioning support for Windows PE 3.1, usability enhancements to the web console and reporting, and strengthened security features such as third-party certificate support for the Cloud Services Appliance. Bugs and issues identified in previous versions will also be addressed.
A probabilistic model for recursive factorized image features pptirisshicat
The document describes a probabilistic model called recursive latent Dirichlet allocation (rLDA) for hierarchical image modeling. rLDA is based on latent Dirichlet allocation and has multiple layers of representations with increasing spatial support, where each layer learns representations jointly across layers through joint inference. This allows for distributed coding of local image features in a hierarchical manner while performing full Bayesian inference. The model is evaluated for its ability to learn hierarchical representations from images.
This document summarizes research on using deformable models for object recognition. It discusses using deformable part models to detect objects by optimizing part locations. Efficient algorithms like dynamic programming and min-convolutions are used for matching. Non-rigid objects are modeled using triangulated polygons that can deform individual triangles. Hierarchical shape models capture shape variations. The document applies these techniques to the PASCAL visual object recognition challenge, achieving state-of-the-art results on 10 of 20 object categories through discriminatively trained, multiscale deformable part models.
This document summarizes a class lecture on global illumination techniques for computer graphics. It discusses ray tracing and path tracing to solve the rendering equation through Monte Carlo integration. Radiosity for diffuse interreflection using form factors is covered. Participating media and photon mapping are also summarized. The next class will cover acceleration structures to speed up ray tracing computations. Project 4 is assigned, involving implementing a simple ray tracer.
This document discusses methods for large-scale image annotation and categorization using weakly supervised training data. It describes how traditional methods do not scale well to large datasets. Recent methods exploit linear models and distance metric learning to better scale. Specifically, Canonical Contextual Distance learning finds linear transformations to maximize correlation between image and label features in a latent subspace, providing a probabilistic similarity measure. This allows image auto-annotation on large datasets.
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
1) The document proposes a SAR image despeckling method based on cartoon-texture decomposition, an improved directionlet transform, and Gaussian mixture modeling of coefficients.
2) It decomposes SAR images into cartoon and texture parts, applies an improved directionlet transform to the texture part, and estimates noise-free coefficients using a Gaussian mixture model.
3) Experimental results on a SAR field image show the proposed method achieves better subjective visual quality and detail preservation compared to other methods like the G-MAP, wavelet, and nonsubsampled contourlet transforms.
Ptychography is a technique for scanning diffractive imaging that allows reconstruction of the phase and amplitude of an object from multiple diffraction patterns collected at different positions. It uses an iterative algorithm to recover the object by alternating between updating an estimated object and simulated diffraction patterns. This document discusses using ptychography at scanning transmission x-ray microscopes to achieve resolutions below 10 nm, as well as its applications in 3D imaging of biological samples with resolutions of 100nm or better and quantitative chemical analysis.
Neuronal structures are intricately related to their functions. Study of the neuronal structures reveals healthy and pathologic conditions, crucial to understanding how the Brain works. Current advances in microscopy techniques produce huge volume of data where manual reconstruction and analysis may take several years. Moreover, most of this data is sparse; hence digital reconstructions capturing the essential structural information of the neuronal networks provide ease of archiving, exchanging and analysing. The lack of powerful computational tools to automatically reconstruct neuronal arbors has emerged as a major technical bottleneck in neuroscience research. This work extends the Marked Point Process methodology, which has been proved to be an efficient framework for network extraction in 2D, to 3D neuronal network extraction from microscopy image stacks. The optimization process considers a multiple birth and death dynamics embedded in a simulated annealing scheme. To speed up the convergence a birth map based on the projection of the neuronal processes is considered.
ICCV2009: MAP Inference in Discrete Models: Part 1: Introductionzukun
This document outlines the schedule and topics for a tutorial on MAP inference in discrete models. The tutorial will cover discrete models in computer vision, message passing algorithms like DP and TRW, quadratic pseudo-boolean optimization, transformation and move-making methods for efficiency, and recent advances like dual decomposition and higher-order models. The lectures will be given by researchers from Microsoft and Stanford, and all tutorial material will be made available online after the event.
The document discusses cosmological surveys and their history. It provides an overview of several major galaxy surveys from the 1970s-present, including their sky coverage and number of galaxies observed. It also describes techniques for measuring galaxy clustering statistics like the two-point correlation function ξ(r) and power spectrum P(k) from survey data, and methods for estimating errors. Finally, it summarizes the current BOSS survey, which is aiming to constrain dark energy by measuring the baryon acoustic oscillation scale to 1% in distance and 2% in Hubble parameter in two redshift bins.
This document provides an overview of the Grade 7 curriculum for the 2022-2023 school year. It is divided into two columns - the left column lists the unit topics in Language Arts and Mathematics that will be covered each term, and the right column indicates which overall context each unit falls under. The Language Arts units focus on environmental, social/cultural, imaginative, and communicative themes. The Mathematics units cover number operations, patterns/relations, shape/space, and statistics/probability. The document is intended to outline the overall plan and alignment of subjects by reporting term for the two teachers, Mr. Coomber and Mr. McDowell.
This document discusses using large image datasets and context to understand scenes and objects. It proposes using millions of internet images to generate proposals for image completion and labeling based on nearest visual neighbors. Location metadata from geotagged images can provide context without object labels. Event prediction and video synthesis is demonstrated by retrieving relevant images from large collections to construct new videos based on a text query. Overall it argues that large internet-scale image collections provide rich context that can be leveraged for computer vision tasks through data-driven approaches rather than explicit modeling.
MIT6.870 Grounding Object Recognition and Scene Understanding: lecture 1zukun
This document provides information about the course 6.870 Grounding object recognition and scene understanding taught by Professor Antonio Torralba at MIT. The course covers topics related to object recognition and scene understanding through papers, presentations, and a course project. Students will be graded based on class participation, paper presentations, and a course project. The course project involves individual or paired work on a topic related to one of the course papers, described in a 4-page CVPR format paper and final presentation.
5. Compelling Characteristics
POWERFUL GEOMETRIC AND PHOTOMETRIC REPRESENTATION* OF SCENES
✤ It is a 3D, geometric representation that supports discovery of spatial relations
✤ Its appearance is modeled by MOG to handle illumination variations
✤ Appearance and geometry are automatically learned from multiple images with
calibrated cameras
✤ It is faithful to the scenes: There are no prior assumptions about the model
THESE CHARACTERISTICS ARE IDEAL FOR OBJECT RECOGNITION
* [Pollard and Mundy, CVPR 2007] [Crispell]
Maria Isabel Restrepo
6. Outline
✤ Volumetric appearance model - The Voxel World
✤ Insights on classical recognition methods
✤ Compositional hierarchies
✤ Bienenstock, Geman, Potter, 97; Geman, Chi, 2002; Geman, Jin, CVPR 2006
✤ Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008
✤ Mundy & Ozcanli, SPIE ’09
✤ Experimental work: Proof of concept
✤ Future work
Maria Isabel Restrepo
7. The Voxel World
Probabilistic representation of 3-d scenes based on volumetric units -voxel.
p(intensity)
intensity
Surface probability is given by incremental learning Appearance is modeled a Mixture of Gaussians
3
(I−µk )2
pN (Ix +1 |X ∈ S)
N
wk 1 −
2σ 2
P N +1 (X ∈ S) = P N (X ∈ S) p(I) = e k
pN (Ix +1 )
N
W 2πσk
2
k=1
Maria Isabel Restrepo
8. Outline
✤ Volumetric appearance model - The Voxel World
✤ Insights on classical recognition methods
✤ Compositional hierarchies
✤ Jin Geman
✤ Fidler Leonardis, CVPR’07; Fidler, Boben Leonardis, CVPR 2008
✤ Mundy Ozcanli, SPIE ’09
✤ Experimental work: Proof of concept
✤ Future work
Maria Isabel Restrepo
9. Classical Recognition: Bag of Features
Codeword, Feature space -
Feature descriptor
Codebook Classify
e.g .SVM
e.g. SIFT- Lowe
Naive Bayes
HOG- Dalal
NN
Drawbacks: Many have proposed more complex
representations of spatial object structure.
✤ Disregards spatial
✤ Constellation Models [Weber and Welling et al, Fergus et al]
information -Complex, few parts
✤ Large number of features are ✤ Probabilistic voting [Leibe, Schiele] -Large codebook -
complex matching
needed
✤ Hierarchical representations
Maria Isabel Restrepo
10. formation about the geometric
Learning Hierarchical Models of Scenes, Objects, and Parts center and local appearance. F
Hierarchical Representations
Erik B. Sudderth, Antonio Torralba, William T. Freeman, and Alan S. Willsky
clusters and their distributions f
Electrical Engineering Computer Science, Massachusetts Institute of Technology are therefore represented in on
esuddert@mit.edu, torralba@csail.mit.edu, billf@mit.edu, willsky@mit.edu clusters and geometric distribu
We thus need the means of finding the similarities among
different hierarchical nodes in a geometrical sense.
Abstract We propose to create similarity connections between hi-
o ζ νo a b
erarchical nodes within layers to achieve invariance for high
top nodes aj
We describe a hierarchical probabilistic model for the variability in object shape and draw similarities across lay-
detection and recognition of objects in cluttered, natural r Φ ∆o ers to achieve a proper scale normalization of features. We
h
O show how a d layer-independent description of objects de-
scenes. The model is based on a set of parts which describe α θ f e
Address the need for a
c
fined by the so-called shape-terminals, i.e. shapinals, can
✤
g
the expected appearance and position, in an object centered O z νp be passed to the higher-level, the category-specific repre-
coordinate frame, of features detected by a low-level inter- µ sentation. If performed inappearance k problem of ter-
l m n this manner, the r i
est operator. Each object category then has its own distri- aj
minal nodes within the hierarchical “library” is solved in a
representation that
bution over these parts, which are shared between objects.
We learn the parameters of this model via a Gibbs sampler
β φ
P
w x
Nm
Λ
P
∆p geometric
natural way. There is no distributions
need to by-pass or float features to
the top-most layer and thus unnecessarily load the complex-
p(g j,a j |O n)
M
ity of representation, which may prevent the unsupervised
incorporates geometric
which uses the graphical model’s structure to analytically
Figure 1. Graphical model describing how latent parts z creation of higher layers (the problem arising in [7]). In-
average over many parameters. Applied to a database of (a)
images of isolated objects, the sharing of parts among ob-
generate the appearance w and position x , relative to
Sudderth, Torralba, Freeman Willsky Mikolajcjzyk, Leibe, Schiele
stead, at each hierarchical stage of learning, only a subset
an image–specific reference location r , of the features of the layer’s statistically most repeatable features can be
coherence
jects improves detection accuracy when few training exam-
ples are available. We also extend this hierarchical frame- MIT-2006
detected in an image of object o . Boxes denote repli- UK, Switzerland,Hierarchical 2006
Figure 2. (a) Germany structur
combined further, yet the final, cross-layered description of
objects will retain its descriptive power.
Appearance clusters (left column
cation of the corresponding random variables: there are
work to scenes containing multiple objects.
M images, with Nm observed features in image m. tions for different object classes. F
1. Introduction
In this paper, we develop methods for the visual detec-
are in 2D Cartesian coordinate sys
with interesting semantic interpretations, and can improve
✤ Allow for a more
tion and recognition of object categories. We argue that
multi–object recognition systems should be based on mod-
performance when few training examples are available. Fi-
nally, object appearance information is shared between the Building the tree. To build t
els which consider the relationships between different ob- many scenes in which that object is found.
efficient representation
ject categories during the training process. This approach
provides several benefits. At the lowest level, significant
We begin in Sec. 2 by describing our generative model
for objects and parts, including a discussion of related work
clidean distance) to group the a
computational savings can be achieved if different cate- a hyperball of a given radius r
in the machine vision and text analysis literature. Sec. 3
gories share a common set of features. More importantly, then describes parameter estimation methods which com- or part they belong to. To bu
jointly trained recognition systems can use similarities be- bine Gibbs sampling with efficient variational approxima-
tween object categories to their advantage by learning fea- tions. In Sec. 4, we provide simulations demonstrating
ply agglomerative clustering. T
tures which lead to better generalization [4, 18]. This inter– the potential benefits of feature sharing. We conclude in with the number of clusters eq
✤ Consistent with
category regularization is particularly important in the com-
mon case where few training examples are available.
In complex, natural scenes, object recognition systems
Sec. 5 with preliminary extensions of the object hierarchy
to scenes containing multiple objects. and merges the two closest cl
record the indices of merged cl
biological systems
can be further improved by using contextual knowledge 2. A Generative Model for Object Features
about the objects likely to be found in a given scene, and Jin and Geman, 2006 Our generative model for objects is summarized in the Figuretance at which the representation. are m
1. Cross-layered, scale independent clusters
common spatial relationships between those objects [7, 19, Williamsgraphical model (a directed Geman
Chris Jin and Bayesian network) of Fig. 1. ANC
Fidler, Boben continues until the l
Leonardis
20]. In this paper, we propose a hierarchical generative Hierarchical Object Recognition
The nodes of this graph represent random variables, where
procedure hierarchical compositional
model for objects, the parts composing them, and the scenes Brown University 3.1. The base model:
U. The resulting Slovenia trace
of Ljubljana, clustering
shaded nodes are observed during training, and rounded framework [7]
surrounding them. The model, which is summarized in
Figs. 1 and 5, shares information between object categories
CVPR 2006
boxes are fixed hyperparameters. Edges encode the con-
CVPR 2007, 2008
We build on our previously proposed approach [7], p
tree. The only parameter to
ditional densities underlying the generative process [12]. where we proposed an unsupervised learning framework
in three distinct ways. First, parts define distributions over a
2.1. From Images to Features to obtain tom nodes (radius of appearanc
a hierarchical compositional representation of ob-
common low–level feature vocabularly, leading to compu- ject categories. Starting with simple oriented filters the ap-
tational savings when analyzing new images. In addition, Following [17], we represent each of our M grayscale proach learns the first three The of optimally sharable
tree levels. layers radii for interm
and more unusually, objects are defined using a common training images by a set of SIFT descriptors [13] computed features, defined as loose spatial Isabel Restrepo
Maria compositions, i.e. parts.
set of parts. This structure leads to the discovery of parts on affine covariant regions. We use K-means clustering to Upon thetributed higher-layer categorical representa- n
third layer, a between the bottom
tion is derived with minimal supervision. The model is in
essence composed of two recursively iterated radii are o
the top node. These steps, 1.) a
11. Prior work by Geman: Efficient Discrimination
[Bienenstock, Geman, Potter, 97], [Geman, Chi, 2002], [Geman, Jin, CVPR 2006]
A COMPOSITIONAL MACHINE: license plates
✤ Probabilistic framework
✤ Hierarchy and reusability license numbers
✤ It does not exclude the sharing of subparts
✤ Parts are everywhere, compositions are rare plate boundary
✤ Need to model relative geometry of parts (active) bricks. The proportionality sign (∝)generic letter,
can be replaced
with equality (=) if, at the introduction generic number
of each attribute
20
function, aβ , care is taken to ensure that p0 (aβ ) is exactly
β
40
the current (“unperturbed”) conditional distribution on aβ
60
given xβ 0. In general, it is not practical to compute an
Markovian distribution: Test set: 385 images, mostly from Logan Airport
80
Compositional distribution:
100
characters, plate
exact null distribution and P must be re-normalized.
The effect on coverage of the perturbation can be seen
sides
120
by comparing the upper and lower panels in Figure 3. For
Basic structures Composition vs.
140
each non-terminal brick β, the denominator, p0 (aβ ), was
β
approximated by assuming that in the absence of an explicit
Efficient discrimination: Markov versus Content-Sensitive dist. 160
constraint, the prior distribution on aβ is the parts of
one consis-
cient discrimination: Markov versus Content-Sensitive 200 Coincidence
20 40 60 80 100 120 140 160 180
dist. tent with independent instantiations of the children. The
characters and
(active) bricks. The proportionality sign (∝) can be replaced numerator, pc (aβ ), was constructed to encourage regularity
β
20 20 in plate sides
with equality (=) if, at the introduction of each attribute the relative positions of character parts, and of charac-
function, aβ , care is taken to ensure that p0 (aβ ) is exactly
β ters, in composing characters and strings, respectively. The
40 40
the current (“unperturbed”) conditional distribution on aβ upper panel is a sample instantiation from the Markov back-
60 60
given xβ 0. In general, it is not practical to compute an bone; the lower panel is a sample instantiation from the full
80
100
Sampling 80
100
exact null distribution and P must be re-normalized.
The effect on coverage of the perturbation can be seen
compositional distribution. Samples from the full compo-
sitional distribution can be computed (at considerable com-
120 120 by comparing the upper and lower panels in Figure 3. For putational cost) through a variant of importance sampling.
140 140
each non-terminal brick β, the denominator, p0 (aβ ), was
β Conditional Data Models. The data model connects in-
approximated by assuming that in the absence of an explicit terpretations to the grey-level image, and completes the
160 Original image
image
discrimination: 160 180 200 Zoomed license license region 200 aβ is the one consis-
EfficientOriginal 120 140 Markov versus Content-Sensitive dist. 60 the region
160 Zoomed prior 140 160 180
constraint, 80 100 120 distribution on
20 40 60 80 100 20 40
Bayesian framework. In the license-plate-reading demon-
tent with independent instantiations of the children. The stration system, we have assumed that the data distribution,
Figure 3. Samples from Markov backbone (upper panel, ‘4850’)
numerator, pc (aβ ), was constructed to encourage regularity
β conditioned on an interpretation, is a function only of the
and compositional distribution (lower panel, ‘8502’).
in the relative positions of character parts, and of charac-
20 states of the terminal bricks:
40
ters, in composing characters and strings, respectively. The
aβ (I ) returns the relative coordinates of the four numerals back-
upper panel is a sample instantiation from the Markov P (y|I ) = P (y|{xβ : β ∈ T })
60
bone; the lower panel is a sample instantiation from the full
that instantiate β in the interpretation I . Similarly, each where T ⊆ B is the set of terminal, or bottom-row, bricks.
Zoomed license character brick, and each numeral Samples fromhas an as-
compositional distribution. in particular, the full compo-
80
Original image region
Good performance in most image analysis applications
100
120
Detection sociated attribute function can be computed (at considerable com-
sitional distribution that computes the relative coor-
of the particular parts a variant of importance that
requires some degree of photometric invariance. In the
dinatesputational cost) through that are composed into sampling. context of a probability model, the notion of invariance is
140 Conditional Data Models. The A “compositional
character in a particular interpretation. data model connects in- closely connected to the statistical notion of sufficiency.
Top object under MarkovMarkov Top object under built to thea grey-level image, and completes the
Top object under distribution” is content-sensitive
Top object under content-sensitive (Equation 1)
terpretations from Markov backbone
160 The following data model, employed in the demonstration
60 distribution distribution
20 40 80 100 120
distribution
140 160 180 200
distribution
Bayesian framework. In the license-plate-reading demon-
and a pair of probability distributions, pc (“composed”) and
β system, is an example of the application of sufficiency to
stration system, we have assumed that the data distribution,
Figure 3. Samples from Markov backbone (upper panel, ‘4850’)β (“null”), on each attribute a . The former, composed
p0 β
invariance. As remarked earlier, the terminal bricks in
Top object under Markov distribution, captures regularities of the is a function only of the
conditioned on an interpretation, arrangements (i.e.
Top object under content-sensitive
and compositional distribution (lower panel, ‘8502’). Maria Isabel Restrepo
the demonstration system represent reusable parts of alpha-
distribution distribution states of the terminal bricks:
instantiations) of the children bricks, given that they are numeric characters. The states of the terminal bricks code
parts of the object represented by (y|{xβlatter, null distribu-
P (y|I ) = P β; the : β ∈ T }) the local position of the represented part. Some of the parts
aβ (I ) returns the relative coordinates of the four numerals tion, is the attribute distribution in the absence of the non- can be more-or-less clearly discerned from the upper-hand
that instantiate β in the interpretation I . Similarly, each
12. Prior Work by Fidler and Leonardis
[Fidler, Berginc, Leonardis CVPR 2006], [Fidler, Leonardis, CVPR 2007], [Fidler, Boben, Leonardis CVPR 2008]
Compositionality and bottom-up learning
✤ Computation efficiency - Scalable
✤ Bottom up learning: All classes in early
layers, then class specific
✤ Models general and discriminative
✤ Sharing of parts
Have learned complete objects from simple edges
Example of learned whole-object shape models.
Fidler, M. Boben, A. Leonardis. Learning a Hierarchical Compositional Shape Vocabulary for Multi-class Object Representation. Submitted to a journal.
Images from Fidler webpage
Maria Isabel Restrepo
13. !
Work by Mundy and Ozcanli
[Mundy, Ozcanli, SPIE 2009 ]
F igu re 6 A n example of vehicle ext rema oper ator responses. 1, 0.5, 90o , dark . T he spatial resolution is
a round 0.7 meters, with about 25 pixels on a vehicle. T he oper ator response is indicated by the cyan dot.
T he oper ator ker nel extent is indicated in blue. T he or iginal grey scale intensity is in the red channel.
Composition of Parts
✤ Combine Geman’s and Leonardis’ work into an
unified Bayesian framework F igu re 7 T he composition of ext rema oper ators. T he anisot ropic da r k oper ator,
b r ight pea k oper ator,
or ientation ' .
' . T he composition is cha r acter ized by distance d
, is composed with one of a
' and relative
✤ Classification of foreground objects: Vehicles F igu re 8 T h ree p r imitive ext rema oper ators compose in a L ayer 1 node. T he cent r al pa r t is
2, 1, - 45o , bright , and the second p r imitive pa r t is ' 2, 1, - 45o , dark . T he pea k responses of the
!
oper ators a re indicated by cyan pixels. T he oper ator ker nel is indicated in blue. T he vehicle intensity is in
the red channel.
✤ Domain: Low resolution, satellite images
Probabilistic Score:
p(dαα , θαα |ci )P (ci )
p(ci |dαα , θαα ) = αα αα
αα
p(dαα , θαα ) !
k−1
j j
p(d αα ,θ αα ) = p(d αα ,θ αα |¯
c
αα )P (¯
c
αα )+ p(d αα ,θ αα |cαα )P (cαα )
j=0
Maria Isabel Restrepo
14. Hierarchical Composition for 3D Objects
Buildings, streets, trees,
rivers...
Windows, street lines,
roofs, leafs ...
Junctions, curves...
Simple primitives e.g edges
Learn bottom-up
Maria Isabel Restrepo
15. Outline
✤ Volumetric appearance model - The Voxel World
✤ Insights on classical recognition methods
✤ Compositional hierarchies
✤ Jin Geman
✤ Fidler Leonardis, CVPR’07; Fidler, Boben Leonardis, CVPR 2008
✤ Mundy Ozcanli, SPIE ’09
✤ Proof of concept: Construction of a simple hierarchy to find
windows in the voxel world
✤ Future Work
Maria Isabel Restrepo
16. Data and Algorithm
˜
min DKL (f (x)|f (x)) Algorithm Steps
or f1 (x)
1.For each orientation
K1
✤ Apply corner kernel on
f (x) = wk fk (x) ˜
f (x) ∼ N(˜f , σf )
µ ˜2
k=1
appearance and occupancy grids
Top :Mean appearance near wall surface. Bottom: occupancy ✤ Perform non-maxima
suppression on kernel-specific
region
2.Build a hierarchy to find windows
Maria Isabel Restrepo
17. The Primitives: Corner Kernel
Corner kernel in 2D Corner kernel in 3D
Every pixel has a label/weight Every voxel has a label/weight
DEPTH
PLUS (+)
REGION
HEIGHT
WIDTH MINUS (-)
REGION
Maria Isabel Restrepo
18. The Primitives: Corner Kernel
PLUS (+)
REGION -
WHITE
VOXELS
MINUS (-)
REGION-
BLACK
VOXELS
Maria Isabel Restrepo
19. The Primitives: Corner Kernel
Rotate kernel to create layer of primitives
z
ψ θ
y
φ
x
Coordinate system of a corner kernel
Layer 1: Primitives
3D Corners
Maria Isabel Restrepo
21. Applying the Kernel
“Convolve” kernel with
appearance grid
Maria Isabel Restrepo
22. Operator Response and Simplifications
Ixi : Intensity at voxel xi
K : Kernel response
K = Ixi − Ixj
i:xi ∈R+ j:xj ∈R−
K ∼ Nk (µk , σk ) Distribution of the response
2
µk = µxi − µxj 2
σk = 2
σxi + 2
σxj
i:xi ∈R+ j:xj ∈R− i:xi ∈R+ j:xj ∈R−
This may be the first feature detector based on the spatial arrangement of appearance distributions
|R1
{
+|
µk , P (xi ∈ S) t and µk 0
kernel response = rα = i:xi ∈R+
0, otherwise
Maria Isabel Restrepo
23. Experiment Setup:
1. Demonstrate Hierarchy on a small region Experimental hierarchy
Object Layer:
Window
Layer 3:
2. Show some results on the full grid Triplets of corners
Layer 2:
Pairs of corners
Layer 1:
Corner primitives
Maria Isabel Restrepo
24. Algorithm Steps
Algorithm Steps
1. For each orientation
✤ Run a corner kernel
Maria Isabel Restrepo
25. Layer 1: Simple Features
Algorithm Steps
1. For each orientation
✤ Run a corner kernel
✤ Perform non-maxima suppression
on kernel-specific region
Maria Isabel Restrepo
32. Summary
✤ Appealing characteristics of The Voxel World and Compositional Hierarchies
✤ Introduced volumetric feature detectors that operate on distribution functions of
appearance
✤ Demonstrated, using a very simple instance of a compositional hierarchy the
efficiency of such representation.
✤ Localized large number of windows
Maria Isabel Restrepo
33. Future Work
✤ Include other extrema operators in the hierarchy (e.g. edges)
✤ Use occupancy information
✤ Learn prior distributions to fully explain probability density of compositions
✤ Optimize source code: Search and storage of parts (e.g octree)
✤ Learn parts automatically
✤ Learn whole-object hierarchies
Maria Isabel Restrepo
34. The Principle of Compositionality
The meaning of a complex expression is determined by
its structure and the meanings of its constituents.
Stanford Encyclopedia of Philosophy
Questions?
Maria Isabel Restrepo