This document discusses analyzing 3D human motion from 2D images. It covers:
1) The difficulties in inferring 3D information from 2D images, including depth ambiguities, self-occlusions, and data association challenges.
2) Two main approaches to modeling 3D human motion - generative/alignment-based methods that predict state distributions from images, and discriminative/predictive methods that optimize alignment with image features.
3) Key techniques for temporal inference in tracking human motion over time, including generative methods like particle filtering and discriminative conditional models.
The document discusses understanding visual scenes through object recognition and scene recognition techniques. It provides an overview of early approaches to object recognition using templates, blocks worlds and generalized cylinders. It then discusses more modern techniques using histograms of oriented gradients, deformable parts models, and convolutional neural networks. The document notes that scenes can be understood through scene-level properties like spatial layout and emergent features rather than individual objects. Global image descriptors like GIST and bag-of-words models are discussed as alternatives to object-based recognition for scene classification. Large-scale datasets with thousands of scene categories are also mentioned.
This document discusses representation in low-level visual learning. It summarizes that most visual learning has used overly simplified models and representation matters. It then reviews several approaches for tasks like image segmentation, optical flow estimation, and motion modeling that use more sophisticated representations like Markov random fields, probabilistic graphical models, and layered models that explicitly model occlusion. The document argues these approaches that incorporate spatial context and multi-scale representations have achieved better performance than early simplified models on tasks like image segmentation, optical flow estimation, and motion modeling.
This document summarizes scattering in computer graphics and computer vision, including:
- Types of scattering such as diffuse reflection, specular reflection, BRDF, subsurface scattering, single scattering, and multiple scattering.
- Models for subsurface scattering including diffuse approximation, plane-parallel approximation, and Donner's empirical BSSRDF model.
- Techniques for measuring scattering properties like BRDF and rendering effects of scattering in participating media and subsurface scattering.
This document summarizes Toru Tamaki's presentation on scattering in computer graphics (CG) and computer vision (CV). It discusses reflection models including diffuse/specular reflection and bidirectional reflectance distribution functions (BRDFs). It also covers subsurface scattering within materials, models for subsurface scattering including diffuse approximation and plane-parallel approximation, and measuring scattering properties including single and multiple scattering. Examples of subsurface scattering rendering from past CG papers are shown.
Machine learning methods are increasingly being used for computer vision tasks like object recognition. Bag-of-words models represent images as collections of local features or "visual words" to classify images based on their visual content. These models first detect local features, then cluster them into visual word codebooks. Images are then represented as histograms of visual word frequencies. Naive Bayes and hierarchical Bayesian models like pLSA and LDA can then be used for classification or topic modeling based on these image representations. While intuitive and computationally efficient, bag-of-words models lack explicit geometric information about object parts and have not been extensively tested for viewpoint and scale invariance.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
Retrieving Informations from Satellite Images by Detecting and Removing ShadowIJTET Journal
In accordance with the characteristics of remote sensing images, we put forward a color intensity method of
shadow detection and removal. Some approaches for shadow detection and removal use particular color and spectral
properties of shadows. In this method, the input satellite image color plane is calculated and the values of RGB are separated.
Then the chromaticity is calculated to determine the average value of the segmented region. The Color Intensity algorithm is
adopted to remove the shadow and retrieve the corresponding information.
The document discusses understanding visual scenes through object recognition and scene recognition techniques. It provides an overview of early approaches to object recognition using templates, blocks worlds and generalized cylinders. It then discusses more modern techniques using histograms of oriented gradients, deformable parts models, and convolutional neural networks. The document notes that scenes can be understood through scene-level properties like spatial layout and emergent features rather than individual objects. Global image descriptors like GIST and bag-of-words models are discussed as alternatives to object-based recognition for scene classification. Large-scale datasets with thousands of scene categories are also mentioned.
This document discusses representation in low-level visual learning. It summarizes that most visual learning has used overly simplified models and representation matters. It then reviews several approaches for tasks like image segmentation, optical flow estimation, and motion modeling that use more sophisticated representations like Markov random fields, probabilistic graphical models, and layered models that explicitly model occlusion. The document argues these approaches that incorporate spatial context and multi-scale representations have achieved better performance than early simplified models on tasks like image segmentation, optical flow estimation, and motion modeling.
This document summarizes scattering in computer graphics and computer vision, including:
- Types of scattering such as diffuse reflection, specular reflection, BRDF, subsurface scattering, single scattering, and multiple scattering.
- Models for subsurface scattering including diffuse approximation, plane-parallel approximation, and Donner's empirical BSSRDF model.
- Techniques for measuring scattering properties like BRDF and rendering effects of scattering in participating media and subsurface scattering.
This document summarizes Toru Tamaki's presentation on scattering in computer graphics (CG) and computer vision (CV). It discusses reflection models including diffuse/specular reflection and bidirectional reflectance distribution functions (BRDFs). It also covers subsurface scattering within materials, models for subsurface scattering including diffuse approximation and plane-parallel approximation, and measuring scattering properties including single and multiple scattering. Examples of subsurface scattering rendering from past CG papers are shown.
Machine learning methods are increasingly being used for computer vision tasks like object recognition. Bag-of-words models represent images as collections of local features or "visual words" to classify images based on their visual content. These models first detect local features, then cluster them into visual word codebooks. Images are then represented as histograms of visual word frequencies. Naive Bayes and hierarchical Bayesian models like pLSA and LDA can then be used for classification or topic modeling based on these image representations. While intuitive and computationally efficient, bag-of-words models lack explicit geometric information about object parts and have not been extensively tested for viewpoint and scale invariance.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
Retrieving Informations from Satellite Images by Detecting and Removing ShadowIJTET Journal
In accordance with the characteristics of remote sensing images, we put forward a color intensity method of
shadow detection and removal. Some approaches for shadow detection and removal use particular color and spectral
properties of shadows. In this method, the input satellite image color plane is calculated and the values of RGB are separated.
Then the chromaticity is calculated to determine the average value of the segmented region. The Color Intensity algorithm is
adopted to remove the shadow and retrieve the corresponding information.
Background subtraction is a technique used to separate foreground objects from backgrounds in video frames. It works by comparing each frame to a background model and detecting differences which indicate moving foreground objects. Recursive techniques like mixtures of Gaussians model the background pixel values over time using multiple Gaussian distributions, allowing the background model to adapt to changing lighting conditions. Adaptive background/foreground detection uses a background model that evolves over time to distinguish foreground objects from the background in a robust way.
Video stabilization is a process that smooths shaky camera motion in videos. It works by estimating and compensating for background image motion caused by camera movement. There are different algorithms used depending on the type of scene and motion. Feature-based methods extract and match features between frames to model global motion, while flow-based methods use optical flow. The stabilization process involves two phases: motion estimation and motion smoothing.
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad
This document provides an overview of a Masters thesis that proposes algorithms for human action recognition. It begins with an introduction that discusses the importance of human action recognition, challenges in the field, and differences between actions and activities. It then presents an agenda that outlines an introduction, overview, and details of two proposed algorithms: 2DHOOF/2DPCA contour-based optical flow and human gesture recognition using Radon transform/2DPCA. The overview section describes the general structure of action recognition systems from video capture to classification. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed algorithms.
This document discusses compressive displays and related technologies for reducing the bandwidth requirements of multi-view and light field displays. It describes several technologies including layered 3D displays, polarization field displays, and high-rank 3D displays that decompose 4D light fields into lower dimensional representations. It also discusses using mathematical techniques like non-negative matrix factorization for further compressing display data. The document promotes open collaboration through the proposed Compressive Display Consortium to advance next generation displays.
Application of feature point matching to video stabilizationNikhil Prathapani
One of the significant application of computer vision is Stabilizing a video that was captured from a jittery or moving platform. One way to stabilize a video is to track a prominent feature in the image and utilize it as an anchor point to cancel out all perturbations relative to it. This technique, however, must be bootstrapped with knowledge of where such a salient feature remains in the first video frame. The paper presents method of video stabilization that works without any such erstwhile knowledge. The method is built on the basis of Random Sampling and Consensus (RANSAC) and adding few additions to the existing methodologies. It instead automatically investigates for the "background plane" in a video sequence, and utilizes its observed distortion to precise for camera motion. All the simulations have been performed using MATLAB tool.
This document discusses various feature detectors used in computer vision. It begins by describing classic detectors such as the Harris detector and Hessian detector that search scale space to find distinguished locations. It then discusses detecting features at multiple scales using the Laplacian of Gaussian and determinant of Hessian. The document also covers affine covariant detectors such as maximally stable extremal regions and affine shape adaptation. It discusses approaches for speeding up detection using approximations like those in SURF and learning to emulate detectors. Finally, it outlines new developments in feature detection.
Yann LeCun discusses the architecture of convolutional neural networks and feature learning. He argues that most recognition systems use a similar architecture involving a filter bank, nonlinearity, pooling, and normalization. Convolutional networks directly learn these features through multiple stacked stages, rather than hand-engineering them. LeCun presents results showing convolutional networks outperform traditional systems on Caltech 101 by learning hierarchical representations through multiple stages with harsh nonlinearities, contrast normalization, and sparsity constraints.
1) The document discusses visual servoing in the presence of non-rigid objects. Classical visual servoing algorithms assume rigid objects and environments but this is often not valid.
2) The paper proposes an algorithm that extracts invariant features of the non-rigid object from a sequence of images to represent its state. These features are then used in the visual servoing process to position the robot end-effector despite the non-rigid motion.
3) The algorithm is tested using simulated non-rigid motion and is shown to generate a stable camera trajectory, unlike approaches that rely on features from single images.
C. Guyon, T. Bouwmans, E. Zahzah, “Foreground detection based on low-rank and block-sparse matrix decomposition”, IEEE International Conference on Image Processing, ICIP 2012, pages 1225-1228, Orlando, Florida, USA, September 2012.
Olga Senyukova - Machine Learning Applications in MedicineOlsen222
How can beautiful algorithmic findings be helpful in our everyday life? One of the answers to this question lies in the area of healthcare applications. Nowadays machine learning methods are becoming more and more useful in medicine. They are able not only to assist medical specialists in processing large amounts of data, but also to help in diagnostics and patient follow-up.
This course is devoted to the discussion of some interesting applications of machine learning methods to automatically analyse medical images and physiologic signals. Medical images acquired by means of special equipment represent internal structures of the human body and/or processes in it. The most modern technologies for acquisition of such images are magnetic resonance imaging (MRI) and computed tomography (CT). Physiologic signals usually refer to cardiologic time series such as electrocardiograms (ECG), but can also represent other physiological data, for example, stride intervals of human gait.
Several important problems will be highlighted along with successful solutions involving machine learning methods including examples both from the worldwide practice and the author’s own research. Description of the basic principles of the algorithms used will provide a good opprotunity to strengthen the knowledge acquired from the other courses of the school.
Presentation at International Advanced School on Knowledge Co-creation and Service Innovation 2012, Japan Advanced Institute of Science and Technology, March 1
Accelarating Optical Quadrature Microscopy Using GPUsPerhaad Mistry
1) Phase unwrapping is used in optical quadrature microscopy to determine viability of embryos by counting cells after unwrapping. It needs to be done at near real-time speeds to analyze sample changes.
2) The paper implements minimum LP norm phase unwrapping and affine transformations on a GPU to improve performance and latency for optical microscopy research.
3) Performance results show a 5.24x speedup for total phase unwrapping time compared to a serial CPU implementation. Further optimizations like multi-GPU support could improve speeds for higher image acquisition rates.
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
Shearlet theory has become a central tool in analyzing and representing 2D and 3D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directional parameter. The success of shearlets owes to an extensive list of desirable properties: Shearlet systems can be generated by one function, they provide precise resolution of wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital world.
This talk gives an introduction to shearlet theory with focus on separable and compactly supported shearlets in 2D and 3D. We will consider constructions of band-limited and compactly supported shearlet frames in those dimensions. Finally, we will show that compactly supported shearlet frames satisfying weak decay, smoothness, and directional moment conditions provide optimally sparse approximations of a generalized model of cartoon-like images comprising of piecewise C² functions that are smooth apart from piecewise C² discontinuity edges.
This talk is based on joint with G. Kutyniok and W.-Q Lim (TU Berlin).
The document summarizes several popular iris recognition algorithms: Daugman, Li Ma, Wildes, and Tisse. It describes the key steps and approaches for each algorithm: segmentation, normalization, feature extraction, and matching. It finds Daugman's algorithm to be the most accurate according to tests on the CASIA iris image database, with 0.01/0.09 FAR/FRR and 99.90% accuracy. The document provides references for further reading on iris recognition and the algorithms discussed.
Recent Advances in Object-based Change Detection.pdfgrssieee
This document summarizes recent advances in object-based change detection methods. It presents a new multiresolution segmentation approach adapted for change detection that segments images independently but checks for consistency between segments. Object correspondence is established through directed or intersected objects. Change detection is performed using IR-MAD. The methods were tested on aerial imagery and showed good overall accuracy rates above 98%, demonstrating the promise of object-based change detection techniques. Further developments are needed to improve consistency tests, segmentation parameter selection, and implementation.
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation SlideI-Chao Shen
This is the presentation slide of paper : Perspective-Aware Warping for Seamless Stereoscopic Image Cloning.
It is made by Sheng-Jie Luo, National Taiwan University.
Please refer to http://www.cmlab.csie.ntu.edu.tw/~forestking/research/SIGA12-StereoCloning/ for more detail.
This short document contains a series of repeated words before transitioning to a series of repeated words after. It appears to be contrasting two different states or time periods through minimal repeated language.
This document discusses the process of web crawling and building a web crawler. It describes how crawlers work by starting with a set of URLs, fetching web pages from those URLs, extracting new URLs from the pages, adding them to the list to crawl, and repeating the process. It also discusses important considerations for building large-scale crawlers, such as handling concurrent requests efficiently, avoiding duplicate URLs, managing server load, and storing crawled pages reliably at a large scale.
This small kitchen project involved designing a new layout for a baker's kitchen. The document shows before and after photos of the baker's kitchen, with the after photos depicting an updated design and layout. The new design improved the workflow and organization of the space.
The document contains 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each for a total of 100 slides. The core information provided is the repetitive structure and count of slides across 10 different sections or topics.
Background subtraction is a technique used to separate foreground objects from backgrounds in video frames. It works by comparing each frame to a background model and detecting differences which indicate moving foreground objects. Recursive techniques like mixtures of Gaussians model the background pixel values over time using multiple Gaussian distributions, allowing the background model to adapt to changing lighting conditions. Adaptive background/foreground detection uses a background model that evolves over time to distinguish foreground objects from the background in a robust way.
Video stabilization is a process that smooths shaky camera motion in videos. It works by estimating and compensating for background image motion caused by camera movement. There are different algorithms used depending on the type of scene and motion. Feature-based methods extract and match features between frames to model global motion, while flow-based methods use optical flow. The stabilization process involves two phases: motion estimation and motion smoothing.
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad
This document provides an overview of a Masters thesis that proposes algorithms for human action recognition. It begins with an introduction that discusses the importance of human action recognition, challenges in the field, and differences between actions and activities. It then presents an agenda that outlines an introduction, overview, and details of two proposed algorithms: 2DHOOF/2DPCA contour-based optical flow and human gesture recognition using Radon transform/2DPCA. The overview section describes the general structure of action recognition systems from video capture to classification. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed algorithms.
This document discusses compressive displays and related technologies for reducing the bandwidth requirements of multi-view and light field displays. It describes several technologies including layered 3D displays, polarization field displays, and high-rank 3D displays that decompose 4D light fields into lower dimensional representations. It also discusses using mathematical techniques like non-negative matrix factorization for further compressing display data. The document promotes open collaboration through the proposed Compressive Display Consortium to advance next generation displays.
Application of feature point matching to video stabilizationNikhil Prathapani
One of the significant application of computer vision is Stabilizing a video that was captured from a jittery or moving platform. One way to stabilize a video is to track a prominent feature in the image and utilize it as an anchor point to cancel out all perturbations relative to it. This technique, however, must be bootstrapped with knowledge of where such a salient feature remains in the first video frame. The paper presents method of video stabilization that works without any such erstwhile knowledge. The method is built on the basis of Random Sampling and Consensus (RANSAC) and adding few additions to the existing methodologies. It instead automatically investigates for the "background plane" in a video sequence, and utilizes its observed distortion to precise for camera motion. All the simulations have been performed using MATLAB tool.
This document discusses various feature detectors used in computer vision. It begins by describing classic detectors such as the Harris detector and Hessian detector that search scale space to find distinguished locations. It then discusses detecting features at multiple scales using the Laplacian of Gaussian and determinant of Hessian. The document also covers affine covariant detectors such as maximally stable extremal regions and affine shape adaptation. It discusses approaches for speeding up detection using approximations like those in SURF and learning to emulate detectors. Finally, it outlines new developments in feature detection.
Yann LeCun discusses the architecture of convolutional neural networks and feature learning. He argues that most recognition systems use a similar architecture involving a filter bank, nonlinearity, pooling, and normalization. Convolutional networks directly learn these features through multiple stacked stages, rather than hand-engineering them. LeCun presents results showing convolutional networks outperform traditional systems on Caltech 101 by learning hierarchical representations through multiple stages with harsh nonlinearities, contrast normalization, and sparsity constraints.
1) The document discusses visual servoing in the presence of non-rigid objects. Classical visual servoing algorithms assume rigid objects and environments but this is often not valid.
2) The paper proposes an algorithm that extracts invariant features of the non-rigid object from a sequence of images to represent its state. These features are then used in the visual servoing process to position the robot end-effector despite the non-rigid motion.
3) The algorithm is tested using simulated non-rigid motion and is shown to generate a stable camera trajectory, unlike approaches that rely on features from single images.
C. Guyon, T. Bouwmans, E. Zahzah, “Foreground detection based on low-rank and block-sparse matrix decomposition”, IEEE International Conference on Image Processing, ICIP 2012, pages 1225-1228, Orlando, Florida, USA, September 2012.
Olga Senyukova - Machine Learning Applications in MedicineOlsen222
How can beautiful algorithmic findings be helpful in our everyday life? One of the answers to this question lies in the area of healthcare applications. Nowadays machine learning methods are becoming more and more useful in medicine. They are able not only to assist medical specialists in processing large amounts of data, but also to help in diagnostics and patient follow-up.
This course is devoted to the discussion of some interesting applications of machine learning methods to automatically analyse medical images and physiologic signals. Medical images acquired by means of special equipment represent internal structures of the human body and/or processes in it. The most modern technologies for acquisition of such images are magnetic resonance imaging (MRI) and computed tomography (CT). Physiologic signals usually refer to cardiologic time series such as electrocardiograms (ECG), but can also represent other physiological data, for example, stride intervals of human gait.
Several important problems will be highlighted along with successful solutions involving machine learning methods including examples both from the worldwide practice and the author’s own research. Description of the basic principles of the algorithms used will provide a good opprotunity to strengthen the knowledge acquired from the other courses of the school.
Presentation at International Advanced School on Knowledge Co-creation and Service Innovation 2012, Japan Advanced Institute of Science and Technology, March 1
Accelarating Optical Quadrature Microscopy Using GPUsPerhaad Mistry
1) Phase unwrapping is used in optical quadrature microscopy to determine viability of embryos by counting cells after unwrapping. It needs to be done at near real-time speeds to analyze sample changes.
2) The paper implements minimum LP norm phase unwrapping and affine transformations on a GPU to improve performance and latency for optical microscopy research.
3) Performance results show a 5.24x speedup for total phase unwrapping time compared to a serial CPU implementation. Further optimizations like multi-GPU support could improve speeds for higher image acquisition rates.
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
Shearlet theory has become a central tool in analyzing and representing 2D and 3D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directional parameter. The success of shearlets owes to an extensive list of desirable properties: Shearlet systems can be generated by one function, they provide precise resolution of wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital world.
This talk gives an introduction to shearlet theory with focus on separable and compactly supported shearlets in 2D and 3D. We will consider constructions of band-limited and compactly supported shearlet frames in those dimensions. Finally, we will show that compactly supported shearlet frames satisfying weak decay, smoothness, and directional moment conditions provide optimally sparse approximations of a generalized model of cartoon-like images comprising of piecewise C² functions that are smooth apart from piecewise C² discontinuity edges.
This talk is based on joint with G. Kutyniok and W.-Q Lim (TU Berlin).
The document summarizes several popular iris recognition algorithms: Daugman, Li Ma, Wildes, and Tisse. It describes the key steps and approaches for each algorithm: segmentation, normalization, feature extraction, and matching. It finds Daugman's algorithm to be the most accurate according to tests on the CASIA iris image database, with 0.01/0.09 FAR/FRR and 99.90% accuracy. The document provides references for further reading on iris recognition and the algorithms discussed.
Recent Advances in Object-based Change Detection.pdfgrssieee
This document summarizes recent advances in object-based change detection methods. It presents a new multiresolution segmentation approach adapted for change detection that segments images independently but checks for consistency between segments. Object correspondence is established through directed or intersected objects. Change detection is performed using IR-MAD. The methods were tested on aerial imagery and showed good overall accuracy rates above 98%, demonstrating the promise of object-based change detection techniques. Further developments are needed to improve consistency tests, segmentation parameter selection, and implementation.
SIGGRAPH ASIA 2012 Stereoscopic Cloning Presentation SlideI-Chao Shen
This is the presentation slide of paper : Perspective-Aware Warping for Seamless Stereoscopic Image Cloning.
It is made by Sheng-Jie Luo, National Taiwan University.
Please refer to http://www.cmlab.csie.ntu.edu.tw/~forestking/research/SIGA12-StereoCloning/ for more detail.
This short document contains a series of repeated words before transitioning to a series of repeated words after. It appears to be contrasting two different states or time periods through minimal repeated language.
This document discusses the process of web crawling and building a web crawler. It describes how crawlers work by starting with a set of URLs, fetching web pages from those URLs, extracting new URLs from the pages, adding them to the list to crawl, and repeating the process. It also discusses important considerations for building large-scale crawlers, such as handling concurrent requests efficiently, avoiding duplicate URLs, managing server load, and storing crawled pages reliably at a large scale.
This small kitchen project involved designing a new layout for a baker's kitchen. The document shows before and after photos of the baker's kitchen, with the after photos depicting an updated design and layout. The new design improved the workflow and organization of the space.
The document contains 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each for a total of 100 slides. The core information provided is the repetitive structure and count of slides across 10 different sections or topics.
This document outlines a kitchen remodel project from November 2009 designed by Daniela Cottingham. Key aspects of the remodel included installing a new backsplash, island, oven, farm sink, and transforming an area into a laundry/mudroom. The document provides before and after photos and an overview of the remodel design.
This document promotes Campus Crusade for Christ in Portland, Oregon. It notes that Portland is a city of activists waiting to engage with something meaningful. It also states that over 100,000 students in the area will graduate without hearing about God's love. The document calls for transforming the campuses and city by reaching these students, and that graduates who know Christ will have a ripple effect. It encourages supporting their work by telling people about Jesus, launching movements, leveraging technology, and mentoring future leaders.
The document contains the lyrics to multiple Christmas songs and carols. It discusses various aspects of Christmas including Santa Claus and his reindeer, going to see baby Jesus in the manger, wishing others a Merry Christmas, and celebrating the Christmas season with family.
The document consists of 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each, for a total of 100 slides across 10 sections.
This document discusses analyzing 3D human motion from 2D images. It covers:
1) The difficulties in inferring 3D information from 2D images, including depth ambiguities, self-occlusions, and data association challenges.
2) Two main approaches to modeling 3D human motion - generative/alignment-based methods that predict state distributions from images, and discriminative/predictive methods that optimize alignment with image features.
3) Key techniques for temporal inference in tracking human motion over time, including generative methods like particle filtering and discriminative conditional models.
The document contains 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each, for a total of 100 slides across 10 sections.
The document consists of 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each, for a total of 100 slides across 10 sections.
This document provides an overview of Oracle HRMS Payroll concepts and tables. It defines key payroll terms and describes important tables used in payroll processing and their relationships. These include tables to store employee and assignment data, element and input value definitions, element links and entries, payroll actions and results. Examples of useful queries to retrieve element entry and run result information are also provided.
The document describes a tutorial on using algebraic geometry and Gröbner bases to solve polynomial problems in computer vision. The tutorial will cover the theory of Gröbner bases and how it can be used to solve systems of polynomial equations. Several examples of problems in computer vision that have been solved using Gröbner bases will be presented, including the generalized 3-point problem, 5-point relative pose problem, and triangulation. The tutorial aims to explain both the theoretical foundations and practical applications of Gröbner bases.
The document summarizes several theorists who influenced the shaping of American education from the late 18th century to early 20th century. Some of the key theorists mentioned include John Dewey, who promoted progressive education, Horace Mann, who targeted problems in public schools, and Jane Addams, who was an activist and social reformer. The document provides brief biographies of these and other influential theorists noting their contributions to the development of American education.
Deep learning-for-pose-estimation-wyang-defenseWei Yang
This document summarizes a thesis proposal on using deep learning for articulated human pose estimation. The proposed method uses a deep convolutional neural network (DCNN) as a front-end to extract local appearance features of body parts, combined with message passing layers to model spatial relationships between parts through pairwise constraints. This global pose model is trained end-to-end using a max-sum algorithm to maximize consistency across the entire human pose. Experimental results on standard pose estimation datasets demonstrate state-of-the-art performance.
발표자: 최봉수(Chris Choy, Stanford University 박사 과정)
발표일: 2017.8.
CVPR, ICCV, CVPR, NIPS, TIP, ICRA, TPAMI, 3DV, 등 Reviewer 및 worshop organizer
개요:
3D perception consists broadly of simple geometric understanding of the objects to high-level cognition such as semantic scene understanding and inferring the relationship between objects. In this talk, I'll present broad class of works from low-level to high-level cognition tasks that encompass the 3D perception.
Using reduced system models for vibration design and validation.
1) Equivalent models are used to reduce complexity while preserving key behaviors through homogenization and model updating.
2) Model reduction techniques like component mode synthesis represent the system with subspace bases to enable coupling of test and finite element models.
3) Energy coupling methods allow assembly of disjoint reduced component models through computation of interface energies.
Structure from motion is a computer vision technique used to recover the three-dimensional structure of a scene and the camera motion from a set of images. It involves detecting feature points in multiple images, matching corresponding points across images, estimating camera poses and orientations, and reconstructing the 3D geometry of scene points. Large-scale structure from motion can reconstruct scenes from thousands of images but requires solving very large optimization problems. Applications include 3D modeling, surveying, robot navigation, virtual reality, augmented reality, and simultaneous localization and mapping.
Structure from motion is a computer vision technique used to recover the three-dimensional structure of a scene and the camera motion from a set of images. It can be used to build 3D models of scenes without any prior knowledge of the camera parameters or 3D locations of the scene points. Structure from motion involves detecting feature points in multiple images, matching the features between images, estimating the fundamental matrices between image pairs, and then optimizing a bundle adjustment problem to simultaneously compute the 3D structure and camera motion parameters. Some applications of structure from motion include 3D modeling, surveying, robot navigation, virtual and augmented reality, and visual effects.
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
1) The document discusses super-resolution techniques in deep learning, including inverse problems, image restoration problems, and different deep learning models.
2) Early models like SRCNN used convolutional networks for super-resolution but were shallow, while later models incorporated residual learning (VDSR), recursive learning (DRCN), and became very deep and dense (SRResNet).
3) Key developments included EDSR which provided a strong backbone model and GAN-based approaches like SRGAN which aimed to generate more realistic textures but require new evaluation metrics.
Stereo vision allows computing depth from two images captured from different viewpoints. The key is measuring disparity, how much each pixel moves between the two images. Epipolar geometry constrains the possible matches. Basic stereo algorithms match pixels in conjugate epipolar lines. State-of-the-art methods formulate stereo as an energy minimization problem that balances match quality and smoothness in the depth map. Dynamic programming and graph cuts provide good approximations to solve this NP-hard problem. Structured light and laser scanning are alternatives to passive stereo that simplify correspondence.
The document discusses geometry processing and sparse matrices. It provides an overview of geometry processing, which uses applied math and computer science concepts to design algorithms for acquiring, reconstructing, analyzing, manipulating, simulating and transmitting complex 3D models. It then discusses representing shapes using triangle meshes and differential coordinates. It introduces discrete and continuous Laplace-Beltrami operators for geometry processing. Finally, it discusses representing the Laplacian matrix of a mesh as a sparse matrix using scipy and different sparse matrix storage formats like CSR and CSC for efficient computations.
Computational Tools for Extracting, Representing and Analyzing Facial Featuressaulnml
ABSTRACT: In this work, we present a computer aided system for interactive extraction of anthropometrical points (landmarks) on 3D human face meshes, and a methodology for a statistical analysis of the anthropometrical points. In the developed software, we employed real time rendering techniques and interactive picking through collisions detection and haptic feedback, to allow intuitive user interaction with the virtual model. We also exploit the geometric information of the meshes by computing and displaying the curvatures and shape index, to give to the user a better understanding of the 3D data by using color maps. The proposed method was tested on a database of 35 faces from healthy Mexican individuals, obtained with a low cost structured light stereovision system. One of the objectives of this study, was to determine the statistical variability of a set of 19 face landmarks, which define facial features of the eyes, mouth, nose, cheeks and chin. To validate the reliability of the hand-extracted landmarks, a Technical Error of Measurement (TEM) analysis was performed. After the points were extracted, a rigid registration of the landmarks, to those from a reference head model, was applied by determining an optimal rigid transformation consisting of a unit quaternion and a translation vector obtained from the cross-covariance matrix. To obtain the mean landmarks set and the modes of variation, a Principal Components Analysis (PCA) based on the covariance matrix was employed. Finally, we approximated the average facial shape of the population under study, by deforming the reference head model through cage-based registration. In such approach, the high resolution model is attached to a rough mesh or cage, which encloses the detailed model, by using Mean Value Coordinates (MVC) each vertex of the detailed mesh is represented as a linear combination of the cage mesh vertices, which allow detail preserving deformation of the high polygonal mesh by displacement of the vertices from the coarse mesh, then the cage is iteratively deformed through Laplacian deformation in order to minimize the squared distance between corresponding landmarks. Besides working with points, and linear and angular measurements, the developed software also allows interactively to select paths and contours on the mesh surface, obtaining geodesic distances and areas; and even working on images. Finally, our work can be extended for the analysis of other complex anatomical structures, and potentially it may have other applications such as facial recognition on forensics, random face generation for avateering in virtual environments, and building compact 3D face databases.
This presentation is an analysis of the paper,"SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing"
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structuresayubimoak
1) The document describes a 3D finite-difference time-domain (FDTD) simulator developed at Arizona State University to model large-scale 3D photonic crystal structures.
2) The simulator uses the FDTD method along with convolutional perfectly matched layer absorbing boundary conditions to minimize reflections at the simulation boundaries.
3) Large-scale parallel simulations are enabled through domain decomposition across multiple processors to model photonic crystal structures with over 107 grid cells that cannot be simulated on a single processor.
Computer vision systems use digital image processing and analysis techniques to interpret and understand images. There are many approaches to computer vision including edge detection, segmentation, shape and texture description, as well as techniques for image enhancement, restoration, compression and more. While human vision is very good at tasks like recognition and navigation, computer vision is still limited and faces challenges from changes in lighting, occlusion, and ambiguity.
1) The document discusses using data mining techniques to build process models from full-scale plant data in order to optimize water and wastewater treatment processes.
2) Building accurate models is challenging due to issues scaling up from pilots, nonlinear and chaotic behaviors, and sensitivity to initial conditions. Data mining full-scale data can help address these challenges.
3) Case studies demonstrate using neural networks to model relationships between inputs like turbidity, temperature and outputs like disinfection byproducts. This allows predicting impacts of changes to optimize chemical use and meet regulations.
Various object detection and tracking methodssujeeshkumarj
This document provides an overview of various object detection and tracking methods. It discusses Scale-Invariant Feature Transform (SIFT) for object detection, which extracts keypoints that are invariant to scale, orientation and illumination changes. It also covers You Only Look Once (YOLO) for real-time object detection using a single neural network, and Histogram of Oriented Gradients (HOG) for human detection. For object tracking, it discusses point tracking using deterministic and statistical methods, kernel tracking using template matching, and silhouette tracking for complex object shapes. Applications mentioned include video surveillance, autonomous vehicles, and human-computer interaction.
The document provides an overview of the human visual system and digital cameras. It discusses the key components of the human eye, including the cornea, sclera, choroid, lens, and retina. It also describes how images are formed on the retina through the lens and light receptors. For digital cameras, the document outlines the basic components and image formation process, including the aperture, optical system, and imaging sensor. It also provides equations to convert between camera and image plane coordinates.
(1) This document summarizes two case studies on human activity and face recognition using Bayesian decision theory and computer vision techniques. (2) The first case study recognizes 10 human actions such as sitting, standing, bending etc. by tracking head motion trajectories using a Bayesian classifier with Gaussian likelihood functions. (3) The second case study detects and tracks faces in real-time by modeling skin color distribution with a 2D Gaussian and adapting the model to account for lighting and viewpoint changes.
This document summarizes research on using similarity and structure for visual cognition and decision-making. It discusses how reducing dimensionality through techniques like PCA and autoencoders can build more invariant representations while lowering computational costs. Kernels are proposed to map data to a higher dimensional space where linear classifiers can be used, while avoiding high computational costs. Experiments on artificial scenes show how objects and their locations can be encoded using exemplars, and novel scenes classified based on familiar or novel elements. Further work aims to model natural scenes and object hierarchies using a graph structure approach.
The document consists of 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each, for a total of 100 slides across 10 sections.
The document contains 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each for a total of 100 slides. The core information provided is the repetition of the phrase "10 slides" across 10 lines with no other context or details included.
The document consists of 10 repetitions of the phrase "10 slides". It appears to be listing 10 separate sections or topics that contain 10 slides each, for a total of 100 slides across 10 sections.
The document discusses real-time ray tracing of implicit surfaces on the GPU. It presents several methods for finding roots of implicit surface equations along rays, including analytical methods for low-order surfaces, Mitchell's interval-based method, and marching points techniques. Results show the marching points approaches achieve interactive frame rates for a variety of surfaces, outperforming Mitchell's method for higher-order surfaces. Adaptive stepping improves robustness and performance.
The document outlines the schedule for the International Conference on 3D Body Scanning Technologies held in Lugano, Switzerland from October 19-20, 2010. The conference included an exhibition, opening session, 14 technical sessions on topics related to 3D body scanning systems and their medical and apparel applications, and a closing session. Technical sessions were held in the morning and afternoon with coffee breaks and a lunch break in between. There was also a welcome cocktail on the evening of the 19th. The conference was organized by Hometrica Consulting and supported by several sponsors.
The document discusses real-time ray tracing of implicit surfaces on the GPU. It presents several contributions: analytical root finding for surfaces up to order 4 achieving frame rates of 1100-5821; applying Mitchell's interval method for surfaces up to order 5 at 60-965 FPS; and marching points methods for arbitrary implicits at 38-825 FPS. Adaptive marching points were also introduced for arbitrary implicits achieving 60-920 FPS. Traditional and recent methods for rendering implicit surfaces are reviewed.
This document provides an overview of Oracle HRMS Payroll concepts and tables. It defines key payroll terms and describes important tables used in payroll processing and their relationships. These include tables to store employee and assignment data, element and input value definitions, element links and entries, payroll actions, run results, and more. Examples of useful queries to retrieve element entry and run result information are also provided.
This document discusses how online and offline lives are blurring together. It notes that people are spending more time online, with 120 million people logging into Facebook daily and 48% of women saying they would rather give up sex than the internet. However, there is also a recognition of the downsides of an online life. As a result, people are looking for ways to better integrate the online and offline worlds, such as taking the online experience into the real world and connecting physical objects to the internet.
This document provides an overview of Oracle HRMS Payroll concepts and tables. It defines key payroll terms like element, input value, element link, and element entry. It describes important payroll tables like pay_element_types_f, pay_input_values_f, and pay_element_links_f that store element definitions and links. It also covers tables for element entries, payroll actions, assignment actions, run results, and queries to retrieve employee salary and commission information from these tables.
The document proposes spring 2011 European tours for Quest Specialty Travel that focus on significant experiences while preserving the company's values and pricing tours reasonably. The tours aim to provide low-cost, educational experiences using local transportation and guides, with meaningful cultural exchanges and consideration for different physical abilities. Focus group data was analyzed and trends considered in developing a proposed Mediterranean Islands Bike Tour of Mallorca, Spain.
Jagmohan Singh is a Sikh man from Punjab, India who immigrated to Canada in 2005 seeking better economic opportunities. He currently works as a truck driver and lives in Vancouver with his wife Manpreet Kaur and their two children, daughter Navneet who is 10 and son Jasmeet who is 8. In his free time, Jagmohan enjoys cooking Indian food, listening to Punjabi music, and spending time with his family.
The document repeats the phrase "The quick brown fox jumped" multiple times with minor spelling variations. It consists of a single sentence repeated with some letters replaced or removed, suggesting the jumping of a quick brown fox.
The document discusses how to create effective presentations using SlideShare. It recommends keeping presentations concise with 5-10 slides, using visuals like images and charts to enhance the content, and focusing on a clear message or story to engage the audience. Overall, the goal is to share knowledge through presentations on SlideShare in a way that is easy to understand and will inspire or educate viewers.
This 1 sentence document does not provide enough context or information to generate a meaningful 3 sentence summary. The document simply states "This is a link" without any other details.
This 1 sentence document does not provide enough context or information to generate a meaningful 3 sentence summary. The document simply states "This is a link" without any other details.
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka ! Fix Satta Matka ! Matka Result ! Matka Guessing ! Final Matka ! Matka Result ! Dpboss Matka ! Matka Guessing ! Satta Matta Matka 143 ! Kalyan Matka ! Satta Matka Fast Result ! Kalyan Matka Guessing ! Dpboss Matka Guessing ! Satta 143 ! Kalyan Chart ! Kalyan final ! Satta guessing ! Matka tips ! Matka 143 ! India Matka ! Matka 420 ! matka Mumbai ! Satta chart ! Indian Satta ! Satta King ! Satta 143 ! Satta batta ! Satta मटका ! Satta chart ! Matka 143 ! Matka Satta ! India Matka ! Indian Satta Matka ! Final ank
The cherry: beauty, softness, its heart-shaped plastic has inspired artists since Antiquity. Cherries and strawberries were considered the fruits of paradise and thus represented the souls of men.
2. Inferring 3D from 2D
• History
• Monocular vs. multi-view analysis
• Difficulties
– structure of the solution and ambiguities
– static and dynamic ambiguities
• Modeling frameworks for inference and learning
– top-down (generative, alignment-based)
– bottom-up (discriminative, predictive, exemplar-based)
– Learning joint models
• Take-home points
3. History of Analyzing Humans in Motion
• Markers (Etienne Jules Marey, 1882)
chronophotograph
• Multiple Cameras
(Eadweard Muybridge , 1884)
4. Human motion capture today
120 years and still fighting …
• VICON ~ 100,000 $
– Excellent performance,
de-facto standard for special
effects, animation, etc
• But heavily instrumented
– Multiple cameras
– Markers in order to simplify the image correspondence
– Special room, simple background
Major challenge: Move from the laboratory to the real world
5. What is so different between multi-
view and single-view analysis?
• Different emphasis on the relative importance
of measurement and prior knowledge
– Depth ambiguities
– Self-occluded body parts
• Similar techniques at least one-way
– Transition monocular->multiview straightforward
• Monocular as the `robust limit’ of multi-view
– Multiple cameras unavailable, or less effective in
real-world environments due to occlusion from other
people, objects, etc.
6. 3D Human Motion Capture Difficulties
General poses
Self-occlusions
Difficult to segment the
individual limbs
Loss of 3D information in
the monocular projection Different body sizes
Partial Views
Several people, occlusions
Reduced observability of
body parts due to loose
Accidental allignments
fitting clothing
Motion blur
7. Levels of 3d Modeling
This section Photo Synthetic
• Coarse body model • Complex body model • Complex body model
• 30 - 35 d.o.f • 50 - 60 d.o.f • ? (hundreds) d.o.f
• Simple appearance • Simple appearance • Sophisticated modeling
(implicit texture map) (edge histograms) of clothing and lighting
8. Difficulties
• High-dimensional state space (30-60 dof)
• Complex appearance due to articulation, deformation,
clothing, body proportions
• Depth ambiguities and self-occlusion
• Fast motions, only vaguely known a-priori
– External factors, objects, sudden intentions…
• Data association (what is a human part, what is
background – see the Data Association section)
9. Difficulties, more concretely
Depth ambiguities
Occlusions
(missing data)
Left arm
Data association Preservation of
Left / right leg ? physical constraints
ambiguities
10. Articulated 3d from 2d Joint Positions
Structure of the monocular solution: Lee and Chen, CVGIP 1985 (!)
• Characterizes the space of solutions, assuming
– 2d joint positions + limb lengths Jp
– internal camera parameters
Js
• Builds an interpretation tree of projection- ps
consistent hypotheses (3d joint positions) J1 J2
O p1
camera plane
– obtained by forward-backward flips in-depth
– O(2# of body parts) solutions
– In principle, can prune some by physical reasoning
– But no procedure to compute joint angles, hence
difficult to reason about physical constraints
• Not an automatic 3d reconstruction method
– select the true solution (out of many) manually
Taylor, CVIU 2000
• Adapted for orthographic cameras (Taylor 2000)
11. Why is 3D-from-monocular hard? <v>
Static, Kinematic, Pose Ambiguities
• Monocular static pose optima
– ~ 2Nr of Joints, some pruned by physical constraints
– Temporally persistent
Sminchisescu and Triggs ‘02
12. Trajectory Ambiguities <v>
General smooth dynamics
Model / image Filtered Smoothed
Sminchisescu and Jepson ‘04
2 (out of several) plausible trajectories
13. Visual Inference in a 12d Space
6d rigid motion + 6d learned latent coordinate
Interpretation #1
Points at camera when
conversation ends
(before the turn)
Interpretation #2
Says `salut’ when
conversation ends
(before the turn)
14. Trajectory Ambiguities <v>
Learned latent space and smooth dynamics
Interpretation #1
Says `salut’ when conversation ends
(before the turn)
Interpretation #2
Points at camera when conversation ends
(before the turn)
• Image consistent, smooth, typically human…
Sminchisescu and Jepson ‘04
15. The Nature of 3D Ambiguities
State
x3
x2
x1
r1 Observation
• Persistent over long time-scales (each S-branch)
• Loops (a, b, c) have limited time-scale support,
hence ambiguity cannot be resolved by extending it
16. Generative vs. Discriminative Modelling
G x is the model state
r are image observations
Goal: pθ (x | r )
θ are parameters to learn
x r r
given training set of (r,x) pairs
D
pθ ( x | r ) α pθ (r | x ) ⋅ p ( x )
• Predict state distributions from • Optimize alignment with image
image features features
• Learning to `invert’ perspective • Can learn state representations,
projection and kinematics is difficult dynamics, observation models;
and produces multiple solutions but difficult to model human
– Multivalued mappings ≡ multimodal appearance
conditional state distributions • State inference is expensive,
• Temporal extensions necessary need effective optimization
17. Temporal Inference (tracking)
• Generative (top-down) chain models
(Kalman Filter, Extended KF, Condensation)
Model the
observation
• Discriminative (bottom-up) chain models
(Conditional Bayesian Mixture Of Experts Markov Model - BM3E,
Conditional Random Fields -CRF, Max. Entropy Models - MEMM)
Condition on
the observation
18. Temporal Inference
• x t state at time t
• O t = (o1 , o 2 ,...o t )
observations up to time t
p (x t | O t ) time t
p (x t +1 | O t )
p(o t +1 | x t +1 )
p(x t +1 | O t +1 ) time t+1
cf. CONDENSATION, Isard and Blake, 1996
19. Generative / Alignment Methods
• Modeling
• Methods for temporal inference
• Learning low-dimensional representations
and parameters
20. Model-based Multiview
Reconstruction
Kehl, Bray and van Gool ‘05
• Body represented as a textured 3D mesh
• Tracking by minimizing distance between 3d
points on the mesh and volumetric
reconstruction obtained from multiple cameras
21. Generative 3D Reconstruction
Annealed Particle Filter
(Deutscher, Blake and Reid, ‘99-01)
monocular
Careful design
• Dynamics
• Observation likelihood
– edge + silhouettes
• Annealing-based
search procedure,
improves over particle
filtering
• Simple background and
clothing
Improved results (complex motions) when
multiple cameras (3-6) were used
22. Generative 3D Reconstruction
Sidenbladh, Black and Fleet, ’00-02; Sigal et al ‘04
Monocular Multi-camera
• Condensation-based filter
• Dynamical models
– walking, snippets
• Careful learning of observation • Non-parametric belief
propagation, initialization by limb
likelihood distributions detection and triangulation
23. Kinematic Jump Sampling
Sminchisescu & Triggs ’01,’03 mi = (µi , Σi )
p t −1
C=SelectSamplingChain(mi)
C C1 CM
Candidate Sampling Chains
s=CovarianceScaledSampling(mi)
T=BuildInterpretationTree (s,C)
E=InverseKinematics(T)
E
z
R
z
P1
Rx1Ry1Rz1
Prune and locally optimize E
z
pt
P2
Rx2
z Rx3Ry3
P3 P4
z
25. What can we learn?
• Low-dimensional perceptual representations,
dynamics (unsupervised)
– What is the intrinsic model dimensionality?
– How to preserve physical constraints?
– How to optimize efficiently?
• Parameters (typically supervised)
– Observation likelihood (noise variance, feature weighting)
– Can learn separately (easier) but how well we do?
– Best to learn by doing (i.e. inference)
• Maximize the probability of the right answer on the training data,
hence learning = inference in a loop
• Need efficient inference methods
26. Intrinsic Dimension Estimation <v>
and Latent Representation for Walking
Intrinsic dimension estimation
• 2500 samples from motion capture d = lim r →0
log N (r )
log(1 / r )
• The Hausdorff dimension (d) is
effectively 1, lift to 3 for more flexibility
• Use non-linear embedding to learn the
latent 3d space embedded in an
ambient 30d human joint angle space
Optimize in 3d
latent space
Sminchisescu and Jepson ‘04
27. 3D Model-Based Reconstruction
(Urtasun, Fleet, Hertzmann and Fua’05)
• Track human joints using the WSL tracker (Jepson et al’01)
• Optimize model joint re-projection error in a low-dimensional
space obtained using probabilistic PCA (Lawrence’04)
28. Learning Empirical Distribution of
Edge Filter Responses
(original slide courtesy of Michael Black)
pon(F) poff (F)
Likelihood ratio, pon/ poff , used for edge detection
Geman & Jednyak and Konishi, Yuille, & Coughlan
30. Learning Dependencies
(original slide courtesy of Michael Black); Roth, Sigal and Black ’04
Filter responses are not conditionally independent
Leaning by Maximum Entropy
31. The effect of learning on the
trajectory distribution
Before After
• Learn body proportions + parameters of the observation model (weighting
of different feature types, variances, etc)
• Notice reduction in uncertainty
• The ambiguity diminishes significantly but does not disappear
Sminchisescu, Welling and Hinton ‘03
32. Conditional /Discriminative/ Indexing
Methods
• Nearest-neighbor, snippets
• Regression
• Mixture of neural networks
• Conditional mixtures of experts
• Probabilistic methods for temporal
Integration
33. Discriminative 3d: Nearest Neighbor
Parameter Sensitive Hashing (PSH)
Shakhnarovich, Viola and Darell ’03
• Relies on database of (observation, state) pairs rendered
artificially
– Locates samples that have observation components similar to the
current image data (nearest neighbors) and use their state as putative
estimates
• Extension to multiple cameras and tracking by non-linear model
optimization (PSH used for initialization Demirdjan et al, ICCV05)
– Foreground / background segmentation from stereo
34. Discriminative 3d: Nearest Neighbor Matching
2D->3D Pose + Annotation
Ramanan and Forsyth ‘03
user 3D motion
Annotations
+ library
{run,walk, wave, etc.}
StandWave
Motion Synthesizer
match 1/2 second clips of motion
3D pose and
model annotation
build detect
original video 2D track
36. Discriminative 3d: Regression Methods
Aggarwal and Triggs ‘04, Elgammal & Lee ‘04
• (A&T) 3d pose recovery by non-linear
regression against silhouette
observations represented as shape
context histograms
– Emphasis on sparse, efficient predictions,
good generalization
• (A&T) Careful study of dynamical
regression-based predictors for walking
and extensions to mixture of regressors
(HCI’05)
• (E&L) pose from silhouette regression
where the dimensionality of the input is
reduced using non-linear embedding
– Latent (input) to joint angle (output) state
space map based on RBF networks
37. Discriminative 3d:
Specialized Mappings Architecture
Rosales and Sclaroff ‘01
• Static 3D human pose
estimation from
silhouettes (Hu moments)
• Approximates the
observation-pose
mapping from training
data
– Mixture of neural networks
– Models the joint distribution
• Uses the forward model
(graphics rendering) to
verify solutions
38. Conditional Bayesian Mixtures of Experts
Probability (gate) of Expert
Cluster 1
Cluster 2
Single Expert
Cluster 3
Expert 1
Expert 2
Expert 3
Data Sample Multiple Experts Expert Proportions (Gates)
vs. uniform coefficients (Joint)
• A single expert cannot represent multi-valued relations
• Multiple experts can focus on representing parts of the data
• But the expert contribution (importance) is contextual
– Disregarding context introduces systematic error (invalid extrapolation)
• The experts need observation-sensitive mixing proportions
39. Discriminative Temporal Inference
BM3E= Conditional Bayesian Mixture of Experts Markov Model
• `Bottom-up’ chain
Conditions on
the observation
p(xt | R t ) = ∫ p (x
x t −1
t | x t −1 , rt ) p ( x t −1 | R t −1 ), where R t = (r1 ,..., rt )
Local conditional Temporal (filtered) prior
• The temporal prior is a Gaussian mixture
• The local conditional is a Bayesian mixture of Gaussian experts
• Integrate pair-wise products of Gaussians analytically
Sminchisescu, Kanaujia, Li, Metaxas ‘05
44. Evaluation on artificially generated
silhouettes with 3d ground truth
(average error / average maximum error,
per joint angle, in degrees)
84
• NN = nearest neighbor
• RVM = relevance vector machine
• BME = conditional Bayesian mixture of experts
45. Evaluation, low-dimensional models
(average error / joint angle in degrees)
• KDE-RR=ridge regressor between low-dimensional spaces
• KDE-RVM=RVM between low-dimensional spaces
– Unimodal methods average competing solutions
• kBME=conditional Bayesian mixture between low-
dimensional state and observation spaces
– Training and inference is about 10 time faster
46. Self-supervised Learning of a
Joint Generative-Recognition Model
• Maximize the probability of the (observed)
evidence (e.g. images of humans)
pθ ( x, r ) p ( x, r )
log pθ (r ) = log ∫ Qυ (x | r) ≥ ∫ Qυ (x | r)log θ = KL(Qυ (x | r) || pθ ( x, r ))
x
Qυ (x | r) x Qυ (x | r)
log pθ (r ) − KL(Qυ (x | r) || pθ ( x | r )) = KL(Qυ (x | r) || pθ ( x, r ))
• Hence, the KL divergence between what the
generative model p infers and what the
recognition model Q predicts, with tight bound at
Qυ (x | r) = pθ ( x | r )
47. Self-supervised Learning of a
Joint Generative-Recognition Model
• Local optimum for parameters
• Recognition model is a conditional mixture of data-
driven mean field experts
– Fast expectations, dimensions decouple
49. Take home points
• Multi-view 3d reconstruction reliable in the lab
– Measurement-oriented
– Geometric, marker-based
• correspondence + triangulation
– Optimize multi-view alignment
• generative, model-based
– Data-association in real-world (occlusions) open
• Monocular 3d as robust limit of multi-view
– Difficulties: depth perception + self-occlusion
– Stronger dependency on efficient non-convex
optimization and good observation models
– Increased emphasis on prior vs. measurement
50. Take home points (contd.)
• Top-down / Generative / Alignment Models
– Flexible, but difficult to model human appearance
– Difficult optimization problems, local optima
– Can learn constrained representations and
parameters
• Can handle occlusion, faster search (low-d)
• Fewer local optima -- the best more likely true solutions
• Discriminative / Conditional / Exemplar–based
Models
– Need to model complex multi-valued relations
– Replace inference with indexing / prediction
– Good for initialization, recovery from failure, on-line
– Still need to deal with segmentation / data association
51. For more information
http://www.cs.toronto.edu/~crismin
http://ttic.uchicago.edu/~crismin
crismin@cs.toronto.edu
crismin@nagoya.uchicago.edu
For a tutorial on 3d human sensing from
monocular images, see:
http://www.cs.toronto.edu/~crismin/PAPERS/tutorial_iccv05.pdf
52. Cristian Sminchisescu thanks the
following collaborators on various
elements of the research described
in the 2d-3d relations section
• Geoffrey Hinton (Toronto)
• Allan Jepson (Toronto)
• Atul Kanaujia (Rutgers)
• Zhiguo Li (Rutgers)
• Dimitris Metaxas (Rutgers)
• Bill Triggs (INRIA)
• Max Welling (UCIrvine)