SlideShare a Scribd company logo
Foundations & Core

Workshop on Frontiers in Computer Vision

          MIT, Cambridge, MA

             Aug. 22, 2011

       Harpreet S. Sawhney

  Vision & Learning Technologies
     SRI-Sarnoff, Princeton, NJ
Inspirations

“Work on Important Problems, not just Interesting ones!”

 – Curt Carlson



“Vision is Hard, Vision is Fun, Vision is Probabilistic, and Vision Works!”

 – Andrew Blake at his ICCV Achievement Award acceptance




                                © SRI International Sarnoff.
Impressive Computer Vision Apps
… behind each lie fundamental advances in Theory & Practice of Vision…


    • Human-Machine Interaction with Kinect like sensors

    • “Building Rome in a Day” class of applications

    • MatchMove with 3D camera motion estimation

    • Visual Odometry for Robot Navigation

    • DARPA Rural and Urban Grand Challenges

    • SnapTell’s “Snap a Book Cover and Find the Book” app.

    • Geo-Registration of Aerial Video

    • First down line in Football

    • Emerging Augmented Reality Apps

                                            © SRI International Sarnoff.   3
Most Impressive Computer Vision Apps
… behind each lie fundamental advances in Theory & Practice of Vision…


    • Human-Machine Interaction with Kinect like sensors

    • “Building Rome in a Day” class of applications

    • MatchMove with 3D camera motion estimation

    • Visual Odometry for Robot Navigation

    • DARPA Rural and Urban Grand Challenges

    • SnapTell’s “Snap a Book Cover and Find the Book” app.

    • Geo-Registration of Aerial Video
                              Will Employ both Geometry &
    • First down line in Football     Recognition

    • Emerging Augmented Reality Apps

                                            © SRI International Sarnoff.   4
Fundamental Techniques at the Core of Computer Vision I
– Low-level image processing functions
  • Gradients, Edges, Histograms, etc.
– Aggregate feature computation
  • SIFT / HoG / MSER and their variants.
– Short-range video / image alignment
  • Optical flow, (Quasi-)Parametric image matching
– Image Warping:
  • Parametric, Quasi-parametric and Non-parametric warping.
– Image / Video Mosaicking:
  • Alignment, warping, stitching and blending.
– Long-range / Wide Baseline Matching:
  • Features, RANSAC and matching models.
– Stereo Processing:
  • Constraints, Matching Models and features, MRF like algorithms.
– Camera Calibration
  • Intrinsic and Extrinsic, Single camera, Stereo, and Multi-camera rigs.
– Two-/Three-frame Monocular Camera Motion Models and Algorithms
  • Fundamental / Essential Matrix, Tri-focal tensor, Closed-form methods, Optimization methods, Common /
    practical ambiguities, Numerical Considerations.

                                              © SRI International Sarnoff.                                  5
Fundamental Techniques at the Core of Computer Vision II
– Multi-frame Structure and Motion Estimation:
  • Camera and 3D Point initializations, Bundle-block adjustment, Practical considerations.
– Moving Object Detection and Tracking:
  • Statistical Background Modeling with Static and Moving Camera; Moving Object Detection with Static and
    Moving Cameras; Online models for objects: Appearance, Geometry, Motion, Bayesian Frameworks for
    Tracking: K-filter, Condensation, Layers, Online optimization, etc.
– Color/Texture/Shape/Motion Segmentation:
  • Bottom-up techniques; Segmentation by classification; Graph models and optimization.
– Non-linear Optimization Techniques:
  • Closed-form, Iterative methods, Numerical considerations, Convex Constrained Methods.
– Basic Statistical Learning Techniques:
  • Understanding of probabilities, distributions, marginals, conditionals, etc.; Regression, Linear / Non-linear
    classifiers, PCA / LDA, FLD, etc.; SVMs; Boosting, bagging and their variants; Practical considerations
– High-dimensional Indexing and Structures for Efficient / Accurate Processing:
  • Randomized trees and forests, and variants; Locality-Sensitive Hashing (LSH), and variants; Vocabulary trees
    and variants
– Basic Graph Modeling, Learning and Inferencing Techniques:
  • Formulating a model; Learning and Inferencing with HMMs, CRFs, MRFs, and their variants; Bayesian networks,
    their variants and Learning and Inferencing techniques.



                                              © SRI International Sarnoff.                                          6
Fundamental Techniques at the Core of Computer Vision II
– Multi-frame Structure and Motion Estimation:
  • Camera and 3D Point initializations, Bundle-block adjustment, Practical considerations.
– Moving Object Detection and Tracking:
  • Statistical Background Modeling with Static and Moving Camera; Moving Object Detection with Static and
    Moving Cameras; Online models for objects: Appearance, Geometry, Motion, Bayesian Jay
                                                                        Gould, Stephen Frameworks for
                                                            Misunderstanding of probability may
    Tracking: K-filter, Condensation, Layers, Online optimization, etc.
– Color/Texture/Shape/Motion Segmentation:be the greatest of all impediments to
                                                                    scientific literacy.
  • Bottom-up techniques; Segmentation by classification; Graph models and optimization.
– Non-linear Optimization Techniques:
  • Closed-form, Iterative methods, Numerical considerations, Convex Constrained Methods.
– Basic Statistical Learning Techniques:
  • Understanding of probabilities, distributions, marginals, conditionals, etc.; Regression, Linear / Non-linear
    classifiers, PCA / LDA, FLD, etc.; SVMs; Boosting, bagging and their variants; Practical considerations
– High-dimensional Indexing and Structures for Efficient / Accurate Processing:
  • Randomized trees and forests, and variants; Locality-Sensitive Hashing (LSH), and variants; Vocabulary trees
    and variants
– Basic Graph Modeling, Learning and Inferencing Techniques:
  • Formulating a model; Learning and Inferencing with HMMs, CRFs, MRFs, and their variants; Bayesian networks,
    their variants and Learning and Inferencing techniques.



                                              © SRI International Sarnoff.                                          7
Fundamental Techniques at the Core of Computer Vision III
– Object Recognition:
  • Representations and Models: Bag-of-Words, Appearance and Geometry, Part-based, Global, etc.
  • Features and processing
  • Optimization for Training and Inferencing.
– Action / Activity Recognition:
  • Representations and Models: Bag-of-Words, Motion, Appearance, and Geometry, Part-based,
    Global, etc.
  • Features and processing
  • Optimization for Training and Inferencing.
– Basic Applications:
  •   Image / Video Mosaicking
  •   Geo-registration: Alignment of images / videos to reference imagery
  •   Match Move: Insertion of Objects/Patches in Videos.
  •   Moving Object Detection and Tracking
  •   Frontal Face Recognition




                                         © SRI International Sarnoff.                             8
Challenges
• High Quality 3D Understanding / Modeling from Images/Videos
  – Mensuration, manipulation, recognition etc. need the world to be represented in terms of crisp
    boundaries of objects, surfaces and their connectivity and topology


• Large Scale (Classes and Scale of Data) Multi-Class Object / Scene / Material Detection
  & Classification
  – The world is getting used to applications that work at the scale of the Web and the Earth. This
    problem will require not just the extension of Bag-of-Words techniques and large-scale classifiers
    but insights into representations of classes of objects in terms of their geometry and appearance.


• Event Modeling, Detection and Classification
  – Videos are probably the single fastest growing data type on the Web that is most relevant to
    Computer Vision. Modeling of Events in videos is a relatively unexplored area.


• Computer Vision in Education and Training
  – We as teachers, mentors, and social beings routinely use our eyes and ears to sense and assess
    the level of engagement, proficiency and areas of deficiency in the people we interact with in our
    professions and life.
  – Converting our human prowess into science and engineering of education and training will
    require capturing, modeling, analyzing and reporting on human movements, actions, expressions,
    and interactions.
                                          © SRI International Sarnoff.                                   9
Hurdles

• Computer Vision needs to whole-heartedly embrace Applications and
  Systems as first class activities within the community and not as activities
  to do once you leave the fold of vision.
  – Learn from Graphics, Signal Processing, Medical communities


• Significantly greater effort needs to be devoted to collecting, organizing,
  annotating and disseminating large scale datasets that are representative
  of the diversity and complexity of the real world in terms of scenes, objects,
  activities, events, 3D structure, illumination and viewpoints.

• Computer Vision approaches need to have distinct Representation /
  Modeling, Algorithms and Implementation steps so that the strengths and
  weaknesses of approaches can be assessed at various levels.
  – Many current Input-BlackBox-Output approaches do not follow this discipline.
    Hence, it is hard to assess research advances and their impact on applications.

                                   © SRI International Sarnoff.                       10

More Related Content

Viewers also liked

Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 

Viewers also liked (8)

Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 

Similar to Fcv core sawhney

Computer vision and robotics
Computer vision and roboticsComputer vision and robotics
Computer vision and robotics
Biniam Asnake
 
Fcv core szeliski_zisserman
Fcv core szeliski_zissermanFcv core szeliski_zisserman
Fcv core szeliski_zisserman
zukun
 
Word
WordWord
Word
butest
 
Séminaire IA & VA- Yassine Ruichek, UTBM
Séminaire IA & VA- Yassine Ruichek, UTBMSéminaire IA & VA- Yassine Ruichek, UTBM
Séminaire IA & VA- Yassine Ruichek, UTBM
Mahdi Zarg Ayouna
 
PPT s01-machine vision-s2
PPT s01-machine vision-s2PPT s01-machine vision-s2
PPT s01-machine vision-s2
Binus Online Learning
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
Vanessa Camilleri
 
Application of image processing.ppt
Application of image processing.pptApplication of image processing.ppt
Application of image processing.ppt
Devesh448679
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
Yu Huang
 
ICS1020CV_2022.pdf
ICS1020CV_2022.pdfICS1020CV_2022.pdf
ICS1020CV_2022.pdf
Vanessa Camilleri
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introduction
cairo university
 
Movie on face recognition in e attendace
Movie on face recognition in e attendaceMovie on face recognition in e attendace
Movie on face recognition in e attendace
sbk50000
 
Artificial Intelligence_Strategy.pptx
Artificial Intelligence_Strategy.pptxArtificial Intelligence_Strategy.pptx
Artificial Intelligence_Strategy.pptx
SureshMaddi1
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
Tulipp. Eu
 
2022 COMP4010 Lecture4: AR Interaction
2022 COMP4010 Lecture4: AR Interaction2022 COMP4010 Lecture4: AR Interaction
2022 COMP4010 Lecture4: AR Interaction
Mark Billinghurst
 
Face Recongnition using Machine Learning
Face Recongnition using Machine LearningFace Recongnition using Machine Learning
Face Recongnition using Machine Learning
roysarthak272002
 
Object tracking final
Object tracking finalObject tracking final
Object tracking final
MrsShwetaBanait1
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
MrsShwetaBanait1
 
Context is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next ScenarioContext is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next Scenario
Clark Dodsworth
 
[0122]seunghyeong
[0122]seunghyeong[0122]seunghyeong
[0122]seunghyeong
ivaderivader
 
Anits dip
Anits dipAnits dip

Similar to Fcv core sawhney (20)

Computer vision and robotics
Computer vision and roboticsComputer vision and robotics
Computer vision and robotics
 
Fcv core szeliski_zisserman
Fcv core szeliski_zissermanFcv core szeliski_zisserman
Fcv core szeliski_zisserman
 
Word
WordWord
Word
 
Séminaire IA & VA- Yassine Ruichek, UTBM
Séminaire IA & VA- Yassine Ruichek, UTBMSéminaire IA & VA- Yassine Ruichek, UTBM
Séminaire IA & VA- Yassine Ruichek, UTBM
 
PPT s01-machine vision-s2
PPT s01-machine vision-s2PPT s01-machine vision-s2
PPT s01-machine vision-s2
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
 
Application of image processing.ppt
Application of image processing.pptApplication of image processing.ppt
Application of image processing.ppt
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
ICS1020CV_2022.pdf
ICS1020CV_2022.pdfICS1020CV_2022.pdf
ICS1020CV_2022.pdf
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introduction
 
Movie on face recognition in e attendace
Movie on face recognition in e attendaceMovie on face recognition in e attendace
Movie on face recognition in e attendace
 
Artificial Intelligence_Strategy.pptx
Artificial Intelligence_Strategy.pptxArtificial Intelligence_Strategy.pptx
Artificial Intelligence_Strategy.pptx
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
2022 COMP4010 Lecture4: AR Interaction
2022 COMP4010 Lecture4: AR Interaction2022 COMP4010 Lecture4: AR Interaction
2022 COMP4010 Lecture4: AR Interaction
 
Face Recongnition using Machine Learning
Face Recongnition using Machine LearningFace Recongnition using Machine Learning
Face Recongnition using Machine Learning
 
Object tracking final
Object tracking finalObject tracking final
Object tracking final
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
Context is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next ScenarioContext is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next Scenario
 
[0122]seunghyeong
[0122]seunghyeong[0122]seunghyeong
[0122]seunghyeong
 
Anits dip
Anits dipAnits dip
Anits dip
 

More from zukun

Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
zukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
zukun
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learning
zukun
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
zukun
 
Lecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningLecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learning
zukun
 
Lecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionLecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognition
zukun
 
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for visionP05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
zukun
 
P06 motion and video cvpr2012 deep learning methods for vision
P06 motion and video cvpr2012 deep learning methods for visionP06 motion and video cvpr2012 deep learning methods for vision
P06 motion and video cvpr2012 deep learning methods for vision
zukun
 

More from zukun (20)

Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learning
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
 
Lecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningLecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learning
 
Lecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionLecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognition
 
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for visionP05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
 
P06 motion and video cvpr2012 deep learning methods for vision
P06 motion and video cvpr2012 deep learning methods for visionP06 motion and video cvpr2012 deep learning methods for vision
P06 motion and video cvpr2012 deep learning methods for vision
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Fcv core sawhney

  • 1. Foundations & Core Workshop on Frontiers in Computer Vision MIT, Cambridge, MA Aug. 22, 2011 Harpreet S. Sawhney Vision & Learning Technologies SRI-Sarnoff, Princeton, NJ
  • 2. Inspirations “Work on Important Problems, not just Interesting ones!” – Curt Carlson “Vision is Hard, Vision is Fun, Vision is Probabilistic, and Vision Works!” – Andrew Blake at his ICCV Achievement Award acceptance © SRI International Sarnoff.
  • 3. Impressive Computer Vision Apps … behind each lie fundamental advances in Theory & Practice of Vision… • Human-Machine Interaction with Kinect like sensors • “Building Rome in a Day” class of applications • MatchMove with 3D camera motion estimation • Visual Odometry for Robot Navigation • DARPA Rural and Urban Grand Challenges • SnapTell’s “Snap a Book Cover and Find the Book” app. • Geo-Registration of Aerial Video • First down line in Football • Emerging Augmented Reality Apps © SRI International Sarnoff. 3
  • 4. Most Impressive Computer Vision Apps … behind each lie fundamental advances in Theory & Practice of Vision… • Human-Machine Interaction with Kinect like sensors • “Building Rome in a Day” class of applications • MatchMove with 3D camera motion estimation • Visual Odometry for Robot Navigation • DARPA Rural and Urban Grand Challenges • SnapTell’s “Snap a Book Cover and Find the Book” app. • Geo-Registration of Aerial Video Will Employ both Geometry & • First down line in Football Recognition • Emerging Augmented Reality Apps © SRI International Sarnoff. 4
  • 5. Fundamental Techniques at the Core of Computer Vision I – Low-level image processing functions • Gradients, Edges, Histograms, etc. – Aggregate feature computation • SIFT / HoG / MSER and their variants. – Short-range video / image alignment • Optical flow, (Quasi-)Parametric image matching – Image Warping: • Parametric, Quasi-parametric and Non-parametric warping. – Image / Video Mosaicking: • Alignment, warping, stitching and blending. – Long-range / Wide Baseline Matching: • Features, RANSAC and matching models. – Stereo Processing: • Constraints, Matching Models and features, MRF like algorithms. – Camera Calibration • Intrinsic and Extrinsic, Single camera, Stereo, and Multi-camera rigs. – Two-/Three-frame Monocular Camera Motion Models and Algorithms • Fundamental / Essential Matrix, Tri-focal tensor, Closed-form methods, Optimization methods, Common / practical ambiguities, Numerical Considerations. © SRI International Sarnoff. 5
  • 6. Fundamental Techniques at the Core of Computer Vision II – Multi-frame Structure and Motion Estimation: • Camera and 3D Point initializations, Bundle-block adjustment, Practical considerations. – Moving Object Detection and Tracking: • Statistical Background Modeling with Static and Moving Camera; Moving Object Detection with Static and Moving Cameras; Online models for objects: Appearance, Geometry, Motion, Bayesian Frameworks for Tracking: K-filter, Condensation, Layers, Online optimization, etc. – Color/Texture/Shape/Motion Segmentation: • Bottom-up techniques; Segmentation by classification; Graph models and optimization. – Non-linear Optimization Techniques: • Closed-form, Iterative methods, Numerical considerations, Convex Constrained Methods. – Basic Statistical Learning Techniques: • Understanding of probabilities, distributions, marginals, conditionals, etc.; Regression, Linear / Non-linear classifiers, PCA / LDA, FLD, etc.; SVMs; Boosting, bagging and their variants; Practical considerations – High-dimensional Indexing and Structures for Efficient / Accurate Processing: • Randomized trees and forests, and variants; Locality-Sensitive Hashing (LSH), and variants; Vocabulary trees and variants – Basic Graph Modeling, Learning and Inferencing Techniques: • Formulating a model; Learning and Inferencing with HMMs, CRFs, MRFs, and their variants; Bayesian networks, their variants and Learning and Inferencing techniques. © SRI International Sarnoff. 6
  • 7. Fundamental Techniques at the Core of Computer Vision II – Multi-frame Structure and Motion Estimation: • Camera and 3D Point initializations, Bundle-block adjustment, Practical considerations. – Moving Object Detection and Tracking: • Statistical Background Modeling with Static and Moving Camera; Moving Object Detection with Static and Moving Cameras; Online models for objects: Appearance, Geometry, Motion, Bayesian Jay Gould, Stephen Frameworks for Misunderstanding of probability may Tracking: K-filter, Condensation, Layers, Online optimization, etc. – Color/Texture/Shape/Motion Segmentation:be the greatest of all impediments to scientific literacy. • Bottom-up techniques; Segmentation by classification; Graph models and optimization. – Non-linear Optimization Techniques: • Closed-form, Iterative methods, Numerical considerations, Convex Constrained Methods. – Basic Statistical Learning Techniques: • Understanding of probabilities, distributions, marginals, conditionals, etc.; Regression, Linear / Non-linear classifiers, PCA / LDA, FLD, etc.; SVMs; Boosting, bagging and their variants; Practical considerations – High-dimensional Indexing and Structures for Efficient / Accurate Processing: • Randomized trees and forests, and variants; Locality-Sensitive Hashing (LSH), and variants; Vocabulary trees and variants – Basic Graph Modeling, Learning and Inferencing Techniques: • Formulating a model; Learning and Inferencing with HMMs, CRFs, MRFs, and their variants; Bayesian networks, their variants and Learning and Inferencing techniques. © SRI International Sarnoff. 7
  • 8. Fundamental Techniques at the Core of Computer Vision III – Object Recognition: • Representations and Models: Bag-of-Words, Appearance and Geometry, Part-based, Global, etc. • Features and processing • Optimization for Training and Inferencing. – Action / Activity Recognition: • Representations and Models: Bag-of-Words, Motion, Appearance, and Geometry, Part-based, Global, etc. • Features and processing • Optimization for Training and Inferencing. – Basic Applications: • Image / Video Mosaicking • Geo-registration: Alignment of images / videos to reference imagery • Match Move: Insertion of Objects/Patches in Videos. • Moving Object Detection and Tracking • Frontal Face Recognition © SRI International Sarnoff. 8
  • 9. Challenges • High Quality 3D Understanding / Modeling from Images/Videos – Mensuration, manipulation, recognition etc. need the world to be represented in terms of crisp boundaries of objects, surfaces and their connectivity and topology • Large Scale (Classes and Scale of Data) Multi-Class Object / Scene / Material Detection & Classification – The world is getting used to applications that work at the scale of the Web and the Earth. This problem will require not just the extension of Bag-of-Words techniques and large-scale classifiers but insights into representations of classes of objects in terms of their geometry and appearance. • Event Modeling, Detection and Classification – Videos are probably the single fastest growing data type on the Web that is most relevant to Computer Vision. Modeling of Events in videos is a relatively unexplored area. • Computer Vision in Education and Training – We as teachers, mentors, and social beings routinely use our eyes and ears to sense and assess the level of engagement, proficiency and areas of deficiency in the people we interact with in our professions and life. – Converting our human prowess into science and engineering of education and training will require capturing, modeling, analyzing and reporting on human movements, actions, expressions, and interactions. © SRI International Sarnoff. 9
  • 10. Hurdles • Computer Vision needs to whole-heartedly embrace Applications and Systems as first class activities within the community and not as activities to do once you leave the fold of vision. – Learn from Graphics, Signal Processing, Medical communities • Significantly greater effort needs to be devoted to collecting, organizing, annotating and disseminating large scale datasets that are representative of the diversity and complexity of the real world in terms of scenes, objects, activities, events, 3D structure, illumination and viewpoints. • Computer Vision approaches need to have distinct Representation / Modeling, Algorithms and Implementation steps so that the strengths and weaknesses of approaches can be assessed at various levels. – Many current Input-BlackBox-Output approaches do not follow this discipline. Hence, it is hard to assess research advances and their impact on applications. © SRI International Sarnoff. 10