SlideShare a Scribd company logo
Dog Breed Classification Using Part
          Localization
     Jiongxin       Liu 1,
                    Angjoo                Kanazawa2,

   David Jacobs 2, and Peter Belhumeur1
       1 Columbia   University   2 University   of Maryland
Fine-grained classification
                                           [Branso
[Nilsback
                                           n et al
and
                                           ‘10]
Zisserman
’08]



[Parkhi et
al ’12]




[Kumar et
al ‘12]
Related work
• Dense feature extraction:
  – Mine discriminative region with random forests [Yao et al
    ’11]
  – Multiple Kernel Learning [Nilsback and Zisserman ’08]
  – Post-segmentation [Parkhi and Zisserman ’12]
• Pose-normalized appearance:
  – Birdlets [Farrell et al ’11]
Related work
• Dense feature extraction:
  – Mine discriminative region with random forests [Yao et al
    ’11]
                 Generic sampling of features
  – Multiple Kernel Learning [Nilsback and Zisserman ’08]
               contains more noise than useful
  – Post-Segmentation [Parkhi and Zisserman ’12]
                       information
• Pose-normalizedfine-grained classification!
              for appearance:
  – Birdlets [Farrell et al ’11]
Same breed or not?              NO!!
Entlebucher Mountain Dog   Greater Swiss Mountain Dog
Key insight: Differences in common parts are
              more informative
  Entlebucher Mountain Dog                 Greater Swiss Mountain Dog




 Localize parts based on a non-parameteric method by [Belhumeur et al ‘11]
“Columbia dogs with parts” dataset
       133 breeds, 8351 images
Low inter-breed variation
   Norfolk Terrier or Cairn Terrier?
High intra-breed variation
      Both labrador retriever
Innumerable Poses
Diverse Appearances
Varying geometry of parts
Overview of the system
1. Face Detection    2. Part Detection 3. Feature Extraction and ear localization




                     4. One vs All classification
Pipeline 1: Dog Face Detection

                            Keep the 5
                            highest scoring
                            windows
Pipeline 2: Localize Parts
            Part locations    Detector responses




            Idea: From the “fit” to K most
          similar exemplars weighted by the
                   detector output,
             take the most probable part
                       location
Review: Consensus of Exemplars




                               ...
Local Part Detectors   Exemplar Selection   Part Localization
                                             Slide from Neeraj Kumar
RANSAC-like Exemplar Selection
1. Repeat r times:
   a. Choose random exemplar k
   b. Choose 2 random modes of local detector outputs D={d      i} on query

   c. Find similarity transform t that aligns exemplar to these points
   d. Evaluate match of all i face parts for this (k,t) pair:
                                                    n
   Probability of this
   configuration given   P(Xk,t | D) = C Õ P(x               i
                                                             k,t
                                                                      i
                                                                   |d )   Part detector probability
                                                                          at this (aligned) location
                                                    i
   detector outputs
   e. Add (k,t) pair to list of possible exemplars, ranked by score

2. Take top M (k,t) pairs for determining global configuration
                                                                          Slide from Neeraj Kumar
Final Part Localization
For each face part i:
   a. Compute distribution of this part from all M aligned exemplars
   b. For each of the top M aligned exemplars [(k,t) pairs]:
      Multiply normalized local detector outputs with global distribution of part computed from
      exemplars to get scores at each pixel location
   c. Add all scores together to get final scores at each pixel and choose max




                                                                           Slide from Neeraj Kumar
Pipeline 2: Localize Parts
                          Part locations           Detector responses




                              Difference between current part
                              location and that of exemplar




From K most similar exemplars and the detector output,
        take the most probable part location
Pipeline 3: Infer ears using detected parts




     With r(=10) exemplars from each breed
Pipeline 3: Infer ears using detected parts




     With r(=10) exemplars from each breed
Pipeline 4: Classification




Extract SIFT at part locations for each breed+color
   histogram  one vs all linear SVM classifier
Qualitative Results: Successful
Qualitative Results: Failures
Results: ROC curves
Available in iTunes now
Take a Picture

                 By tapping
                  the nose
Get the breed!
Browse Dog Breeds
Thank you!!

More Related Content

What's hot

Ns2 introduction 2
Ns2 introduction 2Ns2 introduction 2
Ns2 introduction 2
Rohini Sharma
 
Authentication in cloud computing
Authentication in cloud computingAuthentication in cloud computing
Authentication in cloud computing
vidhya dharmarajan
 
Xen & virtualization
Xen & virtualizationXen & virtualization
Xen & virtualization
Susheel Thakur
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
Polygon filling
Polygon fillingPolygon filling
Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...
Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...
Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...
Databricks
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
Edureka!
 
K means clustering
K means clusteringK means clustering
K means clustering
Ahmedasbasb
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
Noura Hussein
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Introduction of VANET
Introduction of VANETIntroduction of VANET
Introduction of VANET
Pallavi Agarwal
 
Ch23
Ch23Ch23
Cloud security Presentation
Cloud security PresentationCloud security Presentation
Cloud security Presentation
Ajay p
 
Sample Network Analysis Report based on Wireshark Analysis
Sample Network Analysis Report based on Wireshark AnalysisSample Network Analysis Report based on Wireshark Analysis
Sample Network Analysis Report based on Wireshark Analysis
David Sweigert
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
KIRAN R
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learning
JinishaKG
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
Sushant Shrivastava
 
Dqdb
DqdbDqdb
Linux in mobile devices
Linux in mobile devicesLinux in mobile devices
Linux in mobile devices
CHESStest{perfect Kadhu}
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging
Kyuhwan Jung
 

What's hot (20)

Ns2 introduction 2
Ns2 introduction 2Ns2 introduction 2
Ns2 introduction 2
 
Authentication in cloud computing
Authentication in cloud computingAuthentication in cloud computing
Authentication in cloud computing
 
Xen & virtualization
Xen & virtualizationXen & virtualization
Xen & virtualization
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Polygon filling
Polygon fillingPolygon filling
Polygon filling
 
Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...
Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...
Text Extraction from Product Images Using State-of-the-Art Deep Learning Tech...
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Introduction of VANET
Introduction of VANETIntroduction of VANET
Introduction of VANET
 
Ch23
Ch23Ch23
Ch23
 
Cloud security Presentation
Cloud security PresentationCloud security Presentation
Cloud security Presentation
 
Sample Network Analysis Report based on Wireshark Analysis
Sample Network Analysis Report based on Wireshark AnalysisSample Network Analysis Report based on Wireshark Analysis
Sample Network Analysis Report based on Wireshark Analysis
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Disease prediction using machine learning
Disease prediction using machine learningDisease prediction using machine learning
Disease prediction using machine learning
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Dqdb
DqdbDqdb
Dqdb
 
Linux in mobile devices
Linux in mobile devicesLinux in mobile devices
Linux in mobile devices
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging
 

Similar to Dog Breed Classification Using Part Localization

15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
McSwathi
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1a
Gaston Liberman
 
Ensemble classification techniques for detecting signatures of natural select...
Ensemble classification techniques for detecting signatures of natural select...Ensemble classification techniques for detecting signatures of natural select...
Ensemble classification techniques for detecting signatures of natural select...
Andrew Stewart
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
Dev Nath
 
Local Outlier Detection with Interpretation
Local Outlier Detection with InterpretationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation
Daiki Tanaka
 
Clustering_Overview.pptx
Clustering_Overview.pptxClustering_Overview.pptx
Clustering_Overview.pptx
nyomans1
 
3D Scene Analysis via Sequenced Predictions over Points and Regions
3D Scene Analysis via Sequenced Predictions over Points and Regions3D Scene Analysis via Sequenced Predictions over Points and Regions
3D Scene Analysis via Sequenced Predictions over Points and Regions
Flavia Grosan
 
Cash nov99
Cash nov99Cash nov99
Cash nov99
Clifford Stone
 
Deduplication on large amounts of code
Deduplication on large amounts of codeDeduplication on large amounts of code
Deduplication on large amounts of code
source{d}
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
Saurav Jha
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
rameswara reddy venkat
 
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
Christian Robert
 
Resolution
ResolutionResolution
Resolution
Dominika Elmlund
 
Bp219 04-13-2011
Bp219 04-13-2011Bp219 04-13-2011
Bp219 04-13-2011
waddling
 
Knn
KnnKnn
Study on atome probe
Study on atome probe Study on atome probe
Study on atome probe
Jithmi Roddrigo
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
My presentation in MST -11 International Workshop
My presentation in MST -11 International WorkshopMy presentation in MST -11 International Workshop
My presentation in MST -11 International Workshop
Arpit Gupta
 
Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Ant colony search and heuristic techniques for optimal dispatch of energy sou...Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Beniamino Murgante
 

Similar to Dog Breed Classification Using Part Localization (20)

15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1a
 
Ensemble classification techniques for detecting signatures of natural select...
Ensemble classification techniques for detecting signatures of natural select...Ensemble classification techniques for detecting signatures of natural select...
Ensemble classification techniques for detecting signatures of natural select...
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
Local Outlier Detection with Interpretation
Local Outlier Detection with InterpretationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation
 
Clustering_Overview.pptx
Clustering_Overview.pptxClustering_Overview.pptx
Clustering_Overview.pptx
 
3D Scene Analysis via Sequenced Predictions over Points and Regions
3D Scene Analysis via Sequenced Predictions over Points and Regions3D Scene Analysis via Sequenced Predictions over Points and Regions
3D Scene Analysis via Sequenced Predictions over Points and Regions
 
Cash nov99
Cash nov99Cash nov99
Cash nov99
 
Deduplication on large amounts of code
Deduplication on large amounts of codeDeduplication on large amounts of code
Deduplication on large amounts of code
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013
 
Resolution
ResolutionResolution
Resolution
 
Bp219 04-13-2011
Bp219 04-13-2011Bp219 04-13-2011
Bp219 04-13-2011
 
Knn
KnnKnn
Knn
 
Study on atome probe
Study on atome probe Study on atome probe
Study on atome probe
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
My presentation in MST -11 International Workshop
My presentation in MST -11 International WorkshopMy presentation in MST -11 International Workshop
My presentation in MST -11 International Workshop
 
Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Ant colony search and heuristic techniques for optimal dispatch of energy sou...Ant colony search and heuristic techniques for optimal dispatch of energy sou...
Ant colony search and heuristic techniques for optimal dispatch of energy sou...
 

Recently uploaded

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 

Recently uploaded (20)

Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 

Dog Breed Classification Using Part Localization

  • 1. Dog Breed Classification Using Part Localization Jiongxin Liu 1, Angjoo Kanazawa2, David Jacobs 2, and Peter Belhumeur1 1 Columbia University 2 University of Maryland
  • 2. Fine-grained classification [Branso [Nilsback n et al and ‘10] Zisserman ’08] [Parkhi et al ’12] [Kumar et al ‘12]
  • 3. Related work • Dense feature extraction: – Mine discriminative region with random forests [Yao et al ’11] – Multiple Kernel Learning [Nilsback and Zisserman ’08] – Post-segmentation [Parkhi and Zisserman ’12] • Pose-normalized appearance: – Birdlets [Farrell et al ’11]
  • 4. Related work • Dense feature extraction: – Mine discriminative region with random forests [Yao et al ’11] Generic sampling of features – Multiple Kernel Learning [Nilsback and Zisserman ’08] contains more noise than useful – Post-Segmentation [Parkhi and Zisserman ’12] information • Pose-normalizedfine-grained classification! for appearance: – Birdlets [Farrell et al ’11]
  • 5. Same breed or not? NO!! Entlebucher Mountain Dog Greater Swiss Mountain Dog
  • 6. Key insight: Differences in common parts are more informative Entlebucher Mountain Dog Greater Swiss Mountain Dog Localize parts based on a non-parameteric method by [Belhumeur et al ‘11]
  • 7. “Columbia dogs with parts” dataset 133 breeds, 8351 images
  • 8. Low inter-breed variation Norfolk Terrier or Cairn Terrier?
  • 9. High intra-breed variation Both labrador retriever
  • 13. Overview of the system 1. Face Detection 2. Part Detection 3. Feature Extraction and ear localization 4. One vs All classification
  • 14. Pipeline 1: Dog Face Detection Keep the 5 highest scoring windows
  • 15. Pipeline 2: Localize Parts Part locations Detector responses Idea: From the “fit” to K most similar exemplars weighted by the detector output, take the most probable part location
  • 16. Review: Consensus of Exemplars ... Local Part Detectors Exemplar Selection Part Localization Slide from Neeraj Kumar
  • 17. RANSAC-like Exemplar Selection 1. Repeat r times: a. Choose random exemplar k b. Choose 2 random modes of local detector outputs D={d i} on query c. Find similarity transform t that aligns exemplar to these points d. Evaluate match of all i face parts for this (k,t) pair: n Probability of this configuration given P(Xk,t | D) = C Õ P(x i k,t i |d ) Part detector probability at this (aligned) location i detector outputs e. Add (k,t) pair to list of possible exemplars, ranked by score 2. Take top M (k,t) pairs for determining global configuration Slide from Neeraj Kumar
  • 18. Final Part Localization For each face part i: a. Compute distribution of this part from all M aligned exemplars b. For each of the top M aligned exemplars [(k,t) pairs]: Multiply normalized local detector outputs with global distribution of part computed from exemplars to get scores at each pixel location c. Add all scores together to get final scores at each pixel and choose max Slide from Neeraj Kumar
  • 19. Pipeline 2: Localize Parts Part locations Detector responses Difference between current part location and that of exemplar From K most similar exemplars and the detector output, take the most probable part location
  • 20. Pipeline 3: Infer ears using detected parts With r(=10) exemplars from each breed
  • 21. Pipeline 3: Infer ears using detected parts With r(=10) exemplars from each breed
  • 22. Pipeline 4: Classification Extract SIFT at part locations for each breed+color histogram  one vs all linear SVM classifier
  • 27. Take a Picture By tapping the nose

Editor's Notes

  1. This is a joint work with Jiongxin Liu, Peter Belhumeur, and David Jacobs.
  2. in which instances from different classes share common parts but have wide variation in shape and appearance. Examples are identifying species of ..These problems lie between the two extremes of individuals such as face identification and basic-level categorizes such as caltech-256.Motivation:A vision system that can do things that humans aren’t very good atApplication for education, examples such as leafsnap) domain of automatic species identification, which is extremely useful for biodiversity studies and general education.. (success in the dog domain will certainly lead to further success in broaderIt is a very challenging problem to solve. We chose dogs as our test domain, (Highlight dogs)
  3. Birdlet<-poselts, find 3D volumetric primitives & describe classes based on their variations. Our work is complementary to their work in that bidlets focuses on using large, articulated parts while we utilize parts describable at point locations. We also use a hierarchical approach in which first the face and the more rigid parts of the face are discovered and used to find class-specific parts such as ears.Built on top of the recent methods for visual object recognition, related work addresses the problem of fine-grained categorization mainly by mining discriminative features via randomized sampling, or with multiple kernel learning framework, or extracting dense features over a segmentated image.Most relevant to our approach is the work by Farrell et al which uses the poselet framework to localize the head and body of birds enabling part-based feature extraction.
  4. Dense feature extraction is often very powerful for object recognition and general visual classification tasks. However, this is not the case for fine-grained categorization, since categories are so visually similar, many regions contain more noise than useful information, and such generic sampling can miss fine details that are needed for correct classificationIn this work, we argue and demonstrate that fine-grained classification can be significantly improved if the features are localized at corresponding object parts.There is a vast literature on face detection and localization parts of human faces. We localize parts of the dog face built on the consensus of models approach by Belhumeur et al, which originally is a non-parametric face parts detector
  5. Here is an example that demonstrates this insight.
  6. Subordinate categories such as dogs/leaves all share semantic parts (legs for charis, stem for leaves, ears for cats and dogs) and the differencces in those parts are more informative than generic sampling of features. These two dogs are of different breeds. The texture of their fur and the color distribution is strikingly similar. But in general, Entlebucher mountain dogs have a shorter snout and rounder nostrils, more pendant, v-shaped, flatter ears while Greater Swiss Mountain dogs have longer snout, nostils that cut to the side with a visible septum, a line between the nostrils, and folded ears that hang on the side of the head.In this work, we argue and demonstrate that fine-grained classification can be significantly improved if the features are localized at corresponding object parts.There is a vast literature on face detection and localization parts of human faces. We localize parts of the dog face built on the consensus of exemplars approach by Belhumeur et al, a non-parametric face parts detector. We extend their method to perform object classification, which has only been previously applied to part detection
  7. -all the dogs face the camera, dog images are from the datasetWe chose dog breed identification as a test case to demonstrate our method.Dogs are an excellent domain for fine grained categorization. After humans, dogs are possibly the most photographed species (perhaps after cats) on the internet.Determination of dog breeds is a very challenging task, sharing many of the challenges seen in fine-grained classification, and success in this domain will certainly lead to further success in broader domain of automatic species identification, which is extremely useful for biodiversity studies and general education.Since we focus on localizing dog parts, we have annotated 8 parts of all dogs in our dataset. Parts are the 2 eyes, nose, ear tips, ear bases, and the top fo the head. Because we only look at these parts, all of the dogs in our dataset are facing the camera, but with varying poses, scale, and rotation, where detection of face parts is far from trivial task.Now I will go over the challenges in recognizing breeds of dogs from a single picture. The first challenge as you can see is that there are many classes. In this work we deal with 133 breeds of dogs.(As a sidenote, all of the pictures you see on these slides are images of dogs from our dataset)
  8. Many subsets of dog breeds are quite similar in appearance.
  9. On top of that, there is also great variation within breeds. These two factors make identification of breeds very challenging especially for humans without expert knowledge.(Try to go back to slide 7and point to Lakeland terrior)
  10. They come in innumerable poses, considerably more than that of human faces
  11. Dogs are very diverse in its visual appearance.
  12. The geometry of their face is also very deformable, again way more than the deformation in human faces.Especially their ear tips: Breeds like beagles have hanging ears, whereas breeds like akita have pointy upright ears.(Also note how nose has greater DoF than the nose of humans like in this picture where the eyes and nose are almost colinear because dogs faces are more 3D (less flat) than ours)These factors make localization of parts very challenging.
  13. Here is the overview of our pipeline: First we detect dog face, then localize three parts, extract features at those places to find most simlar exemplars to detect the rest of the face parts. Then using all the parts we do breed classification and here is a sample result. Green border indicates the correct breed.
  14. We use a sliding window RBF-svm regressor to detect dog faces. Each window has eight SIFT descriptors indicated by these boxes, concatenated into a 1024-dimensional feature vector. We have experimented with a cascaded adaboost detector with Haar like features which works very well with human faces. Perhaps due to the extreme variability on geometry and appearance of dogs faces, the cascaded adaboost detector produced way too many false detections. For details please referr the paper.We keep the 10 highest scoring face detection window and generate hypothesis of part locations for each of them. We keep the face window with the highest score in the next step.
  15. Want part loc that max. probability of that part loc given the detector responses.We want to empose geometric constraints to detector outputs, by combining low-level detectors with labled exemplars.Exe., help create conditional indpt between different parts since we assume that each part is generated by one of the exemplars, so we can re-write…We include the exemplars in the calculation of (1) and marginalized outIntuitively, K exemplars that are most similar to location of the modes of the detector output is selected. They are then transformed to fit the current query image. The P(delta) term is modeled as a 2D gaussian, and the difference between the current part and the exemplar gives how well the model fits the location p_i. We pick the part location that has the highest fit to all $K$ models weighted by the confidence of the detector output.To localize face parts, we first train sliding widnwo linear-SVM detectors for each dog part using a single SIFT feature. If we denote C as detector responses for parts in image I,And p^I denote the ground truth locations of the parts in the image, our goal is to compute (1).Using exemplar (labeled training samples) we can wirte the above for each ith part as (2)The t stands for similarity transformation of model $k$. The K models are selected by RANSAC like procedure. K=100?
  16. Different approach to part detection compared to DPM, but basically they both do the same MAP esimationDPM enforces geometric constraints between parts by parameterizing deformation between connected partsCoE enfoces geometric constraints non-parametricly (although not latent, and part labels are necessary)
  17. Want part loc that max. probability of that part loc given the detector responses.We want to empose geometric constraints to detector outputs, by combining low-level detectors with labled exemplars.Exe., help create conditional indpt between different parts since we assume that each part is generated by one of the exemplars, so we can re-write…We include the exemplars in the calculation of (1) and marginalized outIntuitively, K exemplars that are most similar to location of the modes of the detector output is selected. They are then transformed to fit the current query image. The P(delta) term is modeled as a 2D gaussian, and the difference between the current part and the exemplar gives how well the model fits the location p_i. We pick the part location that has the highest fit to all $K$ models weighted by the confidence of the detector output.To localize face parts, we first train sliding widnwo linear-SVM detectors for each dog part using a single SIFT feature. If we denote C as detector responses for parts in image I,And p^I denote the ground truth locations of the parts in the image, our goal is to compute (1).Using exemplar (labeled training samples) we can wirte the above for each ith part as (2)The t stands for similarity transformation of model $k$. The K models are selected by RANSAC like procedure. K=100?
  18. Similarlly, we infer the ears also by extension of the consensus of models approach. The equations are demonstrated by the animation here.Assuming that the three parts detected in the stage before are accurate, from each breed, we find $R$ many closest exemplars. Do a similarity transform, and find the parts that are most probable.
  19. Again we do this for each breed. The reason why we take this hierarchical approach to detect ears is because the geometry of ears is very breed dependent. So in the end we’ll have 133 hypothesis location of ears.R = 10.
  20. Only 1440-dimensional feature vector (11 parts + kmeans)Finally, for each 133 part hypothesis, we extract sift features at those part locations concatenate it along with color histogram of the face window and send to a linear one vs all svm.One may wonder that we might be missing a lot of information from the body features or fur which is discriminative for dogs like dauschhound, but it is much harder to accurately localize dog parts because of their deformability and occlusion, and if two dogs are easily discriminated by their fur, those breeds have low similarity in apperances and they are easier to classify. The real problem is when features such as fur color and texture is very similar and not discrminative enough, and in these cases looking at the rest of the dog parts is not so useful. One of our contribution is that we get a very good result just by considering their faces.
  21. Note the similarity between the query and the incorrect first guesses.
  22. Look at the magenta curve: Our first guessachieves 67% accuracy and within the first 10 guesses we have achieve 93% accuracy.The green curve is using bag of word approach on the extracted dense SIFT features in a face detection window. Baseline method for object recognition.The cyan and blue are state of the art approaches used earlier for fine-grained categorization methods.The cyan curve uses LLC (locally constrained linear coding) to encode the dictionary for BoW, and blue uses MKL framework and it is extremely inefficient.The second roc curve shows quantative justification of our steps. The pink curve is the our proposed method, the red curve is if we only use the highest scoring face detection window. Note that we keep 10 Green curve shows how without part localization (features extracted on a grid within the face detection window) the accuracy is much lower.
  23. Speaking of efficiency, our system runs in real time as we have an operating iphone application available in itunes now.
  24. Bare layout for ECCV 2012 video preparation. You may submit the .pptx file, or use “File->Save and Send->Create a Video”.Remember: Author names and title will be added above the video by us.
  25. Bare layout for ECCV 2012 video preparation. You may submit the .pptx file, or use “File->Save and Send->Create a Video”.Remember: Author names and title will be added above the video by us.
  26. Bare layout for ECCV 2012 video preparation. You may submit the .pptx file, or use “File->Save and Send->Create a Video”.Remember: Author names and title will be added above the video by us.
  27. Bare layout for ECCV 2012 video preparation. You may submit the .pptx file, or use “File->Save and Send->Create a Video”.Remember: Author names and title will be added above the video by us.
  28. Bare layout for ECCV 2012 video preparation. You may submit the .pptx file, or use “File->Save and Send->Create a Video”.Remember: Author names and title will be added above the video by us.