SlideShare a Scribd company logo
1 of 30
Scale Invariant Feature Transform
(SIFT)
Why do we care about matching features?
• Camera calibration
• Stereo
• Tracking/SFM
• Image moiaicing
• Object/activity Recognition
• …
Objection representation and recognition
• Image content is transformed into local feature
coordinates that are invariant to translation, rotation,
scale, and other imaging parameters
• Automatic Mosaicing
• http://www.cs.ubc.ca/~mbrown/autostitch/autostitch.html
We want invariance!!!
• To illumination
• To scale
• To rotation
• To affine
• To perspective projection
Types of invariance
• Illumination
Types of invariance
• Illumination
• Scale
Types of invariance
• Illumination
• Scale
• Rotation
Types of invariance
• Illumination
• Scale
• Rotation
• Affine (view point
change)
Types of invariance
• Illumination
• Scale
• Rotation
• Affine
• Full Perspective
How to achieve illumination invariance
• The easy way
(normalized)
• Difference based metrics
(random tree, Haar, and
sift, gradient)
How to achieve scale invariance
• Pyramids
• Scale Space (DOG method)
Pyramids
– Divide width and height by 2
– Take average of 4 pixels for
each pixel (or Gaussian blur
with different s)
– Repeat until image is tiny
– Run filter over each size
image and hope its robust
How to achieve scale invariance
• Scale Space: Difference of Gaussian (DOG)
– Take DOG features from differences of these
images-producing the gradient image at different
scales.
– If the feature is repeatedly present in between
Difference of Gaussians, it is Scale Invariant and
should be kept.
Differences Of Gaussians
Rotation Invariance
• Compute histogram of gradient directions to
identify the most dominant direction. Rotate
image to the most dominant direction.
• Rotate all features to go the same way in a
determined manner
Rotation Invariance
SIFT algorithm overview
• Scale-space extrema detection
– Get tons of points from maxima+minima of DOGS
• Keypoint localization
– Threshold on simple contrast (low contrast is generally
less reliable than high for feature points)
– Threshold based on principal curvatures to remove
linear features such as edges
– Orientation assignment
• Keypoint descriptor
– Construct histograms of gradients (HOG)
Scale-space extrema detection
• Find the points, whose surrounding patches (with
some scale) are distinctive
• An approximation to the scale-normalized
Difference of Gaussian
Extreme Point Detection
Convolve with
Gaussian
Downsample
Find extrema
in 3D DoG space
Maxima and
minima in a
3*3*3
neighborhood
At each image size, convolve with DoG of different scales.
Keypoint localization
• Eliminating extreme points with low
contrast
• Eliminating edge points
– Similar to Harris corner detector
Eliminating edge points
• Such a point has large principal curvature across
the edge but a small one in the perpendicular
direction
• The principal curvatures can be calculated from a
Hessian function or covariance matrix of gradient
(Harris detector)
• The eigenvalues of H or C are proportional to the
principal curvatures, so two eigenvalues shouldn’t
differ too much.
Finding Keypoints – Orientation
• Create histogram of local
gradient directions computed
at selected scale
• Assign canonical orientation at
peak of smoothed histogram,
achieving invariance to image
rotation
• Each key point specifies stable
2D coordinates (x, y, scale,
orientation) 0 2 
SIFT Key Feature descriptor
Take a 16x16 window of "in-between" pixels around the
key point. Split that window into sixteen 4x4 windows.
From each 4x4 window, generate a histogram of 8 bins,
producing a total of 4x4x8=128 feature vector.
SIFT Key Feature descriptor
Create the features for each SIFT key point. Given 8
orientations and 4x4 histograms, each SIFT key is described
by a vector of 128 (8x4x4) as shown below.
4 histograms with 8 direction bins
for each histogram, producing 4x8 features.
Actual SIFT stage output
After remove low contrast features After removing linear features (e.g. window edges)
e.g. in the sky region
Image matching with SIFT features
• Nearest neighbor matching
– Distance could be L2 norm on histograms
– Match by (nearest neighbor distance)/(2nd nearest
neighbor distance) ratio
• Use Hough transform like voting: each SIFT key
point votes for an object and the object
receives the most votes is the recognized
object
Application: object recognition
• The SIFT key features of training images are
extracted and stored
• For a query image
1. Extract SIFT key points and their features
2. Nearest neighbor matching or HT voting
Conclusion
• A novel method for detecting interest points.
The most successful feature in computer
vision
• Histogram of Oriented Gradients are
becoming more popular
• SIFT may not be optimal for general object
classification

More Related Content

Similar to SIFT.ppt

Computer Vision invariance
Computer Vision invarianceComputer Vision invariance
Computer Vision invarianceWael Badawy
 
COMP 4010 Lecture10: AR Tracking
COMP 4010 Lecture10: AR TrackingCOMP 4010 Lecture10: AR Tracking
COMP 4010 Lecture10: AR TrackingMark Billinghurst
 
11 cie552 image_featuresii_sift
11 cie552 image_featuresii_sift11 cie552 image_featuresii_sift
11 cie552 image_featuresii_siftElsayed Hemayed
 
affine transformation for computer graphics
affine transformation for computer graphicsaffine transformation for computer graphics
affine transformation for computer graphicsDrSUGANYADEVIK
 
Pattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptxPattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptxEngRSMY2
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learningYu Huang
 
Introduction to Real Time Rendering
Introduction to Real Time RenderingIntroduction to Real Time Rendering
Introduction to Real Time RenderingKoray Hagen
 
Comp4010 Lecture4 AR Tracking and Interaction
Comp4010 Lecture4 AR Tracking and InteractionComp4010 Lecture4 AR Tracking and Interaction
Comp4010 Lecture4 AR Tracking and InteractionMark Billinghurst
 
Michal Erel's SIFT presentation
Michal Erel's SIFT presentationMichal Erel's SIFT presentation
Michal Erel's SIFT presentationwolf
 
Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigationVajira Thambawita
 
Fingerprint Images Enhancement ppt
Fingerprint Images Enhancement pptFingerprint Images Enhancement ppt
Fingerprint Images Enhancement pptMukta Gupta
 
XNA L02–Basic Matrices and Transformations
XNA L02–Basic Matrices and TransformationsXNA L02–Basic Matrices and Transformations
XNA L02–Basic Matrices and TransformationsMohammad Shaker
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Yi-Hsuan Tsai
 
Computer Vision harris
Computer Vision harrisComputer Vision harris
Computer Vision harrisWael Badawy
 
Lecture 9-online
Lecture 9-onlineLecture 9-online
Lecture 9-onlinelifebreath
 

Similar to SIFT.ppt (20)

Computer Vision invariance
Computer Vision invarianceComputer Vision invariance
Computer Vision invariance
 
COMP 4010 Lecture10: AR Tracking
COMP 4010 Lecture10: AR TrackingCOMP 4010 Lecture10: AR Tracking
COMP 4010 Lecture10: AR Tracking
 
11 cie552 image_featuresii_sift
11 cie552 image_featuresii_sift11 cie552 image_featuresii_sift
11 cie552 image_featuresii_sift
 
affine transformation for computer graphics
affine transformation for computer graphicsaffine transformation for computer graphics
affine transformation for computer graphics
 
Cbir ‐ features
Cbir ‐ featuresCbir ‐ features
Cbir ‐ features
 
Image processing.pdf
Image processing.pdfImage processing.pdf
Image processing.pdf
 
project_PPT_final
project_PPT_finalproject_PPT_final
project_PPT_final
 
PPT s06-machine vision-s2
PPT s06-machine vision-s2PPT s06-machine vision-s2
PPT s06-machine vision-s2
 
Pattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptxPattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptx
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
 
Introduction to Real Time Rendering
Introduction to Real Time RenderingIntroduction to Real Time Rendering
Introduction to Real Time Rendering
 
Comp4010 Lecture4 AR Tracking and Interaction
Comp4010 Lecture4 AR Tracking and InteractionComp4010 Lecture4 AR Tracking and Interaction
Comp4010 Lecture4 AR Tracking and Interaction
 
Michal Erel's SIFT presentation
Michal Erel's SIFT presentationMichal Erel's SIFT presentation
Michal Erel's SIFT presentation
 
Lecture 12 localization and navigation
Lecture 12 localization and navigationLecture 12 localization and navigation
Lecture 12 localization and navigation
 
Fingerprint Images Enhancement ppt
Fingerprint Images Enhancement pptFingerprint Images Enhancement ppt
Fingerprint Images Enhancement ppt
 
3 D texturing
 3 D texturing 3 D texturing
3 D texturing
 
XNA L02–Basic Matrices and Transformations
XNA L02–Basic Matrices and TransformationsXNA L02–Basic Matrices and Transformations
XNA L02–Basic Matrices and Transformations
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
 
Computer Vision harris
Computer Vision harrisComputer Vision harris
Computer Vision harris
 
Lecture 9-online
Lecture 9-onlineLecture 9-online
Lecture 9-online
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

SIFT.ppt

  • 1. Scale Invariant Feature Transform (SIFT)
  • 2. Why do we care about matching features? • Camera calibration • Stereo • Tracking/SFM • Image moiaicing • Object/activity Recognition • …
  • 3. Objection representation and recognition • Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters
  • 4. • Automatic Mosaicing • http://www.cs.ubc.ca/~mbrown/autostitch/autostitch.html
  • 5. We want invariance!!! • To illumination • To scale • To rotation • To affine • To perspective projection
  • 6. Types of invariance • Illumination
  • 7. Types of invariance • Illumination • Scale
  • 8. Types of invariance • Illumination • Scale • Rotation
  • 9. Types of invariance • Illumination • Scale • Rotation • Affine (view point change)
  • 10. Types of invariance • Illumination • Scale • Rotation • Affine • Full Perspective
  • 11. How to achieve illumination invariance • The easy way (normalized) • Difference based metrics (random tree, Haar, and sift, gradient)
  • 12. How to achieve scale invariance • Pyramids • Scale Space (DOG method)
  • 13. Pyramids – Divide width and height by 2 – Take average of 4 pixels for each pixel (or Gaussian blur with different s) – Repeat until image is tiny – Run filter over each size image and hope its robust
  • 14. How to achieve scale invariance • Scale Space: Difference of Gaussian (DOG) – Take DOG features from differences of these images-producing the gradient image at different scales. – If the feature is repeatedly present in between Difference of Gaussians, it is Scale Invariant and should be kept.
  • 16. Rotation Invariance • Compute histogram of gradient directions to identify the most dominant direction. Rotate image to the most dominant direction. • Rotate all features to go the same way in a determined manner
  • 18. SIFT algorithm overview • Scale-space extrema detection – Get tons of points from maxima+minima of DOGS • Keypoint localization – Threshold on simple contrast (low contrast is generally less reliable than high for feature points) – Threshold based on principal curvatures to remove linear features such as edges – Orientation assignment • Keypoint descriptor – Construct histograms of gradients (HOG)
  • 19. Scale-space extrema detection • Find the points, whose surrounding patches (with some scale) are distinctive • An approximation to the scale-normalized Difference of Gaussian
  • 20. Extreme Point Detection Convolve with Gaussian Downsample Find extrema in 3D DoG space Maxima and minima in a 3*3*3 neighborhood At each image size, convolve with DoG of different scales.
  • 21. Keypoint localization • Eliminating extreme points with low contrast • Eliminating edge points – Similar to Harris corner detector
  • 22. Eliminating edge points • Such a point has large principal curvature across the edge but a small one in the perpendicular direction • The principal curvatures can be calculated from a Hessian function or covariance matrix of gradient (Harris detector) • The eigenvalues of H or C are proportional to the principal curvatures, so two eigenvalues shouldn’t differ too much.
  • 23. Finding Keypoints – Orientation • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram, achieving invariance to image rotation • Each key point specifies stable 2D coordinates (x, y, scale, orientation) 0 2 
  • 24. SIFT Key Feature descriptor Take a 16x16 window of "in-between" pixels around the key point. Split that window into sixteen 4x4 windows. From each 4x4 window, generate a histogram of 8 bins, producing a total of 4x4x8=128 feature vector.
  • 25. SIFT Key Feature descriptor Create the features for each SIFT key point. Given 8 orientations and 4x4 histograms, each SIFT key is described by a vector of 128 (8x4x4) as shown below. 4 histograms with 8 direction bins for each histogram, producing 4x8 features.
  • 27. After remove low contrast features After removing linear features (e.g. window edges) e.g. in the sky region
  • 28. Image matching with SIFT features • Nearest neighbor matching – Distance could be L2 norm on histograms – Match by (nearest neighbor distance)/(2nd nearest neighbor distance) ratio • Use Hough transform like voting: each SIFT key point votes for an object and the object receives the most votes is the recognized object
  • 29. Application: object recognition • The SIFT key features of training images are extracted and stored • For a query image 1. Extract SIFT key points and their features 2. Nearest neighbor matching or HT voting
  • 30. Conclusion • A novel method for detecting interest points. The most successful feature in computer vision • Histogram of Oriented Gradients are becoming more popular • SIFT may not be optimal for general object classification