5. • Branch of Machine Learning which deals with
Images
• Unstructured Data
• Everywhere
• Captured from cameras
• Created by software like MSPaint, Coreldraw,
Adobe Photoshop etc
• Created by software like AutoCAD, Catia, Adobe
Acrobat, MS Word, Powerpoint
• Can contain text, regular shapes, irregular
shapes
• Contain a treasure of information
INTRODUCTIO
N TO
COMPUTER
VISION
7. COLOUR SPACES
RGB
Red, Green and Blue
0-255
CMYK Cyan, Magenta, Yellow and Key
HSV Hue, Saturation and Value
Grayscale Black and White
8. COLOUR SPACES
Traditional Colors
•Described by Isaac Newton described 1672.
•Primary colors are Red, Green and Blue
•Commonly referred to as "Painter's Colors“.
•Not all colors can be generated.
Subtractive Colors
•Called "Printer's Colors“.
•Colour we see is because of a particular frequency not being absorbed from White light. i.e. Subtracted
•Primary colors are Cyan, Yellow, and Magenta.
Additive Colors
•Adds primary colours together to get a choice of colour
•Displays work like this
9. WHAT IS IMAGE
PROCESSING?
• Extract quantifiable and meaningful
information out of an image
• Objects present in the image
• Location in the image
• Background or Foreground
• Distance from the viewer
10. IS IMAGE PROCESSING NEW TO COMPUTERS?
No. My grand mother used it without ever seeing a computer.
Remember the days before internet?
Features in a Cathode Ray Tube Television
Brightening
Contrast
Colour
Sharpness
How was this done?
Convolution
11. CONVOLUTION IN DIGITAL WORLD
• Process of adding each element of an image to its local neighbours weighted by a curve
• NOT the same as MatMult
• Used for blurring, sharpening, Up/Down sampling, Spherical distortion, De-noising, noise-filter etc
12. CONVOLUTION IN DIGITAL WORLD
• Depending on the convolution matrix, steps and operation chosen, he resultant image shall vary.
19. COMPUTER VISION – THE (AGE) OLD PROBLEMS
• What should a robot do in “Scene understanding”?
• Identify colours, brightness etc
• Identify objects a.k.a Image Segmentation
• Different things
• Multiple occurrences of the same thing
• Stuff other than things
• Distance of things and stuff
• Relative and absolute
20. COLOUR AND
BRIGHTNESS
Colour
spaces
•Grayscale,
RGB, CMY,
•Transparen
cy/Opacity
using a
fourth
attribute
Limitations
•Does not
represent all
colours in
nature
•colour
perception
highly
susceptible to
lighting
changes.
New Solutions
• Colour spaces
have been
expanded
greatly.
• With micro and
macro level
differences,
~250 colour
spaces are in
vogue
• HSV, HSL/HSI,
YUV, YPbPr,
YCbCr etc
22. OLD PROBLEM –
IMAGE
SEGMENTATION
Image is an matrix of numbers.
How to identify the edges of each object
How to recognize the object correctly
Differentiate between “things”
(foreground) and “stuff” (background)
23. IMAGE SEGMENTATION
–
OLD SOLUTIONS
Solution
Family
Algorithm Drawbacks
Thresholding
• Otsu thresholding
• Adaptive local thresholding
• Mean
• Gaussian
For reasonably simple scenarios only
Edges and Corners
• Canny edges, Sobel Hough, Laplace algorithms
• Harris Corner detection
• Convolution of kernels
Unsuitable for noisy/blurry images
Region Growing
Watershed
• Relatively strong at detecting overlapping/touching
objects
Super Pixels
• SLCI Algorithm
• Susceptible to noise
• Steep increase in algorithmic complexity
Clustering
• K-means
• Fuzzy C-Means (FCM)
• Expectation Maximization (EM)
• Relies on low level features like colour etc.
• Poor performance on complicated images
Clustering • Image Pyramid
• Carefully controlled environments only
• Cannot handle non-affine transformation like rotation,
reflection etc.
• Occlusions are a big no-no
• Compute intensive
24. IMAGE SEGMENTATION
–
CONVOLUTIONAL NEURAL NETWORKS
• Specialized kind of neural networks
• Process data in known grid-like spatial structures
• Comprised of large number of layers like convolution, pooling and Fully connected layers
• Usually, very very deep. i.e. lots of layers and lots of weight parameters
• Non linear Activation Functions are mandatory for learning complex features
26. EVOLUTIO
N OF CNN
CLASSIFIE
RS
2014
• Regions
with CNN
Features
2015
• Fast R-CNN
• Faster R-CNN
• Inception V3
2016
• YOLO
• SSD
• UberNet
2017
• Mask R-CNN
• Pixel wise
Instance
Segmentation
27. SOME
SALIENT
POINTS
Regions with CNN Features
R-CNN
•Uses Selective Search
•Significantly reduced the search space to ~2000 region proposal
•Very Slow and very complicated
Designed to solve the problems with R-CNN
Fast R-CNN
•Region Of Interest is treated as a pooling layer
•Jointly trains feature extractor, classifier and bounding box regression into a single model
•Almost 25 time faster than R-CNN
Replace Selective search with region proposal network
Faster R-CNN
•10 times faster than Fast R-CNN
You Only Look Once
YOLO
•Detection is considered as a regression problem
•Extremely fast but less accurate. Struggles with small objects that appear in groups
Single Shot Multi box detector
SSD
•Faster than YOLO and more accurate as well.
Extension of Faster R-CNN
Mask R-CNN
•Predicts the object masks as well as bounding box
•Impressive results
29. OLD
SOLUTIONS
-
DEPTH
PERCEPTIO
N
• Stereo cameras spaced at a fixed distance apart capture the
same image.
• Remember trigonometry?
• Algorithm Families
• Triangulation
• Interferometry
• Time of Flight
• Many Limitations
• Cost
• Complexity
• Controlled environments only
30. NEW
SOLUTIONS
-
DEPTH
PERCEPTIO
N
• Furious research in progress
• Single camera moving between two fixed positions
• Monocular Depth perception
• Some interesting proposals
• Train NN with depth information and semantically segmented
image
• Use the models for predicting depth in new images
• Solutions are almost mainstream
• Anyone heard of Kinect?
32. OLD PROBLEM
-
PROGRAMMERS
DILEMMA
• Which image format should I use?
• Which image file format should I code for? Do I have to learn
reading and writing image files?
• Matlab is expensive
33. NEW SOLUTION
-
OPENCV, PYTHON,
PILLOW ETC
• OpenCV
• Democratized image processing
• A large number of functionalities provided as APIs
• Impressive Python bindings and native support for C, Java
• Python
• PILLOW and many other libraries for reading images
• Vectorization and Numpy Arrays
35. NEURAL
NETWORKS
• Data hungry. Lots and lots of training data.
• Resource hungry and compute intensive.
• Overfitting, Underfitting, Stochasticity
• Black box
36. SOME
SOLUTIONS
• Transfer Learning to reduce training time
• Hyper parameter tuning
• Hardware based solutions for improving performance
• On-going research for explainability
• On-going research for reducing the training data requirement 3rd
generation neural networks