GOPI KRISHNA NUTI
VICE PRESIDENT, MUST RESEARCH
VP@MUST.CO.IN, NGOPIKRISHNA@GMAIL.COM
COMPUTER VISION AND IMAGE ANALYTICS
– A PRIMER
INTRODUCTION
© 2021 MUST Research
MUST Research – Our publications
• Masters in Data Science from State University of New York at Buffalo, MBA
from Amrita University, Bangalore
• A book introducing Machine Learning from basics through Supervised and
Unsupervised learning for beginners
https://www.amazon.in/Machine-Learning-Engineers-Gopi-
Krishna/dp/9389024870/ref=sr_1_2?dchild=1&keywords=machine+learning+for
+engineers&qid=1616195333&sr=8-2
• Multiple publications and patents
• https://www.linkedin.com/in/ngopikrishna/
© 2021 MUST Research
MUST Research
MUST Research is dedicated to promote excellence and competence in the field of data science, cognitive computing, artificial intelligence,
machine learning, advanced analytics for the benefit of the mankind - it’s a must.
Our vision is to build an ecosystem that enables interaction between academia and enterprise, help them in resolving problems and make them
aware of the latest developments in the cognitive era to provide solutions, guidance or training, organize lectures, seminars and workshops,
collaborate on scientific programs and societal missions.
• India’s largest AI community with 500+ data scientists
• Award winning robots – Softie built in collaboration with Microsoft®
https://www.youtube.com/watch?v=jQ8Gq2HWxiA
• Multiple demonstrations of our robots MUSTie and MUSTani
https://www.youtube.com/watch?v=AewM3TsjoBk
• Letter of appreciation from Govt of Telangana for our contributions
• Branch of Machine Learning which deals with
Images
• Unstructured Data
• Everywhere
• Captured from cameras
• Created by software like MSPaint, Coreldraw,
Adobe Photoshop etc
• Created by software like AutoCAD, Catia, Adobe
Acrobat, MS Word, Powerpoint
• Can contain text, regular shapes, irregular
shapes
• Contain a treasure of information
INTRODUCTIO
N TO
COMPUTER
VISION
• Unstructured Data
• Array of Pixels
IMAGE BASICS - IMAGE REPRESENTATION
0 255 0 0 0 0 0
0 255 0 0 0 0 0
0 255 0 0 0 0 0
0 255 0 0 0 0 0
0 255 0 0 0 0 0
0 255 0 0 0 0 0
0 255 0 0 0 0 0
0 255 255 255 255 255 255
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0
COLOUR SPACES
RGB
Red, Green and Blue
0-255
CMYK Cyan, Magenta, Yellow and Key
HSV Hue, Saturation and Value
Grayscale Black and White
COLOUR SPACES
Traditional Colors
•Described by Isaac Newton described 1672.
•Primary colors are Red, Green and Blue
•Commonly referred to as "Painter's Colors“.
•Not all colors can be generated.
Subtractive Colors
•Called "Printer's Colors“.
•Colour we see is because of a particular frequency not being absorbed from White light. i.e. Subtracted
•Primary colors are Cyan, Yellow, and Magenta.
Additive Colors
•Adds primary colours together to get a choice of colour
•Displays work like this
WHAT IS IMAGE
PROCESSING?
• Extract quantifiable and meaningful
information out of an image
• Objects present in the image
• Location in the image
• Background or Foreground
• Distance from the viewer
IS IMAGE PROCESSING NEW TO COMPUTERS?
No. My grand mother used it without ever seeing a computer.
Remember the days before internet?
Features in a Cathode Ray Tube Television
 Brightening
 Contrast
 Colour
 Sharpness
 How was this done?
Convolution
CONVOLUTION IN DIGITAL WORLD
• Process of adding each element of an image to its local neighbours weighted by a curve
• NOT the same as MatMult
• Used for blurring, sharpening, Up/Down sampling, Spherical distortion, De-noising, noise-filter etc
CONVOLUTION IN DIGITAL WORLD
• Depending on the convolution matrix, steps and operation chosen, he resultant image shall vary.
WHERE IS WALDO
Locate this
gentleman in next
slide
WHERE IS THE NUMBER PLATE?
HOW MUCH DAMAGE
ON TO COMPUTER
VISION
ALGORITHMS
COMPUTER VISION – THE (AGE) OLD PROBLEMS
• What should a robot do in “Scene understanding”?
• Identify colours, brightness etc
• Identify objects a.k.a Image Segmentation
• Different things
• Multiple occurrences of the same thing
• Stuff other than things
• Distance of things and stuff
• Relative and absolute
COLOUR AND
BRIGHTNESS
Colour
spaces
•Grayscale,
RGB, CMY,
•Transparen
cy/Opacity
using a
fourth
attribute
Limitations
•Does not
represent all
colours in
nature
•colour
perception
highly
susceptible to
lighting
changes.
New Solutions
• Colour spaces
have been
expanded
greatly.
• With micro and
macro level
differences,
~250 colour
spaces are in
vogue
• HSV, HSL/HSI,
YUV, YPbPr,
YCbCr etc
OLD PROBLEM –
IMAGE
SEGMENTATION
• Panoptic Segmentation – Not a
technique. A metric
OLD PROBLEM –
IMAGE
SEGMENTATION
Image is an matrix of numbers.
How to identify the edges of each object
How to recognize the object correctly
Differentiate between “things”
(foreground) and “stuff” (background)
IMAGE SEGMENTATION
–
OLD SOLUTIONS
Solution
Family
Algorithm Drawbacks
Thresholding
• Otsu thresholding
• Adaptive local thresholding
• Mean
• Gaussian
For reasonably simple scenarios only
Edges and Corners
• Canny edges, Sobel Hough, Laplace algorithms
• Harris Corner detection
• Convolution of kernels
Unsuitable for noisy/blurry images
Region Growing
Watershed
• Relatively strong at detecting overlapping/touching
objects
Super Pixels
• SLCI Algorithm
• Susceptible to noise
• Steep increase in algorithmic complexity
Clustering
• K-means
• Fuzzy C-Means (FCM)
• Expectation Maximization (EM)
• Relies on low level features like colour etc.
• Poor performance on complicated images
Clustering • Image Pyramid
• Carefully controlled environments only
• Cannot handle non-affine transformation like rotation,
reflection etc.
• Occlusions are a big no-no
• Compute intensive
IMAGE SEGMENTATION
–
CONVOLUTIONAL NEURAL NETWORKS
• Specialized kind of neural networks
• Process data in known grid-like spatial structures
• Comprised of large number of layers like convolution, pooling and Fully connected layers
• Usually, very very deep. i.e. lots of layers and lots of weight parameters
• Non linear Activation Functions are mandatory for learning complex features
http://cs231n.github.io/convolutional-networks/#overview
EVOLUTIO
N OF CNN
CLASSIFIE
RS
2014
• Regions
with CNN
Features
2015
• Fast R-CNN
• Faster R-CNN
• Inception V3
2016
• YOLO
• SSD
• UberNet
2017
• Mask R-CNN
• Pixel wise
Instance
Segmentation
SOME
SALIENT
POINTS
Regions with CNN Features
R-CNN
•Uses Selective Search
•Significantly reduced the search space to ~2000 region proposal
•Very Slow and very complicated
Designed to solve the problems with R-CNN
Fast R-CNN
•Region Of Interest is treated as a pooling layer
•Jointly trains feature extractor, classifier and bounding box regression into a single model
•Almost 25 time faster than R-CNN
Replace Selective search with region proposal network
Faster R-CNN
•10 times faster than Fast R-CNN
You Only Look Once
YOLO
•Detection is considered as a regression problem
•Extremely fast but less accurate. Struggles with small objects that appear in groups
Single Shot Multi box detector
SSD
•Faster than YOLO and more accurate as well.
Extension of Faster R-CNN
Mask R-CNN
•Predicts the object masks as well as bounding box
•Impressive results
OLD
PROBLEM
-
DEPTH
PERCEPTIO
N
Normal vision and
depth perception
expectation
Relative
depth
Optical illusion based on
depth
Picture of a picture. All
pixels have same depth
OLD
SOLUTIONS
-
DEPTH
PERCEPTIO
N
• Stereo cameras spaced at a fixed distance apart capture the
same image.
• Remember trigonometry? 
• Algorithm Families
• Triangulation
• Interferometry
• Time of Flight
• Many Limitations
• Cost
• Complexity
• Controlled environments only
NEW
SOLUTIONS
-
DEPTH
PERCEPTIO
N
• Furious research in progress
• Single camera moving between two fixed positions
• Monocular Depth perception
• Some interesting proposals
• Train NN with depth information and semantically segmented
image
• Use the models for predicting depth in new images
• Solutions are almost mainstream
• Anyone heard of Kinect?
OLD PROBLEM –
PROGRAMMER’S
DILEMMA
OLD PROBLEM
-
PROGRAMMERS
DILEMMA
• Which image format should I use?
• Which image file format should I code for? Do I have to learn
reading and writing image files?
• Matlab is expensive 
NEW SOLUTION
-
OPENCV, PYTHON,
PILLOW ETC
• OpenCV
• Democratized image processing
• A large number of functionalities provided as APIs
• Impressive Python bindings and native support for C, Java
• Python
• PILLOW and many other libraries for reading images
• Vectorization and Numpy Arrays
NEW SOLUTIONS
–
NEW PROBLEMS
NEURAL
NETWORKS
• Data hungry. Lots and lots of training data.
• Resource hungry and compute intensive.
• Overfitting, Underfitting, Stochasticity
• Black box
SOME
SOLUTIONS
• Transfer Learning to reduce training time
• Hyper parameter tuning
• Hardware based solutions for improving performance
• On-going research for explainability
• On-going research for reducing the training data requirement 3rd
generation neural networks
THANKS

Image analytics - A Primer

  • 1.
    GOPI KRISHNA NUTI VICEPRESIDENT, MUST RESEARCH VP@MUST.CO.IN, NGOPIKRISHNA@GMAIL.COM COMPUTER VISION AND IMAGE ANALYTICS – A PRIMER
  • 2.
  • 3.
    © 2021 MUSTResearch MUST Research – Our publications • Masters in Data Science from State University of New York at Buffalo, MBA from Amrita University, Bangalore • A book introducing Machine Learning from basics through Supervised and Unsupervised learning for beginners https://www.amazon.in/Machine-Learning-Engineers-Gopi- Krishna/dp/9389024870/ref=sr_1_2?dchild=1&keywords=machine+learning+for +engineers&qid=1616195333&sr=8-2 • Multiple publications and patents • https://www.linkedin.com/in/ngopikrishna/
  • 4.
    © 2021 MUSTResearch MUST Research MUST Research is dedicated to promote excellence and competence in the field of data science, cognitive computing, artificial intelligence, machine learning, advanced analytics for the benefit of the mankind - it’s a must. Our vision is to build an ecosystem that enables interaction between academia and enterprise, help them in resolving problems and make them aware of the latest developments in the cognitive era to provide solutions, guidance or training, organize lectures, seminars and workshops, collaborate on scientific programs and societal missions. • India’s largest AI community with 500+ data scientists • Award winning robots – Softie built in collaboration with Microsoft® https://www.youtube.com/watch?v=jQ8Gq2HWxiA • Multiple demonstrations of our robots MUSTie and MUSTani https://www.youtube.com/watch?v=AewM3TsjoBk • Letter of appreciation from Govt of Telangana for our contributions
  • 5.
    • Branch ofMachine Learning which deals with Images • Unstructured Data • Everywhere • Captured from cameras • Created by software like MSPaint, Coreldraw, Adobe Photoshop etc • Created by software like AutoCAD, Catia, Adobe Acrobat, MS Word, Powerpoint • Can contain text, regular shapes, irregular shapes • Contain a treasure of information INTRODUCTIO N TO COMPUTER VISION
  • 6.
    • Unstructured Data •Array of Pixels IMAGE BASICS - IMAGE REPRESENTATION 0 255 0 0 0 0 0 0 255 0 0 0 0 0 0 255 0 0 0 0 0 0 255 0 0 0 0 0 0 255 0 0 0 0 0 0 255 0 0 0 0 0 0 255 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 7.
    COLOUR SPACES RGB Red, Greenand Blue 0-255 CMYK Cyan, Magenta, Yellow and Key HSV Hue, Saturation and Value Grayscale Black and White
  • 8.
    COLOUR SPACES Traditional Colors •Describedby Isaac Newton described 1672. •Primary colors are Red, Green and Blue •Commonly referred to as "Painter's Colors“. •Not all colors can be generated. Subtractive Colors •Called "Printer's Colors“. •Colour we see is because of a particular frequency not being absorbed from White light. i.e. Subtracted •Primary colors are Cyan, Yellow, and Magenta. Additive Colors •Adds primary colours together to get a choice of colour •Displays work like this
  • 9.
    WHAT IS IMAGE PROCESSING? •Extract quantifiable and meaningful information out of an image • Objects present in the image • Location in the image • Background or Foreground • Distance from the viewer
  • 10.
    IS IMAGE PROCESSINGNEW TO COMPUTERS? No. My grand mother used it without ever seeing a computer. Remember the days before internet? Features in a Cathode Ray Tube Television  Brightening  Contrast  Colour  Sharpness  How was this done? Convolution
  • 11.
    CONVOLUTION IN DIGITALWORLD • Process of adding each element of an image to its local neighbours weighted by a curve • NOT the same as MatMult • Used for blurring, sharpening, Up/Down sampling, Spherical distortion, De-noising, noise-filter etc
  • 12.
    CONVOLUTION IN DIGITALWORLD • Depending on the convolution matrix, steps and operation chosen, he resultant image shall vary.
  • 13.
    WHERE IS WALDO Locatethis gentleman in next slide
  • 15.
    WHERE IS THENUMBER PLATE?
  • 17.
  • 18.
  • 19.
    COMPUTER VISION –THE (AGE) OLD PROBLEMS • What should a robot do in “Scene understanding”? • Identify colours, brightness etc • Identify objects a.k.a Image Segmentation • Different things • Multiple occurrences of the same thing • Stuff other than things • Distance of things and stuff • Relative and absolute
  • 20.
    COLOUR AND BRIGHTNESS Colour spaces •Grayscale, RGB, CMY, •Transparen cy/Opacity usinga fourth attribute Limitations •Does not represent all colours in nature •colour perception highly susceptible to lighting changes. New Solutions • Colour spaces have been expanded greatly. • With micro and macro level differences, ~250 colour spaces are in vogue • HSV, HSL/HSI, YUV, YPbPr, YCbCr etc
  • 21.
    OLD PROBLEM – IMAGE SEGMENTATION •Panoptic Segmentation – Not a technique. A metric
  • 22.
    OLD PROBLEM – IMAGE SEGMENTATION Imageis an matrix of numbers. How to identify the edges of each object How to recognize the object correctly Differentiate between “things” (foreground) and “stuff” (background)
  • 23.
    IMAGE SEGMENTATION – OLD SOLUTIONS Solution Family AlgorithmDrawbacks Thresholding • Otsu thresholding • Adaptive local thresholding • Mean • Gaussian For reasonably simple scenarios only Edges and Corners • Canny edges, Sobel Hough, Laplace algorithms • Harris Corner detection • Convolution of kernels Unsuitable for noisy/blurry images Region Growing Watershed • Relatively strong at detecting overlapping/touching objects Super Pixels • SLCI Algorithm • Susceptible to noise • Steep increase in algorithmic complexity Clustering • K-means • Fuzzy C-Means (FCM) • Expectation Maximization (EM) • Relies on low level features like colour etc. • Poor performance on complicated images Clustering • Image Pyramid • Carefully controlled environments only • Cannot handle non-affine transformation like rotation, reflection etc. • Occlusions are a big no-no • Compute intensive
  • 24.
    IMAGE SEGMENTATION – CONVOLUTIONAL NEURALNETWORKS • Specialized kind of neural networks • Process data in known grid-like spatial structures • Comprised of large number of layers like convolution, pooling and Fully connected layers • Usually, very very deep. i.e. lots of layers and lots of weight parameters • Non linear Activation Functions are mandatory for learning complex features
  • 25.
  • 26.
    EVOLUTIO N OF CNN CLASSIFIE RS 2014 •Regions with CNN Features 2015 • Fast R-CNN • Faster R-CNN • Inception V3 2016 • YOLO • SSD • UberNet 2017 • Mask R-CNN • Pixel wise Instance Segmentation
  • 27.
    SOME SALIENT POINTS Regions with CNNFeatures R-CNN •Uses Selective Search •Significantly reduced the search space to ~2000 region proposal •Very Slow and very complicated Designed to solve the problems with R-CNN Fast R-CNN •Region Of Interest is treated as a pooling layer •Jointly trains feature extractor, classifier and bounding box regression into a single model •Almost 25 time faster than R-CNN Replace Selective search with region proposal network Faster R-CNN •10 times faster than Fast R-CNN You Only Look Once YOLO •Detection is considered as a regression problem •Extremely fast but less accurate. Struggles with small objects that appear in groups Single Shot Multi box detector SSD •Faster than YOLO and more accurate as well. Extension of Faster R-CNN Mask R-CNN •Predicts the object masks as well as bounding box •Impressive results
  • 28.
    OLD PROBLEM - DEPTH PERCEPTIO N Normal vision and depthperception expectation Relative depth Optical illusion based on depth Picture of a picture. All pixels have same depth
  • 29.
    OLD SOLUTIONS - DEPTH PERCEPTIO N • Stereo camerasspaced at a fixed distance apart capture the same image. • Remember trigonometry?  • Algorithm Families • Triangulation • Interferometry • Time of Flight • Many Limitations • Cost • Complexity • Controlled environments only
  • 30.
    NEW SOLUTIONS - DEPTH PERCEPTIO N • Furious researchin progress • Single camera moving between two fixed positions • Monocular Depth perception • Some interesting proposals • Train NN with depth information and semantically segmented image • Use the models for predicting depth in new images • Solutions are almost mainstream • Anyone heard of Kinect?
  • 31.
  • 32.
    OLD PROBLEM - PROGRAMMERS DILEMMA • Whichimage format should I use? • Which image file format should I code for? Do I have to learn reading and writing image files? • Matlab is expensive 
  • 33.
    NEW SOLUTION - OPENCV, PYTHON, PILLOWETC • OpenCV • Democratized image processing • A large number of functionalities provided as APIs • Impressive Python bindings and native support for C, Java • Python • PILLOW and many other libraries for reading images • Vectorization and Numpy Arrays
  • 34.
  • 35.
    NEURAL NETWORKS • Data hungry.Lots and lots of training data. • Resource hungry and compute intensive. • Overfitting, Underfitting, Stochasticity • Black box
  • 36.
    SOME SOLUTIONS • Transfer Learningto reduce training time • Hyper parameter tuning • Hardware based solutions for improving performance • On-going research for explainability • On-going research for reducing the training data requirement 3rd generation neural networks
  • 37.