I had the opportunity to teach a lab of computer vision for the amazing women studying one of the AllWomen courses.
Here I share the slides used for teaching these 2h labs where I tried to cover in a very high level some of the basic concepts of computer vision.
2. hello!
I am Karenne Mata
You can find the material for this lab in github
https://github.com/karenne/ComputerVisionLab
2
3. “Computer vision and machine learning
have really started to take off, but for
most people, the whole idea of what is a
computer seeing when it's looking at an
image is relatively obscure.
Mike Krieger
3
4. What is computer vision?
"Computer vision is an interdisciplinary scientific field that deals with
how computers can be made to gain high-level understanding from
digital images or videos. From the perspective of engineering, it seeks to
automate tasks that the human visual system can do.” - Wikipedia
4
?
5. Still a little obscure, isn’t it?
5
I am the Machine
looking at this image
A Black Box,
actually my brain
I am probably seeing a
5!
But I need some training before
getting so good at seeing numbers
6. HISTORY OF COMPUTER VISION
or how have we built the black box
Optical Character Recognision
1870 -1974: Development of the
technology that allows the
translation from image to text
1974 – 2000: Massive
commercialization
2000 – Now: Open software
realeased (Adobe, GoogleDrive,
WebOCR)
The fathers of computer vision
Larry Roberts (1963); “Machine
Perception of three
dimensional solids”; MIT PhD.
Thesis.
David Marr (1982); “Vision. A
Computational Investigation
into the Human
Representation and Processing
of Visual Information”; MIT
Press
Convolutional Nets
1980 Fukushima. The
Neocognitron
1998 Yann LeCun et al. LeNet5
2012 AlexNet
2015 ResNet-152
6
7. Some necessary maths…. What is a neural network
7
a11
a1n…
am1 amn…
…
y1
ym
…
x1x1 xn…
x2
xn
X =
zh = wTX + b
ah = g(zh)
zo = wT ah + b
ao = g(zo)
9. Opening convolutional neural networks
9
* Figure 4 from paper Zeiler et al. (2013) Visualizing and Understanding Convolutional Networks.
10. Business stories
1. Autonomous vehicles
Self driving cars are constantly
streaming the environment to react
according the signals and other cars
or people.
2. Google Translate app
Google allows the translation of the
text embedded in images thus the
user doesn’t need to write the words
down.
3. Facial recognition
Security systems can use advanced
algorithms of computer vision to
recognize a person only showing the
face.
4. Healthcare
Diagnostic of X-rays and all the
image based sources in medicine.
5. Real-time sports tracking
See in real time the performance of
a player and the deployment of a
strategy.
6. Manufacturing
Packaging and labeling quality
assessments.
10
* Bernard Marr; 7 Amazing Examples Of Computer And Machine Vision In Practice (8 Apr 2019) Forbes
11. Another business story:
Emotion AI
1. Medical diagnosis.
Diagnosis of some mental health
diseases such as depression or
anxiety
2. Truth detector
Emotion AI could be used as a
truth detection device that can
be used e.g. in fraud detection
3. Marketing
Knowing how the customer feels
regarding a product or the mood of
the person can improve the quality
of the campaigns
11
* 13 Surprising Uses For Emotion AI Technology, Smarter with Garner (Susan Moore. September 11, 2018)
"Emotion AI is a subset of artificial intelligence (the broad term for machines replicating the way humans think)
that measures, understands, simulates, and reacts to human emotions” Meredith Somers, Emotion AI, explained
(March 8, 2019)
12. 12
An example
Check this github repository to use an API for face
expression recognision:
https://github.com/justadudewhohacks/face-api.js
This is fun!
Larry Roberts – How to represent images from 2D arrays to 3D using topology and algebra. Representation of the image rather than trying to convert to text. David Marr creates the bottom up approach, low level image processing meaning detecting corners, edges and motion
Fukushima: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position”
Yann LeCun “Gradient-Based Learning Applied to Document Recognition”
1989 LeCun CNN with backpropagation
Stride is the number of pixels shifts over the input matrix.
Pad the picture with zeros (zero-padding) so that it fits
ReLU stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) = max(0,x).
Pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling which reduces the dimensionality of each map but retains important information. Spatial pooling can be of different types:
Max Pooling
Average Pooling
Sum Pooling
Stride is the number of pixels shifts over the input matrix.
Pad the picture with zeros (zero-padding) so that it fits
ReLU stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) = max(0,x).
Pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling which reduces the dimensionality of each map but retains important information. Spatial pooling can be of different types:
Max Pooling
Average Pooling
Sum Pooling