SlideShare a Scribd company logo
1 of 23
Seeing is Perceiving
… Unless You’re a Machine
Scott Thibault, PhD
“Data Science” Applications
• Marketing
• Social media profiling
• Media coverage
• In-store behavior
• Digital asset management
• OCR
• Keyword tagging
• Face tagging
• Remote sensing
• DeepSolar
• Plant health
• Military
• Internet of things
• Surveillance
• Manufacturing
• Smart homes
What is a Digital Image? Video?
Abstractly
• 2-d array of pixels
• Pixels
• Number grayscale
• Triplet (e.g. RGB)
• N values
• Channels
Concretely
• File representations
• PNG, JPG, etc.
• MP4, MPEG, etc.
• Memory representations
• Row-major
• Column-major
• Integer, float, etc.
Image Filters
Image Filtering
0 0 0 9 37 82 83 86 173 173
0 1 9 38 89 170 171 172 211 212
0 13 37 90 171 212 210 211 197 199
2 23 85 170 210 198 195 196 133 127
11 41 89 214 197 133 128 125 117 111
30 88 173 213 195 122 111 112 103 100
81 172 214 195 134 115 102 101 100 100
80 176 212 199 123 112 100 100 100 100
0 0 4 19 45 60 88 127 129 173
0 4 19 49 104 130 171 191 192 211
6 18 513
3
4
4
Input Image
Output Image
out = f(neighborhood)
for any function f
Image Filtering
0 0 0 9 37 82 83 86 173 173
0 1 9 38 89 170 171 172 211 212
0 13 37 90 171 212 210 211 197 199
2 23 85 170 210 198 195 196 133 127
11 41 89 214 197 133 128 125 117 111
30 88 173 213 195 122 111 112 103 100
81 172 214 195 134 115 102 101 100 100
80 176 212 199 123 112 100 100 100 100
0 0 4 19 45 60 88 127 129 173
0 4 19 49 104 130 171 191 192 211
6 18 51 1043
3
4
4
Input Image
Output Image
out = f(neighborhood)
for any function f
Image Filtering
0 0 0 9 37 82 83 86 173 173
0 1 9 38 89 170 171 172 211 212
0 13 37 90 171 212 210 211 197 199
2 23 85 170 210 198 195 196 133 127
11 41 89 214 197 133 128 125 117 111
30 88 173 213 195 122 111 112 103 100
81 172 214 195 134 115 102 101 100 100
80 176 212 199 123 112 100 100 100 100
0 0 4 19 45 60 88 127 129 173
0 4 19 49 104 130 171 191 192 211
6 18 51 1043
3
5
5
Input Image
Output Image
out = f(neighborhood)
for any function f
Image Filtering
0 0 0 9 37 82 83 86 173 173
0 1 9 38 89 170 171 172 211 212
0 13 37 90 171 212 210 211 197 199
2 23 85 170 210 198 195 196 133 127
11 41 89 214 197 133 128 125 117 111
30 88 173 213 195 122 111 112 103 100
81 172 214 195 134 115 102 101 100 100
80 176 212 199 123 112 100 100 100 100
0 0 4 19 45 60 88 127 129 173
0 4 19 49 104 130 171 191 192 211
6 18 51 104 1513
3
5
5
Input Image
Output Image
out = f(neighborhood)
for any function f
Convolution Filter
0 0 0 9 37 82 83 86 173 173
0 1 9 38 89 170 171 172 211 212
0 13 37 90 171 212 210 211 197 199
2 23 85 170 210 198 195 196 133 127
11 41 89 214 197 133 128 125 117 111
30 88 173 213 195 122 111 112 103 100
81 172 214 195 134 115 102 101 100 100
80 176 212 199 123 112 100 100 100 100
38 89 170
90 171 212
170 210 198
1 2 1
2 4 2
1 2 1
Kernel
38 178 170
180 684 424
170 420 198
∑ 2462
Feature Extraction
Convolutional Neural Networks
Input
Expected Output
Deep Learning
Model training
• Learn weights (e.g. convolution
coefficients)
• Thousands/millions of examples
• Backpropagation
• Hours to days
Inference
• Model file + weight file
• Few MB to hundreds of MB
• Milliseconds to seconds
Image Classification
Class Probability
0-2 0.000011
4-6 0.000008
8-13 0.000187
15-20 0.001444
25-32 0.313656
38-43 0.683580
48-53 0.001012
60+ 0.000101
Age Classification
Image Classification
• ImageNet – 1000 classes
• COCO – 80 classes
• Pascal VOC – 20 classes (person,
animals, vehicles, indoor objects)
• Places365
• Demographics (gender/age/race)
• Emotions (facial expression)
• Activities, e.g. sports
• Plants
• Medical diagnosis
• Marketing profiles
• Security camera event capture
• Quality control
• Policy verification
• Smart home
Object Detection
• Text, Signs
• Cars, Bikes, Trains, …
• Pedestrians, Faces, Hands
• License plates
• Logos
YOLOv3 / COCO
Video Object Tracking
• Smart cameras
• Face tracking (e.g. auto)
• Surveillance event detection
• Pedestrian/vehicle counting
• Drones
People Counting Demo – Take I
Locality Sensitive Hashing
• High-dimension data to low-
dimension vector
• Euclidean distance ~ similarity
• OpenFace
• Deep Neural Network
• Face  128d vector
• Compare to a signature
Clustering
• Machine learning algorithm
• Create groups of similar objects
• Need similarity function
Clustering
• OpenFace + Clustering =
Deduplication
People Counting Demo – Take II
Programming: OpenCV
Many neural network frameworks
• TensorFlow
• Caffe
• CNTK
• Torch
• MXNet
OpenCV support
• Caffe
• TensorFlow
• Torch
• Darknet
Programming: Python + OpenCV
import numpy as np
import cv2
# load pre-trained model
print "loading model..."
model = "res10_300x300_ssd.prototxt"
weights = "res10_300x300_ssd.caffemodel"
net = cv2.dnn.readNetFromCaffe(model, weights)
# load image and construct an input array
# by resizing to a fixed 300x300 pixels
# and then normalizing
image = cv2.imread("Katherine.jpg")
(h, w) = image.shape[:2]
resized = cv2.resize(image, (300, 300))
blob = cv2.dnn.blobFromImage(resized, 1.0,
(300, 300), (104.0, 177.0, 123.0))
print "running model..."
net.setInput(blob)
detections = net.forward()
for i in range(0, detections.shape[2]):
# confidence score of for this box
confidence = detections[0, 0, i, 2]
if confidence > 0.5:
# get coordinates of bounding box
box = detections[0, 0, i, 3:7] *
np.array([w, h, w, h])
print confidence, box

More Related Content

Similar to Seeing is Perceiving ... Unless You're a Machine

Panoramic Video in Environmental Monitoring Software Development and Applica...
Panoramic Video in Environmental Monitoring Software Development and Applica...Panoramic Video in Environmental Monitoring Software Development and Applica...
Panoramic Video in Environmental Monitoring Software Development and Applica...pycontw
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startupsLouis Dorard
 
introduction to Digital Image Processing
introduction to Digital Image Processingintroduction to Digital Image Processing
introduction to Digital Image Processingnikesh gadare
 
Scalable AI Solution cross AI platforms
Scalable AI Solution cross AI platformsScalable AI Solution cross AI platforms
Scalable AI Solution cross AI platformsKTN
 
Server Based Training
Server Based TrainingServer Based Training
Server Based TrainingEdgevalue
 
Measuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometerMeasuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometerChangsu Jung
 
機器學習應用於蔬果辨識
機器學習應用於蔬果辨識機器學習應用於蔬果辨識
機器學習應用於蔬果辨識Kobe Yu
 
Additive manufacturing (3 d printing) -Vishal Bhaya
Additive manufacturing (3 d printing) -Vishal BhayaAdditive manufacturing (3 d printing) -Vishal Bhaya
Additive manufacturing (3 d printing) -Vishal Bhayabhayavp
 
LST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchLST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchDimitry Snezhkov
 
Week2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxWeek2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxfahmi324663
 
Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Alexey Grigorev
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
 
Machine Learning Model Bakeoff
Machine Learning Model BakeoffMachine Learning Model Bakeoff
Machine Learning Model Bakeoffmrphilroth
 
Game Engine for Serious Games
Game Engine for Serious GamesGame Engine for Serious Games
Game Engine for Serious GamesKashif Shamaun
 
Cyber warfare: an unorthodox view from the battlefield
Cyber warfare: an unorthodox view from the battlefieldCyber warfare: an unorthodox view from the battlefield
Cyber warfare: an unorthodox view from the battlefieldRoberto Rigolin F. Lopes
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
 

Similar to Seeing is Perceiving ... Unless You're a Machine (20)

Panoramic Video in Environmental Monitoring Software Development and Applica...
Panoramic Video in Environmental Monitoring Software Development and Applica...Panoramic Video in Environmental Monitoring Software Development and Applica...
Panoramic Video in Environmental Monitoring Software Development and Applica...
 
Predictive apps for startups
Predictive apps for startupsPredictive apps for startups
Predictive apps for startups
 
introduction to Digital Image Processing
introduction to Digital Image Processingintroduction to Digital Image Processing
introduction to Digital Image Processing
 
Scalable AI Solution cross AI platforms
Scalable AI Solution cross AI platformsScalable AI Solution cross AI platforms
Scalable AI Solution cross AI platforms
 
unit-1-intro
 unit-1-intro unit-1-intro
unit-1-intro
 
Server Based Training
Server Based TrainingServer Based Training
Server Based Training
 
The City as a Platform : Week 5
The City as a Platform : Week 5The City as a Platform : Week 5
The City as a Platform : Week 5
 
Pixel Perfect
Pixel PerfectPixel Perfect
Pixel Perfect
 
Measuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometerMeasuring movements of golfers with an accelerometer
Measuring movements of golfers with an accelerometer
 
機器學習應用於蔬果辨識
機器學習應用於蔬果辨識機器學習應用於蔬果辨識
機器學習應用於蔬果辨識
 
Additive manufacturing (3 d printing) -Vishal Bhaya
Additive manufacturing (3 d printing) -Vishal BhayaAdditive manufacturing (3 d printing) -Vishal Bhaya
Additive manufacturing (3 d printing) -Vishal Bhaya
 
LST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, TouchLST Toolkit: Exfiltration Over Sound, Light, Touch
LST Toolkit: Exfiltration Over Sound, Light, Touch
 
Week2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptxWeek2- Deep Learning Intuition.pptx
Week2- Deep Learning Intuition.pptx
 
Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)Duplicates everywhere (Berlin)
Duplicates everywhere (Berlin)
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
 
Machine Learning Model Bakeoff
Machine Learning Model BakeoffMachine Learning Model Bakeoff
Machine Learning Model Bakeoff
 
Game Engine for Serious Games
Game Engine for Serious GamesGame Engine for Serious Games
Game Engine for Serious Games
 
Cyber warfare: an unorthodox view from the battlefield
Cyber warfare: an unorthodox view from the battlefieldCyber warfare: an unorthodox view from the battlefield
Cyber warfare: an unorthodox view from the battlefield
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
 
DIP-Unit1-Session1.pdf
DIP-Unit1-Session1.pdfDIP-Unit1-Session1.pdf
DIP-Unit1-Session1.pdf
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Seeing is Perceiving ... Unless You're a Machine

  • 1. Seeing is Perceiving … Unless You’re a Machine Scott Thibault, PhD
  • 2. “Data Science” Applications • Marketing • Social media profiling • Media coverage • In-store behavior • Digital asset management • OCR • Keyword tagging • Face tagging • Remote sensing • DeepSolar • Plant health • Military • Internet of things • Surveillance • Manufacturing • Smart homes
  • 3. What is a Digital Image? Video? Abstractly • 2-d array of pixels • Pixels • Number grayscale • Triplet (e.g. RGB) • N values • Channels Concretely • File representations • PNG, JPG, etc. • MP4, MPEG, etc. • Memory representations • Row-major • Column-major • Integer, float, etc.
  • 5. Image Filtering 0 0 0 9 37 82 83 86 173 173 0 1 9 38 89 170 171 172 211 212 0 13 37 90 171 212 210 211 197 199 2 23 85 170 210 198 195 196 133 127 11 41 89 214 197 133 128 125 117 111 30 88 173 213 195 122 111 112 103 100 81 172 214 195 134 115 102 101 100 100 80 176 212 199 123 112 100 100 100 100 0 0 4 19 45 60 88 127 129 173 0 4 19 49 104 130 171 191 192 211 6 18 513 3 4 4 Input Image Output Image out = f(neighborhood) for any function f
  • 6. Image Filtering 0 0 0 9 37 82 83 86 173 173 0 1 9 38 89 170 171 172 211 212 0 13 37 90 171 212 210 211 197 199 2 23 85 170 210 198 195 196 133 127 11 41 89 214 197 133 128 125 117 111 30 88 173 213 195 122 111 112 103 100 81 172 214 195 134 115 102 101 100 100 80 176 212 199 123 112 100 100 100 100 0 0 4 19 45 60 88 127 129 173 0 4 19 49 104 130 171 191 192 211 6 18 51 1043 3 4 4 Input Image Output Image out = f(neighborhood) for any function f
  • 7. Image Filtering 0 0 0 9 37 82 83 86 173 173 0 1 9 38 89 170 171 172 211 212 0 13 37 90 171 212 210 211 197 199 2 23 85 170 210 198 195 196 133 127 11 41 89 214 197 133 128 125 117 111 30 88 173 213 195 122 111 112 103 100 81 172 214 195 134 115 102 101 100 100 80 176 212 199 123 112 100 100 100 100 0 0 4 19 45 60 88 127 129 173 0 4 19 49 104 130 171 191 192 211 6 18 51 1043 3 5 5 Input Image Output Image out = f(neighborhood) for any function f
  • 8. Image Filtering 0 0 0 9 37 82 83 86 173 173 0 1 9 38 89 170 171 172 211 212 0 13 37 90 171 212 210 211 197 199 2 23 85 170 210 198 195 196 133 127 11 41 89 214 197 133 128 125 117 111 30 88 173 213 195 122 111 112 103 100 81 172 214 195 134 115 102 101 100 100 80 176 212 199 123 112 100 100 100 100 0 0 4 19 45 60 88 127 129 173 0 4 19 49 104 130 171 191 192 211 6 18 51 104 1513 3 5 5 Input Image Output Image out = f(neighborhood) for any function f
  • 9. Convolution Filter 0 0 0 9 37 82 83 86 173 173 0 1 9 38 89 170 171 172 211 212 0 13 37 90 171 212 210 211 197 199 2 23 85 170 210 198 195 196 133 127 11 41 89 214 197 133 128 125 117 111 30 88 173 213 195 122 111 112 103 100 81 172 214 195 134 115 102 101 100 100 80 176 212 199 123 112 100 100 100 100 38 89 170 90 171 212 170 210 198 1 2 1 2 4 2 1 2 1 Kernel 38 178 170 180 684 424 170 420 198 ∑ 2462
  • 12. Deep Learning Model training • Learn weights (e.g. convolution coefficients) • Thousands/millions of examples • Backpropagation • Hours to days Inference • Model file + weight file • Few MB to hundreds of MB • Milliseconds to seconds
  • 13. Image Classification Class Probability 0-2 0.000011 4-6 0.000008 8-13 0.000187 15-20 0.001444 25-32 0.313656 38-43 0.683580 48-53 0.001012 60+ 0.000101 Age Classification
  • 14. Image Classification • ImageNet – 1000 classes • COCO – 80 classes • Pascal VOC – 20 classes (person, animals, vehicles, indoor objects) • Places365 • Demographics (gender/age/race) • Emotions (facial expression) • Activities, e.g. sports • Plants • Medical diagnosis • Marketing profiles • Security camera event capture • Quality control • Policy verification • Smart home
  • 15. Object Detection • Text, Signs • Cars, Bikes, Trains, … • Pedestrians, Faces, Hands • License plates • Logos YOLOv3 / COCO
  • 16. Video Object Tracking • Smart cameras • Face tracking (e.g. auto) • Surveillance event detection • Pedestrian/vehicle counting • Drones
  • 17. People Counting Demo – Take I
  • 18. Locality Sensitive Hashing • High-dimension data to low- dimension vector • Euclidean distance ~ similarity • OpenFace • Deep Neural Network • Face  128d vector • Compare to a signature
  • 19. Clustering • Machine learning algorithm • Create groups of similar objects • Need similarity function
  • 20. Clustering • OpenFace + Clustering = Deduplication
  • 21. People Counting Demo – Take II
  • 22. Programming: OpenCV Many neural network frameworks • TensorFlow • Caffe • CNTK • Torch • MXNet OpenCV support • Caffe • TensorFlow • Torch • Darknet
  • 23. Programming: Python + OpenCV import numpy as np import cv2 # load pre-trained model print "loading model..." model = "res10_300x300_ssd.prototxt" weights = "res10_300x300_ssd.caffemodel" net = cv2.dnn.readNetFromCaffe(model, weights) # load image and construct an input array # by resizing to a fixed 300x300 pixels # and then normalizing image = cv2.imread("Katherine.jpg") (h, w) = image.shape[:2] resized = cv2.resize(image, (300, 300)) blob = cv2.dnn.blobFromImage(resized, 1.0, (300, 300), (104.0, 177.0, 123.0)) print "running model..." net.setInput(blob) detections = net.forward() for i in range(0, detections.shape[2]): # confidence score of for this box confidence = detections[0, 0, i, 2] if confidence > 0.5: # get coordinates of bounding box box = detections[0, 0, i, 3:7] * np.array([w, h, w, h]) print confidence, box

Editor's Notes

  1. Brain ~ 13ms Picture worth 1000 words <-> efficient human communication 1MB vs 1KB <-> inefficient digital communication
  2. Machine learning Deep learning