SlideShare a Scribd company logo
Training Image
Models with
Video Learning for Analysis from Deep Embeddings
Timothy Emerick, PhD Sue He Alexander Polis Monica Rajendiran
truck
truck
bicyclist
A green truck is crossing an intersection.
A group of people are crossing the street.
★ Machine vision models often require large amounts of labeled data to
train well
★ Existing labelled datasets can be too generic and have a broad concept
space for our purposes
★ Machine vision models often require large amounts of labeled data to
train well
★ Existing labelled datasets can be too generic and have a broad concept
space for our purposes
ImageNet
14 million+ images of 21K+ class entities
YouTube-8M
450K+ hours of 4700+ class entities
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg
and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition
Challenge. IJCV, 2015.
Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification
benchmark." arXiv preprint arXiv:1609.08675 (2016).
ImageNet
14 million+ images of 21K+ class entities
YouTube-8M
450K+ hours of 4700+ class entities
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg
and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition
Challenge. IJCV, 2015.
Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification
benchmark." arXiv preprint arXiv:1609.08675 (2016).
★ Graphics have become
extremely realistic over the
years
★ Games are codeable, enabling
complex simulations
★ Simulating in-game helps you
ignore low level tasks like
movement animations and
routing
★ Graphics have become
extremely realistic over the
years
★ Games are codeable, enabling
complex simulations
★ Simulating in-game helps you
ignore low level tasks like
movement animations and
routing
★ Graphics have become
extremely realistic over the
years
★ Games are codeable, enabling
complex simulations
★ Simulating in-game helps you
ignore low level tasks like
movement animations and
routing
★ Rockstar Advanced Game
Engine’s (RAGE) super realistic
graphics
★ Huge modding community
provides lots of customization
★ Programmatically configurable
options
★ Rockstar Advanced Game
Engine’s (RAGE) super realistic
graphics
★ Huge modding community
provides lots of customization
★ Programmatically configurable
options
★ Rockstar Advanced Game
Engine’s (RAGE) super realistic
graphics
★ Huge modding community
provides lots of customization
★ Programmatically configurable
options
★ Programmatically configurable
options
○ Script-Hook-V is a library which
allows you to write scripts in-game
○ Thousands of function calls
★ Programmatically configurable
options
○ We can generate entities of choice
in-game and have them perform
complex actions
○ Vehicles: driving, turning, waiting at
stoplights
○ People: entering/exiting vehicles,
waiting to cross the street, parking
○ Environment: weather, time of day,
camera elevation, zoom
★ Grand Theft Auto Dataset:
○ Video footage
○ Objects of interest per frame
(vehicles and pedestrians)
○ Object location information
(bounding box information)
○ Text Descriptions
(e.g. a white truck is turning left)
CNNS
★ Extracts features from the input image,
distilled down to class predictions
★ Preserves spatial relationship between
pixels
Bird
Airplane
Superman
Car
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
7 8 5
12 12 15
16 16 7
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
7
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
16
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
16 16
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
16 16 7
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
3 feature maps
produced from 3
filters
Bird
Airplane
Superman
Car
-1 -1 -1
-1 8 -1
-1 -1 -1
CNNS
★ Extracts features from the input image,
distilled down to class predictions
★ Preserves spatial relationship between
pixels
Bird
Airplane
Superman
Car
★ YOLO9000 (YOLO v2) is a real time object
detection convolutional neural network
architecture
★ Redmon, Joseph and Farhadi, Ali. "YOLO9000:
better, faster, stronger." arXiv (2017).
★ YOLO9000 (YOLO v2) is a real time object
detection convolutional neural network
architecture
★ Redmon, Joseph and Farhadi, Ali. "YOLO9000:
better, faster, stronger." arXiv (2017).
Game Engine
Action
Generation
Camera
Control
Environment
Control
Annotations
Text
Extraction
Pedestrians/Vehicles
Camera
Environment
Game Engine
Action
Generation
Camera
Control
Environment
Control
Annotations
Text
Extraction
Pedestrians/Vehicles
Camera
Environment
RNNs
★ Works well with sequential input (e.g. words in
a sentence or a vector of numbers representing
an image)
★ For a given input, incorporates a “feedback”
loop of the information it received and the
decision it made from the previous input in the
sequence
Neural
Network
Output
Input
“e”
“h”
Vocabulary of 4 letters:
h e l o
Letters could be encoded as:
h [1 0 0 0]
e [0 1 0 0]
l [0 0 1 0]
o [0 0 0 1]
h
e
e l
l l
l
o
“l”
“e”
h
e
e l
l l
l
o
“l”
“l”
h
e
e l
l l
l
o
“o”
“l”
h
e
e l l
l l o
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM A
car
white
driving
LSTM
★ A variation of RNNs (Long Short Term Memory)
★ LSTMs use additional units of “memory” for longer
connections across sequence inputs
Attention
★ Train model to focus on salient objects in
the image
★ Instead of feeding features from the
entire image to an RNN, just feed the
salient region’s features
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM A
car
white
driving
“A man in a white shirt is walking”
“A white service vehicle is parked”
Search: “red truck”
Search by Text in Video
★ Extracting captions from video and store
them in an index
★ Fast video search by text query over large
amounts of video
Search by Example in Video
★ A user-defined bounding box on a video
frame
★ Query for similar objects of interest in the
entirety of a video dataset, at the frame
level
Search by Example in Video
★ A user-defined bounding box on a video
frame
★ Query for similar objects of interest in the
entirety of a video dataset, at the frame
level
★ GTA V allows us to create fully annotated, custom tailored,
photorealistic datasets
★ We can use this dataset to train models that are good at object
detection/localization, captioning, and search by example or text for
overhead video
★ The use of models trained on GTA data also has applicability in areas
such as real-time security camera alerting and self driving cars
★ GTA V allows us to create fully annotated, custom tailored,
photorealistic datasets
★ We can use this dataset to train models that are good at object
detection/localization, captioning, and search by example or text for
overhead video
★ The use of models trained on GTA data also has applicability in areas
such as real-time security camera alerting and self driving cars
★ GTA V allows us to create fully annotated, custom tailored,
photorealistic datasets
★ We can use this dataset to train models that are good at object
detection/localization, captioning, and search by example or text for
overhead video
★ The use of models trained on GTA data also has applicability in areas
such as real-time security camera alerting and self driving cars
www.ccri.com
mrajendiran@ccri.com

More Related Content

What's hot

Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
Barry Jones
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
Ryo Iwaki
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
Ryo Iwaki
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Grace Yang
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
Hansol Kang
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
Gasperi Jerome
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in rRichard Wamalwa
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
CCRinc
 

What's hot (8)

Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in r
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 

Similar to Training Drone Image Models with Grand Theft Auto

Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
AugmentedWorldExpo
 
Drone ppt
Drone pptDrone ppt
Drone ppt
Changik Choi
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
LogeekNightUkraine
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
Fwdays
 
UE4 Landscape
UE4 LandscapeUE4 Landscape
UE4 Landscape
Che (ZUL) Abdullah
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AI
Amazon Web Services
 
AI Powered Drones
AI Powered DronesAI Powered Drones
AI Powered Drones
Achal Negi
 
Machine learning for newbies
Machine learning for newbiesMachine learning for newbies
Machine learning for newbies
Andrew Nikishaev
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
MohamedAliHabib3
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
Ken Chatfield
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
LEE HOSEONG
 
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thessaloniki
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
Kiho Suh
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
Junho Kim
 
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON
 
Keep Calm and Stegosploit - 44CON 2015
Keep Calm and Stegosploit - 44CON 2015Keep Calm and Stegosploit - 44CON 2015
Keep Calm and Stegosploit - 44CON 2015
Saumil Shah
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software
 
Novi sad ai event 3-2018
Novi sad ai event 3-2018Novi sad ai event 3-2018
Novi sad ai event 3-2018
Jovan Stojanovic
 

Similar to Training Drone Image Models with Grand Theft Auto (20)

Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
 
Drone ppt
Drone pptDrone ppt
Drone ppt
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
 
UE4 Landscape
UE4 LandscapeUE4 Landscape
UE4 Landscape
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AI
 
AI Powered Drones
AI Powered DronesAI Powered Drones
AI Powered Drones
 
Machine learning for newbies
Machine learning for newbiesMachine learning for newbies
Machine learning for newbies
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
 
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
 
Keep Calm and Stegosploit - 44CON 2015
Keep Calm and Stegosploit - 44CON 2015Keep Calm and Stegosploit - 44CON 2015
Keep Calm and Stegosploit - 44CON 2015
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
Novi sad ai event 3-2018
Novi sad ai event 3-2018Novi sad ai event 3-2018
Novi sad ai event 3-2018
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 

Training Drone Image Models with Grand Theft Auto

  • 2. Video Learning for Analysis from Deep Embeddings Timothy Emerick, PhD Sue He Alexander Polis Monica Rajendiran
  • 3.
  • 6. A green truck is crossing an intersection.
  • 7. A group of people are crossing the street.
  • 8. ★ Machine vision models often require large amounts of labeled data to train well ★ Existing labelled datasets can be too generic and have a broad concept space for our purposes
  • 9. ★ Machine vision models often require large amounts of labeled data to train well ★ Existing labelled datasets can be too generic and have a broad concept space for our purposes
  • 10. ImageNet 14 million+ images of 21K+ class entities YouTube-8M 450K+ hours of 4700+ class entities Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
  • 11. ImageNet 14 million+ images of 21K+ class entities YouTube-8M 450K+ hours of 4700+ class entities Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
  • 12. ★ Graphics have become extremely realistic over the years ★ Games are codeable, enabling complex simulations ★ Simulating in-game helps you ignore low level tasks like movement animations and routing
  • 13. ★ Graphics have become extremely realistic over the years ★ Games are codeable, enabling complex simulations ★ Simulating in-game helps you ignore low level tasks like movement animations and routing
  • 14. ★ Graphics have become extremely realistic over the years ★ Games are codeable, enabling complex simulations ★ Simulating in-game helps you ignore low level tasks like movement animations and routing
  • 15. ★ Rockstar Advanced Game Engine’s (RAGE) super realistic graphics ★ Huge modding community provides lots of customization ★ Programmatically configurable options
  • 16. ★ Rockstar Advanced Game Engine’s (RAGE) super realistic graphics ★ Huge modding community provides lots of customization ★ Programmatically configurable options
  • 17. ★ Rockstar Advanced Game Engine’s (RAGE) super realistic graphics ★ Huge modding community provides lots of customization ★ Programmatically configurable options
  • 18. ★ Programmatically configurable options ○ Script-Hook-V is a library which allows you to write scripts in-game ○ Thousands of function calls
  • 19. ★ Programmatically configurable options ○ We can generate entities of choice in-game and have them perform complex actions ○ Vehicles: driving, turning, waiting at stoplights ○ People: entering/exiting vehicles, waiting to cross the street, parking ○ Environment: weather, time of day, camera elevation, zoom
  • 20. ★ Grand Theft Auto Dataset: ○ Video footage ○ Objects of interest per frame (vehicles and pedestrians) ○ Object location information (bounding box information) ○ Text Descriptions (e.g. a white truck is turning left)
  • 21. CNNS ★ Extracts features from the input image, distilled down to class predictions ★ Preserves spatial relationship between pixels Bird Airplane Superman Car
  • 22. 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5
  • 23. 7 8 5 12 12 15 16 16 7 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map
  • 24. 7 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 25. 7 8 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 26. 7 8 5 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 27. 7 8 5 12 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 28. 7 8 5 12 12 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 29. 7 8 5 12 12 15 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 30. 7 8 5 12 12 15 16 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 31. 7 8 5 12 12 15 16 16 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 32. 7 8 5 12 12 15 16 16 7 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 33. 3 feature maps produced from 3 filters Bird Airplane Superman Car
  • 34. -1 -1 -1 -1 8 -1 -1 -1 -1
  • 35. CNNS ★ Extracts features from the input image, distilled down to class predictions ★ Preserves spatial relationship between pixels Bird Airplane Superman Car
  • 36. ★ YOLO9000 (YOLO v2) is a real time object detection convolutional neural network architecture ★ Redmon, Joseph and Farhadi, Ali. "YOLO9000: better, faster, stronger." arXiv (2017).
  • 37. ★ YOLO9000 (YOLO v2) is a real time object detection convolutional neural network architecture ★ Redmon, Joseph and Farhadi, Ali. "YOLO9000: better, faster, stronger." arXiv (2017).
  • 38.
  • 41.
  • 42. RNNs ★ Works well with sequential input (e.g. words in a sentence or a vector of numbers representing an image) ★ For a given input, incorporates a “feedback” loop of the information it received and the decision it made from the previous input in the sequence Neural Network Output Input
  • 43. “e” “h” Vocabulary of 4 letters: h e l o Letters could be encoded as: h [1 0 0 0] e [0 1 0 0] l [0 0 1 0] o [0 0 0 1] h e e l l l l o
  • 47. LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A car white driving LSTM ★ A variation of RNNs (Long Short Term Memory) ★ LSTMs use additional units of “memory” for longer connections across sequence inputs
  • 48.
  • 49. Attention ★ Train model to focus on salient objects in the image ★ Instead of feeding features from the entire image to an RNN, just feed the salient region’s features
  • 51.
  • 52. “A man in a white shirt is walking”
  • 53. “A white service vehicle is parked”
  • 54. Search: “red truck” Search by Text in Video ★ Extracting captions from video and store them in an index ★ Fast video search by text query over large amounts of video
  • 55. Search by Example in Video ★ A user-defined bounding box on a video frame ★ Query for similar objects of interest in the entirety of a video dataset, at the frame level
  • 56. Search by Example in Video ★ A user-defined bounding box on a video frame ★ Query for similar objects of interest in the entirety of a video dataset, at the frame level
  • 57. ★ GTA V allows us to create fully annotated, custom tailored, photorealistic datasets ★ We can use this dataset to train models that are good at object detection/localization, captioning, and search by example or text for overhead video ★ The use of models trained on GTA data also has applicability in areas such as real-time security camera alerting and self driving cars
  • 58. ★ GTA V allows us to create fully annotated, custom tailored, photorealistic datasets ★ We can use this dataset to train models that are good at object detection/localization, captioning, and search by example or text for overhead video ★ The use of models trained on GTA data also has applicability in areas such as real-time security camera alerting and self driving cars
  • 59. ★ GTA V allows us to create fully annotated, custom tailored, photorealistic datasets ★ We can use this dataset to train models that are good at object detection/localization, captioning, and search by example or text for overhead video ★ The use of models trained on GTA data also has applicability in areas such as real-time security camera alerting and self driving cars

Editor's Notes

  1. Vehicles - Color, Type, Damage People - Clothing Color, Gender, Number Buildings - Type
  2. Video captioning
  3. Video captioning
  4. Video captioning