SlideShare a Scribd company logo
1 of 25
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Soji Adeshina, Machine Learning Engineer, Amazon AI
Computer Vision 101 - Gluon
CV
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computer Vision Architectures for Image
Classification : A brief Timeline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolution
• Ideal for picking up on spatial
patterns in data
• Applied over and over again
(layer after layer), you can create
more abstracted spatial features
• Inspired by experiments on visual
cortex of a cat.
• Can be run in parallel for really
fast computations
http://colah.github.io/posts/2014-07-
Understanding-Convolutions/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LeNet 1995
• Challenge: Multiple convolutions blow up dimensionality
• Solution: Pooling
• AvgPooling/Subsampling - average over patches (works OK)
• MaxPooling - pick the maximum over patches (much better)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AlexNet (Krizhevsky et al., 2012)
• More convolutional layers
• More channels
• More filters
• More data
More computation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
VGG (2014)
+
vs.
• Want to reach receptive field of size k
• Use one large filter (linear mix of many, then nonlinearity)
• Use several small filters (many linear mixes of few) - has fewer parameters
• Simonyan & Zisserman, 2014 find that deep and narrow wins
Deep and Narrow or Wide and Shallow?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fancy structures - Networks of networks
• Compute different filters
• Compose one big vector from all of them
• Layer them iteratively
Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf
Inception (2014)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch Norm (Ioffe et al., 2015) loss
data
• Loss occurs at last layer
• Last layers learn quickly
• Data is inserted at bottom layer
• Bottom layers change - everything changes
• Last layers need to relearn many times
• Slow convergence
• This is like covariate shift
Can we avoid changing last layers while
learning first layers?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch Norm (Ioffe et al., 2015)
• Can we avoid changing last layers while
learning first layers?
• Fix mean and variance
and adjust it separately mean
variance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ResNet (He et al., 2015)
• In regular layer simple
function is given by f(x) = 0
• Key idea - ‘Taylor expansion’
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DenseNet (Huang et al., 2016)
• Simple Function
• In ResNet ‘Taylor expansion’ ends after one term
• In DenseNet use multiple steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon CV: Deep Learning Toolkit for Computer
Vision
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
“reproducing the best claimed results”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-world Stories
Back to a period in 2016, the same ImageNet models trained by MXNet
achieves on average 1% worse accuracy compared to Torch.
Tried almost everything to debug, even developed a plugin to run Torch
code inside MXNet so that it is easier to compare the results.
Transcoding training images using 95 JPEG quality rather than 85
solved the problem.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-world Stories
Using another open source DL framework, a similar problem
happened: trained model accuracies cannot match previous internal
version.
Spent months to figure out why, with no clue.
The order of data augmentation is different from previous version.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Starting from scratch can be hard
• Even the most talented researchers will get blocked by trivial things.
• Experiences and instincts might be your enemies in certain
circumstances.
• Training is time-consuming, initialization and augmentation is
randomized, and tons of implementation details need to be taken
care of. Debugging deep models is extremely difficult.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Qualities of open-source implementations vary.
• Languages, code styles, project structures, DL frameworks are
mixed.
• Personal projects tend to focusing on a specific task with specific
datasets. It requires significant engineering efforts to adapt to your
use case.
• Community projects can be abandoned frequently.
Embracing open source solutions can be difficult
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does GluonCV provide
Reproduction of important papers in recent years
Training scripts (as well as tuned hyper-
parameters) to reproduce the results
Considerate APIs and modules that are easy to
follow and understand, so that experiments based
on existing algorithms are less frustrating
Community support, feel free to ask and discuss
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
Image Classification
• More than 20+ pre-trained ImageNet models(ResNet,
MobileNet…)
• We achieved the best accuracy using some of the most popular
models(e.g., ResNet), compared with other frameworks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
• Object Detection
• SSD and YOLOv3: fastest
solution
• Faster-RCNN, RFCN and
FPN: slower but more
accurate, especially for tiny
objects
• Mask-RCNN: simultaneous
object detection and
semantic segmentation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
Semantic Segmentation
• FCN
• PSPNet
• Mask-RCNN
• DeepLab
Instance Segmentation
• Mask-RCNN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
• Style Transfer
• MSGNet
• Generative Adversarial
Networks (GAN)
• CycleGAN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Like GluonCV?
https://gluon-cv.mxnet.io
https://github.com/dmlc/gluon-cv
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More Related Content

Similar to GluonCV

MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonAmazon Web Services
 
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018Hagay Lupesko
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenAWS Germany
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services
 
Machine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateMachine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateAmazon Web Services
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...Amazon Web Services
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Amazon Web Services
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Amazon Web Services
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerAmazon Web Services
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
 
Emotion Recognition in Images
Emotion Recognition in ImagesEmotion Recognition in Images
Emotion Recognition in ImagesApache MXNet
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...Edge AI and Vision Alliance
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Codiax
 

Similar to GluonCV (20)

MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
 
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
Machine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateMachine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS Fargate
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model Server
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Emotion Recognition in Images
Emotion Recognition in ImagesEmotion Recognition in Images
Emotion Recognition in Images
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
Deep Learning Workshop
Deep Learning WorkshopDeep Learning Workshop
Deep Learning Workshop
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

GluonCV

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Soji Adeshina, Machine Learning Engineer, Amazon AI Computer Vision 101 - Gluon CV
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computer Vision Architectures for Image Classification : A brief Timeline
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolution • Ideal for picking up on spatial patterns in data • Applied over and over again (layer after layer), you can create more abstracted spatial features • Inspired by experiments on visual cortex of a cat. • Can be run in parallel for really fast computations http://colah.github.io/posts/2014-07- Understanding-Convolutions/
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LeNet 1995 • Challenge: Multiple convolutions blow up dimensionality • Solution: Pooling • AvgPooling/Subsampling - average over patches (works OK) • MaxPooling - pick the maximum over patches (much better)
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AlexNet (Krizhevsky et al., 2012) • More convolutional layers • More channels • More filters • More data More computation
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. VGG (2014) + vs. • Want to reach receptive field of size k • Use one large filter (linear mix of many, then nonlinearity) • Use several small filters (many linear mixes of few) - has fewer parameters • Simonyan & Zisserman, 2014 find that deep and narrow wins Deep and Narrow or Wide and Shallow?
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fancy structures - Networks of networks • Compute different filters • Compose one big vector from all of them • Layer them iteratively Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf Inception (2014)
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch Norm (Ioffe et al., 2015) loss data • Loss occurs at last layer • Last layers learn quickly • Data is inserted at bottom layer • Bottom layers change - everything changes • Last layers need to relearn many times • Slow convergence • This is like covariate shift Can we avoid changing last layers while learning first layers?
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch Norm (Ioffe et al., 2015) • Can we avoid changing last layers while learning first layers? • Fix mean and variance and adjust it separately mean variance
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ResNet (He et al., 2015) • In regular layer simple function is given by f(x) = 0 • Key idea - ‘Taylor expansion’
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DenseNet (Huang et al., 2016) • Simple Function • In ResNet ‘Taylor expansion’ ends after one term • In DenseNet use multiple steps
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon CV: Deep Learning Toolkit for Computer Vision
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why GluonCV? What is the biggest challenge you have ever encountered with deep learning?
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why GluonCV? What is the biggest challenge you have ever encountered with deep learning? “reproducing the best claimed results”
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-world Stories Back to a period in 2016, the same ImageNet models trained by MXNet achieves on average 1% worse accuracy compared to Torch. Tried almost everything to debug, even developed a plugin to run Torch code inside MXNet so that it is easier to compare the results. Transcoding training images using 95 JPEG quality rather than 85 solved the problem.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-world Stories Using another open source DL framework, a similar problem happened: trained model accuracies cannot match previous internal version. Spent months to figure out why, with no clue. The order of data augmentation is different from previous version.
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Starting from scratch can be hard • Even the most talented researchers will get blocked by trivial things. • Experiences and instincts might be your enemies in certain circumstances. • Training is time-consuming, initialization and augmentation is randomized, and tons of implementation details need to be taken care of. Debugging deep models is extremely difficult.
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Qualities of open-source implementations vary. • Languages, code styles, project structures, DL frameworks are mixed. • Personal projects tend to focusing on a specific task with specific datasets. It requires significant engineering efforts to adapt to your use case. • Community projects can be abandoned frequently. Embracing open source solutions can be difficult
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does GluonCV provide Reproduction of important papers in recent years Training scripts (as well as tuned hyper- parameters) to reproduce the results Considerate APIs and modules that are easy to follow and understand, so that experiments based on existing algorithms are less frustrating Community support, feel free to ask and discuss
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV Image Classification • More than 20+ pre-trained ImageNet models(ResNet, MobileNet…) • We achieved the best accuracy using some of the most popular models(e.g., ResNet), compared with other frameworks
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV • Object Detection • SSD and YOLOv3: fastest solution • Faster-RCNN, RFCN and FPN: slower but more accurate, especially for tiny objects • Mask-RCNN: simultaneous object detection and semantic segmentation
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV Semantic Segmentation • FCN • PSPNet • Mask-RCNN • DeepLab Instance Segmentation • Mask-RCNN
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV • Style Transfer • MSGNet • Generative Adversarial Networks (GAN) • CycleGAN
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Like GluonCV? https://gluon-cv.mxnet.io https://github.com/dmlc/gluon-cv
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Editor's Notes

  1. Won a nobel prize for this in 19