SlideShare a Scribd company logo
1 of 25
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Soji Adeshina, Machine Learning Engineer, Amazon AI
Computer Vision 101 - Gluon
CV
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Computer Vision Architectures for Image
Classification : A brief Timeline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convolution
• Ideal for picking up on spatial
patterns in data
• Applied over and over again
(layer after layer), you can create
more abstracted spatial features
• Inspired by experiments on visual
cortex of a cat.
• Can be run in parallel for really
fast computations
http://colah.github.io/posts/2014-07-
Understanding-Convolutions/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
LeNet 1995
• Challenge: Multiple convolutions blow up dimensionality
• Solution: Pooling
• AvgPooling/Subsampling - average over patches (works OK)
• MaxPooling - pick the maximum over patches (much better)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AlexNet (Krizhevsky et al., 2012)
• More convolutional layers
• More channels
• More filters
• More data
More computation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
VGG (2014)
+
vs.
• Want to reach receptive field of size k
• Use one large filter (linear mix of many, then nonlinearity)
• Use several small filters (many linear mixes of few) - has fewer parameters
• Simonyan & Zisserman, 2014 find that deep and narrow wins
Deep and Narrow or Wide and Shallow?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fancy structures - Networks of networks
• Compute different filters
• Compose one big vector from all of them
• Layer them iteratively
Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf
Inception (2014)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch Norm (Ioffe et al., 2015) loss
data
• Loss occurs at last layer
• Last layers learn quickly
• Data is inserted at bottom layer
• Bottom layers change - everything changes
• Last layers need to relearn many times
• Slow convergence
• This is like covariate shift
Can we avoid changing last layers while
learning first layers?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Batch Norm (Ioffe et al., 2015)
• Can we avoid changing last layers while
learning first layers?
• Fix mean and variance
and adjust it separately mean
variance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ResNet (He et al., 2015)
• In regular layer simple
function is given by f(x) = 0
• Key idea - ‘Taylor expansion’
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DenseNet (Huang et al., 2016)
• Simple Function
• In ResNet ‘Taylor expansion’ ends after one term
• In DenseNet use multiple steps
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gluon CV: Deep Learning Toolkit for Computer
Vision
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why GluonCV?
What is the biggest challenge you have ever encountered
with deep learning?
“reproducing the best claimed results”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-world Stories
Back to a period in 2016, the same ImageNet models trained by MXNet
achieves on average 1% worse accuracy compared to Torch.
Tried almost everything to debug, even developed a plugin to run Torch
code inside MXNet so that it is easier to compare the results.
Transcoding training images using 95 JPEG quality rather than 85
solved the problem.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-world Stories
Using another open source DL framework, a similar problem
happened: trained model accuracies cannot match previous internal
version.
Spent months to figure out why, with no clue.
The order of data augmentation is different from previous version.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Starting from scratch can be hard
• Even the most talented researchers will get blocked by trivial things.
• Experiences and instincts might be your enemies in certain
circumstances.
• Training is time-consuming, initialization and augmentation is
randomized, and tons of implementation details need to be taken
care of. Debugging deep models is extremely difficult.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Qualities of open-source implementations vary.
• Languages, code styles, project structures, DL frameworks are
mixed.
• Personal projects tend to focusing on a specific task with specific
datasets. It requires significant engineering efforts to adapt to your
use case.
• Community projects can be abandoned frequently.
Embracing open source solutions can be difficult
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does GluonCV provide
Reproduction of important papers in recent years
Training scripts (as well as tuned hyper-
parameters) to reproduce the results
Considerate APIs and modules that are easy to
follow and understand, so that experiments based
on existing algorithms are less frustrating
Community support, feel free to ask and discuss
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
Image Classification
• More than 20+ pre-trained ImageNet models(ResNet,
MobileNet…)
• We achieved the best accuracy using some of the most popular
models(e.g., ResNet), compared with other frameworks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
• Object Detection
• SSD and YOLOv3: fastest
solution
• Faster-RCNN, RFCN and
FPN: slower but more
accurate, especially for tiny
objects
• Mask-RCNN: simultaneous
object detection and
semantic segmentation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
Semantic Segmentation
• FCN
• PSPNet
• Mask-RCNN
• DeepLab
Instance Segmentation
• Mask-RCNN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What’s in GluonCV
• Style Transfer
• MSGNet
• Generative Adversarial
Networks (GAN)
• CycleGAN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Like GluonCV?
https://gluon-cv.mxnet.io
https://github.com/dmlc/gluon-cv
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More Related Content

Similar to GluonCV

MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonAmazon Web Services
 
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018Hagay Lupesko
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenAWS Germany
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonAmazon Web Services
 
Machine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateMachine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateAmazon Web Services
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...Amazon Web Services
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Amazon Web Services
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Amazon Web Services
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerAmazon Web Services
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
 
Emotion Recognition in Images
Emotion Recognition in ImagesEmotion Recognition in Images
Emotion Recognition in ImagesApache MXNet
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...Edge AI and Vision Alliance
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsTaesu Kim
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Codiax
 

Similar to GluonCV (20)

MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
 
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018What is deep learning (and why you should care) - Talk at SJSU Oct 2018
What is deep learning (and why you should care) - Talk at SJSU Oct 2018
 
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und ExpertenMaschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
Maschinelles Lernen auf AWS für Entwickler, Data Scientists und Experten
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
Machine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS FargateMachine Learning Models with Apache MXNet and AWS Fargate
Machine Learning Models with Apache MXNet and AWS Fargate
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
Training Chatbots and Conversational Artificial Intelligence Agents with Amaz...
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
Model Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model ServerModel Serving for Deep Learning with MXNet Model Server
Model Serving for Deep Learning with MXNet Model Server
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Emotion Recognition in Images
Emotion Recognition in ImagesEmotion Recognition in Images
Emotion Recognition in Images
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
Issues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applicationsIssues in AI product development and practices in audio applications
Issues in AI product development and practices in audio applications
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
Deep Learning Workshop
Deep Learning WorkshopDeep Learning Workshop
Deep Learning Workshop
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

GluonCV

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Soji Adeshina, Machine Learning Engineer, Amazon AI Computer Vision 101 - Gluon CV
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Computer Vision Architectures for Image Classification : A brief Timeline
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convolution • Ideal for picking up on spatial patterns in data • Applied over and over again (layer after layer), you can create more abstracted spatial features • Inspired by experiments on visual cortex of a cat. • Can be run in parallel for really fast computations http://colah.github.io/posts/2014-07- Understanding-Convolutions/
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LeNet 1995 • Challenge: Multiple convolutions blow up dimensionality • Solution: Pooling • AvgPooling/Subsampling - average over patches (works OK) • MaxPooling - pick the maximum over patches (much better)
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AlexNet (Krizhevsky et al., 2012) • More convolutional layers • More channels • More filters • More data More computation
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. VGG (2014) + vs. • Want to reach receptive field of size k • Use one large filter (linear mix of many, then nonlinearity) • Use several small filters (many linear mixes of few) - has fewer parameters • Simonyan & Zisserman, 2014 find that deep and narrow wins Deep and Narrow or Wide and Shallow?
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fancy structures - Networks of networks • Compute different filters • Compose one big vector from all of them • Layer them iteratively Szegedy et al. arxiv.org/pdf/1409.4842v1.pdf Inception (2014)
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch Norm (Ioffe et al., 2015) loss data • Loss occurs at last layer • Last layers learn quickly • Data is inserted at bottom layer • Bottom layers change - everything changes • Last layers need to relearn many times • Slow convergence • This is like covariate shift Can we avoid changing last layers while learning first layers?
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Batch Norm (Ioffe et al., 2015) • Can we avoid changing last layers while learning first layers? • Fix mean and variance and adjust it separately mean variance
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ResNet (He et al., 2015) • In regular layer simple function is given by f(x) = 0 • Key idea - ‘Taylor expansion’
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. DenseNet (Huang et al., 2016) • Simple Function • In ResNet ‘Taylor expansion’ ends after one term • In DenseNet use multiple steps
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gluon CV: Deep Learning Toolkit for Computer Vision
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why GluonCV? What is the biggest challenge you have ever encountered with deep learning?
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why GluonCV? What is the biggest challenge you have ever encountered with deep learning? “reproducing the best claimed results”
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-world Stories Back to a period in 2016, the same ImageNet models trained by MXNet achieves on average 1% worse accuracy compared to Torch. Tried almost everything to debug, even developed a plugin to run Torch code inside MXNet so that it is easier to compare the results. Transcoding training images using 95 JPEG quality rather than 85 solved the problem.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-world Stories Using another open source DL framework, a similar problem happened: trained model accuracies cannot match previous internal version. Spent months to figure out why, with no clue. The order of data augmentation is different from previous version.
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Starting from scratch can be hard • Even the most talented researchers will get blocked by trivial things. • Experiences and instincts might be your enemies in certain circumstances. • Training is time-consuming, initialization and augmentation is randomized, and tons of implementation details need to be taken care of. Debugging deep models is extremely difficult.
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Qualities of open-source implementations vary. • Languages, code styles, project structures, DL frameworks are mixed. • Personal projects tend to focusing on a specific task with specific datasets. It requires significant engineering efforts to adapt to your use case. • Community projects can be abandoned frequently. Embracing open source solutions can be difficult
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does GluonCV provide Reproduction of important papers in recent years Training scripts (as well as tuned hyper- parameters) to reproduce the results Considerate APIs and modules that are easy to follow and understand, so that experiments based on existing algorithms are less frustrating Community support, feel free to ask and discuss
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV Image Classification • More than 20+ pre-trained ImageNet models(ResNet, MobileNet…) • We achieved the best accuracy using some of the most popular models(e.g., ResNet), compared with other frameworks
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV • Object Detection • SSD and YOLOv3: fastest solution • Faster-RCNN, RFCN and FPN: slower but more accurate, especially for tiny objects • Mask-RCNN: simultaneous object detection and semantic segmentation
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV Semantic Segmentation • FCN • PSPNet • Mask-RCNN • DeepLab Instance Segmentation • Mask-RCNN
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in GluonCV • Style Transfer • MSGNet • Generative Adversarial Networks (GAN) • CycleGAN
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Like GluonCV? https://gluon-cv.mxnet.io https://github.com/dmlc/gluon-cv
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Editor's Notes

  1. Won a nobel prize for this in 19