This document discusses deep learning for computer vision tasks. It begins with an overview of image classification using convolutional neural networks and how they have achieved superhuman performance on ImageNet. It then covers the key layers and concepts in CNNs, including convolutions, max pooling, and transferring learning to new problems. Finally, it discusses more advanced computer vision tasks that CNNs have been applied to, such as semantic segmentation, style transfer, visual question answering, and combining images with other data sources.
PyConZA'17 Deep Learning for Computer VisionAlex Conway
Slides from my talk on deep learning for computer vision at PyConZA on 2017/10/06.
Description:
The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 to just 2.25% in 2017 (human level error is around 5%).
In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.
The talk will give an overview of the cutting edge and some of the core mathematical concepts and will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…
PyDresden 20170824 - Deep Learning for Computer VisionAlex Conway
Slides from my talk at PyDresden
The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the international ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 before deep learning to just 2.25% in 2017 (human level error is around 5%).
In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.
The talk will give an overview of the cutting edge in the field and some of the core mathematical concepts behind the models. It will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
Slides for my talk on:
"Convolutional Neural Networks for Image Classification"
...at the Cape Town Deep Learning Meet-up 20170620
https://www.meetup.com/Cape-Town-deep-learning/events/240485642/
Machine Learning Tokyo - Deep Neural Networks for Video - NumberBoostAlex Conway
Slides from a talk I gave at the Machine Learning Tokyo meetup group on 20190318.
More info here: https://www.meetup.com/Machine-Learning-Tokyo/events/259467268/
Feel free to reach out if you ever need to build a computer vision system or need data labelled to train machine learning models :)
www.numberboost.com
Deep Neural Networks for Video Applications at the EdgeAlex Conway
Slides from my talk about deep learning for video applications and edge computing on 16 May 2019 at the Oslo Machine Learning Meetup on sponsored by InMeta!
More details about the event here:
https://www.meetup.com/Oslo-Maskinlaering/events/261318845/
I covered the following topics:
* Neural network crash course
* Convolutional neural networks
* Recurrent neural networks
* Object detection
* Counting people getting in/out transport using deep learning
* Edge computing
* Creating labelled training datasets using sebenz.ai
PyConZA'17 Deep Learning for Computer VisionAlex Conway
Slides from my talk on deep learning for computer vision at PyConZA on 2017/10/06.
Description:
The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 to just 2.25% in 2017 (human level error is around 5%).
In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.
The talk will give an overview of the cutting edge and some of the core mathematical concepts and will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…
PyDresden 20170824 - Deep Learning for Computer VisionAlex Conway
Slides from my talk at PyDresden
The state-of-the-art in image classification has skyrocketed thanks to the development of deep convolutional neural networks and increases in the amount of data and computing power available to train them. The top-5 error rate in the international ImageNet competition to predict which of 1000 classes an image belongs to has plummeted from 28% error in 2010 before deep learning to just 2.25% in 2017 (human level error is around 5%).
In addition to being able to classify objects in images (including not hotdogs), deep learning can be used to automatically generate captions for images, convert photos into paintings, detect cancer in pathology slide images, and help self-driving cars ‘see’.
The talk will give an overview of the cutting edge in the field and some of the core mathematical concepts behind the models. It will also include a short code-first tutorial to show how easy it is to get started using deep learning for computer vision in python…
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
Slides for my talk on:
"Convolutional Neural Networks for Image Classification"
...at the Cape Town Deep Learning Meet-up 20170620
https://www.meetup.com/Cape-Town-deep-learning/events/240485642/
Machine Learning Tokyo - Deep Neural Networks for Video - NumberBoostAlex Conway
Slides from a talk I gave at the Machine Learning Tokyo meetup group on 20190318.
More info here: https://www.meetup.com/Machine-Learning-Tokyo/events/259467268/
Feel free to reach out if you ever need to build a computer vision system or need data labelled to train machine learning models :)
www.numberboost.com
Deep Neural Networks for Video Applications at the EdgeAlex Conway
Slides from my talk about deep learning for video applications and edge computing on 16 May 2019 at the Oslo Machine Learning Meetup on sponsored by InMeta!
More details about the event here:
https://www.meetup.com/Oslo-Maskinlaering/events/261318845/
I covered the following topics:
* Neural network crash course
* Convolutional neural networks
* Recurrent neural networks
* Object detection
* Counting people getting in/out transport using deep learning
* Edge computing
* Creating labelled training datasets using sebenz.ai
A practical talk by Anirudh Koul aimed at how to run Deep Neural Networks to run on memory and energy constrained devices like smartphones. Highlights some frameworks and best practices.
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
В докладе представлена тема глубокого обучения (Deep Learning) для распознавания изображений. Рассматриваются практические аспекты обучения глубоких сверточных сетей на GPU, обсуждается личный опыт портирования обученных нейросетей в приложение на основе библиотеки OpenCV, проводится сравнение полученного детектора домашних животных на основе подхода Lazy Deep Learning с детектором Виолы-Джонса.
Докладчики: Артем Чернодуб – эксперт в области искусственных нейронных сетей и систем искусственного интеллекта. В 2007 году закончил Московский физико-технический институт. Руководит направлением Computer Vision в компании ZZ Wolf, а также по совместительству работает научным сотрудником в Институте проблем математических машин и систем НАНУ.
Юрий Пащенко – специалист в области систем машинного зрения и машинного обучения, магистр НТУУ «Киевский Политехнический Институт», факультет прикладной математики (2014). Работает в компании ZZ Wolf на должности R&D Engineer.
Presentation of few recent papers on Deep Learning ... in particular Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, Song Han, Huizi Mao, William J. Dally International Conference on Learning Representations ICLR2016
Deep Learning with Python: Getting started and getting from ideas to insights in minutes.
PyData Seattle 2015
Alex Korbonits (@korbonits)
This presentation was given July 25, 2015 at the PyData Seattle conference hosted by PyData and NumFocus.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
Synthetic dialogue generation with Deep LearningS N
A walkthrough of a Deep Learning based technique which would generate TV scripts using Recurrent Neural Network. The model will generate a completely new TV script for a scene, after being training from a dataset. One will learn the concepts around RNN, NLP and various deep learning techniques.
Technologies to be used:
Python 3, Jupyter, TensorFlow
Source code: https://github.com/syednasar/talks/tree/master/synthetic-dialog
Deep learning is making news across the country as one of the most promising techniques in machine learning research. However, these methods are complex to implement, finicky to tune, and state-of-the-art accuracy is only achieved by a few experts in the field. In this session, we give a beginner-friendly explanation of deep learning using neural networks—what it is, what it does, and how; and introduce the concept of deep features, which allows you to obtain great performance with reduced running times and data set sizes. We then show how these methods can easily be deployed on GPU instances (G2) on Amazon EC2.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Yinyin Liu presents at SD Robotics Meetup on November 8th, 2016. Deep learning has made great success in image understanding, speech, text recognition and natural language processing. Deep Learning also has tremendous potential to tackle the challenges in robotic vision, and sensorimotor learning in a robotic learning environment. In this talk, we will talk about how current and future deep learning technologies can be applied for robotic applications.
Deep learning on mobile - 2019 Practitioner's GuideAnirudh Koul
The 2019 Guide to Deep Learning on Mobile, from Inference to Training on iOS and Android smartphones. Featuring CoreML, Tensorflow Lite, MLKit, Fritz, AutoML Approaches (Hardware Aware Neural Architecture Search) to make models more efficient, and lots of videos. Presented by Anirudh Koul, Siddha Ganju and Meher Anand Kasam. More details at PracticalDL.ai in the upcoming O'Reilly Book 'Practical Deep Learning on Cloud & Mobile'
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
Deep Learning is all the rage these days, but where does the reality of what Deep Learning can do end and the media hype begin? In this talk, I will dispel common myths about Deep Learning that are not necessarily true and help you decide whether you should practically use Deep Learning in your software stack.
I’ll begin with a technical overview of common neural network architectures like CNNs, RNNs, GANs and their common use cases like computer vision, language understanding or unsupervised machine learning. Then I’ll separate the hype from reality around questions like:
• When should you prefer traditional ML systems like scikit learn or Spark.ML instead of Deep Learning?
• Do you no longer need to do careful feature extraction and standardization if using Deep Learning?
• Do you really need terabytes of data when training neural networks or can you ‘steal’ pre-trained lower layers from public models by using transfer learning?
• How do you decide which activation function (like ReLU, leaky ReLU, ELU, etc) or optimizer (like Momentum, AdaGrad, RMSProp, Adam, etc) to use in your neural network?
• Should you randomly initialize the weights in your network or use more advanced strategies like Xavier or He initialization?
• How easy is it to overfit/overtrain a neural network and what are the common techniques to ovoid overfitting (like l1/l2 regularization, dropout and early stopping)?
Transfer learning (TL) is a research problem in machine learning (ML) that focuses on applying knowledge gained while solving one task to a related task
A practical talk by Anirudh Koul aimed at how to run Deep Neural Networks to run on memory and energy constrained devices like smartphones. Highlights some frameworks and best practices.
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
В докладе представлена тема глубокого обучения (Deep Learning) для распознавания изображений. Рассматриваются практические аспекты обучения глубоких сверточных сетей на GPU, обсуждается личный опыт портирования обученных нейросетей в приложение на основе библиотеки OpenCV, проводится сравнение полученного детектора домашних животных на основе подхода Lazy Deep Learning с детектором Виолы-Джонса.
Докладчики: Артем Чернодуб – эксперт в области искусственных нейронных сетей и систем искусственного интеллекта. В 2007 году закончил Московский физико-технический институт. Руководит направлением Computer Vision в компании ZZ Wolf, а также по совместительству работает научным сотрудником в Институте проблем математических машин и систем НАНУ.
Юрий Пащенко – специалист в области систем машинного зрения и машинного обучения, магистр НТУУ «Киевский Политехнический Институт», факультет прикладной математики (2014). Работает в компании ZZ Wolf на должности R&D Engineer.
Presentation of few recent papers on Deep Learning ... in particular Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, Song Han, Huizi Mao, William J. Dally International Conference on Learning Representations ICLR2016
Deep Learning with Python: Getting started and getting from ideas to insights in minutes.
PyData Seattle 2015
Alex Korbonits (@korbonits)
This presentation was given July 25, 2015 at the PyData Seattle conference hosted by PyData and NumFocus.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
Synthetic dialogue generation with Deep LearningS N
A walkthrough of a Deep Learning based technique which would generate TV scripts using Recurrent Neural Network. The model will generate a completely new TV script for a scene, after being training from a dataset. One will learn the concepts around RNN, NLP and various deep learning techniques.
Technologies to be used:
Python 3, Jupyter, TensorFlow
Source code: https://github.com/syednasar/talks/tree/master/synthetic-dialog
Deep learning is making news across the country as one of the most promising techniques in machine learning research. However, these methods are complex to implement, finicky to tune, and state-of-the-art accuracy is only achieved by a few experts in the field. In this session, we give a beginner-friendly explanation of deep learning using neural networks—what it is, what it does, and how; and introduce the concept of deep features, which allows you to obtain great performance with reduced running times and data set sizes. We then show how these methods can easily be deployed on GPU instances (G2) on Amazon EC2.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-
ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make train-
ing faster, we used non-saturating neurons and a very efficient GPU implemen-
tation of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry.
Yinyin Liu presents at SD Robotics Meetup on November 8th, 2016. Deep learning has made great success in image understanding, speech, text recognition and natural language processing. Deep Learning also has tremendous potential to tackle the challenges in robotic vision, and sensorimotor learning in a robotic learning environment. In this talk, we will talk about how current and future deep learning technologies can be applied for robotic applications.
Deep learning on mobile - 2019 Practitioner's GuideAnirudh Koul
The 2019 Guide to Deep Learning on Mobile, from Inference to Training on iOS and Android smartphones. Featuring CoreML, Tensorflow Lite, MLKit, Fritz, AutoML Approaches (Hardware Aware Neural Architecture Search) to make models more efficient, and lots of videos. Presented by Anirudh Koul, Siddha Ganju and Meher Anand Kasam. More details at PracticalDL.ai in the upcoming O'Reilly Book 'Practical Deep Learning on Cloud & Mobile'
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
Deep Learning is all the rage these days, but where does the reality of what Deep Learning can do end and the media hype begin? In this talk, I will dispel common myths about Deep Learning that are not necessarily true and help you decide whether you should practically use Deep Learning in your software stack.
I’ll begin with a technical overview of common neural network architectures like CNNs, RNNs, GANs and their common use cases like computer vision, language understanding or unsupervised machine learning. Then I’ll separate the hype from reality around questions like:
• When should you prefer traditional ML systems like scikit learn or Spark.ML instead of Deep Learning?
• Do you no longer need to do careful feature extraction and standardization if using Deep Learning?
• Do you really need terabytes of data when training neural networks or can you ‘steal’ pre-trained lower layers from public models by using transfer learning?
• How do you decide which activation function (like ReLU, leaky ReLU, ELU, etc) or optimizer (like Momentum, AdaGrad, RMSProp, Adam, etc) to use in your neural network?
• Should you randomly initialize the weights in your network or use more advanced strategies like Xavier or He initialization?
• How easy is it to overfit/overtrain a neural network and what are the common techniques to ovoid overfitting (like l1/l2 regularization, dropout and early stopping)?
Transfer learning (TL) is a research problem in machine learning (ML) that focuses on applying knowledge gained while solving one task to a related task
NIT Silchar ML Hackathon 2019 Session on Computer Vision with Deep Learning.
Targeted Audience: Pre-requisite: Basic knowledge on Machine Learning and Deep Learning
Getting your hands dirty with deep learning in javaDave Snowdon
Slides to accompany the introduction to deep learning workshop and exercises on github: https://github.com/davesnowdon/devoxxuk2018-dl-workshop
Workshop held at Devoxx UK 2018, London
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
In this session, we’ll discuss approaches for applying convolutional neural networks to novel computer vision problems, even without having millions of images of your own. Pretrained models and generic image data sets from Google, Kaggle, universities, and other places can be leveraged and adapted to solve industry and business specific problems. We’ll discuss the approaches of transfer learning and fine tuning to help anyone get started on using deep learning to get cutting edge results on their computer vision problems.
We trained a large, deep convolutional neural network to classify the 1.2 million
high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different
classes. On the test data, we achieved top-1 and top-5 error rates of 37.5%
and 17.0% which is considerably better than the previous state-of-the-art. The
neural network, which has 60 million parameters and 650,000 neurons, consists
of five convolutional layers, some of which are followed by max-pooling layers,
and three fully-connected layers with a final 1000-way softmax. To make training
faster, we used non-saturating neurons and a very efficient GPU implementation
of the convolution operation. To reduce overfitting in the fully-connected
layers we employed a recently-developed regularization method called “dropout”
that proved to be very effective. We also entered a variant of this model in the
ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%,
compared to 26.2% achieved by the second-best entry
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Similar to Deep Learning for Computer Vision - PyconDE 2017 (20)
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
21. Deep learning is MagicDeep learning is MagicDeep learning is EASY!
22. 1. What is a neural network?
2. What is a convolutional neural network?
3. How to use a convolutional neural
network
4. More advanced Methods
5. Case studies & applications 22
24. What is a neuron?
25
• 3 inputs [x1,x2,x3]
• 3 weights [w1,w2,w3]
• 1 bias term b
• Element-wise multiply and sum: inputs & weights
• Add bias term
• Apply activation function f
• Output of neuron is just a number (like the in
25. What is an Activation Function?
26
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transform neuron’s
output
NB: sigmoid output in [0,1]
27. What is a
28
Inputs outputs
hidden
layer 1
hidden
layer 2
hidden
layer 3
Note: Outputs of one layer are inputs into the next layer
This (non-convolutional) architecture is called a “multi-layered
perceptron”
Neural Network?(Deep)
36. How does a neural network learn?
37
• We need labelled examples “training data”
• We initialize network weights randomly and initially get random predictions
• For each labelled training data point, we calculate the error (cross-entropy
error for classification or mean-squared error for regression) between the
network’s predictions and the ground-truth labels
• Use ‘backpropagation’ (chain rule), to update the network parameters
(weights + convolutional filters ) in the opposite direction to the error
37. “How much
error increases
when we increase
this weight”
How does a neural network learn?
38
New
weight =
Old
weight
Learning
rate-
Gradient of
Error with
respect to Weight
( )x
40. What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks &
Deep Learning free online book
http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Notes
http://neuralnetworksanddeeplearning.com/chap1.html
41
42. What is a Convolutional Neural
Network?
43
“like a ordinary neural network but with
special types of layers that work well
on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)
• Pixel intensity ∈[0,255]
• Image has width w and height h
• Therefore image is w x h x 3 numbers
43. First of all why do we need special
layers?
44
1. Locality
– objects tend to have a local spatial support
2. Translation Invariance
– objects can appear anywhere in the image
3. Parameter counts explode in fully connected nets
– recall 3 layer 28x28 pixel MNIST net had 13’002 parameters…
if we instead had 500x500 pixel images, we’d need 4’000’458
4. We can get better results ;)
– Keras MLP (98.4% accuracy = 1.6% error) vs CNN (99.25% = 0.75
error)
So more than half the error rate
https://www.youtube.com/watch?v=t_TY_5bG9J8
44. 45
This is VGGNet – don’t panic, we’ll break it down piece
by piece
Example Architecture
45. 46
This is VGGNet – don’t panic, we’ll break it down piece
by piece
Example Architecture
48. New Layer Type: Convolutional Layer
49
• 2-d weighted average when multiply kernel over pixel patches
• We slide the kernel over all pixels of the image (handle
borders)
• Kernel starts off with “random” values and network updates
(learns) the kernel values (using backpropagation) to try
minimize loss
• Kernels shared across the whole image (parameter
49. Many Kernels = Many “Activation Maps” = Volume
50http://cs231n.github.io/convolutional-networks/
57. New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next
• …by replacing NxN sub-area with max value
• Makes the network less sensitive to local perturbations
• Makes network “look” at larger areas of the image at a time
• e.g. Instead of identifying fur, identify cat
• Reduces overfitting since losing information helps the network
generalize
59http://cs231n.github.io/convolutional-networks/
59. Stack Conv + Pooling Layers and Go Deep
61
Convolution + max pooling + fully connected +
softmax
60. 62
Stack Conv + Pooling Layers and Go Deep
Convolution + max pooling + fully connected +
softmax
61. 63
Stack These Layers and Go Deep
Convolution + max pooling + fully connected +
softmax
62. 64
Stack These Layers and Go Deep
Convolution + max pooling + fully connected +
softmax
63. 65
Flatten the Final “Bottleneck” layer
Convolution + max pooling + fully connected +
softmax
Flatten the final 7 x 7 x 512 max pooling layer
Add fully-connected dense layer on top
64. 66
Bringing it all together
Convolution + max pooling + fully connected +
softmax
65. Softmax
Convert scores ∈ ℝ to probabilities ∈ [0,1]
Final output prediction = highest probability class
67
66. Bringing it all together
68
Convolution + max pooling + fully connected +
softmax
73. Using a Pre-Trained ImageNet-Winning
CNN
75
• We’ve been looking at “VGGNet”
• Oxford Visual Geometry Group (VGG)
• ImageNet 2014 Runner-up
• Network is 16 layers (deep!)
• Trained on 4 GPUs for 2-3 weeks (data split across
GPUs)
• Easy to fine-tune
80. Fine-tuning A CNN To Solve A New Problem
• Cut off last layer of pre-trained Imagenet winning
CNN
• Keep learned feature detectors (convolutions)
• …BUT replace the dense and softmax layers with
new ones!
• Can learn to predict new (completely different)
classes
• Fine-tuning is re-training the new final layer(s)
82
84. Fine-tuning A CNN To Solve A New Problem
• Load pre-trained VGG network
• Remove final dense & softmax layers that predict 1000 ImageNet
classes
• Replace with new dense & softmax layer to predict 8 categories
86
96.3% accuracy in under 2 minutes for
classifying products into categories
(WITH ONLY 3467 TRAINING IMAGES!!1!)
Fine-tuning is awesome!
Insert obligatory brain analogy
87. Visual Similarity
89
• Chop off VGG dense layers and keep bottleneck
features
• Add single dense layer with 256 activations
• “Predict” to get activation for each image
• Compute nearest neighbours in the space of these
https://memeburn.com/2017/06/spree-image-search/
96. Long et al. “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015
Noh et al. Learning Deconvolution Network for Semantic Segmentation. IEEE on Computer Vision 2016
Semantic Segmentation
107. king + woman – man ≈ queen
111
Frome et al. (2013) ‘DeViSE: A Deep Visual-Semantic Embedding Model’,
Advances in Neural Information Processing Systems, pp. 2121–2129
CNN + Word2Vec = AWESOME
108. DeViSE: A Deep Visual-Semantic Embedding Model
XXX
112
Before:
Encode labels as 1-hot vector
109. DeViSE: A Deep Visual-Semantic Embedding Model
XXX
113
After:
Encode labels as word2vec vectors
(FROM A SEPARATE MODEL)
Can look these up for all the nouns in ImageNet
300-d
word2vec
vectors
110. DeViSE: A Deep Visual-Semantic Embedding Model
wv(fish)+ wv(net)
114
https://www.youtube.com/watch?v=uv0gmrXSXVg
2
wv* =
…get nearest neighbours to wv*
111.
112. Multi-Modal Modeling – Mix & Match!
Structured
data e.g.
#beds,
#garages, m2,
etc.
Features
extracted from
description e.g.
with an LSTM
Features
extracted from
photos using
convolutional
network
Price
114. Estimating Accident Repair Cost from
Photos
TODO
118
Prototype for large
S.A. insurer
Detect car make &
model from license
sticker on
windshield
Predict repair cost
using learned
model
115. Image & Video Moderation
TODO
119
Large international gay dating app with tens of
millions of users
uploading hundreds-of-thousands of photos per day
119. “Using the final model that has been trained on survey data, we can
estimate per capita consumption expenditure for any location
where we have daytime satellite imagery ”
http://sustain.stanford.edu/predicting-poverty/
“…features visible in daytime
satellite imagery, such as roofing
material and distance to urban
areas, vary roughly linearly with
expenditure”