Neural Networks and Deep Learning

NEURAL NETWORKS AND DEEP
LEARNING
ASIM JALIS
GALVANIZE

ASIM JALIS
Galvanize/Zipfian, Data
Engineering
Cloudera, Microso!,
Salesforce
MS in Computer Science
from University of
Virginia

GALVANIZE PROGRAMS
Program Duration
Data Science
Immersive
12
weeks
Data
Engineering
Immersive
12
weeks
Web
Developer
Immersive
6
months
Galvanize U 1 year

WHAT IS THIS TALK ABOUT?
Using Neural Networks
and Deep Learning
To recognize images
By the end of the class
you will be able to
create your own deep
learning systems

HOW MANY PEOPLE HERE HAVE
USED NEURAL NETWORKS?

USED MACHINE LEARNING?

USED PYTHON?

WHAT IS MACHINE LEARNING
Self-driving cars
Voice recognition
Facial recognition

HISTORY OF MACHINE LEARNING
Input Features Algorithm Output
Machine Human Human Machine
Machine Human Machine Machine
Machine Machine Machine Machine

FEATURE EXTRACTION
Traditionally data scientists to define features
Deep learning systems are able to extract features
themselves

DEEP LEARNING MILESTONES
Years Theme
1980s Backpropagation invented allows multi-layer
Neural Networks
2000s SVMs, Random Forests and other classifiers
overtook NNs
2010s Deep Learning reignited interest in NN

IMAGENET
AlexNet submitted to the ImageNet ILSVRC challenge in
2012 is partly responsible for the renaissance.
Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey Hinton used
Deep Learning techniques.
They combined this with GPUs, some other techniques.
The result was a neural network that could classify images
of cats and dogs.
It had an error 16% compared to 26% for the runner up.

Ilya Sutskever, Alex Krizhevsky, Geoﬀrey Hinton

MACHINE LEARNING AND DEEP
LEARNING
Deep Learning fits inside
Machine Learning
Deep Learning a
Machine Learning
technique
Share techniques for
evaluating and
optimizing models

WHAT IS MACHINE LEARNING?
Inputs: Vectors or points of high dimensions
Outputs: Either binary vectors or continuous vectors
Machine Learning finds the relationship between them
Uses statistical techniques

SUPERVISED VS UNSUPERVISED
Supervised: Data needs to be labeled
Unsupervised: Data does not need to be labeled

TECHNIQUES
Classification
Regression
Clustering
Recommendations
Anomaly detection

CLASSIFICATION EXAMPLE:
EMAIL SPAM DETECTION

CLASSIFICATION EXAMPLE:
EMAIL SPAM DETECTION
Start with large collection of emails, labeled spam/not-
spam
Convert email text into vectors of 0s and 1s: 0 if a word
occurs, 1 if it does not
These are called inputs or features
Split data set into training set (70%) and test set (30%)
Use algorithm like Random Forest to build model
Evaluate model by running it on test set and capturing
success rate

CLASSIFICATION ALGORITHMS
Neural Networks
Random Forest
Support Vector Machines (SVM)
Decision Trees
Logistic Regression
Naive Bayes

CHOOSING ALGORITHM
Evaluate diﬀerent models on data
Look at the relative success rates
Use rules of thumb: some algorithms work better on some
kinds of data

CLASSIFICATION EXAMPLES
Is this tumor benign or cancerous?
Is this lead profitable or not?
Who will win the presidential elections?

CLASSIFICATION: POP QUIZ
Is classification supervised or unsupervised learning?
Supervised because you have to label the data.

CLUSTERING EXAMPLE: LOCATE
CELL PHONE TOWERS
Start with GPS
coordinates of all cell
phone users
Represent data as
vectors
Locate towers in biggest
clusters

CLUSTERING EXAMPLE: T-SHIRTS
What size should a t-
shirt be?
Everyone’s real t-shirt
size is diﬀerent
Lay out all sizes and
cluster
Target large clusters
with XS, S, M, L, XL

CLUSTERING: POP QUIZ
Is clustering supervised or unsupervised?
Unsupervised because no labeling is required

RECOMMENDATIONS EXAMPLE:
AMAZON
Model looks at user
ratings of books
Viewing a book triggers
implicit rating
Recommend user new
books

RECOMMENDATION: POP QUIZ
Are recommendation systems supervised or unsupervised?
Unsupervised

REGRESSION
Like classification
Output is continuous instead of one from k choices

REGRESSION EXAMPLES
How many units of product will sell next month
What will student score on SAT
What is the market price of this house
How long before this engine needs repair

REGRESSION EXAMPLE:
AIRCRAFT PART FAILURE
Cessna collects data
from airplane sensors
Predict when part needs
to be replaced
Ship part to customer’s
service airport

REGRESSION: QUIZ
Is regression supervised or unsupervised?
Supervised

ANOMALY DETECTION EXAMPLE:
CREDIT CARD FRAUD
Train model on good
transactions
Anomalous activity
indicates fraud
Can pass transaction
down to human for
investigation

ANOMALY DETECTION EXAMPLE:
NETWORK INTRUSION
Train model on network
login activity
Anomalous activity
indicates threat
Can initiate alerts and
lockdown procedures

ANOMALY DETECTION: QUIZ
Is anomaly detection supervised or unsupervised?
Unsupervised because we only train on normal data

FEATURE EXTRACTION
Converting data to feature vectors
Natural Language Processing
Principal Component Analysis
Auto-Encoders

FEATURE EXTRACTION: QUIZ
Is feature extraction supervised or unsupervised?
Unsupervised

DEEP LEARNING USED FOR
Feature Extraction
Classification
Regression

DEEP LEARNING FRAMEWORKS
TensorFlow: NN library from Google
Theano: Low-level GPU-enabled tensor library
Torch7: NN library, uses Lua for binding, used by Facebook
and Google
Caﬀe: NN library by Berkeley AMPLab
Nervana: Fast GPU-based machines optimized for deep
learning

DEEP LEARNING FRAMEWORKS
Keras, Lasagne, Blocks: NN libraries that make Theano
easier to use
CUDA: Programming model for using GPUs in general-
purpose programming
cuDNN: NN library by Nvidia based on CUDA, can be used
with Torch7, Caﬀe
Chainer: NN library that uses CUDA

DEEP LEARNING PROGRAMMING
LANGUAGES
All the frameworks support Python
Except Torch7 which uses Lua for its binding language

TENSORFLOW
TensorFlow originally
developed by Google
Brain Team
Allows using GPUs for
deep learning
algorithms
Single processor version
released in 2015
Multiple processor
version released in
March 2016

KERAS
Supports Theano and
TensorFlow as back-
ends
Provides deep learning
API on top of TensorFlow
TensorFlow provides
low-level matrix
operations

TENSORFLOW: GEOFFREY
HINTON, JEFF DEAN

WHAT IS A NEURON?
Receives signal on synapse
When trigger sends signal on axon

MATHEMATICAL NEURON
Mathematical abstraction, inspired by biological neuron
Either on or oﬀ based on sum of input

MATHEMATICAL FUNCTION
Neuron is a mathematical function
Adds up (weighted) inputs and applies sigmoid (or other
function)
This determines if it fires or not

WHAT ARE NEURAL NETWORKS?
Biologically inspired machine learning algorithm
Mathematical neurons arranged in layers
Accumulate signals from the previous layer
Fire when signal reaches threshold

NEURON INCOMING
Each neuron receives
signals from neurons in
previous layer
Signal aﬀected by
weight
Some are more
important than others
Bias is the base signal
that the neuron receives

NEURON OUTGOING
Each neuron sends its
signal to the neurons in
the next layer
Signals aﬀected by
weight

LAYERED NETWORK
Each layer looks at features identified by previous layer

ELECTIONS
Consider the elections
This is a gated system
A way to aggregate
diﬀerent views

ELECTIONS
Is this a Neural Network?
How many layers does it
have?

NEURON LAYERS
The nomination is the
last layer, layer N
States are layer N-1
Counties are layer N-2
Districts are layer N-3
Individuals are layer N-4
Individual brains have
even more layers

TRAINING: HOW DO WE
IMPROVE?
Calculate error from desired goal
Increase weight of neurons who voted right
Decrease weight of neurons who voted wrong
This will reduce error

GRADIENT DESCENT
This algorithm is called gradient descent
Think of error as function of weights

FEED FORWARD
Also called forward
propagation or forward
prop
Initialize inputs
Calculate activation of
each layer
Calculate activation of
output layer

BACK PROPAGATION
Use forward prop to
calculate the error
Error is function of all
network weights
Adjust weights using
gradient descent
Repeat with next record
Keep going over training
set until convergence

HOW DO YOU FIND THE MINIMUM
IN AN N-DIMENSIONAL SPACE?
Take a step in the steepest direction.
Steepest direction is vector sum of all derivatives.

PUTTING ALL THIS TOGETHER
Use forward prop to
activate
Use back prop to train
Then use forward prop
to test

BENEFITS OF RELU
Popular
Accelerates convergence
by 6x (Krizhevsky et al)
Operation is faster since
it is linear not
exponential
Can die by going to zero
Pro: Sparse matrix
Con: Network can die

LEAKY RELU
Pro: Does not die
Con: Matrix is not sparse

SOFTMAX
Final layer of network
used for classification
Turns output into
probability distribution
Normalizes output of
neurons to sum to 1

PROBLEM: OIL EXPLORATION
Drilling holes is
expensive
We want to find the
biggest oilfield without
wasting money on duds
Where should we plant
our next oilfield derrick?

PROBLEM: NEURAL NETWORKS
Testing
hyperparameters is
expensive
We have an N-
dimensional grid of
parameters
How can we quickly zero
in on the best
combination of
hyperparameters?

HYPERPARAMETER EXAMPLE
How many layers should
we have
How many neurons
should we have in
hidden layers
Should we use Sigmoid,
Tanh, or ReLU
Should we initialize

ALGORITHMS
Grid
Random
Bayesian Optimization

GRID
Systematically search
entire grid
Remember best found
so far

RANDOM
Randomly search the grid
Remember the best found so far
Bergstra and Bengio’s result and Alice Zheng’s
explanation (see References)
60 random samples gets you within top 5% of grid search
with 95% probability

BAYESIAN OPTIMIZATION
Balance between
explore and exploit
Exploit: test spots within
explored perimeter
Explore: test new spots
in random locations
Balance the trade-oﬀ

SIGOPT
YC-backed SF startup
Founded by Scott Clark
Raised $2M
Sells cloud-based
proprietary variant of
Bayesian Optimization

BAYESIAN OPTIMIZATION PRIMER
Bayesian Optimization Primer by Ian Dewancker, Michael
McCourt, Scott Clark
See References

OPEN SOURCE VARIANTS
Open source alternatives:
Spearmint
Hyperopt
SMAC
MOE

DEPLOYING
Phases: training,
deployment
Training phase run on
back-end servers
Optimize hyper-
parameters on back-end
Deploy model to front-
end servers, browsers,
devices
Front-end only uses
forward prop and is fast

SERIALIZING/DESERIALIZING
MODEL
Back-end: Serialize model + weights
Front-end: Deserialize model + weights

HDF 5
Keras serializes model architecture to JSON
Keras serializes weights to HDF5
Serialization model for hierarchical data
APIs for C++, Python, Java, etc
https://www.hdfgroup.org

DEPLOYMENT EXAMPLE: CANCER
DETECTION
Rhobota.com’s cancer
detecting iPhone app
Developed by Bryan
Shaw a!er his son’s
illness
Model built on back-end,
deployed on iPhone
iPhone detects retinal
cancer

WHAT IS DEEP LEARNING?
Deep Learning is a learning method that can train the
system with more than 2 or 3 non-linear hidden layers.

WHAT IS DEEP LEARNING?
Machine learning techniques which enable unsupervised
feature learning and pattern analysis/classification.
The essence of deep learning is to compute
representations of the data.
Higher-level features are defined from lower-level ones.

HOW IS DEEP LEARNING
DIFFERENT FROM REGULAR
NEURAL NETWORKS?
Training neural networks requires applying gradient
descent on millions of dimensions.
This is intractable for large networks.
Deep learning places constraints on neural networks.
This allows them to be solvable iteratively.
The constraints are generic.

WHAT ARE AUTO-ENCODERS?
An auto-encoder is a learning algorithm
It applies backpropagation and sets the target values to
be equal to its inputs
In other words it trains itself to do the identity
transformation

WHY DOES IT DO THIS?
Auto-encoder places constraints on itself
E.g. it restricts the number of hidden neurons
This allows it to find a good representation of the data

IS THE AUTO-ENCODER
SUPERVISED OR UNSUPERVISED?
It is unsupervised.
The data is unlabeled.

WHAT ARE CONVOLUTION
NEURAL NETWORKS?
Feedforward neural networks
Connection pattern inspired by visual cortex

CNNS
The convolutional layer’s parameters are a set of
learnable filters
Every filter is small along width and height
During the forward pass, each filter slides across the width
and height of the input, producing a 2-dimensional
activation map
As we slide across the input we compute the dot product
between the filter and the input

CNNS
Intuitively, the network learns filters that activate when
they see a specific type of feature anywhere
In this way it creates translation invariance

CONVNET EXAMPLE
Zero-Padding: the boundaries are padded with a 0
Stride: how much the filter moves in the convolution
Parameter sharing: all filters share the same parameters

CONVNET EXAMPLE
From http://cs231n.github.io/convolutional-networks/

WHAT IS A POOLING LAYER?
The pooling layer reduces the resolution of the image
further
It tiles the output area with 2x2 mask and takes the
maximum activation value of the area

REVIEW
keras/examples/mnist_cnn.py
Recognizes hand-written digits
By combining diﬀerent layers

RNNS
RNNs capture patterns
in time series data
Constrained by shared
weights across neurons
Each neuron observes
diﬀerent times

LSTMS
Long Short Term Memory networks
RNNs cannot handle long time lags between events
LSTMs can pick up patterns separated by big lags
Used for speech recognition

RNN EFFECTIVENESS
Andrej Karpathy uses
LSTMs to generate text
Generates Shakespeare,
Linux Kernel code,
mathematical proofs.
See
http://karpathy.github.io/

REFERENCES
Bayesian Optimization by Dewancker et al
Random Search by Bengio et al
Evaluating machine learning models
Alice Zheng
http://sigopt.com
http://jmlr.org
http://www.oreilly.com

REFERENCES
Dropout by Hinton et al
Understanding LSTM Networks by Chris Olah
Multi-scale Deep Learning for Gesture Detection and
Localization
by Neverova et al
Unreasonable Eﬀectiveness of RNNs by Karpathy
http://cs.utoronto.edu
http://github.io
http://uoguelph.ca
http://karpathy.github.io

Neural Networks and Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Neural Networks and Deep Learning

Similar to Neural Networks and Deep Learning (20)

Recently uploaded

Recently uploaded (20)

Neural Networks and Deep Learning