Introduction to Deep Learning
UCSC Meetup (Santa Clara)
Monday 08/12/2019
Oswald Campesato
ocampesato@yahoo.com
High-Level List of Topics
intro to AI/ML/DL/ANNs
Hidden layers/Initialization/Neurons per layer
Activation function
cost function/gradient descent/learning rate
Dropout rate
What is linear regression
what are CNNs
Topics We Won’t Cover
RNNs, LSTMs, and VAEs
Transformers et al
Reinforcement Learning (RL)
Deep Reinforcement Learning (DRL)
Natural Language Processing (NLP)
The Data/AI Landscape
The Official Start of AI (1956)
Use Cases for Deep Learning
computer vision
speech recognition
image processing
bioinformatics
social network filtering
drug design
Customer relationship management
Recommendation systems
Bioinformatics
Mobile Advertising
Many others
What is a Deep Neural Network?
 input layer, multiple hidden layers, and output layer
 nonlinear processing via activation functions
 perform transformation and feature extraction
 gradient descent algorithm with back propagation
 each layer receives the output from previous layer
 results are comparable/superior to human experts
Types of Deep Learning
 Supervised learning (you know the answer)
 unsupervised learning (you don’t know the answer)
 Semi-supervised learning (mixed dataset)
 Reinforcement learning (such as games)
 Types of algorithms:
 Classifiers (detect images, spam, fraud, etc)
 Regression (predict stock price, housing price, etc)
 Clustering (unsupervised classifiers)
Deep Learning Architectures
Unsupervised Pre-trained Networks (UPNs)
Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (LSTMs)
The most popular architectures:
CNNs (and capsule networks) for images
LSTMs for NLP and audio
Neural Network: 3 Hidden Layers
NN: 2 Hidden Layers (Regression)
Titanic Dataset (portion)
Classification and Deep Learning
What is Linear Regression
One of the simplest models in ML
Fits a line (y = m*x + b) to data in 2D
Finds best line by minimizing MSE:
m: = minimize sum of squared values
b also has a closed form solution
Linear Regression in 2D: example
Linear Regression in 2D: example
Sample Cost Function #1 (MSE)
Linear Regression: example #1
One feature (independent variable):
X = number of square feet
Predicted value (dependent variable):
Y = cost of a house
A very “coarse grained” model
We can devise a much better model
Linear Regression: example #2
Multiple features:
X1 = # of square feet
X2 = # of bedrooms
X3 = # of bathrooms (dependency?)
X4 = age of house
X5 = cost of nearby houses
X6 = corner lot (or not): Boolean
a much better model (6 features)
Linear Multivariate Analysis
General form of multivariate equation:
Y = w1*x1 + w2*x2 + . . . + wn*xn + b
w1, w2, . . . , wn are numeric values
x1, x2, . . . , xn are variables (features)
Properties of variables:
Can be independent (Naïve Bayes)
weak/strong dependencies can exist
Neural Network with 3 Hidden Layers
Neural Networks (general)
Multiple hidden layers:
Layer composition is your decision
Activation functions: sigmoid, tanh, RELU
https://en.wikipedia.org/wiki/Activation_function
Back propagation (1980s)
https://en.wikipedia.org/wiki/Backpropagation
=> Initial weights: small random numbers
Euler’s Function (e: 2.71828. . .)
The sigmoid Activation Function
The tanh Activation Function
The ReLU Activation Function
The softmax Activation Function
Gaussian Functions
Gaussian Functions
Activation Functions in Python
import numpy as np
...
# Python sigmoid example:
z = 1/(1 + np.exp(-np.dot(W, x)))
...
# Python tanh example:
z = np.tanh(np.dot(W,x));
# Python ReLU example:
z = np.maximum(0, np.dot(W, x))
What’s the “Best” Activation Function?
Initially: sigmoid was popular
Then: tanh became popular
Now: RELU is preferred (better results)
Softmax: for FC (fully connected) layers
NB: sigmoid and tanh are used in LSTMs
Types of Cost/Error Functions
MSE (mean-squared error)
Cross-entropy
others
Sample Cost Function #1 (MSE)
Sample Cost Function #2
Sample Cost Function #3
How to Select a Cost Function
mean-squared error:
for a regression problem
binary cross-entropy (or mse):
for a two-class classification problem
categorical cross-entropy:
for a many-class classification problem
Types of Optimizers
SGD
rmsprop
Adagrad
Adam
Others
http://cs229.stanford.edu/notes/cs229-notes1.pdf
Setting up Data & the Model
standardize the data:
Subtract the ‘mean’ and divide by stddev
Initial weight values for NNs:
random(0,1) or N(0,1) or N(0/(1/n))
More details:
http://cs231n.github.io/neural-networks-2/#losses
Hyper Parameters (examples)
# of hidden layers in a neural network
the learning rate (in many models)
the dropout rate
# of leaves or depth of a tree
# of latent factors in a matrix factorization
# of clusters in a k-means clustering
Hyper Parameter: dropout rate
"dropout" refers to dropping out units (both hidden
and visible) in a neural network
a regularization technique for reducing overfitting in
neural networks
prevents complex co-adaptations on training data
a very efficient way of performing model averaging
with neural networks
How Many Layers in a DNN?
Algorithm #1 (from Geoffrey Hinton):
1) add layers until you start overfitting your
training set
2) now add dropout or some another
regularization method
Algorithm #2 (Yoshua Bengio):
"Add layers until the test error does not improve
anymore.”
How Many Hidden Nodes in a DNN?
Based on a relationship between:
# of input and # of output nodes
Amount of training data available
Complexity of the cost function
The training algorithm
TF playground home page:
http://playground.tensorflow.org
CNNs versus RNNs
CNNs (Convolutional NNs):
Good for image processing
2000: CNNs processed 10-20% of all checks
=> Approximately 60% of all NNs
RNNs (Recurrent NNs):
Good for NLP and audio
Used in hybrid networks
CNNs: Convolution, ReLU, and Max Pooling
CNNs: Convolution Calculations
https://docs.gimp.org/en/plug-in-convmatrix.html
CNNs: Convolution Matrices (examples)
Sharpen:
Blur:
CNNs: Convolution Matrices (examples)
Edge detect:
Emboss:
CNNs: Convolution Matrices (examples)
CNNs: Convolution Matrices (examples)
CNNs: Convolution Matrices (examples)
CNN Filter Terminology
Stride:
how many units you “shift” the filter (1/2/3/etc)
Padding:
makes feature map same size as image
Kernel size:
the dimensions of the filter (1x1, 3x3, 5x5, etc)
CNNs: Max Pooling Example
Components of a CNN (again)
1) Specify the input layer
2) Add a convolution to create feature maps
3) Perform RELU on the feature maps
4) repeat 1) and 2)
5) add a FC (fully connected layer)
6) connect FC to output layer via softmax
CNN pseudocode
 Specify an optimiser
 specify a cost function
 specify a learning rate
 Specify desired metrics (accuracy/precision/etc)
 specify # of batch runs in a training epoch
 For each epoch:
 For each batch:
 Extract the batch data
 Run the optimiser + cross-entropy operations
 Add to the average cost
 Calculate the current test accuracy
 Print out some results
 Calculate the final test accuracy and print
CNN in Python/Keras (fragment)
 from keras.models import Sequential
 from keras.layers.core import Dense, Dropout, Flatten, Activation
 from keras.layers.convolutional import Conv2D, MaxPooling2D
 from keras.optimizers import Adadelta
 input_shape = (3, 32, 32)
 nb_classes = 10
 model = Sequential()
 model.add(Conv2D(32, (3, 3), padding='same’,
input_shape=input_shape))
 model.add(Activation('relu'))
 model.add(Conv2D(32, (3, 3)))
 model.add(Activation('relu'))
 model.add(MaxPooling2D(pool_size=(2, 2)))
 model.add(Dropout(0.25))
GANs: Generative Adversarial Networks
GANs: Generative Adversarial Networks
Make imperceptible changes to images
Can consistently defeat all NNs
Can have extremely high error rate
Some images create optical illusions
https://www.quora.com/What-are-the-pros-and-cons-
of-using-generative-adversarial-networks-a-type-of-
neural-network
GANs: Generative Adversarial Networks
Create your own GANs:
https://www.oreilly.com/learning/generative-adversarial-networks-for-
beginners
https://github.com/jonbruner/generative-adversarial-networks
GANs from MNIST:
http://edwardlib.org/tutorials/gan
GANs and Capsule networks?
What is TensorFlow 2?
 Support for Python, Java, C++
 Desktop, Server, Mobile, Web
 CPU/GPU/TPU support
 Visualization via TensorBoard
 Can be embedded in Python scripts
 Installation: pip install tensorflow
 Ex: pip install tensorflow==2.0.0-beta1
TensorFlow Use Cases (Generic)
Image recognition
Computer vision
Voice/sound recognition
Time series analysis
Language detection
Language translation
Text-based processing
Handwriting Recognition
Major Changes in TF 2
 TF 2 Eager Execution: default mode
 @tf.function decorator: instead of tf.Session()
 AutoGraph: graph generated in @tf.function()
 Generators: used with tf.data.Dataset (TF 2)
 Differentiation: calculates gradients
 tf.GradientTape: automatic differentiation
Removed from TensorFlow 2
 tf.Session()
 tf.placeholder()
 tf.global_initializer()
 feed_dict
 Variable scopes
 tf.contrib code
 tf.flags and tf.contrib
 Global variables
 => TF 1.x functions moved to compat.v1
Deep Learning and Art/”Stuff”
“Convolutional Blending” images:
=> 19-layer Convolutional Neural Network
www.deepart.io
Bots created their own language:
https://www.recode.net/2017/3/23/14962182/ai-learning-
language-open-ai-research
https://www.fastcodesign.com/90124942/this-google-
engineer-taught-an-algorithm-to-make-train-footage-
and-its-hypnotic
Upcoming Classes at UCSC
About Me: Recent Books
 1) Angular and Machine Learning (2020)
 2) Angular and Deep Learning (2020)
 3) AI/ML/DL: Concepts and Code (2020)
 4) Shell Programming in Bash (2020)
 5) TensorFlow 2 Pocket Primer (2019)
 6) TensorFlow 1.x Pocket Primer (2019)
 7) Python for TensorFlow (2019)
 8) C Programming Pocket Primer (2019)
 9) RegEx Pocket Primer (2018)
 10) Data Cleaning Pocket Primer (2018)
 11) Angular Pocket Primer (2017)
 12) Android Pocket Primer (2017)
 13) CSS3 Pocket Primer (2016)
About Me: Less Recent Books
 14) SVG Pocket Primer (2016)
 15) Python Pocket Primer (2015)
 16) D3 Pocket Primer (2015)
 17) HTML5 Mobile Pocket Primer (2014)
 18) jQuery, CSS3, and HTML5 (2013)
 19) HTML5 Pocket Primer (2013)
 20) jQuery Pocket Primer (2013)
 21) HTML5 Canvas (2012)
 22) Flash on Android (2011)
 23) Web 2.0 Fundamentals (2010)
 24) MS Silverlight Graphics (2008)
 25) Fundamentals of SVG (2003)
 26) Java Graphics Library (2002)

Introduction to Deep Learning

  • 1.
    Introduction to DeepLearning UCSC Meetup (Santa Clara) Monday 08/12/2019 Oswald Campesato ocampesato@yahoo.com
  • 2.
    High-Level List ofTopics intro to AI/ML/DL/ANNs Hidden layers/Initialization/Neurons per layer Activation function cost function/gradient descent/learning rate Dropout rate What is linear regression what are CNNs
  • 3.
    Topics We Won’tCover RNNs, LSTMs, and VAEs Transformers et al Reinforcement Learning (RL) Deep Reinforcement Learning (DRL) Natural Language Processing (NLP)
  • 4.
  • 5.
    The Official Startof AI (1956)
  • 6.
    Use Cases forDeep Learning computer vision speech recognition image processing bioinformatics social network filtering drug design Customer relationship management Recommendation systems Bioinformatics Mobile Advertising Many others
  • 7.
    What is aDeep Neural Network?  input layer, multiple hidden layers, and output layer  nonlinear processing via activation functions  perform transformation and feature extraction  gradient descent algorithm with back propagation  each layer receives the output from previous layer  results are comparable/superior to human experts
  • 8.
    Types of DeepLearning  Supervised learning (you know the answer)  unsupervised learning (you don’t know the answer)  Semi-supervised learning (mixed dataset)  Reinforcement learning (such as games)  Types of algorithms:  Classifiers (detect images, spam, fraud, etc)  Regression (predict stock price, housing price, etc)  Clustering (unsupervised classifiers)
  • 9.
    Deep Learning Architectures UnsupervisedPre-trained Networks (UPNs) Convolutional Neural Networks (CNNs) Recurrent Neural Networks (LSTMs) The most popular architectures: CNNs (and capsule networks) for images LSTMs for NLP and audio
  • 10.
    Neural Network: 3Hidden Layers
  • 11.
    NN: 2 HiddenLayers (Regression)
  • 12.
  • 13.
  • 14.
    What is LinearRegression One of the simplest models in ML Fits a line (y = m*x + b) to data in 2D Finds best line by minimizing MSE: m: = minimize sum of squared values b also has a closed form solution
  • 15.
  • 16.
  • 17.
  • 18.
    Linear Regression: example#1 One feature (independent variable): X = number of square feet Predicted value (dependent variable): Y = cost of a house A very “coarse grained” model We can devise a much better model
  • 19.
    Linear Regression: example#2 Multiple features: X1 = # of square feet X2 = # of bedrooms X3 = # of bathrooms (dependency?) X4 = age of house X5 = cost of nearby houses X6 = corner lot (or not): Boolean a much better model (6 features)
  • 20.
    Linear Multivariate Analysis Generalform of multivariate equation: Y = w1*x1 + w2*x2 + . . . + wn*xn + b w1, w2, . . . , wn are numeric values x1, x2, . . . , xn are variables (features) Properties of variables: Can be independent (Naïve Bayes) weak/strong dependencies can exist
  • 21.
    Neural Network with3 Hidden Layers
  • 22.
    Neural Networks (general) Multiplehidden layers: Layer composition is your decision Activation functions: sigmoid, tanh, RELU https://en.wikipedia.org/wiki/Activation_function Back propagation (1980s) https://en.wikipedia.org/wiki/Backpropagation => Initial weights: small random numbers
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Activation Functions inPython import numpy as np ... # Python sigmoid example: z = 1/(1 + np.exp(-np.dot(W, x))) ... # Python tanh example: z = np.tanh(np.dot(W,x)); # Python ReLU example: z = np.maximum(0, np.dot(W, x))
  • 31.
    What’s the “Best”Activation Function? Initially: sigmoid was popular Then: tanh became popular Now: RELU is preferred (better results) Softmax: for FC (fully connected) layers NB: sigmoid and tanh are used in LSTMs
  • 32.
    Types of Cost/ErrorFunctions MSE (mean-squared error) Cross-entropy others
  • 33.
  • 34.
  • 35.
  • 36.
    How to Selecta Cost Function mean-squared error: for a regression problem binary cross-entropy (or mse): for a two-class classification problem categorical cross-entropy: for a many-class classification problem
  • 37.
  • 38.
    Setting up Data& the Model standardize the data: Subtract the ‘mean’ and divide by stddev Initial weight values for NNs: random(0,1) or N(0,1) or N(0/(1/n)) More details: http://cs231n.github.io/neural-networks-2/#losses
  • 39.
    Hyper Parameters (examples) #of hidden layers in a neural network the learning rate (in many models) the dropout rate # of leaves or depth of a tree # of latent factors in a matrix factorization # of clusters in a k-means clustering
  • 40.
    Hyper Parameter: dropoutrate "dropout" refers to dropping out units (both hidden and visible) in a neural network a regularization technique for reducing overfitting in neural networks prevents complex co-adaptations on training data a very efficient way of performing model averaging with neural networks
  • 41.
    How Many Layersin a DNN? Algorithm #1 (from Geoffrey Hinton): 1) add layers until you start overfitting your training set 2) now add dropout or some another regularization method Algorithm #2 (Yoshua Bengio): "Add layers until the test error does not improve anymore.”
  • 42.
    How Many HiddenNodes in a DNN? Based on a relationship between: # of input and # of output nodes Amount of training data available Complexity of the cost function The training algorithm TF playground home page: http://playground.tensorflow.org
  • 43.
    CNNs versus RNNs CNNs(Convolutional NNs): Good for image processing 2000: CNNs processed 10-20% of all checks => Approximately 60% of all NNs RNNs (Recurrent NNs): Good for NLP and audio Used in hybrid networks
  • 44.
    CNNs: Convolution, ReLU,and Max Pooling
  • 45.
  • 46.
    CNNs: Convolution Matrices(examples) Sharpen: Blur:
  • 47.
    CNNs: Convolution Matrices(examples) Edge detect: Emboss:
  • 48.
  • 49.
  • 50.
  • 51.
    CNN Filter Terminology Stride: howmany units you “shift” the filter (1/2/3/etc) Padding: makes feature map same size as image Kernel size: the dimensions of the filter (1x1, 3x3, 5x5, etc)
  • 52.
  • 53.
    Components of aCNN (again) 1) Specify the input layer 2) Add a convolution to create feature maps 3) Perform RELU on the feature maps 4) repeat 1) and 2) 5) add a FC (fully connected layer) 6) connect FC to output layer via softmax
  • 54.
    CNN pseudocode  Specifyan optimiser  specify a cost function  specify a learning rate  Specify desired metrics (accuracy/precision/etc)  specify # of batch runs in a training epoch  For each epoch:  For each batch:  Extract the batch data  Run the optimiser + cross-entropy operations  Add to the average cost  Calculate the current test accuracy  Print out some results  Calculate the final test accuracy and print
  • 55.
    CNN in Python/Keras(fragment)  from keras.models import Sequential  from keras.layers.core import Dense, Dropout, Flatten, Activation  from keras.layers.convolutional import Conv2D, MaxPooling2D  from keras.optimizers import Adadelta  input_shape = (3, 32, 32)  nb_classes = 10  model = Sequential()  model.add(Conv2D(32, (3, 3), padding='same’, input_shape=input_shape))  model.add(Activation('relu'))  model.add(Conv2D(32, (3, 3)))  model.add(Activation('relu'))  model.add(MaxPooling2D(pool_size=(2, 2)))  model.add(Dropout(0.25))
  • 56.
  • 57.
    GANs: Generative AdversarialNetworks Make imperceptible changes to images Can consistently defeat all NNs Can have extremely high error rate Some images create optical illusions https://www.quora.com/What-are-the-pros-and-cons- of-using-generative-adversarial-networks-a-type-of- neural-network
  • 58.
    GANs: Generative AdversarialNetworks Create your own GANs: https://www.oreilly.com/learning/generative-adversarial-networks-for- beginners https://github.com/jonbruner/generative-adversarial-networks GANs from MNIST: http://edwardlib.org/tutorials/gan GANs and Capsule networks?
  • 59.
    What is TensorFlow2?  Support for Python, Java, C++  Desktop, Server, Mobile, Web  CPU/GPU/TPU support  Visualization via TensorBoard  Can be embedded in Python scripts  Installation: pip install tensorflow  Ex: pip install tensorflow==2.0.0-beta1
  • 60.
    TensorFlow Use Cases(Generic) Image recognition Computer vision Voice/sound recognition Time series analysis Language detection Language translation Text-based processing Handwriting Recognition
  • 61.
    Major Changes inTF 2  TF 2 Eager Execution: default mode  @tf.function decorator: instead of tf.Session()  AutoGraph: graph generated in @tf.function()  Generators: used with tf.data.Dataset (TF 2)  Differentiation: calculates gradients  tf.GradientTape: automatic differentiation
  • 62.
    Removed from TensorFlow2  tf.Session()  tf.placeholder()  tf.global_initializer()  feed_dict  Variable scopes  tf.contrib code  tf.flags and tf.contrib  Global variables  => TF 1.x functions moved to compat.v1
  • 63.
    Deep Learning andArt/”Stuff” “Convolutional Blending” images: => 19-layer Convolutional Neural Network www.deepart.io Bots created their own language: https://www.recode.net/2017/3/23/14962182/ai-learning- language-open-ai-research https://www.fastcodesign.com/90124942/this-google- engineer-taught-an-algorithm-to-make-train-footage- and-its-hypnotic
  • 64.
  • 65.
    About Me: RecentBooks  1) Angular and Machine Learning (2020)  2) Angular and Deep Learning (2020)  3) AI/ML/DL: Concepts and Code (2020)  4) Shell Programming in Bash (2020)  5) TensorFlow 2 Pocket Primer (2019)  6) TensorFlow 1.x Pocket Primer (2019)  7) Python for TensorFlow (2019)  8) C Programming Pocket Primer (2019)  9) RegEx Pocket Primer (2018)  10) Data Cleaning Pocket Primer (2018)  11) Angular Pocket Primer (2017)  12) Android Pocket Primer (2017)  13) CSS3 Pocket Primer (2016)
  • 66.
    About Me: LessRecent Books  14) SVG Pocket Primer (2016)  15) Python Pocket Primer (2015)  16) D3 Pocket Primer (2015)  17) HTML5 Mobile Pocket Primer (2014)  18) jQuery, CSS3, and HTML5 (2013)  19) HTML5 Pocket Primer (2013)  20) jQuery Pocket Primer (2013)  21) HTML5 Canvas (2012)  22) Flash on Android (2011)  23) Web 2.0 Fundamentals (2010)  24) MS Silverlight Graphics (2008)  25) Fundamentals of SVG (2003)  26) Java Graphics Library (2002)