Deep Learning and Android
Android Meetup SF 09/27/2017
Google LaunchPad SF
Oswald Campesato
ocampesato@yahoo.com
Overview
intro to AI/ML/DL
linear regression
activation functions
cost functions
gradient descent
back propagation
hyper-parameters
what are CNNs
Android and DL
The Data/AI Landscape
Gartner 2016: Where is Deep Learning?
Gartner 2017: Deep Learning (YES!)
The Official Start of AI (1956)
Neural Network with 3 Hidden Layers
AI/ML/DL: How They Differ
Traditional AI (20th century):
based on collections of rules
Led to expert systems in the 1980s
The era of LISP and Prolog
AI/ML/DL: How They Differ
Machine Learning:
Started in the 1950s (approximate)
Alan Turing and “learning machines”
Data-driven (not rule-based)
Many types of algorithms
Involves optimization
AI/ML/DL: How They Differ
Deep Learning:
Started in the 1950s (approximate)
The “perceptron” (basis of NNs)
Data-driven (not rule-based)
large (even massive) data sets
Involves neural networks (CNNs: ~1970s)
Lots of heuristics
Heavily based on empirical results
The Rise of Deep Learning
Massive and inexpensive computing power
Huge volumes of data/Powerful algorithms
The “big bang” in 2009:
”deep-learning neural networks and NVidia GPUs"
Google Brain used NVidia GPUs (2009)
AI/ML/DL: Commonality
All of them involve a model
A model represents a system
Goal: a good predictive model
The model is based on:
Many rules (for AI)
data and algorithms (for ML)
large sets of data (for DL)
A Basic Model in Machine Learning
Let’s perform the following steps:
1) Start with a simple model (2 variables)
2) Generalize that model (n variables)
3) See how it might apply to a NN
Linear Regression
One of the simplest models in ML
Fits a line (y = m*x + b) to data in 2D
Finds best line by minimizing MSE:
m = average of x values (“mean”)
b also has a closed form solution
Linear Regression in 2D: example
Sample Cost Function #1 (MSE)
Linear Regression: example #1
One feature (independent variable):
X = number of square feet
Predicted value (dependent variable):
Y = cost of a house
A very “coarse grained” model
We can devise a much better model
Linear Regression: example #2
Multiple features:
X1 = # of square feet
X2 = # of bedrooms
X3 = # of bathrooms (dependency?)
X4 = age of house
X5 = cost of nearby houses
X6 = corner lot (or not): Boolean
a much better model (6 features)
Linear Multivariate Analysis
General form of multivariate equation:
Y = w1*x1 + w2*x2 + . . . + wn*xn + b
w1, w2, . . . , wn are numeric values
x1, x2, . . . , xn are variables (features)
Properties of variables:
Can be independent (Naïve Bayes)
weak/strong dependencies can exist
Neural Network with 3 Hidden Layers
Neural Networks: equations
Node “values” in first hidden layer:
N1 = w11*x1+w21*x2+…+wn1*xn
N2 = w12*x1+w22*x2+…+wn2*xn
N3 = w13*x1+w23*x2+…+wn3*xn
. . .
Nn = w1n*x1+w2n*x2+…+wnn*xn
Similar equations for other pairs of layers
Neural Networks: Matrices
From inputs to first hidden layer:
Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix)
From first to second hidden layers:
Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix)
From second to third hidden layers:
Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)
 Apply an “activation function” to y values
Neural Networks (general)
Multiple hidden layers:
Layer composition is your decision
Activation functions: sigmoid, tanh, RELU
https://en.wikipedia.org/wiki/Activation_function
Back propagation (1980s)
https://en.wikipedia.org/wiki/Backpropagation
=> Initial weights: small random numbers
Euler’s Function
The sigmoid Activation Function
The tanh Activation Function
The ReLU Activation Function
The softmax Activation Function
Activation Functions in Python
import numpy as np
...
# Python sigmoid example:
z = 1/(1 + np.exp(-np.dot(W, x)))
...
# Python tanh example:
z = np.tanh(np.dot(W,x));
# Python ReLU example:
z = np.maximum(0, np.dot(W, x))
What’s the “Best” Activation Function?
Initially: sigmoid was popular
Then: tanh became popular
Now: RELU is preferred (better results)
Softmax: for FC (fully connected) layers
NB: sigmoid and tanh are used in LSTMs
Even More Activation Functions!
https://stats.stackexchange.com/questions/11525
8/comprehensive-list-of-activation-functions-in-
neural-networks-with-pros-cons
https://medium.com/towards-data-
science/activation-functions-and-its-types-which-
is-better-a9a5310cc8f
https://medium.com/towards-data-science/multi-
layer-neural-networks-with-sigmoid-function-
deep-learning-for-rookies-2-bf464f09eb7f
Sample Cost Function #1 (MSE)
Sample Cost Function #2
Sample Cost Function #3
How to Select a Cost Function
1) Depends on the learning type:
=> supervised/unsupervised/RL
2) Depends on the activation function
3) Other factors
Example:
cross-entropy cost function for supervised
learning on multiclass classification
GD versus SGD
SGD (Stochastic Gradient Descent):
+ involves a SUBSET of the dataset
+ aka Minibatch Stochastic Gradient Descent
GD (Gradient Descent):
+ involves the ENTIRE dataset
More details:
http://cs229.stanford.edu/notes/cs229-notes1.pdf
Setting up Data & the Model
Normalize the data (DL only):
Subtract the ‘mean’ and divide by stddev
[Central Limit Theorem]
Initial weight values for NNs:
Random numbers between -1 and 1
More details:
http://cs231n.github.io/neural-networks-2/#losses
What are Hyper Parameters?
higher level concepts about the model such as
complexity, or capacity to learn
Cannot be learned directly from the data in the
standard model training process
must be predefined
Hyper Parameters (examples)
# of hidden layers in a neural network
the learning rate (in many models)
the dropout rate
# of leaves or depth of a tree
# of latent factors in a matrix factorization
# of clusters in a k-means clustering
Hyper Parameter: dropout rate
"dropout" refers to dropping out units (both hidden
and visible) in a neural network
a regularization technique for reducing overfitting in
neural networks
prevents complex co-adaptations on training data
a very efficient way of performing model averaging
with neural networks
How Many Layers in a DNN?
Algorithm #1 (from Geoffrey Hinton):
1) add layers until you start overfitting your
training set
2) now add dropout or some another
regularization method
Algorithm #2 (Yoshua Bengio):
"Add layers until the test error does not improve
anymore.”
How Many Hidden Nodes in a DNN?
Based on a relationship between:
# of input and # of output nodes
Amount of training data available
Complexity of the cost function
The training algorithm
CNNs versus RNNs
CNNs (Convolutional NNs):
Good for image processing
2000: CNNs processed 10-20% of all checks
=> Approximately 60% of all NNs
RNNs (Recurrent NNs):
Good for NLP and audio
CNNs: Convolution Calculations
https://docs.gimp.org/en/plug-in-
convmatrix.html
CNNs: Convolution Matrices (examples)
Sharpen:
Blur:
CNNs: Convolution Matrices (examples)
Edge detect:
Emboss:
CNNs: Sample Convolutions/Filters
CNNs: Max Pooling Example
CNNs: convolution-pooling (1)
CNNs: convolution and pooling (2)
Sample CNN in Keras (fragment)
 from keras.models import Sequential
 from keras.layers.core import Dense, Dropout, Flatten, Activation
 from keras.layers.convolutional import Conv2D, MaxPooling2D
 from keras.optimizers import Adadelta
 input_shape = (3, 32, 32)
 nb_classes = 10
 model = Sequential()
 model.add(Conv2D(32, (3, 3), padding='same’,
input_shape=input_shape))
 model.add(Activation('relu'))
 model.add(Conv2D(32, (3, 3)))
 model.add(Activation('relu'))
 model.add(MaxPooling2D(pool_size=(2, 2)))
 model.add(Dropout(0.25))
GANs: Generative Adversarial Networks
GANs: Generative Adversarial Networks
Make imperceptible changes to images
Can consistently defeat all NNs
Can have extremely high error rate
Some images create optical illusions
https://www.quora.com/What-are-the-pros-and-cons-
of-using-generative-adversarial-networks-a-type-of-
neural-network
GANs: Generative Adversarial Networks
Create your own GANs:
https://www.oreilly.com/learning/generative-adversarial-networks-for-
beginners
https://github.com/jonbruner/generative-adversarial-networks
GANs from MNIST:
http://edwardlib.org/tutorials/gan
GANs: Generative Adversarial Networks
GANs, Graffiti, and Art:
https://thenewstack.io/camouflaged-graffiti-road-signs-can-fool-
machine-learning-models/
GANs and audio:
https://www.technologyreview.com/s/608381/ai-shouldnt-believe-
everything-it-hears
Houdini algorithm: https://arxiv.org/abs/1707.05373
Deep Learning Playground
TF playground home page:
http://playground.tensorflow.org
Demo #1:
https://github.com/tadashi-aikawa/typescript-
playground
Converts playground to TypeScript
Android and Deep Learning (1)
Option #1: generate the model outside of Android
(use a Python script)
Option #2: use a pre-trained model
Option #3: use an existing apk with DL
Option #4: use TensorFlow Lite APIs (when?)
Android and Deep Learning (2)
Generate the model outside of Android
Perform the following steps:
Create an app in Android Studio
generate a (“.pb”) model (via Python script)
Copy the model into the assets folder
Compile and deploy to a device
Android and Deep Learning (3)
Android app with pre-configured model
Download/uncompress this sample:
http://nilhcem.com/android/custom-tensorflow-classifier
Open the project in Android Studio
Compile and deploy to an Android device
Android and Deep Learning (4)
TensorFlow Lite: Google I/O (release date?)
A subset of the TensorFlow APIs (which ones?)
Provides “regular” TensorFlow APIs for apps
Does not require Python scripts (?)
Deep Learning and Art
“Convolutional Blending” images:
=> 19-layer Convolutional Neural Network
www.deepart.io
Bots created their own language:
https://www.recode.net/2017/3/23/14962182/ai-learning-
language-open-ai-research
https://www.fastcodesign.com/90124942/this-google-
engineer-taught-an-algorithm-to-make-train-footage-
and-its-hypnotic
What Do I Learn Next?
 PGMs (Probabilistic Graphical Models)
 MC (Markov Chains)
 MCMC (Markov Chains Monte Carlo)
 HMMs (Hidden Markov Models)
 RL (Reinforcement Learning)
 Hopfield Nets
 Neural Turing Machines
 Autoencoders
 Hypernetworks
 Pixel Recurrent Neural Networks
 Bayesian Neural Networks
 SVMs
Some Recent Books
1) HTML5 Canvas and CSS3 Graphics (2013)
2) jQuery, CSS3, and HTML5 for Mobile (2013)
3) HTML5 Pocket Primer (2013)
4) jQuery Pocket Primer (2013)
5) HTML5 Mobile Pocket Primer (2014)
6) D3 Pocket Primer (2015)
7) Python Pocket Primer (2015)
8) SVG Pocket Primer (2016)
9) CSS3 Pocket Primer (2016)
10) Android Pocket Primer (2017)
11) Angular Pocket Primer (2017)
FREE Kotlin Online Course
About Me
I provide training for the following:
=> Deep Learning/TensorFlow/Keras
=> Android
=> Angular 4

Android and Deep Learning

  • 1.
    Deep Learning andAndroid Android Meetup SF 09/27/2017 Google LaunchPad SF Oswald Campesato ocampesato@yahoo.com
  • 2.
    Overview intro to AI/ML/DL linearregression activation functions cost functions gradient descent back propagation hyper-parameters what are CNNs Android and DL
  • 3.
  • 4.
    Gartner 2016: Whereis Deep Learning?
  • 5.
    Gartner 2017: DeepLearning (YES!)
  • 6.
    The Official Startof AI (1956)
  • 7.
    Neural Network with3 Hidden Layers
  • 8.
    AI/ML/DL: How TheyDiffer Traditional AI (20th century): based on collections of rules Led to expert systems in the 1980s The era of LISP and Prolog
  • 9.
    AI/ML/DL: How TheyDiffer Machine Learning: Started in the 1950s (approximate) Alan Turing and “learning machines” Data-driven (not rule-based) Many types of algorithms Involves optimization
  • 10.
    AI/ML/DL: How TheyDiffer Deep Learning: Started in the 1950s (approximate) The “perceptron” (basis of NNs) Data-driven (not rule-based) large (even massive) data sets Involves neural networks (CNNs: ~1970s) Lots of heuristics Heavily based on empirical results
  • 11.
    The Rise ofDeep Learning Massive and inexpensive computing power Huge volumes of data/Powerful algorithms The “big bang” in 2009: ”deep-learning neural networks and NVidia GPUs" Google Brain used NVidia GPUs (2009)
  • 12.
    AI/ML/DL: Commonality All ofthem involve a model A model represents a system Goal: a good predictive model The model is based on: Many rules (for AI) data and algorithms (for ML) large sets of data (for DL)
  • 13.
    A Basic Modelin Machine Learning Let’s perform the following steps: 1) Start with a simple model (2 variables) 2) Generalize that model (n variables) 3) See how it might apply to a NN
  • 14.
    Linear Regression One ofthe simplest models in ML Fits a line (y = m*x + b) to data in 2D Finds best line by minimizing MSE: m = average of x values (“mean”) b also has a closed form solution
  • 15.
  • 16.
  • 17.
    Linear Regression: example#1 One feature (independent variable): X = number of square feet Predicted value (dependent variable): Y = cost of a house A very “coarse grained” model We can devise a much better model
  • 18.
    Linear Regression: example#2 Multiple features: X1 = # of square feet X2 = # of bedrooms X3 = # of bathrooms (dependency?) X4 = age of house X5 = cost of nearby houses X6 = corner lot (or not): Boolean a much better model (6 features)
  • 19.
    Linear Multivariate Analysis Generalform of multivariate equation: Y = w1*x1 + w2*x2 + . . . + wn*xn + b w1, w2, . . . , wn are numeric values x1, x2, . . . , xn are variables (features) Properties of variables: Can be independent (Naïve Bayes) weak/strong dependencies can exist
  • 20.
    Neural Network with3 Hidden Layers
  • 21.
    Neural Networks: equations Node“values” in first hidden layer: N1 = w11*x1+w21*x2+…+wn1*xn N2 = w12*x1+w22*x2+…+wn2*xn N3 = w13*x1+w23*x2+…+wn3*xn . . . Nn = w1n*x1+w2n*x2+…+wnn*xn Similar equations for other pairs of layers
  • 22.
    Neural Networks: Matrices Frominputs to first hidden layer: Y1 = W1*X + B1 (X/Y1/B1: vectors; W1: matrix) From first to second hidden layers: Y2 = W2*X + B2 (X/Y2/B2: vectors; W2: matrix) From second to third hidden layers: Y3 = W3*X + B3 (X/Y3/B3: vectors; W3: matrix)  Apply an “activation function” to y values
  • 23.
    Neural Networks (general) Multiplehidden layers: Layer composition is your decision Activation functions: sigmoid, tanh, RELU https://en.wikipedia.org/wiki/Activation_function Back propagation (1980s) https://en.wikipedia.org/wiki/Backpropagation => Initial weights: small random numbers
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    Activation Functions inPython import numpy as np ... # Python sigmoid example: z = 1/(1 + np.exp(-np.dot(W, x))) ... # Python tanh example: z = np.tanh(np.dot(W,x)); # Python ReLU example: z = np.maximum(0, np.dot(W, x))
  • 30.
    What’s the “Best”Activation Function? Initially: sigmoid was popular Then: tanh became popular Now: RELU is preferred (better results) Softmax: for FC (fully connected) layers NB: sigmoid and tanh are used in LSTMs
  • 31.
    Even More ActivationFunctions! https://stats.stackexchange.com/questions/11525 8/comprehensive-list-of-activation-functions-in- neural-networks-with-pros-cons https://medium.com/towards-data- science/activation-functions-and-its-types-which- is-better-a9a5310cc8f https://medium.com/towards-data-science/multi- layer-neural-networks-with-sigmoid-function- deep-learning-for-rookies-2-bf464f09eb7f
  • 32.
  • 33.
  • 34.
  • 35.
    How to Selecta Cost Function 1) Depends on the learning type: => supervised/unsupervised/RL 2) Depends on the activation function 3) Other factors Example: cross-entropy cost function for supervised learning on multiclass classification
  • 36.
    GD versus SGD SGD(Stochastic Gradient Descent): + involves a SUBSET of the dataset + aka Minibatch Stochastic Gradient Descent GD (Gradient Descent): + involves the ENTIRE dataset More details: http://cs229.stanford.edu/notes/cs229-notes1.pdf
  • 37.
    Setting up Data& the Model Normalize the data (DL only): Subtract the ‘mean’ and divide by stddev [Central Limit Theorem] Initial weight values for NNs: Random numbers between -1 and 1 More details: http://cs231n.github.io/neural-networks-2/#losses
  • 38.
    What are HyperParameters? higher level concepts about the model such as complexity, or capacity to learn Cannot be learned directly from the data in the standard model training process must be predefined
  • 39.
    Hyper Parameters (examples) #of hidden layers in a neural network the learning rate (in many models) the dropout rate # of leaves or depth of a tree # of latent factors in a matrix factorization # of clusters in a k-means clustering
  • 40.
    Hyper Parameter: dropoutrate "dropout" refers to dropping out units (both hidden and visible) in a neural network a regularization technique for reducing overfitting in neural networks prevents complex co-adaptations on training data a very efficient way of performing model averaging with neural networks
  • 41.
    How Many Layersin a DNN? Algorithm #1 (from Geoffrey Hinton): 1) add layers until you start overfitting your training set 2) now add dropout or some another regularization method Algorithm #2 (Yoshua Bengio): "Add layers until the test error does not improve anymore.”
  • 42.
    How Many HiddenNodes in a DNN? Based on a relationship between: # of input and # of output nodes Amount of training data available Complexity of the cost function The training algorithm
  • 43.
    CNNs versus RNNs CNNs(Convolutional NNs): Good for image processing 2000: CNNs processed 10-20% of all checks => Approximately 60% of all NNs RNNs (Recurrent NNs): Good for NLP and audio
  • 44.
  • 45.
    CNNs: Convolution Matrices(examples) Sharpen: Blur:
  • 46.
    CNNs: Convolution Matrices(examples) Edge detect: Emboss:
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
    Sample CNN inKeras (fragment)  from keras.models import Sequential  from keras.layers.core import Dense, Dropout, Flatten, Activation  from keras.layers.convolutional import Conv2D, MaxPooling2D  from keras.optimizers import Adadelta  input_shape = (3, 32, 32)  nb_classes = 10  model = Sequential()  model.add(Conv2D(32, (3, 3), padding='same’, input_shape=input_shape))  model.add(Activation('relu'))  model.add(Conv2D(32, (3, 3)))  model.add(Activation('relu'))  model.add(MaxPooling2D(pool_size=(2, 2)))  model.add(Dropout(0.25))
  • 52.
  • 53.
    GANs: Generative AdversarialNetworks Make imperceptible changes to images Can consistently defeat all NNs Can have extremely high error rate Some images create optical illusions https://www.quora.com/What-are-the-pros-and-cons- of-using-generative-adversarial-networks-a-type-of- neural-network
  • 54.
    GANs: Generative AdversarialNetworks Create your own GANs: https://www.oreilly.com/learning/generative-adversarial-networks-for- beginners https://github.com/jonbruner/generative-adversarial-networks GANs from MNIST: http://edwardlib.org/tutorials/gan
  • 55.
    GANs: Generative AdversarialNetworks GANs, Graffiti, and Art: https://thenewstack.io/camouflaged-graffiti-road-signs-can-fool- machine-learning-models/ GANs and audio: https://www.technologyreview.com/s/608381/ai-shouldnt-believe- everything-it-hears Houdini algorithm: https://arxiv.org/abs/1707.05373
  • 56.
    Deep Learning Playground TFplayground home page: http://playground.tensorflow.org Demo #1: https://github.com/tadashi-aikawa/typescript- playground Converts playground to TypeScript
  • 57.
    Android and DeepLearning (1) Option #1: generate the model outside of Android (use a Python script) Option #2: use a pre-trained model Option #3: use an existing apk with DL Option #4: use TensorFlow Lite APIs (when?)
  • 58.
    Android and DeepLearning (2) Generate the model outside of Android Perform the following steps: Create an app in Android Studio generate a (“.pb”) model (via Python script) Copy the model into the assets folder Compile and deploy to a device
  • 59.
    Android and DeepLearning (3) Android app with pre-configured model Download/uncompress this sample: http://nilhcem.com/android/custom-tensorflow-classifier Open the project in Android Studio Compile and deploy to an Android device
  • 60.
    Android and DeepLearning (4) TensorFlow Lite: Google I/O (release date?) A subset of the TensorFlow APIs (which ones?) Provides “regular” TensorFlow APIs for apps Does not require Python scripts (?)
  • 61.
    Deep Learning andArt “Convolutional Blending” images: => 19-layer Convolutional Neural Network www.deepart.io Bots created their own language: https://www.recode.net/2017/3/23/14962182/ai-learning- language-open-ai-research https://www.fastcodesign.com/90124942/this-google- engineer-taught-an-algorithm-to-make-train-footage- and-its-hypnotic
  • 62.
    What Do ILearn Next?  PGMs (Probabilistic Graphical Models)  MC (Markov Chains)  MCMC (Markov Chains Monte Carlo)  HMMs (Hidden Markov Models)  RL (Reinforcement Learning)  Hopfield Nets  Neural Turing Machines  Autoencoders  Hypernetworks  Pixel Recurrent Neural Networks  Bayesian Neural Networks  SVMs
  • 63.
    Some Recent Books 1)HTML5 Canvas and CSS3 Graphics (2013) 2) jQuery, CSS3, and HTML5 for Mobile (2013) 3) HTML5 Pocket Primer (2013) 4) jQuery Pocket Primer (2013) 5) HTML5 Mobile Pocket Primer (2014) 6) D3 Pocket Primer (2015) 7) Python Pocket Primer (2015) 8) SVG Pocket Primer (2016) 9) CSS3 Pocket Primer (2016) 10) Android Pocket Primer (2017) 11) Angular Pocket Primer (2017)
  • 64.
  • 65.
    About Me I providetraining for the following: => Deep Learning/TensorFlow/Keras => Android => Angular 4