- 1. Introduction to Deep Learning UCSC Meetup (Santa Clara) Monday 08/12/2019 Oswald Campesato ocampesato@yahoo.com
- 2. High-Level List of Topics intro to AI/ML/DL/ANNs Hidden layers/Initialization/Neurons per layer Activation function cost function/gradient descent/learning rate Dropout rate What is linear regression what are CNNs
- 3. Topics We Won’t Cover RNNs, LSTMs, and VAEs Transformers et al Reinforcement Learning (RL) Deep Reinforcement Learning (DRL) Natural Language Processing (NLP)
- 5. The Official Start of AI (1956)
- 6. Use Cases for Deep Learning computer vision speech recognition image processing bioinformatics social network filtering drug design Customer relationship management Recommendation systems Bioinformatics Mobile Advertising Many others
- 7. What is a Deep Neural Network? input layer, multiple hidden layers, and output layer nonlinear processing via activation functions perform transformation and feature extraction gradient descent algorithm with back propagation each layer receives the output from previous layer results are comparable/superior to human experts
- 8. Types of Deep Learning Supervised learning (you know the answer) unsupervised learning (you don’t know the answer) Semi-supervised learning (mixed dataset) Reinforcement learning (such as games) Types of algorithms: Classifiers (detect images, spam, fraud, etc) Regression (predict stock price, housing price, etc) Clustering (unsupervised classifiers)
- 9. Deep Learning Architectures Unsupervised Pre-trained Networks (UPNs) Convolutional Neural Networks (CNNs) Recurrent Neural Networks (LSTMs) The most popular architectures: CNNs (and capsule networks) for images LSTMs for NLP and audio
- 10. Neural Network: 3 Hidden Layers
- 11. NN: 2 Hidden Layers (Regression)
- 13. Classification and Deep Learning
- 14. What is Linear Regression One of the simplest models in ML Fits a line (y = m*x + b) to data in 2D Finds best line by minimizing MSE: m: = minimize sum of squared values b also has a closed form solution
- 15. Linear Regression in 2D: example
- 16. Linear Regression in 2D: example
- 17. Sample Cost Function #1 (MSE)
- 18. Linear Regression: example #1 One feature (independent variable): X = number of square feet Predicted value (dependent variable): Y = cost of a house A very “coarse grained” model We can devise a much better model
- 19. Linear Regression: example #2 Multiple features: X1 = # of square feet X2 = # of bedrooms X3 = # of bathrooms (dependency?) X4 = age of house X5 = cost of nearby houses X6 = corner lot (or not): Boolean a much better model (6 features)
- 20. Linear Multivariate Analysis General form of multivariate equation: Y = w1*x1 + w2*x2 + . . . + wn*xn + b w1, w2, . . . , wn are numeric values x1, x2, . . . , xn are variables (features) Properties of variables: Can be independent (Naïve Bayes) weak/strong dependencies can exist
- 21. Neural Network with 3 Hidden Layers
- 22. Neural Networks (general) Multiple hidden layers: Layer composition is your decision Activation functions: sigmoid, tanh, RELU https://en.wikipedia.org/wiki/Activation_function Back propagation (1980s) https://en.wikipedia.org/wiki/Backpropagation => Initial weights: small random numbers
- 23. Euler’s Function (e: 2.71828. . .)
- 24. The sigmoid Activation Function
- 25. The tanh Activation Function
- 26. The ReLU Activation Function
- 27. The softmax Activation Function
- 30. Activation Functions in Python import numpy as np ... # Python sigmoid example: z = 1/(1 + np.exp(-np.dot(W, x))) ... # Python tanh example: z = np.tanh(np.dot(W,x)); # Python ReLU example: z = np.maximum(0, np.dot(W, x))
- 31. What’s the “Best” Activation Function? Initially: sigmoid was popular Then: tanh became popular Now: RELU is preferred (better results) Softmax: for FC (fully connected) layers NB: sigmoid and tanh are used in LSTMs
- 32. Types of Cost/Error Functions MSE (mean-squared error) Cross-entropy others
- 33. Sample Cost Function #1 (MSE)
- 34. Sample Cost Function #2
- 35. Sample Cost Function #3
- 36. How to Select a Cost Function mean-squared error: for a regression problem binary cross-entropy (or mse): for a two-class classification problem categorical cross-entropy: for a many-class classification problem
- 38. Setting up Data & the Model standardize the data: Subtract the ‘mean’ and divide by stddev Initial weight values for NNs: random(0,1) or N(0,1) or N(0/(1/n)) More details: http://cs231n.github.io/neural-networks-2/#losses
- 39. Hyper Parameters (examples) # of hidden layers in a neural network the learning rate (in many models) the dropout rate # of leaves or depth of a tree # of latent factors in a matrix factorization # of clusters in a k-means clustering
- 40. Hyper Parameter: dropout rate "dropout" refers to dropping out units (both hidden and visible) in a neural network a regularization technique for reducing overfitting in neural networks prevents complex co-adaptations on training data a very efficient way of performing model averaging with neural networks
- 41. How Many Layers in a DNN? Algorithm #1 (from Geoffrey Hinton): 1) add layers until you start overfitting your training set 2) now add dropout or some another regularization method Algorithm #2 (Yoshua Bengio): "Add layers until the test error does not improve anymore.”
- 42. How Many Hidden Nodes in a DNN? Based on a relationship between: # of input and # of output nodes Amount of training data available Complexity of the cost function The training algorithm TF playground home page: http://playground.tensorflow.org
- 43. CNNs versus RNNs CNNs (Convolutional NNs): Good for image processing 2000: CNNs processed 10-20% of all checks => Approximately 60% of all NNs RNNs (Recurrent NNs): Good for NLP and audio Used in hybrid networks
- 44. CNNs: Convolution, ReLU, and Max Pooling
- 46. CNNs: Convolution Matrices (examples) Sharpen: Blur:
- 47. CNNs: Convolution Matrices (examples) Edge detect: Emboss:
- 48. CNNs: Convolution Matrices (examples)
- 49. CNNs: Convolution Matrices (examples)
- 50. CNNs: Convolution Matrices (examples)
- 51. CNN Filter Terminology Stride: how many units you “shift” the filter (1/2/3/etc) Padding: makes feature map same size as image Kernel size: the dimensions of the filter (1x1, 3x3, 5x5, etc)
- 52. CNNs: Max Pooling Example
- 53. Components of a CNN (again) 1) Specify the input layer 2) Add a convolution to create feature maps 3) Perform RELU on the feature maps 4) repeat 1) and 2) 5) add a FC (fully connected layer) 6) connect FC to output layer via softmax
- 54. CNN pseudocode Specify an optimiser specify a cost function specify a learning rate Specify desired metrics (accuracy/precision/etc) specify # of batch runs in a training epoch For each epoch: For each batch: Extract the batch data Run the optimiser + cross-entropy operations Add to the average cost Calculate the current test accuracy Print out some results Calculate the final test accuracy and print
- 55. CNN in Python/Keras (fragment) from keras.models import Sequential from keras.layers.core import Dense, Dropout, Flatten, Activation from keras.layers.convolutional import Conv2D, MaxPooling2D from keras.optimizers import Adadelta input_shape = (3, 32, 32) nb_classes = 10 model = Sequential() model.add(Conv2D(32, (3, 3), padding='same’, input_shape=input_shape)) model.add(Activation('relu')) model.add(Conv2D(32, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))
- 56. GANs: Generative Adversarial Networks
- 57. GANs: Generative Adversarial Networks Make imperceptible changes to images Can consistently defeat all NNs Can have extremely high error rate Some images create optical illusions https://www.quora.com/What-are-the-pros-and-cons- of-using-generative-adversarial-networks-a-type-of- neural-network
- 58. GANs: Generative Adversarial Networks Create your own GANs: https://www.oreilly.com/learning/generative-adversarial-networks-for- beginners https://github.com/jonbruner/generative-adversarial-networks GANs from MNIST: http://edwardlib.org/tutorials/gan GANs and Capsule networks?
- 59. What is TensorFlow 2? Support for Python, Java, C++ Desktop, Server, Mobile, Web CPU/GPU/TPU support Visualization via TensorBoard Can be embedded in Python scripts Installation: pip install tensorflow Ex: pip install tensorflow==2.0.0-beta1
- 60. TensorFlow Use Cases (Generic) Image recognition Computer vision Voice/sound recognition Time series analysis Language detection Language translation Text-based processing Handwriting Recognition
- 61. Major Changes in TF 2 TF 2 Eager Execution: default mode @tf.function decorator: instead of tf.Session() AutoGraph: graph generated in @tf.function() Generators: used with tf.data.Dataset (TF 2) Differentiation: calculates gradients tf.GradientTape: automatic differentiation
- 62. Removed from TensorFlow 2 tf.Session() tf.placeholder() tf.global_initializer() feed_dict Variable scopes tf.contrib code tf.flags and tf.contrib Global variables => TF 1.x functions moved to compat.v1
- 63. Deep Learning and Art/”Stuff” “Convolutional Blending” images: => 19-layer Convolutional Neural Network www.deepart.io Bots created their own language: https://www.recode.net/2017/3/23/14962182/ai-learning- language-open-ai-research https://www.fastcodesign.com/90124942/this-google- engineer-taught-an-algorithm-to-make-train-footage- and-its-hypnotic
- 64. Upcoming Classes at UCSC
- 65. About Me: Recent Books 1) Angular and Machine Learning (2020) 2) Angular and Deep Learning (2020) 3) AI/ML/DL: Concepts and Code (2020) 4) Shell Programming in Bash (2020) 5) TensorFlow 2 Pocket Primer (2019) 6) TensorFlow 1.x Pocket Primer (2019) 7) Python for TensorFlow (2019) 8) C Programming Pocket Primer (2019) 9) RegEx Pocket Primer (2018) 10) Data Cleaning Pocket Primer (2018) 11) Angular Pocket Primer (2017) 12) Android Pocket Primer (2017) 13) CSS3 Pocket Primer (2016)
- 66. About Me: Less Recent Books 14) SVG Pocket Primer (2016) 15) Python Pocket Primer (2015) 16) D3 Pocket Primer (2015) 17) HTML5 Mobile Pocket Primer (2014) 18) jQuery, CSS3, and HTML5 (2013) 19) HTML5 Pocket Primer (2013) 20) jQuery Pocket Primer (2013) 21) HTML5 Canvas (2012) 22) Flash on Android (2011) 23) Web 2.0 Fundamentals (2010) 24) MS Silverlight Graphics (2008) 25) Fundamentals of SVG (2003) 26) Java Graphics Library (2002)