Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neural Networks and Deep Learning

2,965 views

Published on

Introduction to Neural Networks, Deep Learning, TensorFlow, and Keras.
For code see https://github.com/asimjalis/tensorflow-quickstart

Published in: Data & Analytics

Neural Networks and Deep Learning

  1. 1. NEURAL NETWORKS AND DEEP LEARNING ASIM JALIS GALVANIZE
  2. 2. INTRO
  3. 3. ASIM JALIS Galvanize/Zipfian, Data Engineering Cloudera, Microso!, Salesforce MS in Computer Science from University of Virginia
  4. 4. GALVANIZE PROGRAMS Program Duration Data Science Immersive 12 weeks Data Engineering Immersive 12 weeks Web Developer Immersive 6 months Galvanize U 1 year
  5. 5. TALK OVERVIEW
  6. 6. WHAT IS THIS TALK ABOUT? Using Neural Networks and Deep Learning To recognize images By the end of the class you will be able to create your own deep learning systems
  7. 7. HOW MANY PEOPLE HERE HAVE USED NEURAL NETWORKS?
  8. 8. HOW MANY PEOPLE HERE HAVE USED MACHINE LEARNING?
  9. 9. HOW MANY PEOPLE HERE HAVE USED PYTHON?
  10. 10. DEEP LEARNING
  11. 11. WHAT IS MACHINE LEARNING Self-driving cars Voice recognition Facial recognition
  12. 12. HISTORY OF DEEP LEARNING
  13. 13. HISTORY OF MACHINE LEARNING Input Features Algorithm Output Machine Human Human Machine Machine Human Machine Machine Machine Machine Machine Machine
  14. 14. FEATURE EXTRACTION Traditionally data scientists to define features Deep learning systems are able to extract features themselves
  15. 15. DEEP LEARNING MILESTONES Years Theme 1980s Backpropagation invented allows multi-layer Neural Networks 2000s SVMs, Random Forests and other classifiers overtook NNs 2010s Deep Learning reignited interest in NN
  16. 16. IMAGENET AlexNet submitted to the ImageNet ILSVRC challenge in 2012 is partly responsible for the renaissance. Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used Deep Learning techniques. They combined this with GPUs, some other techniques. The result was a neural network that could classify images of cats and dogs. It had an error 16% compared to 26% for the runner up.
  17. 17. Ilya Sutskever, Alex Krizhevsky, Geoffrey Hinton
  18. 18. INDEED.COM/SALARY
  19. 19. MACHINE LEARNING
  20. 20. MACHINE LEARNING AND DEEP LEARNING Deep Learning fits inside Machine Learning Deep Learning a Machine Learning technique Share techniques for evaluating and optimizing models
  21. 21. WHAT IS MACHINE LEARNING? Inputs: Vectors or points of high dimensions Outputs: Either binary vectors or continuous vectors Machine Learning finds the relationship between them Uses statistical techniques
  22. 22. SUPERVISED VS UNSUPERVISED Supervised: Data needs to be labeled Unsupervised: Data does not need to be labeled
  23. 23. TECHNIQUES Classification Regression Clustering Recommendations Anomaly detection
  24. 24. CLASSIFICATION EXAMPLE: EMAIL SPAM DETECTION
  25. 25. CLASSIFICATION EXAMPLE: EMAIL SPAM DETECTION Start with large collection of emails, labeled spam/not- spam Convert email text into vectors of 0s and 1s: 0 if a word occurs, 1 if it does not These are called inputs or features Split data set into training set (70%) and test set (30%) Use algorithm like Random Forest to build model Evaluate model by running it on test set and capturing success rate
  26. 26. CLASSIFICATION ALGORITHMS Neural Networks Random Forest Support Vector Machines (SVM) Decision Trees Logistic Regression Naive Bayes
  27. 27. CHOOSING ALGORITHM Evaluate different models on data Look at the relative success rates Use rules of thumb: some algorithms work better on some kinds of data
  28. 28. CLASSIFICATION EXAMPLES Is this tumor benign or cancerous? Is this lead profitable or not? Who will win the presidential elections?
  29. 29. CLASSIFICATION: POP QUIZ Is classification supervised or unsupervised learning? Supervised because you have to label the data.
  30. 30. CLUSTERING EXAMPLE: LOCATE CELL PHONE TOWERS Start with GPS coordinates of all cell phone users Represent data as vectors Locate towers in biggest clusters
  31. 31. CLUSTERING EXAMPLE: T-SHIRTS What size should a t- shirt be? Everyone’s real t-shirt size is different Lay out all sizes and cluster Target large clusters with XS, S, M, L, XL
  32. 32. CLUSTERING: POP QUIZ Is clustering supervised or unsupervised? Unsupervised because no labeling is required
  33. 33. RECOMMENDATIONS EXAMPLE: AMAZON Model looks at user ratings of books Viewing a book triggers implicit rating Recommend user new books
  34. 34. RECOMMENDATION: POP QUIZ Are recommendation systems supervised or unsupervised? Unsupervised
  35. 35. REGRESSION Like classification Output is continuous instead of one from k choices
  36. 36. REGRESSION EXAMPLES How many units of product will sell next month What will student score on SAT What is the market price of this house How long before this engine needs repair
  37. 37. REGRESSION EXAMPLE: AIRCRAFT PART FAILURE Cessna collects data from airplane sensors Predict when part needs to be replaced Ship part to customer’s service airport
  38. 38. REGRESSION: QUIZ Is regression supervised or unsupervised? Supervised
  39. 39. ANOMALY DETECTION EXAMPLE: CREDIT CARD FRAUD Train model on good transactions Anomalous activity indicates fraud Can pass transaction down to human for investigation
  40. 40. ANOMALY DETECTION EXAMPLE: NETWORK INTRUSION Train model on network login activity Anomalous activity indicates threat Can initiate alerts and lockdown procedures
  41. 41. ANOMALY DETECTION: QUIZ Is anomaly detection supervised or unsupervised? Unsupervised because we only train on normal data
  42. 42. FEATURE EXTRACTION Converting data to feature vectors Natural Language Processing Principal Component Analysis Auto-Encoders
  43. 43. FEATURE EXTRACTION: QUIZ Is feature extraction supervised or unsupervised? Unsupervised
  44. 44. MACHINE LEARNING WORKFLOW
  45. 45. DEEP LEARNING USED FOR Feature Extraction Classification Regression
  46. 46. HISTORY OF MACHINE LEARNING Input Features Algorithm Output Machine Human Human Machine Machine Human Machine Machine Machine Machine Machine Machine
  47. 47. DEEP LEARNING FRAMEWORKS
  48. 48. DEEP LEARNING FRAMEWORKS TensorFlow: NN library from Google Theano: Low-level GPU-enabled tensor library Torch7: NN library, uses Lua for binding, used by Facebook and Google Caffe: NN library by Berkeley AMPLab Nervana: Fast GPU-based machines optimized for deep learning
  49. 49. DEEP LEARNING FRAMEWORKS Keras, Lasagne, Blocks: NN libraries that make Theano easier to use CUDA: Programming model for using GPUs in general- purpose programming cuDNN: NN library by Nvidia based on CUDA, can be used with Torch7, Caffe Chainer: NN library that uses CUDA
  50. 50. DEEP LEARNING PROGRAMMING LANGUAGES All the frameworks support Python Except Torch7 which uses Lua for its binding language
  51. 51. TENSORFLOW TensorFlow originally developed by Google Brain Team Allows using GPUs for deep learning algorithms Single processor version released in 2015 Multiple processor version released in March 2016
  52. 52. KERAS Supports Theano and TensorFlow as back- ends Provides deep learning API on top of TensorFlow TensorFlow provides low-level matrix operations
  53. 53. TENSORFLOW: GEOFFREY HINTON, JEFF DEAN
  54. 54. KERAS: FRANCOIS CHOLLET
  55. 55. NEURAL NETWORKS
  56. 56. WHAT IS A NEURON? Receives signal on synapse When trigger sends signal on axon
  57. 57. MATHEMATICAL NEURON Mathematical abstraction, inspired by biological neuron Either on or off based on sum of input
  58. 58. MATHEMATICAL FUNCTION Neuron is a mathematical function Adds up (weighted) inputs and applies sigmoid (or other function) This determines if it fires or not
  59. 59. WHAT ARE NEURAL NETWORKS? Biologically inspired machine learning algorithm Mathematical neurons arranged in layers Accumulate signals from the previous layer Fire when signal reaches threshold
  60. 60. NEURAL NETWORKS
  61. 61. NEURON INCOMING Each neuron receives signals from neurons in previous layer Signal affected by weight Some are more important than others Bias is the base signal that the neuron receives
  62. 62. NEURON OUTGOING Each neuron sends its signal to the neurons in the next layer Signals affected by weight
  63. 63. LAYERED NETWORK Each layer looks at features identified by previous layer
  64. 64. US ELECTIONS
  65. 65. ELECTIONS Consider the elections This is a gated system A way to aggregate different views
  66. 66. HIGHEST LEVEL: STATES
  67. 67. NEXT LEVEL: COUNTIES
  68. 68. ELECTIONS Is this a Neural Network? How many layers does it have?
  69. 69. NEURON LAYERS The nomination is the last layer, layer N States are layer N-1 Counties are layer N-2 Districts are layer N-3 Individuals are layer N-4 Individual brains have even more layers
  70. 70. GRADIENT DESCENT
  71. 71. TRAINING: HOW DO WE IMPROVE? Calculate error from desired goal Increase weight of neurons who voted right Decrease weight of neurons who voted wrong This will reduce error
  72. 72. GRADIENT DESCENT This algorithm is called gradient descent Think of error as function of weights
  73. 73. FEED FORWARD Also called forward propagation or forward prop Initialize inputs Calculate activation of each layer Calculate activation of output layer
  74. 74. BACK PROPAGATION Use forward prop to calculate the error Error is function of all network weights Adjust weights using gradient descent Repeat with next record Keep going over training set until convergence
  75. 75. HOW DO YOU FIND THE MINIMUM IN AN N-DIMENSIONAL SPACE? Take a step in the steepest direction. Steepest direction is vector sum of all derivatives.
  76. 76. PUTTING ALL THIS TOGETHER Use forward prop to activate Use back prop to train Then use forward prop to test
  77. 77. TYPES OF NEURONS
  78. 78. SIGMOID
  79. 79. TANH
  80. 80. RELU
  81. 81. BENEFITS OF RELU Popular Accelerates convergence by 6x (Krizhevsky et al) Operation is faster since it is linear not exponential Can die by going to zero Pro: Sparse matrix Con: Network can die
  82. 82. LEAKY RELU Pro: Does not die Con: Matrix is not sparse
  83. 83. SOFTMAX Final layer of network used for classification Turns output into probability distribution Normalizes output of neurons to sum to 1
  84. 84. HYPERPARAMETER TUNING
  85. 85. PROBLEM: OIL EXPLORATION Drilling holes is expensive We want to find the biggest oilfield without wasting money on duds Where should we plant our next oilfield derrick?
  86. 86. PROBLEM: NEURAL NETWORKS Testing hyperparameters is expensive We have an N- dimensional grid of parameters How can we quickly zero in on the best combination of hyperparameters?
  87. 87. HYPERPARAMETER EXAMPLE How many layers should we have How many neurons should we have in hidden layers Should we use Sigmoid, Tanh, or ReLU Should we initialize
  88. 88. ALGORITHMS Grid Random Bayesian Optimization
  89. 89. GRID Systematically search entire grid Remember best found so far
  90. 90. RANDOM Randomly search the grid Remember the best found so far Bergstra and Bengio’s result and Alice Zheng’s explanation (see References) 60 random samples gets you within top 5% of grid search with 95% probability
  91. 91. BAYESIAN OPTIMIZATION Balance between explore and exploit Exploit: test spots within explored perimeter Explore: test new spots in random locations Balance the trade-off
  92. 92. SIGOPT YC-backed SF startup Founded by Scott Clark Raised $2M Sells cloud-based proprietary variant of Bayesian Optimization
  93. 93. BAYESIAN OPTIMIZATION PRIMER Bayesian Optimization Primer by Ian Dewancker, Michael McCourt, Scott Clark See References
  94. 94. OPEN SOURCE VARIANTS Open source alternatives: Spearmint Hyperopt SMAC MOE
  95. 95. PRODUCTION
  96. 96. DEPLOYING Phases: training, deployment Training phase run on back-end servers Optimize hyper- parameters on back-end Deploy model to front- end servers, browsers, devices Front-end only uses forward prop and is fast
  97. 97. SERIALIZING/DESERIALIZING MODEL Back-end: Serialize model + weights Front-end: Deserialize model + weights
  98. 98. HDF 5 Keras serializes model architecture to JSON Keras serializes weights to HDF5 Serialization model for hierarchical data APIs for C++, Python, Java, etc https://www.hdfgroup.org
  99. 99. DEPLOYMENT EXAMPLE: CANCER DETECTION Rhobota.com’s cancer detecting iPhone app Developed by Bryan Shaw a!er his son’s illness Model built on back-end, deployed on iPhone iPhone detects retinal cancer
  100. 100. DEEP LEARNING
  101. 101. WHAT IS DEEP LEARNING? Deep Learning is a learning method that can train the system with more than 2 or 3 non-linear hidden layers.
  102. 102. WHAT IS DEEP LEARNING? Machine learning techniques which enable unsupervised feature learning and pattern analysis/classification. The essence of deep learning is to compute representations of the data. Higher-level features are defined from lower-level ones.
  103. 103. HOW IS DEEP LEARNING DIFFERENT FROM REGULAR NEURAL NETWORKS? Training neural networks requires applying gradient descent on millions of dimensions. This is intractable for large networks. Deep learning places constraints on neural networks. This allows them to be solvable iteratively. The constraints are generic.
  104. 104. AUTO-ENCODERS
  105. 105. WHAT ARE AUTO-ENCODERS? An auto-encoder is a learning algorithm It applies backpropagation and sets the target values to be equal to its inputs In other words it trains itself to do the identity transformation
  106. 106. WHY DOES IT DO THIS? Auto-encoder places constraints on itself E.g. it restricts the number of hidden neurons This allows it to find a good representation of the data
  107. 107. IS THE AUTO-ENCODER SUPERVISED OR UNSUPERVISED? It is unsupervised. The data is unlabeled.
  108. 108. WHAT ARE CONVOLUTION NEURAL NETWORKS? Feedforward neural networks Connection pattern inspired by visual cortex
  109. 109. CONVOLUTIONAL NEURAL NETWORKS
  110. 110. CNNS The convolutional layer’s parameters are a set of learnable filters Every filter is small along width and height During the forward pass, each filter slides across the width and height of the input, producing a 2-dimensional activation map As we slide across the input we compute the dot product between the filter and the input
  111. 111. CNNS Intuitively, the network learns filters that activate when they see a specific type of feature anywhere In this way it creates translation invariance
  112. 112. CONVNET EXAMPLE Zero-Padding: the boundaries are padded with a 0 Stride: how much the filter moves in the convolution Parameter sharing: all filters share the same parameters
  113. 113. CONVNET EXAMPLE From http://cs231n.github.io/convolutional-networks/
  114. 114. WHAT IS A POOLING LAYER? The pooling layer reduces the resolution of the image further It tiles the output area with 2x2 mask and takes the maximum activation value of the area
  115. 115. REVIEW keras/examples/mnist_cnn.py Recognizes hand-written digits By combining different layers
  116. 116. RECURRENT NEURAL NETWORKS
  117. 117. RNNS RNNs capture patterns in time series data Constrained by shared weights across neurons Each neuron observes different times
  118. 118. LSTMS Long Short Term Memory networks RNNs cannot handle long time lags between events LSTMs can pick up patterns separated by big lags Used for speech recognition
  119. 119. RNN EFFECTIVENESS Andrej Karpathy uses LSTMs to generate text Generates Shakespeare, Linux Kernel code, mathematical proofs. See http://karpathy.github.io/
  120. 120. RNN INTERNALS
  121. 121. LSTM INTERNALS
  122. 122. CONCLUSION
  123. 123. REFERENCES Bayesian Optimization by Dewancker et al Random Search by Bengio et al Evaluating machine learning models Alice Zheng http://sigopt.com http://jmlr.org http://www.oreilly.com
  124. 124. REFERENCES Dropout by Hinton et al Understanding LSTM Networks by Chris Olah Multi-scale Deep Learning for Gesture Detection and Localization by Neverova et al Unreasonable Effectiveness of RNNs by Karpathy http://cs.utoronto.edu http://github.io http://uoguelph.ca http://karpathy.github.io
  125. 125. QUESTIONS

×