Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to Deep
Learning
Vishwas Lele (@vlele)
Applied Information Sciences
Agenda
•Introduction to Deep Learning
• From ML to Deep Learning
•Neural Network Basics
• Dense, CNN, RNN
• Key Concepts
•...
Demo
• Why do I need to learn neural networks when I have Cognitive and
other APIs?
What is Deep Learning?
Deep Learning Examples
• Near-human-level image classification
• Near-human-level speech recognition
• Near-human-level ha...
Why now? What about past AI winters?
• Hardware
• Datasets and benchmarks
• Algorithmic advances
What about other ML techniques?
• Probabilistic modeling (Naïve Bayes)
• Kernel Methods
• Support Vector Machines
• Decisi...
“Deep” in deep learning
How can a network help us?
0
1
2
3
4
7
5
6
9
8
Digit Classification
Network of layers
• Layers are like LEGO bricks of deep learning
• A data-processing module that takes as input one or mor...
Tensor
shape (128, 256, 256, 1)
How layers transform data?
output = relu(dot(W, input) + b)
W- Weight
b - Bias
Rectified Linear Unit (RELU)
Power of parallelization
def naive_relu(x):
i in range(x.shape[0]): for j in range(x.shape[1]): x[i, j] = max(x[i, j], 0)
...
Sigmoid function
How network learns?
Training Loop
• Draw a batch of training samples x and corresponding targets y.
• Run the network on x
• Compute the loss ...
Key Concepts
• Derivative
• Gradient
• Stochastic Gradient
Stochastic gradient descent
• Draw a batch of training samples x and corresponding targets y.
• Run the network on x to ob...
Stochastic gradient descent
• Compute the gradient of the loss with regard to the network’s
parameters (a backward pass).
...
Back propagation
0
1
2
3
4
7
5
6
9
8
Demo
• Digit Classification
• Keras Library
Training and validation loss
Overfitting
Weight Regularization
• L1 regularization— The cost added is proportional to the absolute
value of the weight coefficients...
Dropout
• Randomly dropping out (setting to zero) a number of output features
of the layer during training.
Feature Engineering
• process of using your own knowledge to make the algorithm work
better by applying hardcoded (nonlear...
Feature engineering
How network learns?
Enter CNN or (Convolution Neural Networks)
Convolution Operation
Spatial Hierarchy
How Convolution works?
Sliding
Padding
Stride
Max-Pooling
• Extracting windows from the input feature maps and outputting the
max value of each channel.
• It’s conceptu...
Demo
• Digit Classification (using CNN)
• Keras Library
Demo
• X-ray Image Classification
• Dealing with overfitting
• Visualizing the learning
Visualizing the learning
Overfitting
With dropout regularization
With data augmentation
Demo
• Activation heatmap
Introduction to deep learning
Upcoming SlideShare
Loading in …5
×

Introduction to deep learning

84 views

Published on

Introduction to deep learning

Published in: Technology
  • Be the first to comment

Introduction to deep learning

  1. 1. Introduction to Deep Learning Vishwas Lele (@vlele) Applied Information Sciences
  2. 2. Agenda •Introduction to Deep Learning • From ML to Deep Learning •Neural Network Basics • Dense, CNN, RNN • Key Concepts • CNN Basics •X-ray Image Analysis • Optimization (using pre-trained models) • Visualizing the learning
  3. 3. Demo • Why do I need to learn neural networks when I have Cognitive and other APIs?
  4. 4. What is Deep Learning?
  5. 5. Deep Learning Examples • Near-human-level image classification • Near-human-level speech recognition • Near-human-level handwriting transcription • Digital assistants Google Now, Cortana • Near-human-level autonomous driving
  6. 6. Why now? What about past AI winters? • Hardware • Datasets and benchmarks • Algorithmic advances
  7. 7. What about other ML techniques? • Probabilistic modeling (Naïve Bayes) • Kernel Methods • Support Vector Machines • Decision Trees • Random Forests • Gradient Boosting Machines
  8. 8. “Deep” in deep learning
  9. 9. How can a network help us? 0 1 2 3 4 7 5 6 9 8
  10. 10. Digit Classification
  11. 11. Network of layers • Layers are like LEGO bricks of deep learning • A data-processing module that takes as input one or more tensors and that outputs one or more tensors
  12. 12. Tensor shape (128, 256, 256, 1)
  13. 13. How layers transform data? output = relu(dot(W, input) + b) W- Weight b - Bias
  14. 14. Rectified Linear Unit (RELU)
  15. 15. Power of parallelization def naive_relu(x): i in range(x.shape[0]): for j in range(x.shape[1]): x[i, j] = max(x[i, j], 0) return x z = np.maximum(z, 0.)
  16. 16. Sigmoid function
  17. 17. How network learns?
  18. 18. Training Loop • Draw a batch of training samples x and corresponding targets y. • Run the network on x • Compute the loss of the network • Update all weights of the network in a way that slightly reduces the loss on this batch.
  19. 19. Key Concepts • Derivative • Gradient • Stochastic Gradient
  20. 20. Stochastic gradient descent • Draw a batch of training samples x and corresponding targets y. • Run the network on x to obtain predictions y_pred. • Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y.
  21. 21. Stochastic gradient descent • Compute the gradient of the loss with regard to the network’s parameters (a backward pass). • Move the parameters a little in the opposite direction from the gradient—for example W = step * gradient—thus reducing the loss on the batch a bit.
  22. 22. Back propagation 0 1 2 3 4 7 5 6 9 8
  23. 23. Demo • Digit Classification • Keras Library
  24. 24. Training and validation loss
  25. 25. Overfitting
  26. 26. Weight Regularization • L1 regularization— The cost added is proportional to the absolute value of the weight coefficients • L2 regularization— The cost added is proportional to the square of the value of the weight coefficients
  27. 27. Dropout • Randomly dropping out (setting to zero) a number of output features of the layer during training.
  28. 28. Feature Engineering • process of using your own knowledge to make the algorithm work better by applying hardcoded (nonlearned) transformations to the data before it goes into the model
  29. 29. Feature engineering
  30. 30. How network learns?
  31. 31. Enter CNN or (Convolution Neural Networks)
  32. 32. Convolution Operation
  33. 33. Spatial Hierarchy
  34. 34. How Convolution works?
  35. 35. Sliding
  36. 36. Padding
  37. 37. Stride
  38. 38. Max-Pooling • Extracting windows from the input feature maps and outputting the max value of each channel. • It’s conceptually similar to convolution. • Except that instead of transforming local patches via a learned linear transformation
  39. 39. Demo • Digit Classification (using CNN) • Keras Library
  40. 40. Demo • X-ray Image Classification • Dealing with overfitting • Visualizing the learning
  41. 41. Visualizing the learning
  42. 42. Overfitting
  43. 43. With dropout regularization
  44. 44. With data augmentation
  45. 45. Demo • Activation heatmap

×