Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

84 views

Published on

Introduction to deep learning

Published in:
Technology

No Downloads

Total views

84

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

14

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Introduction to Deep Learning Vishwas Lele (@vlele) Applied Information Sciences
- 2. Agenda •Introduction to Deep Learning • From ML to Deep Learning •Neural Network Basics • Dense, CNN, RNN • Key Concepts • CNN Basics •X-ray Image Analysis • Optimization (using pre-trained models) • Visualizing the learning
- 3. Demo • Why do I need to learn neural networks when I have Cognitive and other APIs?
- 4. What is Deep Learning?
- 5. Deep Learning Examples • Near-human-level image classification • Near-human-level speech recognition • Near-human-level handwriting transcription • Digital assistants Google Now, Cortana • Near-human-level autonomous driving
- 6. Why now? What about past AI winters? • Hardware • Datasets and benchmarks • Algorithmic advances
- 7. What about other ML techniques? • Probabilistic modeling (Naïve Bayes) • Kernel Methods • Support Vector Machines • Decision Trees • Random Forests • Gradient Boosting Machines
- 8. “Deep” in deep learning
- 9. How can a network help us? 0 1 2 3 4 7 5 6 9 8
- 10. Digit Classification
- 11. Network of layers • Layers are like LEGO bricks of deep learning • A data-processing module that takes as input one or more tensors and that outputs one or more tensors
- 12. Tensor shape (128, 256, 256, 1)
- 13. How layers transform data? output = relu(dot(W, input) + b) W- Weight b - Bias
- 14. Rectified Linear Unit (RELU)
- 15. Power of parallelization def naive_relu(x): i in range(x.shape[0]): for j in range(x.shape[1]): x[i, j] = max(x[i, j], 0) return x z = np.maximum(z, 0.)
- 16. Sigmoid function
- 17. How network learns?
- 18. Training Loop • Draw a batch of training samples x and corresponding targets y. • Run the network on x • Compute the loss of the network • Update all weights of the network in a way that slightly reduces the loss on this batch.
- 19. Key Concepts • Derivative • Gradient • Stochastic Gradient
- 20. Stochastic gradient descent • Draw a batch of training samples x and corresponding targets y. • Run the network on x to obtain predictions y_pred. • Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y.
- 21. Stochastic gradient descent • Compute the gradient of the loss with regard to the network’s parameters (a backward pass). • Move the parameters a little in the opposite direction from the gradient—for example W = step * gradient—thus reducing the loss on the batch a bit.
- 22. Back propagation 0 1 2 3 4 7 5 6 9 8
- 23. Demo • Digit Classification • Keras Library
- 24. Training and validation loss
- 25. Overfitting
- 26. Weight Regularization • L1 regularization— The cost added is proportional to the absolute value of the weight coefficients • L2 regularization— The cost added is proportional to the square of the value of the weight coefficients
- 27. Dropout • Randomly dropping out (setting to zero) a number of output features of the layer during training.
- 28. Feature Engineering • process of using your own knowledge to make the algorithm work better by applying hardcoded (nonlearned) transformations to the data before it goes into the model
- 29. Feature engineering
- 30. How network learns?
- 31. Enter CNN or (Convolution Neural Networks)
- 32. Convolution Operation
- 33. Spatial Hierarchy
- 34. How Convolution works?
- 35. Sliding
- 36. Padding
- 37. Stride
- 38. Max-Pooling • Extracting windows from the input feature maps and outputting the max value of each channel. • It’s conceptually similar to convolution. • Except that instead of transforming local patches via a learned linear transformation
- 39. Demo • Digit Classification (using CNN) • Keras Library
- 40. Demo • X-ray Image Classification • Dealing with overfitting • Visualizing the learning
- 41. Visualizing the learning
- 42. Overfitting
- 43. With dropout regularization
- 44. With data augmentation
- 45. Demo • Activation heatmap

No public clipboards found for this slide

Be the first to comment