Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Deep Learning for NLP (without Magi... by BigDataCloud 7383 views
- Deep Learning for Java (DL4J) by 신동 강 4181 views
- RBM with DL4J for Deep Learning by 신동 강 1560 views
- Deep learning on a mixed cluster wi... by François Garillot 501 views
- Deep Learning on Hadoop by DataWorks Summit 2732 views
- Distributed Deep Learning on Hadoop... by DataWorks Summit/... 5994 views

2,965 views

Published on

For code see https://github.com/asimjalis/tensorflow-quickstart

Published in:
Data & Analytics

No Downloads

Total views

2,965

On SlideShare

0

From Embeds

0

Number of Embeds

58

Shares

0

Downloads

165

Comments

0

Likes

10

No embeds

No notes for slide

- 1. NEURAL NETWORKS AND DEEP LEARNING ASIM JALIS GALVANIZE
- 2. INTRO
- 3. ASIM JALIS Galvanize/Zipfian, Data Engineering Cloudera, Microso!, Salesforce MS in Computer Science from University of Virginia
- 4. GALVANIZE PROGRAMS Program Duration Data Science Immersive 12 weeks Data Engineering Immersive 12 weeks Web Developer Immersive 6 months Galvanize U 1 year
- 5. TALK OVERVIEW
- 6. WHAT IS THIS TALK ABOUT? Using Neural Networks and Deep Learning To recognize images By the end of the class you will be able to create your own deep learning systems
- 7. HOW MANY PEOPLE HERE HAVE USED NEURAL NETWORKS?
- 8. HOW MANY PEOPLE HERE HAVE USED MACHINE LEARNING?
- 9. HOW MANY PEOPLE HERE HAVE USED PYTHON?
- 10. DEEP LEARNING
- 11. WHAT IS MACHINE LEARNING Self-driving cars Voice recognition Facial recognition
- 12. HISTORY OF DEEP LEARNING
- 13. HISTORY OF MACHINE LEARNING Input Features Algorithm Output Machine Human Human Machine Machine Human Machine Machine Machine Machine Machine Machine
- 14. FEATURE EXTRACTION Traditionally data scientists to define features Deep learning systems are able to extract features themselves
- 15. DEEP LEARNING MILESTONES Years Theme 1980s Backpropagation invented allows multi-layer Neural Networks 2000s SVMs, Random Forests and other classifiers overtook NNs 2010s Deep Learning reignited interest in NN
- 16. IMAGENET AlexNet submitted to the ImageNet ILSVRC challenge in 2012 is partly responsible for the renaissance. Alex Krizhevsky, Ilya Sutskever, and Geoﬀrey Hinton used Deep Learning techniques. They combined this with GPUs, some other techniques. The result was a neural network that could classify images of cats and dogs. It had an error 16% compared to 26% for the runner up.
- 17. Ilya Sutskever, Alex Krizhevsky, Geoﬀrey Hinton
- 18. INDEED.COM/SALARY
- 19. MACHINE LEARNING
- 20. MACHINE LEARNING AND DEEP LEARNING Deep Learning fits inside Machine Learning Deep Learning a Machine Learning technique Share techniques for evaluating and optimizing models
- 21. WHAT IS MACHINE LEARNING? Inputs: Vectors or points of high dimensions Outputs: Either binary vectors or continuous vectors Machine Learning finds the relationship between them Uses statistical techniques
- 22. SUPERVISED VS UNSUPERVISED Supervised: Data needs to be labeled Unsupervised: Data does not need to be labeled
- 23. TECHNIQUES Classification Regression Clustering Recommendations Anomaly detection
- 24. CLASSIFICATION EXAMPLE: EMAIL SPAM DETECTION
- 25. CLASSIFICATION EXAMPLE: EMAIL SPAM DETECTION Start with large collection of emails, labeled spam/not- spam Convert email text into vectors of 0s and 1s: 0 if a word occurs, 1 if it does not These are called inputs or features Split data set into training set (70%) and test set (30%) Use algorithm like Random Forest to build model Evaluate model by running it on test set and capturing success rate
- 26. CLASSIFICATION ALGORITHMS Neural Networks Random Forest Support Vector Machines (SVM) Decision Trees Logistic Regression Naive Bayes
- 27. CHOOSING ALGORITHM Evaluate diﬀerent models on data Look at the relative success rates Use rules of thumb: some algorithms work better on some kinds of data
- 28. CLASSIFICATION EXAMPLES Is this tumor benign or cancerous? Is this lead profitable or not? Who will win the presidential elections?
- 29. CLASSIFICATION: POP QUIZ Is classification supervised or unsupervised learning? Supervised because you have to label the data.
- 30. CLUSTERING EXAMPLE: LOCATE CELL PHONE TOWERS Start with GPS coordinates of all cell phone users Represent data as vectors Locate towers in biggest clusters
- 31. CLUSTERING EXAMPLE: T-SHIRTS What size should a t- shirt be? Everyone’s real t-shirt size is diﬀerent Lay out all sizes and cluster Target large clusters with XS, S, M, L, XL
- 32. CLUSTERING: POP QUIZ Is clustering supervised or unsupervised? Unsupervised because no labeling is required
- 33. RECOMMENDATIONS EXAMPLE: AMAZON Model looks at user ratings of books Viewing a book triggers implicit rating Recommend user new books
- 34. RECOMMENDATION: POP QUIZ Are recommendation systems supervised or unsupervised? Unsupervised
- 35. REGRESSION Like classification Output is continuous instead of one from k choices
- 36. REGRESSION EXAMPLES How many units of product will sell next month What will student score on SAT What is the market price of this house How long before this engine needs repair
- 37. REGRESSION EXAMPLE: AIRCRAFT PART FAILURE Cessna collects data from airplane sensors Predict when part needs to be replaced Ship part to customer’s service airport
- 38. REGRESSION: QUIZ Is regression supervised or unsupervised? Supervised
- 39. ANOMALY DETECTION EXAMPLE: CREDIT CARD FRAUD Train model on good transactions Anomalous activity indicates fraud Can pass transaction down to human for investigation
- 40. ANOMALY DETECTION EXAMPLE: NETWORK INTRUSION Train model on network login activity Anomalous activity indicates threat Can initiate alerts and lockdown procedures
- 41. ANOMALY DETECTION: QUIZ Is anomaly detection supervised or unsupervised? Unsupervised because we only train on normal data
- 42. FEATURE EXTRACTION Converting data to feature vectors Natural Language Processing Principal Component Analysis Auto-Encoders
- 43. FEATURE EXTRACTION: QUIZ Is feature extraction supervised or unsupervised? Unsupervised
- 44. MACHINE LEARNING WORKFLOW
- 45. DEEP LEARNING USED FOR Feature Extraction Classification Regression
- 46. HISTORY OF MACHINE LEARNING Input Features Algorithm Output Machine Human Human Machine Machine Human Machine Machine Machine Machine Machine Machine
- 47. DEEP LEARNING FRAMEWORKS
- 48. DEEP LEARNING FRAMEWORKS TensorFlow: NN library from Google Theano: Low-level GPU-enabled tensor library Torch7: NN library, uses Lua for binding, used by Facebook and Google Caﬀe: NN library by Berkeley AMPLab Nervana: Fast GPU-based machines optimized for deep learning
- 49. DEEP LEARNING FRAMEWORKS Keras, Lasagne, Blocks: NN libraries that make Theano easier to use CUDA: Programming model for using GPUs in general- purpose programming cuDNN: NN library by Nvidia based on CUDA, can be used with Torch7, Caﬀe Chainer: NN library that uses CUDA
- 50. DEEP LEARNING PROGRAMMING LANGUAGES All the frameworks support Python Except Torch7 which uses Lua for its binding language
- 51. TENSORFLOW TensorFlow originally developed by Google Brain Team Allows using GPUs for deep learning algorithms Single processor version released in 2015 Multiple processor version released in March 2016
- 52. KERAS Supports Theano and TensorFlow as back- ends Provides deep learning API on top of TensorFlow TensorFlow provides low-level matrix operations
- 53. TENSORFLOW: GEOFFREY HINTON, JEFF DEAN
- 54. KERAS: FRANCOIS CHOLLET
- 55. NEURAL NETWORKS
- 56. WHAT IS A NEURON? Receives signal on synapse When trigger sends signal on axon
- 57. MATHEMATICAL NEURON Mathematical abstraction, inspired by biological neuron Either on or oﬀ based on sum of input
- 58. MATHEMATICAL FUNCTION Neuron is a mathematical function Adds up (weighted) inputs and applies sigmoid (or other function) This determines if it fires or not
- 59. WHAT ARE NEURAL NETWORKS? Biologically inspired machine learning algorithm Mathematical neurons arranged in layers Accumulate signals from the previous layer Fire when signal reaches threshold
- 60. NEURAL NETWORKS
- 61. NEURON INCOMING Each neuron receives signals from neurons in previous layer Signal aﬀected by weight Some are more important than others Bias is the base signal that the neuron receives
- 62. NEURON OUTGOING Each neuron sends its signal to the neurons in the next layer Signals aﬀected by weight
- 63. LAYERED NETWORK Each layer looks at features identified by previous layer
- 64. US ELECTIONS
- 65. ELECTIONS Consider the elections This is a gated system A way to aggregate diﬀerent views
- 66. HIGHEST LEVEL: STATES
- 67. NEXT LEVEL: COUNTIES
- 68. ELECTIONS Is this a Neural Network? How many layers does it have?
- 69. NEURON LAYERS The nomination is the last layer, layer N States are layer N-1 Counties are layer N-2 Districts are layer N-3 Individuals are layer N-4 Individual brains have even more layers
- 70. GRADIENT DESCENT
- 71. TRAINING: HOW DO WE IMPROVE? Calculate error from desired goal Increase weight of neurons who voted right Decrease weight of neurons who voted wrong This will reduce error
- 72. GRADIENT DESCENT This algorithm is called gradient descent Think of error as function of weights
- 73. FEED FORWARD Also called forward propagation or forward prop Initialize inputs Calculate activation of each layer Calculate activation of output layer
- 74. BACK PROPAGATION Use forward prop to calculate the error Error is function of all network weights Adjust weights using gradient descent Repeat with next record Keep going over training set until convergence
- 75. HOW DO YOU FIND THE MINIMUM IN AN N-DIMENSIONAL SPACE? Take a step in the steepest direction. Steepest direction is vector sum of all derivatives.
- 76. PUTTING ALL THIS TOGETHER Use forward prop to activate Use back prop to train Then use forward prop to test
- 77. TYPES OF NEURONS
- 78. SIGMOID
- 79. TANH
- 80. RELU
- 81. BENEFITS OF RELU Popular Accelerates convergence by 6x (Krizhevsky et al) Operation is faster since it is linear not exponential Can die by going to zero Pro: Sparse matrix Con: Network can die
- 82. LEAKY RELU Pro: Does not die Con: Matrix is not sparse
- 83. SOFTMAX Final layer of network used for classification Turns output into probability distribution Normalizes output of neurons to sum to 1
- 84. HYPERPARAMETER TUNING
- 85. PROBLEM: OIL EXPLORATION Drilling holes is expensive We want to find the biggest oilfield without wasting money on duds Where should we plant our next oilfield derrick?
- 86. PROBLEM: NEURAL NETWORKS Testing hyperparameters is expensive We have an N- dimensional grid of parameters How can we quickly zero in on the best combination of hyperparameters?
- 87. HYPERPARAMETER EXAMPLE How many layers should we have How many neurons should we have in hidden layers Should we use Sigmoid, Tanh, or ReLU Should we initialize
- 88. ALGORITHMS Grid Random Bayesian Optimization
- 89. GRID Systematically search entire grid Remember best found so far
- 90. RANDOM Randomly search the grid Remember the best found so far Bergstra and Bengio’s result and Alice Zheng’s explanation (see References) 60 random samples gets you within top 5% of grid search with 95% probability
- 91. BAYESIAN OPTIMIZATION Balance between explore and exploit Exploit: test spots within explored perimeter Explore: test new spots in random locations Balance the trade-oﬀ
- 92. SIGOPT YC-backed SF startup Founded by Scott Clark Raised $2M Sells cloud-based proprietary variant of Bayesian Optimization
- 93. BAYESIAN OPTIMIZATION PRIMER Bayesian Optimization Primer by Ian Dewancker, Michael McCourt, Scott Clark See References
- 94. OPEN SOURCE VARIANTS Open source alternatives: Spearmint Hyperopt SMAC MOE
- 95. PRODUCTION
- 96. DEPLOYING Phases: training, deployment Training phase run on back-end servers Optimize hyper- parameters on back-end Deploy model to front- end servers, browsers, devices Front-end only uses forward prop and is fast
- 97. SERIALIZING/DESERIALIZING MODEL Back-end: Serialize model + weights Front-end: Deserialize model + weights
- 98. HDF 5 Keras serializes model architecture to JSON Keras serializes weights to HDF5 Serialization model for hierarchical data APIs for C++, Python, Java, etc https://www.hdfgroup.org
- 99. DEPLOYMENT EXAMPLE: CANCER DETECTION Rhobota.com’s cancer detecting iPhone app Developed by Bryan Shaw a!er his son’s illness Model built on back-end, deployed on iPhone iPhone detects retinal cancer
- 100. DEEP LEARNING
- 101. WHAT IS DEEP LEARNING? Deep Learning is a learning method that can train the system with more than 2 or 3 non-linear hidden layers.
- 102. WHAT IS DEEP LEARNING? Machine learning techniques which enable unsupervised feature learning and pattern analysis/classification. The essence of deep learning is to compute representations of the data. Higher-level features are defined from lower-level ones.
- 103. HOW IS DEEP LEARNING DIFFERENT FROM REGULAR NEURAL NETWORKS? Training neural networks requires applying gradient descent on millions of dimensions. This is intractable for large networks. Deep learning places constraints on neural networks. This allows them to be solvable iteratively. The constraints are generic.
- 104. AUTO-ENCODERS
- 105. WHAT ARE AUTO-ENCODERS? An auto-encoder is a learning algorithm It applies backpropagation and sets the target values to be equal to its inputs In other words it trains itself to do the identity transformation
- 106. WHY DOES IT DO THIS? Auto-encoder places constraints on itself E.g. it restricts the number of hidden neurons This allows it to find a good representation of the data
- 107. IS THE AUTO-ENCODER SUPERVISED OR UNSUPERVISED? It is unsupervised. The data is unlabeled.
- 108. WHAT ARE CONVOLUTION NEURAL NETWORKS? Feedforward neural networks Connection pattern inspired by visual cortex
- 109. CONVOLUTIONAL NEURAL NETWORKS
- 110. CNNS The convolutional layer’s parameters are a set of learnable filters Every filter is small along width and height During the forward pass, each filter slides across the width and height of the input, producing a 2-dimensional activation map As we slide across the input we compute the dot product between the filter and the input
- 111. CNNS Intuitively, the network learns filters that activate when they see a specific type of feature anywhere In this way it creates translation invariance
- 112. CONVNET EXAMPLE Zero-Padding: the boundaries are padded with a 0 Stride: how much the filter moves in the convolution Parameter sharing: all filters share the same parameters
- 113. CONVNET EXAMPLE From http://cs231n.github.io/convolutional-networks/
- 114. WHAT IS A POOLING LAYER? The pooling layer reduces the resolution of the image further It tiles the output area with 2x2 mask and takes the maximum activation value of the area
- 115. REVIEW keras/examples/mnist_cnn.py Recognizes hand-written digits By combining diﬀerent layers
- 116. RECURRENT NEURAL NETWORKS
- 117. RNNS RNNs capture patterns in time series data Constrained by shared weights across neurons Each neuron observes diﬀerent times
- 118. LSTMS Long Short Term Memory networks RNNs cannot handle long time lags between events LSTMs can pick up patterns separated by big lags Used for speech recognition
- 119. RNN EFFECTIVENESS Andrej Karpathy uses LSTMs to generate text Generates Shakespeare, Linux Kernel code, mathematical proofs. See http://karpathy.github.io/
- 120. RNN INTERNALS
- 121. LSTM INTERNALS
- 122. CONCLUSION
- 123. REFERENCES Bayesian Optimization by Dewancker et al Random Search by Bengio et al Evaluating machine learning models Alice Zheng http://sigopt.com http://jmlr.org http://www.oreilly.com
- 124. REFERENCES Dropout by Hinton et al Understanding LSTM Networks by Chris Olah Multi-scale Deep Learning for Gesture Detection and Localization by Neverova et al Unreasonable Eﬀectiveness of RNNs by Karpathy http://cs.utoronto.edu http://github.io http://uoguelph.ca http://karpathy.github.io
- 125. QUESTIONS

No public clipboards found for this slide

Be the first to comment