Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning 101

467 views

Published on

Slides for an introductory talk to Machine Learning, given at Instituto Tecnológico de Tepic in an academic week.

Published in: Engineering
  • Be the first to comment

Machine Learning 101

  1. 1. Machine Learning 101 Edwin Jiménez October 2017.
  2. 2. Machine Learning 101 Edwin Jiménez October 2017.
  3. 3. A.I. Bring the attention of your audience over a key concept using icons or illustrations Image from: https://i0.wp.com/dailypremiere.com/wp-content/uploads/2016/11/AI_Poster.jpg?resize=1024%2C641
  4. 4. What is: Intelligence? Learning?
  5. 5. “ A very general mental capability that, [...], involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. Intelligence Definition, From "Mainstream Science on Intelligence" (1994),
  6. 6. “ We define learning as the transformative process of taking in information that—when internalized and mixed with what we have experienced—changes what we know and builds on what we do. It’s based on input, process, and reflection. Learning Definition. From The New Social Learning by Tony Bingham and Marcia Conner
  7. 7. A.I. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology. “It is the science and engineering of making intelligent machines, especially intelligent computer programs.”
  8. 8. What is A.I. (nowadays)
  9. 9. What is not A.I. (nowadays)
  10. 10. Image from: http://legalexecutiveinstitute.com/wp-content/uploads/2016/02/AI-Graphic-NEW.jpg
  11. 11. Image from: http://legalexecutiveinstitute.com/wp-content/uploads/2016/02/AI-Graphic-NEW.jpg
  12. 12. Image from: http://legalexecutiveinstitute.com/wp-content/uploads/2016/02/AI-Graphic-NEW.jpg
  13. 13. M.L. Arthur Samuel (1959) Field of study that gives computers the ability to learn without being explicitly programmed.
  14. 14. ‘Machine’ is a term we use to denote a mathematical model which aims to optimize a given function.
  15. 15. To learn, it needs data
  16. 16. Data Input Picture Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Text ID Name Age Sex Student Happy 01 Marc 25 Male Yes No 02 Ana 18 Female Yes Yes Vector of values
  17. 17. Data Input Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. ID Name Age Sex Student Happy 01 Marc 25 Male Yes No 02 Ana 18 Female Yes Yes Each value of the data is called feature
  18. 18. Machine Learning Unsupervised We have data and its features. Supervised We have labeled data and we learn from that.
  19. 19. How it works? Image from: https://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2017/07/big-data-theree.png
  20. 20. Supervised Input: Data points Feature 1 Feature2 Learn from data Result (classification or regression)
  21. 21. Classification
  22. 22. Classification Input Process Vector of probabilities 0.01 0.05 0.07 0.62 0.12 0.05 0.02 0.03 0.01 0.02
  23. 23. Regression
  24. 24. Regression Input House Size: 289 m2 Process House Price 4,000£
  25. 25. Popular Supervised Algorithms ● Nearest Neighbor ● Naive Bayes ● Decision Trees ● Linear Regression ● Support Vector Machines (SVM) ● Neural Networks
  26. 26. Unsupervised Image from: https://www.packtpub.com/sites/default/files/Article-Images/B03905_01_01.png Input: Data points Result (clustering) Learn from data
  27. 27. Unsupervised Image from: https://www.datascience.com/blog/k-means-clustering Input: Data points Learn from data
  28. 28. Unsupervised Image from: https://www.datascience.com/blog/k-means-clustering Input: Data points Learn from data
  29. 29. Popular Unsupervised Algorithms ● k-means clustering ● Association Rules
  30. 30. That’s nice But how do I ‘learn’ from data?
  31. 31. Artificial Neural Network
  32. 32. Artificial Neural Network
  33. 33. ● Dendrite - It receives signals from other neurons. Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications
  34. 34. ● Soma (cell body) - It sums all the incoming signals to generate input. Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications
  35. 35. ● Axon - When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the other neurons Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications
  36. 36. ● Synapses - The point of interconnection of one neuron with other neurons. The amount of signal transmitted depend upon the strength (synaptic weights) of the connections. Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications
  37. 37. Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications
  38. 38. Image from: https://www.xenonstack.com/blog/overview-of-artificial-neural-networks-and-its-applications
  39. 39. Perceptron Image from: https://d4datascience.files.wordpress.com/2016/09/600px-artificialneuronmodel_english.png
  40. 40. Perceptron Image from: https://d4datascience.files.wordpress.com/2016/09/600px-artificialneuronmodel_english.png X1 … Xn are the features
  41. 41. Perceptron Image from: https://d4datascience.files.wordpress.com/2016/09/600px-artificialneuronmodel_english.png w1j … wnj are the weights, denotes importance of Xi to the result.
  42. 42. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 ● Supervised ● Classification
  43. 43. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  44. 44. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  45. 45. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  46. 46. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  47. 47. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Perceptron
  48. 48. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Vector of values 1.245 0.789
  49. 49. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Vector of values 1.245 0.789 Get probabilities ?
  50. 50. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Vector of values 1.245 0.789 Softmax
  51. 51. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Vector of values 1.245 0.789
  52. 52. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Vector of values 1.245 0.789 softmax 0.75 0.25 Vector of probabilities
  53. 53. Artificial Neural Network Image from: http://cs231n.github.io/assets/nn1 Vector of values 1.245 0.789 softmax 0.75 0.25 Vector of probabilities 0 Predicted label 0 1
  54. 54. Training or learning from data Image from: http://cs231n.github.io/assets/nn1 1.Give data to the model 2.Calculate probabilities 3.Evaluate, how far is the predicted label from the original label (error) 4.Update weights 5.Repeat 1 - 4 until the error is minor than some value
  55. 55. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4
  56. 56. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 x1
  57. 57. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 x1 x2 x3 x4
  58. 58. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 x1 y1 z1
  59. 59. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 x1 y1 z1 12*x1+0.7*y1+0.4*z 1
  60. 60. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 1.67
  61. 61. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 1.67 1.27
  62. 62. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 1.67 1.27 0.35
  63. 63. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 1.67 1.27 0.35 Softmax
  64. 64. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 1.67 1.27 0.35 0.8 0.2
  65. 65. Training (error propagation) Image from: http://cs231n.github.io/assets/nn1 1 0 Lossi = -ti log(pi) X Y =- 0.8 0.2 *log( )
  66. 66. Training (error propagation) Image from: http://cs231n.github.io/assets/nn1 1 0 Lossi = -ti log(pi) X Y =- -0.096 −0.69 * Why log? Because log(1) = 0 When expected and predicted label are equal we have 0 error.
  67. 67. Training (error propagation) Image from: http://cs231n.github.io/assets/nn1 Lossi = -ti log(pi) X Y =- -0.096 0
  68. 68. Training (error propagation) Image from: http://cs231n.github.io/assets/nn1 Lossi = -ti log(pi) 0.096 0
  69. 69. Training Image from: http://cs231n.github.io/assets/nn1 12 0.7 0.4 1.67 1.27 0.35 0.096 0 error
  70. 70. Training (back propagation) Image from: http://cs231n.github.io/assets/nn1 initialize network weights (often small random values) do forEach training example named ex prediction = neural-net-output(network, ex) // forward actual = teacher-output(ex) compute error at the output units compute Δwh for all weights from hidden layer to output layer // backward compute Δwi for all weights from input layer to hidden layer // backward update network weights // input layer not modified by error estimate until all examples classified correctly or another stopping criterion satisfied return the network
  71. 71. Evaluation Confusion Matrix Image from:Wikipedia Actual class Cat Non-cat Predicted class Cat 5 True Positives (TP) 2 False Positives (FP) Non-cat 3 False Negatives (FN) 17 True negatives (TN)
  72. 72. Evaluation Confusion Matrix T. Fawcett / Pattern Recognition Letters 27 (2006) 861–874
  73. 73. Deep Learning (D.L.)
  74. 74. MNIST
  75. 75. MNIST - Model definition 28x28 784 features 10 categories 10 Accuracy: 93.10%
  76. 76. MNIST - Model definition 28x28 784 features 10 categories 10 Accuracy: 93.51% 10
  77. 77. MNIST - Model definition 28x28 784 features 10 categories 10 Accuracy: 93.75% 10 hidden layer 5 [10,10,10]
  78. 78. Sometimes
  79. 79. GoogLeNet
  80. 80. Why to go deeper? Image from: http://fortune.com/ai-artificial-intelligence-deep-machine-learning/
  81. 81. Image from: https://fortunedotcom.files.wordpress.com /2016/09/lrn-10-01-16-neural-networks-e1474990995824.png
  82. 82. Image from: https://fortunedotcom.files.wordpress.com /2016/09/lrn-10-01-16-neural-networks-e1474990995824.png
  83. 83. http://fortune.com/ai-artificial-intelligence-deep-machine-learning/
  84. 84. Applications
  85. 85. Generate Haikus from news
  86. 86. Generate poetry from images A man is taking a picture of himself in the mirror of the world. So I said the word had come to this story once a year ago, and I was a fool for the sake of the same word, and the fact that I was a boy who had been dead for a moment. He is a poor picture of the past.
  87. 87. Automatic Colorization of Black and White Images
  88. 88. Generate christmas carols from images.
  89. 89. Automatically Adding Sounds To Silent Movies
  90. 90. Automatic Machine Translation
  91. 91. Artistic filters
  92. 92. Artistic filters (2)
  93. 93. Artistic reinterpretation of images (GoogLeNet, aka inception)
  94. 94. Object Classification and Detection in Photographs
  95. 95. Automatic Text Generation
  96. 96. Automatic Handwriting Generation
  97. 97. Automatic Game Playing
  98. 98. Automatic Image Caption Generation
  99. 99. Tools
  100. 100. Example with Theano import theano import numpy x = theano.tensor.fvector('x') target = theano.tensor.fscalar('target') W = theano.shared(numpy.asarray([0.2, 0.7]),'W') y = (x * W).sum() cost = theano.tensor.sqr(target - y) gradients = theano.tensor.grad(cost, [W]) W_updated = W - (0.1 * gradients) updates = [(W, W_updated)] f = theano.function([x, target], y, updates=updates) for i in xrange(10): output = f([1.0, 1.0], 20.0) 1.0 1.0 20
  101. 101. Example with Lasagne network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28), input_var=input_var) # Hidden layers and dropout: nonlin = lasagne.nonlinearities.rectify for _ in range(depth): network = lasagne.layers.DenseLayer(network, width, nonlinearity=nonlin) # Output layer: softmax = lasagne.nonlinearities.softmax network = lasagne.layers.DenseLayer(network, 10, nonlinearity=softmax)
  102. 102. Convolutional Neural Network
  103. 103. Example with Lasagne network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28), input_var=input_var) network = lasagne.layers.Conv2DLayer(network, num_filters=32, filter_size=(5, 5), nonlinearity=lasagne.nonlinearities.rectify) # Max-pooling layer of factor 2 in both dimensions: network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) # Another convolution with 32 5x5 kernels, and another 2x2 pooling: network = lasagne.layers.Conv2DLayer(network, num_filters=32, filter_size=(5, 5), nonlinearity=lasagne.nonlinearities.rectify) network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) # A fully-connected layer of 256 units network = lasagne.layers.DenseLayer(network, num_units=256, nonlinearity=lasagne.nonlinearities.rectify) # And, finally, the 10-unit output layer network = lasagne.layers.DenseLayer( network, num_units=10, nonlinearity=lasagne.nonlinearities.softmax)
  104. 104. Courses
  105. 105. https://es.coursera.org/learn/machine-learning Andrew Ng Universidad de Stanford
  106. 106. Is Machine Learning the answer to everything?
  107. 107. Is Machine Learning the answer to everything?
  108. 108. “ Turning over rocks and finding nothing is progress
  109. 109. Modern Data Scientist http://www.mark etingdistillery.co m/wp- content/uploads /2014/08/mds.p ng Math & Statistics • Machine Learning Supervised Learning Unsupervised Learning Optimization • Statistical modeling • Experiment design • Bayesian Inference
  110. 110. Modern Data Scientist http://www.mark etingdistillery.co m/wp- content/uploads /2014/08/mds.p ng Programming & Database • Computer science fundamentals • Scripting language • Databases • Relational Algebra • MapReduce
  111. 111. Thanks! Any questions? You can find me at: eejimenez@gdl.cinvestav.mx edwinjimenezlepe @Lepe_92 lepe92
  112. 112. “ What is research, but a blind date with knowledge. William Henry

×