Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An introduction to Machine Learning

12 views

Published on

We delivered a quick introduction to Machine Learning to EPITECH' students in Nice : Complexity, algorithm families, troubleshoot and some insights

Published in: Technology
  • Be the first to comment

  • Be the first to like this

An introduction to Machine Learning

  1. 1. B2B Travel technology Machine Learning Introduction October 20th 2016
  2. 2. Introduction Johnny RAHAJARISON @brainstorm_me johnny.rahajarison@mylittleadventure.com MyLittleAdventure @mylitadventure 2
  3. 3. Agenda What’s Machine Learning ? Usage examples Complexity Algorithm families Let’s go! Troubleshoot Tech insights Next steps Conclusion 3!
  4. 4. Machine learning Introduction 4
  5. 5. What’s Machine Learning ? Software that do something without being explicitly programmed to, just by learning through examples Same software can be used for various tasks It learns from experiences with respect to some task and performance, and improves through experience 5!
  6. 6. Usage examples (1/2) 6! Some typical usage examples
  7. 7. Use cases : MyLittleAdventure (2/2) 7 Language detection Clustering Anomaly detection Recommendation Chose of parameters MyLittleAdventure usage !
  8. 8. Complexity 8! """Tests for convolution related functionality in tensorflow.ops.nn.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf class Conv2DTransposeTest(tf.test.TestCase): def testConv2DTransposeSingleStride(self): with self.test_session(): strides = [1, 1, 1, 1] # Input, output: [batch, height, width, depth] x_shape = [2, 6, 4, 3] y_shape = [2, 6, 4, 2] # Filter: [kernel_height, kernel_width, output_depth, input_depth] f_shape = [3, 3, 2, 3] x = tf.constant(1.0, shape=x_shape, name="x", dtype=tf.float32) f = tf.constant(1.0, shape=f_shape, name="filter", dtype=tf.float32) output = tf.nn.conv2d_transpose(x, f, y_shape, strides=strides, padding="SAME") value = output.eval() # We count the number of cells being added at the locations in the output. # At the center, #cells=kernel_height * kernel_width # At the corners, #cells=ceil(kernel_height/2) * ceil(kernel_width/2) # At the borders, #cells=ceil(kernel_height/2)*kernel_width or # kernel_height * ceil(kernel_width/2) for n in xrange(x_shape[0]): for k in xrange(f_shape[2]): for w in xrange(y_shape[2]): for h in xrange(y_shape[1]): target = 4 * 3.0 h_in = h > 0 and h < y_shape[1] - 1 w_in = w > 0 and w < y_shape[2] - 1 if h_in and w_in: target += 5 * 3.0 """GradientDescent for TensorFlow.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function from tensorflow.python.framework import ops from tensorflow.python.ops import math_ops from tensorflow.python.training import optimizer from tensorflow.python.training import training_ops class GradientDescentOptimizer(optimizer.Optimizer): """Optimizer that implements the gradient descent algorithm. @@__init__ """ def __init__(self, learning_rate, use_locking=False, name="GradientDescent"): """Construct a new gradient descent optimizer. Args: learning_rate: A Tensor or a floating point value. The learning rate to use. use_locking: If True use locks for update operations. name: Optional name prefix for the operations created when applying gradients. Defaults to "GradientDescent". """ super(GradientDescentOptimizer, self).__init__(use_locking, name) self._learning_rate = learning_rate def _apply_dense(self, grad, var): return training_ops.apply_gradient_descent( var, math_ops.cast(self._learning_rate_tensor, var.dtype.base_dtype), grad, use_locking=self._use_locking).op def _apply_sparse(self, grad, var): delta = ops.IndexedSlices( grad.values * """Tests for tensorflow.ops.linalg_grad.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import numpy as np import tensorflow as tf class ShapeTest(tf.test.TestCase): def testBatchGradientUnknownSize(self): with self.test_session(): batch_size = tf.constant(3) matrix_size = tf.constant(4) batch_identity = tf.tile( tf.expand_dims( tf.diag(tf.ones([matrix_size])), 0), [batch_size, 1, 1]) determinants = tf.matrix_determinant(batch_identity) reduced = tf.reduce_sum(determinants) sum_grad = tf.gradients(reduced, batch_identity)[0] self.assertAllClose(batch_identity.eval(), sum_grad.eval()) class MatrixUnaryFunctorGradientTest(tf.test.TestCase): pass # Filled in below def _GetMatrixUnaryFunctorGradientTest(functor_, dtype_, shape_, **kwargs_): def Test(self): with self.test_session(): np.random.seed(1) m = np.random.uniform(low=-1.0, high=1.0, size=np.prod(shape_)).reshape(shape_).astype(dtype_) a = tf.constant(m) b = functor_(a, **kwargs_) # Optimal stepsize for central difference is O(epsilon^{1/3}). epsilon = np.finfo(dtype_).eps delta = 0.1 * epsilon**(1.0 / 3.0) # tolerance obtained by looking at actual differences using # np.linalg.norm(theoretical-numerical, np.inf) on -mavx build Complex algorithm before train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) And Machine learning now…
  9. 9. Machine learning Algorithm families 9
  10. 10. Supervised algorithms 10 Supervised algorithms ClassificationRegression
  11. 11. Unsupervised algorithms 11 Unsupervised algorithms ClusteringAnomaly detection
  12. 12. Machine learning Let’s go! 12
  13. 13. Recipe !13 Collect Training data Files, database, cache, data flow Selection of model, and (hyper) parameters Train algorithm Use or store your trained algorithm Make predictions Measure accuracy precision Measure
  14. 14. Collect training data Get qualitative data Get some samples Don’t get data for months and then try Go fast and try things. 14 Weight (g) Width (cm) Height (cm) Label 192 8.4 7.3 Granny smith apple 86 6.2 4.7 Mandarin 178 7.1 7.8 Braeburn apple 162 7.4 7.2 Cripps pink apple 118 6.1 8.1 Unidentified lemons 144 6.8 7.4 Turkey orange 362 9.6 9.2 Spanish jumbo orange … … … … What about the data? Fruit identification example
  15. 15. Prepare your data 15 Numerize your features and labels Put them in same scale (normalization) ? Weight (g) Width (cm) Height (cm) Label 192 8.4 7.3 1 86 6.2 4.7 2 178 7.1 7.8 3 162 7.4 7.2 5 118 6.1 8.1 10 144 6.8 7.4 8 362 9.6 9.2 9 … … … … We need to have some tests Training set Learning phase (60% - 80 %) Test set Analytics phase (20% - 40%)
  16. 16. Train algorithm 16 Choose a classifier Fit the decision tree Weight (g) Width (cm) Height (cm) Label 192 8.4 7.3 1 86 6.2 4.7 2 178 7.1 7.8 3 162 7.4 7.2 5 118 6.1 8.1 10 144 6.8 7.4 8 362 9.6 9.2 9 … … … … We need to choose an estimator
  17. 17. Make predictions 17 What looks our predictions? Weight (g) Width (cm) Height (cm) Label 192 8.4 7.3 1 86 6.2 4.7 2 178 7.1 7.8 3 Test set Weight (g) Width (cm) Height (cm) Label 192 8.4 7.3 1 86 6.2 4.7 2 178 7.1 7.8 1 Predictions !
  18. 18. Measure (1/2) Evaluate on the dataset that as never ever been learned by your model 18! Error level Correct predictions / total predictions Gives a simple confidence score of our performance level
  19. 19. Measure (2/2) Try to visualize and analyze your data, and know what you want 19! Actual true Actual false Predicted true True positive False positive Predicted false False negative True negative Confusion Matrix Skewed classes Precision = True positives / #predicted positives Recall = True positives / #actual positives F1 score (trade-off) = (precision * recall) / (precision + recall)
  20. 20. Machine learning Troubleshoot 20
  21. 21. Troubleshoot (1/4) 21! Under/Overfitting situation
  22. 22. Troubleshoot (2/4) Underfitting Add / create more features Use more sophisticated model Use fewer samples Decrease regularization 22! Overfitting Use fewer features Use more simple model Use more samples Increase regularization What are the different options ?
  23. 23. Troubleshoot (3/4) 23! Underfitting Overfitting Using the learning curves…
  24. 24. Troubleshoot : Model choice (4/4) 24!
  25. 25. Machine learning Tech insights 25
  26. 26. Platforms : easy, peasy You don’t even have to code to build something (*wink wink* business developers) Built-in models Data munging Model management by UI PaaS 26! Very high-level solutions
  27. 27. Languages For understanding & prototyping implementation Most Valuable Languages Comfortable for prototyping, yet powerful for industrialisation For bigger companies & projects, and fine-tuned softwares 27! Matlab Octave Go Python Java C++ What language for what purpose ?
  28. 28. Libraries Built-in models Data munging Fine-tuning Full integration to your product 28! You will have great power using a library Golearn
  29. 29. Machine learning Next steps… 29
  30. 30. Next steps Split your data in 3 : Training / Cross validation / Test set Know the top algorithms Search advanced techniques and optimizers (online learning, stacking) Deep and reinforcement learning Partial and semi-supervised learning Transfer learning How to store and analyse big data ? How do we scale ? !30 Try it ! Find your best tools and have some fun
  31. 31. Conclusion Try it and let’s get in touch! Machine learning is not just a buzz word Difficulties are not always what we think! Machine learning is rather experiences and tests than just algorithms There is no perfect unique solution There is plenty of easy to use solutions for beginners 31!
  32. 32. Machine learning One more thing! 32
  33. 33. Tensor flow & mnist 33
  34. 34. Tensorflow learn 34
  35. 35. Thank you Machine Learning Introduction October 20th 2016 Questions ?

×