Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine learning in php

952 views

Published on

Machine learning is teaching the computer how to learn by itself. It is far easier to be done, especially when you have small data set and a good level of expertise in your field. Classifying objects, predicting who will buy, where to park your car or which test will fail may be achieved with grassy algorithm like neural networks, genetic algorithms or ant herding. PHP is in good position to make use of such teachings, and take advantages of related technologies like R. By the end of the session, you’ll know where you want to try it.

Published in: Technology
  • Be the first to comment

Machine learning in php

  1. 1. MACHINE LEARNING IN PHP The roots of education are bitter, but the fruit is sweet Verona, Italia, 2016
  2. 2. AGENDA How to teach tricks to your PHP Application : searching for code in comments Complex learning
  3. 3. SPEAKER Damien Seguy Exakat CTO Static analysis of PHP code
  4. 4. MACHINE LEARNING Teaching the machine Supervised learning : learning then applying Application build its own model : training phase It applies its model to real cases : applying phase
  5. 5. APPLICATIONS Play go, chess, tic-tac-toe and beat everyone else Fraud detection and risk analysis Automated translation or automated transcription OCR and face recognition Medical diagnostics Walk, welcome guest at hotels, play football Finding good PHP code
  6. 6. PHP APPLICATIONS Recommendations systems Predicting user behavior SPAM conversion user to customer ETA Detect code in comments
  7. 7. REAL USE CASE Identify code in comments Classic problem Good problem for machine learning Complex, no simple solution A lot of data and expertise are available
  8. 8. SUPERVISEDTRAINING History data Training ModelReal data Results
  9. 9. THE FANN EXTENSION ext/fann (https://pecl.php.net/package/fann) Fast Artificial Neural Network http://leenissen.dk/fann/wp/ Neural networks in PHP Works on PHP 7, thanks to the hard work of Jakub Zelenka https://github.com/bukka/php-fann
  10. 10. NEURAL NETWORKS Imitation of nature Input layer Output layer Intermediate layers
  11. 11. NEURAL NETWORK Imitation of nature Input layer Output layer Intermediate layers
  12. 12. INITIALIZATION <?php $num_layers  = 1; $num_input  = 5; $num_neurons_hidden = 3; $num_output  = 1; $ann = fann_create_standard($num_layers, $num_input,  $num_neurons_hidden, $num_output); // Activation function fann_set_activation_function_hidden($ann,  FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output($ann,  FANN_SIGMOID_SYMMETRIC);
  13. 13. PREPARING DATA Raw data Extract Filter Human review Fann ready
  14. 14. EXPERT AT WORK // Test if the if is in a compressed format // none need yet // icon // There is a parser specified in `Parser::$KEYWORD_PARSERS` // $result should exist, regardless of $_message // $a && $b and multidimensional // numGlyphs + 1 // TODO : fix this; var_dump($var); // if(ob_get_clean()){ //$annots .= ' /StructParent '; // $cfg['Servers'][$i]['controlpass'] = 'pmapass';
  15. 15. INPUTVECTOR 'length' : size of the comment 'countDollar' : number of $ 'countEqual' : number of = 'countObjectOperator' number of -> operator ($o->p) 'countSemicolon' : number of semi-colon ;
  16. 16. INPUT DATA 46 5 1 825 0 0 0 1 0 37 2 0 0 0 0 55 2 2 0 1 1 61 2 1 3 1 1 ...  * This file is part of Exakat.  *  * Exakat is free software: you can redist  * it under the terms of the GNU Affero Ge  * the Free Software Foundation, either ve  * (at your option) any later version.  *  * Exakat is distributed in the hope that   * but WITHOUT ANY WARRANTY; without even   * MERCHANTABILITY or FITNESS FOR A PARTIC  * GNU Affero General Public License for m  *  * You should have received a copy of the   * along with Exakat.  If not, see <http:/  *  * The latest code can be found at <http:/  * */ // $x[3] or $x[] and multidimensional //if ($round == 3) { die('Round '.$round); //$this->errors[] = $this->language->get(' Number of input Number of incoming data Number of outgoing data
  17. 17. TRAINING $max_epochs  = 500000; $desired_error  = 0.001; // the actual training if (fann_train_on_file($ann,  'incoming.data',  $max_epochs,  $epochs_between_reports,  $desired_error)) {         fann_save($ann, 'model.out'); } fann_destroy($ann); ?>
  18. 18. TRAINING 47 cases 5 characteristics 3 hidden neurons + 5 input + 1 output Duration : 5.711 s
  19. 19. APPLICATION History data Training ModelReal data Results
  20. 20. APPLICATION <?php  $ann = fann_create_from_file('model.out');  $comment = '//$gvars = $this->getGraphicVars();'; $input = makeVector($comment); $results = fann_run($ann, $input);  if ($results[0] > 0.8) {       print ""$comment" -> $results[0] n";  }  ?>
  21. 21. RESULTS > 0.8 Answer between 0 and 1 Values ranges from -14 to 0,999 The closer to 1, the safer.The closer to 0, the safer. Is this a percentage? Is this a carrots count ? It's a mix of counts…
  22. 22. -16 -12 -8 -4 0 60.000000 70.000000 80.000000 90.000000 100.000000
  23. 23. REAL CASES Tested on 14093 comments Duration 367.01ms Found 1960 issues (14%)
  24. 24. 0.99999893 // $cfg['Servers'][$i]['controlhost'] = '';     0.99999928 //$_SESSION['Import_message'] = $message->getDisplay();     /* 0.99999928 if (defined('SESSIONUPLOAD')) {     // write sessionupload back into the loaded PMA session     $sessionupload = unserialize(SESSIONUPLOAD);     foreach ($sessionupload as $key => $value) {         $_SESSION[$key] = $value;     }     // remove session upload data that are not set anymore     foreach ($_SESSION as $key => $value) {         if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX))             == UPLOAD_PREFIX             && ! isset($sessionupload[$key])         ) {             unset($_SESSION[$key]);         }     }
  25. 25. 0.98780382 //LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232     0.99361396 // We have server(s) => apply default configuration      0.98383027 // Duration = as configured     0.99999928 // original -> translation mapping     0.97590065 // = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in    
  26. 26. True positive False positive True negative False negative Found by FANN Target
  27. 27. True positive False positive True negative False negative Found by FANN Target // $cfg['Servers'][$i]['table_coords'] = 'pma__tabl //(isset($attribs['height'])?$attribs['height']: 1) // if ($key != null) did not work for index "0"     // the PASSWORD() function     0.99999923 0.73295981 0.99999851 0.2104115
  28. 28. RESULTS 1960 issues 50+% of false positive With an easy clean, 822 issues reported 14k comments, analyzed in 367 ms Total time of coding : 27 mins. // = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in     /* vim: set expandtab sw=4 ts=4 sts=4: */
  29. 29. LEARN BETTER, NOT HARDER Better training data Improve characteristics Configure the neural network Change algorithm Automate learning Update constantly Real data History data Training Model Results Retroaction
  30. 30. BETTERTRAINING DATA More data, more data, more data Varied situations, real case situations Include specific cases Experience is capital https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
  31. 31. IMPROVE CHARACTERISTICS Add new characteristics Remove the one that are less interesting Find the right set of characteristics
  32. 32. NETWORK CONFIGURATION Input vector Intermediate neurons Activation function Output vector 0 5000 10000 15000 20000 1 2 3 4 5 6 7 8 9 10 1 layer 2 layers 3 layers 4 layers Time of training (ms)
  33. 33. CHANGE ALGORITHM First add more data before changing algorithm Try cascade2 algorithm from FANN 0.6 => 0 found 0.5 => 2 found Not found by the first algorithm
  34. 34. FINDINGTHE BEST Test with 2-4 layers
 10 neurons Measure results 0 2250 4500 6750 9000 1 2 3 4 5 6 7 8 9 10 11 12 13 1 layer 2 layers 3 layers 4 layers
  35. 35. DEEP LEARNING Chaining the neural networks Auto-encoders Unsupervised Learning Genetic algorithm, ant
  36. 36. OTHERTOOLS PHP ext/fann Langage R https://github.com/kachkaev/php-r Scikit-learn https://github.com/scikit-learn/scikit-learn Mahout https://mahout.apache.org/
  37. 37. @exakat https://joind.in/talk/42120 GRAZIE
  38. 38. AUTRES CONFIGURATIONS Fonction d'activation FANN_SIGMOID_SYMMETRIC FANN_LINEAR FANN_THRESHOLD FANN_SIN_SYMMETRIC
  39. 39. Linéaire Seuil Tangeante Gaussienne Quadratique Sigmoide
  40. 40. QUELLES APPLICATIONS? Non-déterministe Elimination de tout ce qui est systématique à trouver Accès à l'expertise et aux vecteurs de caractéristiques Couche finale après les résultats Classification, priorisation, approximation rapide
  41. 41. APPRENTISSAGE PAR RENFORCEMENT Logiciel Monde réel Récompense ActionRéaction
  42. 42. FILTRES BAYESIENS
  43. 43. ALGORITHMES GÉNÉTIQUES Population Population Selection Reproduction PopulationVariations

×