Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DutchMLSchool. Automating Decision Making

101 views

Published on

Enhancing and Automating Decision Making with Machine Learning - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

DutchMLSchool. Automating Decision Making

  1. 1. 1st edition | July 8-11, 2019
  2. 2. BigML, Inc #DutchMLSchool 2 Feature Engineering Creating Features that Make Machine Learning Work Poul Petersen CIO, BigML, Inc
  3. 3. BigML, Inc #DutchMLSchool Gaming the ML Performance 3 • Use ML to improve performance automatically • OptiML • Unsupervised Feature Engineering (PCA, Topic Models, Clustering, Anomaly Detection, etc) • Automated feature selection • Use domain knowledge to improve performance manually • Bespoke features (requires expertise) • Fusions of models • Manual feature selection A Tale of Two Strategies…
  4. 4. BigML, Inc #DutchMLSchool what is Feature Engineering 4 Feature Engineering: applying domain knowledge of the data to create new features that allow ML algorithms to work better, or to work at all. • This is really, really important - more than algorithm selection! • In fact, so important that BigML often does it automatically • ML Algorithms have no deeper understanding of data • Numerical: have a natural order, can be scaled, etc • Categorical: have discrete values, etc. • The "magic" is the ability to find patterns quickly and efficiently • ML Algorithms only know what you tell/show it with data • Medical: Kg and M, but BMI = Kg/M2 is better • Lending: Debt and Income, but DTI is better • Intuition can be risky: remember to prove it with an evaluation!
  5. 5. BigML, Inc #DutchMLSchool Built-in Transformations 5 2013-09-25 10:02 Date-Time Fields … year month day hour minute … … 2013 Sep 25 10 2 … … … … … … … … NUM NUMCAT NUM NUM • Date-Time fields have a lot of information "packed" into them • Splitting out the time components allows ML algorithms to discover time-based patterns. DATE-TIME
  6. 6. BigML, Inc #DutchMLSchool Built-in Transformations 6 Categorical Fields for Clustering/LR … alchemy_category … … business … … recreation … … health … … … … CAT business health recreation … … 1 0 0 … … 0 0 1 … … 0 1 0 … … … … … … NUM NUM NUM • Clustering and Logistic Regression require numeric fields for inputs • Categorical values are transformed to numeric vectors automatically* • *Note: In BigML, clustering uses k-prototypes and the encoding used for LR can be configured.
  7. 7. BigML, Inc #DutchMLSchool Built-in Transformations 7 Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon ‘em. TEXT Text Fields … great afraid born achieve … … 4 1 1 1 … … … … … … … NUM NUM NUM NUM • Unstructured text contains a lot of potentially interesting patterns • Bag-of-words analysis happens automatically and extracts the "interesting" tokens in the text • Another option is Topic Modeling to extract thematic meaning
  8. 8. BigML, Inc #DutchMLSchool Help ML to Work Better 8 { “url":"cbsnews", "title":"Breaking News Headlines Business Entertainment World News “, "body":" news covering all the latest breaking national and world news headlines, including politics, sports, entertainment, business and more.” } TEXT title body Breaking News… news covering… … … TEXT TEXT When text is not actually unstructured • In this case, the text field has structure (key/value pairs) • Extracting the structure as new features may allow the ML algorithm to work better
  9. 9. BigML, Inc #DutchMLSchool FE Demo #1 9
  10. 10. BigML, Inc #DutchMLSchool Help ML to Work at all 10 When the pattern does not exist Highway Number Direction Is Long 2 East-West FALSE 4 East-West FALSE 5 North-South TRUE 8 East-West FALSE 10 East-West TRUE … … … Goal: Predict principle direction from highway number ( = (mod (field "Highway Number") 2) 0)
  11. 11. BigML, Inc #DutchMLSchool FE Demo #2 11
  12. 12. BigML, Inc #DutchMLSchool Feature Engineering 12 Discretization Total Spend 7.342,99 304,12 4,56 345,87 8.546,32 NUM “Predict will spend $3,521 with error $1,232” Spend Category Top 33% Bottom 33% Bottom 33% Middle 33% Top 33% CAT “Predict customer will be Top 33% in spending”
  13. 13. BigML, Inc #DutchMLSchool FE Demo #3 13
  14. 14. BigML, Inc #DutchMLSchool Built-ins for FE 14 • Discretize: Converts a numeric value to categorical • Replace missing values: fixed/max/mean/median/etc • Normalize: Adjust a numeric value to a specific range of values while preserving the distribution • Math: Exponentiation, Logarithms, Squares, Roots, etc • Types: Force a field value to categorical, integer, or real • Random: Create random values for introducing noise • Statistics: Mean, Population • Refresh Fields: • Types: recomputes field types. Ex: #classes > 1000 • Preferred: recomputes preferred status
  15. 15. BigML, Inc #DutchMLSchool Flatline Add Fields 15 Computing with Existing Features Debt Income 10.134 100.000 85.234 134.000 8.112 21.500 0 45.900 17.534 52.000 NUM NUM (/ (field "Debt") (field "Income")) Debt Income Debt to Income Ratio 0,10 0,64 0,38 0 0,34 NUM
  16. 16. BigML, Inc #DutchMLSchool FE Demo #4 16
  17. 17. BigML, Inc #DutchMLSchool What is Flatline? 17 • DSL: • Invented by BigML - Programmatic / Optimized for speed • Transforms datasets into new datasets • Adding new fields / Filtering • Transformations are written in lisp-style syntax • Feature Engineering • Computing new fields: (/ (field "Debt") (field “Income”)) • Programmatic Filtering: • Filtering datasets according to functions that evaluate to true/false using the row of data as an input. Flatline: a domain specific language for feature engineering and programmatic filtering
  18. 18. BigML, Inc #DutchMLSchool Flatline 18 • Lisp style syntax: Operators come first • Correct: (+ 1 2) => NOT Correct: (1 + 2) • Dataset Fields are first-class citizens • (field “diabetes pedigree”) • Limited programming language structures • let, cond, if, map, list operators, */+-, etc. • Built-in transformations • statistics, strings, timestamps, windows
  19. 19. BigML, Inc #DutchMLSchool Flatline s-expressions 19 (= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) )) Name Month - 3 Month - 2 Month - 1 Joe Schmo 123,23 0 0 Jane Plain 0 0 0 Mary Happy 0 55,22 243,33 Tom Thumb 12,34 8,34 14,56 Un-Labelled Data Labelled data Name Month - 3 Month - 2 Month - 1 Default Joe Schmo 123,23 0 0 FALSE Jane Plain 0 0 0 TRUE Mary Happy 0 55,22 243,33 FALSE Tom Thumb 12,34 8,34 14,56 FALSE Adding Simple Labels to Data Define "default" as missing three payments in a row
  20. 20. BigML, Inc #DutchMLSchool FE Demo #5 20
  21. 21. BigML, Inc #DutchMLSchool Flatline s-expressions 21 date volume price 1 34353 314 2 44455 315 3 22333 315 4 52322 321 5 28000 320 6 31254 319 7 56544 323 8 44331 324 9 81111 287 10 65422 294 11 59999 300 12 45556 302 13 19899 301 Current - (4-day avg) std dev Shock: Deviations from a Trend day-4 day-3 day-2 day-1 4davg - 314 - 314 315 - 314 315 315 - 314 315 315 321 316,25 315 315 321 320 317,75 315 321 320 319 318,75
  22. 22. BigML, Inc #DutchMLSchool Flatline s-expressions 22 Current - (4-day avg) std dev Shock: Deviations from a Trend Current : (field “price”) 4-day avg: (avg-window “price” -4 -1) std dev: (standard-deviation “price”) (/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
  23. 23. BigML, Inc #DutchMLSchool FE Demo #6 23
  24. 24. BigML, Inc #DutchMLSchool Advanced s-expressions 24 ( = (mod (field "Highway Number") 2) 0) Highway isEven?
  25. 25. BigML, Inc #DutchMLSchool Advanced s-expressions 25 ( / ( mod ( - ( / ( epoch ( field "date-field" )) 1000 ) 621300 ) 2551443 ) 2551442 ) Moon Phase% https://gist.github.com/petersen-poul/0cf5022ed1768837fe13af72b2488329
  26. 26. BigML, Inc #DutchMLSchool Home Price Feature 26 Worth More Worth Less
  27. 27. BigML, Inc #DutchMLSchool Home Price Feature 27 LATITUDE LONGITUDE REFERENCE LATITUDE REFERENCE LONGITUDE 44,583 -123,296775 44,5638 -123,2794 44,604414 -123,296129 44,5638 -123,2794 44,600108 -123,29707 44,5638 -123,2794 44,603077 -123,295004 44,5638 -123,2794 44,589587 -123,301154 44,5638 -123,2794 Distance (m) 700 30,4 19,38 37,8 23,39
  28. 28. BigML, Inc #DutchMLSchool Haversine Formula 28 https://en.wikipedia.org/wiki/Haversine_formula
  29. 29. BigML, Inc #DutchMLSchool Advanced s-expressions 29 ( let ( R 6371000 latA (to-radians {lat-ref}) latB (to-radians ( field "LATITUDE" ) ) latD ( - latB latA ) longD ( to-radians ( - ( field "LONGITUDE" ) {long-ref} ) ) a ( + ( square ( sin ( / latD 2 ) ) ) ( * (cos latA) (cos latB) (square ( sin ( / longD 2))) ) ) c ( * 2 ( asin ( min (list 1 (sqrt a))))) ) ( * R c ) ) Distance Lat/Long <=> Ref (Haversine)
  30. 30. BigML, Inc #DutchMLSchool WhizzML + Flatline 30 HAVERSINE FLATLINE OUTPUT DATASET INPUT DATASET LONG Ref LAT Ref WHIZZML SCRIPT https://bigml.com/gallery/scripts
  31. 31. BigML, Inc #DutchMLSchool Advanced s-expressions 31 JSON Parser??? • Remember, Flatline is not a full programming language • No loops • No accumulated values • Code executes on one row at a time and has a limited view into other rows https://gist.github.com/petersen-poul/504c62ceaace76227cc6d8e0c5f1704b
  32. 32. BigML, Inc #DutchMLSchool Feature Engineering 32 Fix Missing Values in a “Meaningful” Way F i l t e r Zeros Model 
 insulin Predict 
 insulin Select 
 insulin Fixed
 Dataset Amended
 Dataset Original
 Dataset Clean
 Dataset ( if ( = (field "insulin") 0) (field "predicted insulin") (field "insulin"))
  33. 33. BigML, Inc #DutchMLSchool FE Demo #7 33
  34. 34. BigML, Inc #DutchMLSchool Feature Selection 34
  35. 35. BigML, Inc #DutchMLSchool Feature Selection 35 • Model Summary • Field Importance • Algorithmic • Best-First Feature Selection • Boruta • Leakage • Tight Correlations (AD, Plot, Correlations) • Test Data • Perfect future knowledge Care must be taken when creating features!
  36. 36. BigML, Inc #DutchMLSchool Feature Selection 36 Leakage • sales pipeline where step n-1 has no other outcome then step n. • stock close predicts stock open • churn retention: the worst rep is actually the best (correlation != causation) • cancer prediction where one input is a doctor ordered test for the condition • account ID predicts fraud (because only new accounts are fraudsters)
  37. 37. BigML, Inc #DutchMLSchool Summary 37 • Feature Engineering: what is it / why it is important • Automatic transformations: date-time, text, etc • Built-in functions: filtering and feature engineering • Discretization / Normalization / etc. • Flatline: programmatic feature engineering / filtering • Structure • Examples: Adding fields / filtering • When building features it is important to watch for leakage
  38. 38. BigML, Inc #DutchMLSchool 38 OptiML and Fusions Automating Machine Learning Poul Petersen CIO, BigML, Inc
  39. 39. BigML, Inc #DutchMLSchool Title 39 Decreasing Interpretability / Better Representation / Longer Training IncreasingDataSize/Complexity Early Stage Rapid Prototyping Mid Stage Proven Application Late Stage Critical Performance DeepnetsSingle Tree Model Logistic Regression Boosted Trees Random Decision Forest Decision Forest TO O H AR D
  40. 40. BigML, Inc #DutchMLSchool BigML Deepnets 40 • The success of a Deepnet is dependent on getting the right network structure for the dataset • But, there are too many parameters: • Nodes, layers, activation function, learning rate, etc… • And setting them takes significant expert knowledge • Solution: • Metalearning (a good initial guess) • Network search (try a bunch) Remember this?
  41. 41. BigML, Inc #DutchMLSchool OptiML 41 • Each resource has several parameters that impact quality • Number of trees, missing splits, nodes, weight • Rather than trial and error, we can use ML to find ideal parameters • Why not make the model type, Decision Tree, Boosted Tree, etc, a parameter as well? • Similar to Deepnet network search, but finds the optimum machine learning algorithm and parameters for your data automatically Key Insight: We can solve any parameter selection problem in a similar way.
  42. 42. BigML, Inc #DutchMLSchool The Challenge… 42 • We will start with a dataset from StumbleUpon • Train/Test split with seed “bigml” • Build and Evaluate: • 1-click Model, LR, Ensemble, Deepnet • Top model from OptiML output • Compare the results using the phi coefficient • Explore other ideas for improving performance further
  43. 43. BigML, Inc #DutchMLSchool OptiML Demo 43
  44. 44. BigML, Inc #DutchMLSchool Results… 44 All scores are phi, evaluated against a holdout • 1-Click Decision Tree: 0.36 • 1-Click LR: 0.47 • 1-Click Ensemble: 0.58 • Best OptiML Model (LR): 0.66 • 1-Click Deepnet: 0.67 • What else can we try?
  45. 45. BigML, Inc #DutchMLSchool Fusions Inside 45 • Fuse any set of models into a new “fusion” • Must have the same objective type • Inputs and feature space can differ • Weights can be added • Give more importance to individual models • Fusions can be fused as well • Especially useful for fusing OptiML models Key Insight: ML algorithms each have unique strengths and weaknesses
  46. 46. BigML, Inc #DutchMLSchool Performance thru Diversity 46 Dataset Optimized Deepnet Optimized Ensemble Optimized Logistic Regression Better?
  47. 47. BigML, Inc #DutchMLSchool Fusion Demo #1 47
  48. 48. BigML, Inc #DutchMLSchool Results… 48 All scores are phi, evaluated against a holdout • 1-Click Decision Tree: 0.36 • 1-Click LR: 0.47 • 1-Click Ensemble: 0.58 • Best OptiML Model (LR): 0.66 • 1-Click Deepnet: 0.67 • Fusion of top Model Types: 0.68
  49. 49. BigML, Inc #DutchMLSchool Fusions: Under the Hood 49 P(TRUE) = [56+(100-67)+2*78] / 4 Model Prediction Probability Weight Ensemble TRUE %56 1 Fus ion Deepnet FALSE %67 1 TRUE %61 Model TRUE %78 2 Classification Model Prediction Error Weight Ensemble 156,78 12,56 1 Fus ion Deepnet 139,55 9,88 1 160,13 17,49 Model 172,10 23,76 2 Regression
  50. 50. BigML, Inc #DutchMLSchool Fusions: Like any BigML Model 50 • Fully accessible thru API and WhizzML • Bindings have support for local predictions
  51. 51. BigML, Inc #DutchMLSchool Decision Boundary Smoothness 51 Single Tree: • Outcome changes abruptly near decision boundary • And not at all parallel to the boundary • This can be “surprising” Single Tree + Deepnet: • Keep the interpretability of the tree • But with a more nuanced decision boundary
  52. 52. BigML, Inc #DutchMLSchool Feature Stability 52 Feature Importance: Different subsets of features may have similar modeling performance Fusing models gives better resilience against missing values as well as ensuring that all relevant features are utilized.
  53. 53. BigML, Inc #DutchMLSchool Weighting over Time 53 1 Day Data significance over time: • Some data may change significance in different times • Short-term user behavior versus long-term • Weights can set to account for significance of time 1 Week 1 Month w=8 w=4 w=2
  54. 54. BigML, Inc #DutchMLSchool Improved Class Separation 54 Consider a 3-class objective • Really only care about “yes” versus “not yes” • A single model may struggle to separate the two negative classes Yes No Maybe yes/no/maybe yes/no yes/maybe
  55. 55. BigML, Inc #DutchMLSchool Feature Space Optimization 55 Model Skills: Some ML algorithms “generally” do better on some feature types: • RDF for sparse text vectors • LR/Deepnets for numeric features • Trees for categorical features Full Numeric Text
  56. 56. BigML, Inc #DutchMLSchool Fusions Demo #2 56
  57. 57. BigML, Inc #DutchMLSchool Results… 57 All scores are phi, evaluated against a holdout • 1-Click Decision Tree: 0.36 • 1-Click LR: 0.47 • 1-Click Ensemble: 0.58 • Best OptiML Model (LR): 0.66 • 1-Click Deepnet: 0.67 • Fusion of top Model Types: 0.68 • Custom Feature Fusion: 0.70
  58. 58. BigML, Inc #DutchMLSchool PCA Principal Component Analysis Poul Petersen CIO, BigML 58
  59. 59. BigML, Inc #DutchMLSchool Issues with High Dimensionality 59 • Implicitly increases model complexity, prone to overfitting • Requires more observations in order to generalize well • Contains correlated or useless variables • Data is difficult to visualize • Takes a longer time to train models or make predictions Principal Component Analysis addresses all of these issues
  60. 60. BigML, Inc #DutchMLSchool Other Approaches 60 MODEL Pruning, Node threshold ENSEMBLE Bagging, Randomization LOGISTIC REGRESSION L1 and L2 penalties DEEPNET Dropout
  61. 61. BigML, Inc #DutchMLSchool Dimensionality Reduction 61 Feature Selection • Preserves the original variables and selects a subset • Often uses recursive methods or statistical thresholds • Examples: RFE, Chi-Squared Test, Boruta Feature Extraction • Transforms original variables into variables better suited for modeling • Examples: word vectors, clustering • PCA falls into this category Manual Approach
  62. 62. BigML, Inc #DutchMLSchool When to use PCA 62 1. You want to reduce the number of variables in your model, but it is not clear which should be eliminated 2. You want to generate variables that are not correlated 3. You are okay with sacrificing some amount of interpretability for potential downstream performance gains
  63. 63. BigML, Inc #DutchMLSchool How Does PCA Work? 63 Each PC is a linear combination of original variables PC1 = w1F1 + w2F2 + w3F3 + … + wNFN PC2 = w1F1 + w2F2 + w3F3 + … + wNFN PCN = w1F1 + w2F2 + w3F3 + … + wNFN …
  64. 64. BigML, Inc #DutchMLSchool PCA Output 64 These principal components are not correlated
  65. 65. BigML, Inc #DutchMLSchool PCA Workflow 65 SOURCE DATASET TRAIN TEST
  66. 66. BigML, Inc #DutchMLSchool PCA Workflow 66 PCA SOURCE DATASET TRAIN TEST
  67. 67. BigML, Inc #DutchMLSchool PCA Workflow 67 BATCH PROJECTION BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA
  68. 68. BigML, Inc #DutchMLSchool PCA Workflow 68 NEW TRAIN FEATURES NEW TEST FEATURES BATCH PROJECTION BATCH PROJECTION SOURCE DATASET TRAIN TEST PCA
  69. 69. BigML, Inc #DutchMLSchool PCA Demo 69
  70. 70. BigML, Inc #DutchMLSchool BigML PCA 70 • Standard PCA only applies to numerical data • BigML uses three different data transformation methods in order to handle different data types • Numeric data: Principal Component Analysis (PCA) • Categorical data: Multiple Correspondence Analysis (MCA) • Mixed data: Factorial Analysis of Mixed Data (FAMD) • BigML will automatically handle numeric, text, items, and categorical data without needing user input
  71. 71. Co-organized by: Sponsor: Business Partners:

×