Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DutchMLSchool. Your first BigML Project


Published on

Your first BigML Project - Workshop: Working with the Masters.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

DutchMLSchool. Your first BigML Project

  1. 1. 1st edition | July 8-11, 2019
  2. 2. BigML, Inc #DutchMLSchool My First BigML Model Subtitle here Mercè Martín VP of Applications, BigML 2
  3. 3. BigML, Inc #DutchMLSchool 3 • Lots of decisions • Lots of potentially related variables • Uncertain correlations ML CAN HELP Do I really need a model?
  4. 4. BigML, Inc #DutchMLSchool 4 We decide the actionNew data arrives The model labels it Maybe I too could use a model…
  5. 5. BigML, Inc #DutchMLSchool The challenge 5
  6. 6. BigML, Inc #DutchMLSchool Credit delinquency I WANT TO MINIMIZE RISK BY PREDICTING DEFAULTS 6
  7. 7. BigML, Inc #DutchMLSchool First step 7
  8. 8. BigML, Inc #DutchMLSchool Defining the question 8
  9. 9. BigML, Inc #DutchMLSchool Defining the real question 9 When do I consider a customer is in default? When the customer misses payments? What if the customer pays late? What is the maximum delinquency that you allow?
  10. 10. BigML, Inc #DutchMLSchool Defining the contest goal 10 Predicting who will be 90 days past due or worse to act only on them
  11. 11. BigML, Inc #DutchMLSchool And now… 11
  12. 12. BigML, Inc #DutchMLSchool The First Decision 12
  13. 13. BigML, Inc #DutchMLSchool The Data Dictionary 13 Variable Name Description Type SeriousDlqin2yrs Person experienced 90 days past due delinquency or worse Y/N RevolvingUtilizationOfUnsecuredLines Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits percentage age Age of borrower in years integer NumberOfTime30-59DaysPastDueNotWo rse Number of times borrower has been 30-59 days past due but no worse in the last 2 years. integer DebtRatio Monthly debt payments, alimony,living costs divided by monthy gross income percentage MonthlyIncome Monthly income real NumberOfOpenCreditLinesAndLoans Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) integer NumberOfTimes90DaysLate Number of times borrower has been 90 days or more past due. integer NumberRealEstateLoansOrLines Number of mortgage and real estate loans including home equity lines of credit integer NumberOfTime60-89DaysPastDueNotWo rse Number of times borrower has been 60-89 days past due but no worse in the last 2 years. integer NumberOfDependents Number of dependents in family excluding themselves (spouse, children etc.) integer 10 predictors
  14. 14. BigML, Inc #DutchMLSchool The Data 14
  15. 15. BigML, Inc #DutchMLSchool The Source 15 How to interpret your data? • Field types • Locale (decimals) • Missing tokens • Text / Items parsing
  16. 16. BigML, Inc #DutchMLSchool The Dataset 16 How is data distributed? • Histograms • Statistics • Number of missings • Number of errors
  17. 17. BigML, Inc #DutchMLSchool And now… The Model 17
  18. 18. BigML, Inc #DutchMLSchool The Model 18 What insights will the model extract? • Patterns • Importance • and…
  19. 19. BigML, Inc #DutchMLSchool The Prediction 19 What label corresponds to this loan? • Predictions (labels) • Confidence • Explanations
  20. 20. BigML, Inc #DutchMLSchool Are predictions correct? 20
  22. 22. BigML, Inc #DutchMLSchool And now… The Evaluation 22
  23. 23. BigML, Inc #DutchMLSchool The Evaluation 23 Do predictions match the real values? Hey! Great accuracy!!! right?
  24. 24. BigML, Inc #DutchMLSchool I wish to make a complaint! 24
  25. 25. BigML, Inc #DutchMLSchool The Evaluation 25 Do predictions match the real values? • Positive class: 1 1 / 1 Predicted / Actual TP FN 0 / 1 FP TN 1 / 0 0 / 0
  26. 26. BigML, Inc #DutchMLSchool The Costs 26 Predicting who will be 90 days past due or worse to act only on them • Always remember the goal TO MINIMIZE COST WE SHOULD MAXIMIZE THE RECALL • And the costs of failing!!! Unbalanced
  27. 27. BigML, Inc #DutchMLSchool And now… Model Tuning 27
  28. 28. BigML, Inc #DutchMLSchool Compensating unbalance 28 The percentage of examples of the class we are interested is very low Increasing their frequency could help the model to learn better
  29. 29. BigML, Inc #DutchMLSchool Choosing according to Costs 29 THE BALANCED MODEL WORKS BETTER vs. Unbalanced Balanced
  30. 30. BigML, Inc #DutchMLSchool And now… Automating 30
  31. 31. BigML, Inc #DutchMLSchool The OptiML 31
  32. 32. BigML, Inc #DutchMLSchool Automating tuning 32 Smart search for the best performing configuration
  33. 33. BigML, Inc #DutchMLSchool And the winner is… 33 A simple decision tree!!! • 19-node • balanced • pruned
  34. 34. BigML, Inc #DutchMLSchool Operating the model 34 Pick the probability threshold to decide when to accept your prediction
  36. 36. BigML, Inc #DutchMLSchool Going to production 36 What does production mean for you? Well, I need to • Predict a bunch of data periodically: batch predictions • Check from a call center as a customer calls: single predictions • Use immediately the predicted value in my web: single local predictions • Integrate groups of predictions in my app: batch local predictions
  37. 37. BigML, Inc #DutchMLSchool Whitebox models & bindings 37 Predictions should be integrated in any widget and software • IT systems • Mobiles • Tablets • ATMs • Amazon Echo • Google Sheets • Web sites
  38. 38. BigML, Inc #DutchMLSchool Going to production 38
  39. 39. BigML, Inc #DutchMLSchool Local production environment 39
  40. 40. Co-organized by: Sponsor: Business Partners: