BigML Education - Models 2


Learn more about solving supervised learning problems using BigML. This tutorial uses a loan dataset to explain the sunburst view and how to deal with unbalanced datasets.

  1. 1. BigML Education Models 2 June 2017
  2. 2. BigML Education Program 2Models 2 In This Video • A new supervised learning problem: Predicting peer-to- peer loan defaults • Exploring models using the sunburst view • Objective field balancing and instance weighting • Leveraging missing data
  3. 3. BigML Education Program 3Models 2 Unbalanced Datasets • The problem: Unbalanced data • One of the classes in the dataset is far more rare than the other(s) • This rare class is particularly important to classify accurately • Examples: • Medical diagnosis: Disease is typically rare compared with health • Fraud detection: Fraud is typically rare compared to legitimate activity • Predictive maintenance: Systems typically work far more often than they fail
  4. 4. BigML Education Program 4Models 2 Missing Data • Data could be missing from your dataset for many different reasons • Human error, as in web forms • Deliberate choice, as in medical tests • Random corruption, as with database errors • Data missing for reasons unrelated to the objective should be ignored… • …but often data is missing for reasons that are closely related to the objective.
  5. 5. BigML Education Program 5Models 2 Review • Supervised learning provides an effective way to detect defaults on consumer loans • Use objective balancing to create “balanced” models from unbalanced datasets • If the missing data in your dataset has meaning, use “missing splits” to capture that meaning