Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigML Education - Models 2


Published on

Learn more about solving supervised learning problems using BigML. This tutorial uses a loan dataset to explain the sunburst view and how to deal with unbalanced datasets.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BigML Education - Models 2

  1. 1. BigML Education Models 2 June 2017
  2. 2. BigML Education Program 2Models 2 In This Video • A new supervised learning problem: Predicting peer-to- peer loan defaults • Exploring models using the sunburst view • Objective field balancing and instance weighting • Leveraging missing data
  3. 3. BigML Education Program 3Models 2 Unbalanced Datasets • The problem: Unbalanced data • One of the classes in the dataset is far more rare than the other(s) • This rare class is particularly important to classify accurately • Examples: • Medical diagnosis: Disease is typically rare compared with health • Fraud detection: Fraud is typically rare compared to legitimate activity • Predictive maintenance: Systems typically work far more often than they fail
  4. 4. BigML Education Program 4Models 2 Missing Data • Data could be missing from your dataset for many different reasons • Human error, as in web forms • Deliberate choice, as in medical tests • Random corruption, as with database errors • Data missing for reasons unrelated to the objective should be ignored… • …but often data is missing for reasons that are closely related to the objective.
  5. 5. BigML Education Program 5Models 2 Review • Supervised learning provides an effective way to detect defaults on consumer loans • Use objective balancing to create “balanced” models from unbalanced datasets • If the missing data in your dataset has meaning, use “missing splits” to capture that meaning