1
David Gerster
VP Data Science
gerster@bigml.com
Demo: Predictive Modeling
• Train a predictive model using 699 biopsies
• The “label” of benign or malignant is known for each one
• Since we have labels, this is supervised learning
2
What if we don’t have labels?
• Can we get insight into our data if we don’t know the labels?
• Enter anomaly detection
• Since we don’t have labels, this is unsupervised learning
3
10 lines are needed
to isolate this data point
(not anomalous)
Only 4 lines are needed
to isolate this data point
(highly anomalous)
Demo: Anomaly Detection
• Remove the labels of benign or malignant
• Train an anomaly detector on this unlabeled data
• Create a new dataset with the anomaly scores as “labels”
• Use these “labels” to train a predictive model!
6
Who Needs Labels?
Who Needs Labels?
8
Minority Report
• Anomaly detection works great on large unlabeled datasets,
especially if you expect to find an (adversarial) minority class
• Millions of credit card transactions, billions of network events …
• Doesn’t require you to know what you’re looking for!
9
Thanks!
10
David Gerster
VP Data Science, BigML
gerster@bigml.com

Demo: Predictive Modeling with BigML - by David Gerster - PAPIs Connect

  • 1.
    1 David Gerster VP DataScience gerster@bigml.com
  • 2.
    Demo: Predictive Modeling •Train a predictive model using 699 biopsies • The “label” of benign or malignant is known for each one • Since we have labels, this is supervised learning 2
  • 3.
    What if wedon’t have labels? • Can we get insight into our data if we don’t know the labels? • Enter anomaly detection • Since we don’t have labels, this is unsupervised learning 3
  • 4.
    10 lines areneeded to isolate this data point (not anomalous)
  • 5.
    Only 4 linesare needed to isolate this data point (highly anomalous)
  • 6.
    Demo: Anomaly Detection •Remove the labels of benign or malignant • Train an anomaly detector on this unlabeled data • Create a new dataset with the anomaly scores as “labels” • Use these “labels” to train a predictive model! 6
  • 7.
  • 8.
  • 9.
    Minority Report • Anomalydetection works great on large unlabeled datasets, especially if you expect to find an (adversarial) minority class • Millions of credit card transactions, billions of network events … • Doesn’t require you to know what you’re looking for! 9
  • 10.
    Thanks! 10 David Gerster VP DataScience, BigML gerster@bigml.com