Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigML Education - Datasets

291 views

Published on

Datasets are the fundamental building block for your BigML workflows. Learn how to filter, sample, add new fields, or split a dataset into training and test datasets.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BigML Education - Datasets

  1. 1. BigML Education Datasets June 2017
  2. 2. BigML Education Program 2Datasets In This Video • Introduction • Typical workflow: 1-click creation • Purpose of datasets in BigML • Exploration • Pre-flight check • Basic Features • Other ways to create datasets • Train/Test split • More Exploration • Advanced Features • Filtering • Feature engineering with Flatline
  3. 3. BigML Education Program 3Datasets Sources Introduction
  4. 4. BigML Education Program 4Datasets What is a Dataset? • Datasets are the fundamental building blocks • Models, Clusters, etc. derive from datasets • Sources can only become datasets • Data exploration / Pre-flight check • Missing/Errors • Summary statistics • Non-preferred fields • Default objective for 1-click actions
  5. 5. BigML Education Program 5Datasets Datasets Basic Features
  6. 6. BigML Education Program 6Datasets Dataset Features • Immutable - “dataset/5943226f01440401bf0003bd” • Creating Datasets • From a source • From a dataset: sampling, training/test • From a batch output • Dynamic scatterplot
  7. 7. BigML Education Program 7Datasets Datasets Advanced Features
  8. 8. BigML Education Program 8Datasets Advanced Configuration • Dataset Filtering • Feature Engineering
  9. 9. BigML Education Program 9Datasets Loan Status Charged Off Current Default Fully Paid In Grace Late (16-30) Late (31-120) Filter Current In Grace Late (16-30) Late (31-120) Open Charged Off Default Fully Paid Closed Engineer Good Bad Quality
  10. 10. BigML Education Program 10Datasets Summary • Dataset Purpose • Fundamental building block • Pre-flight check: counts, histograms, scatterplot • Creating dataset • From source: 1-click and sampling • Training / Test split • From batch output • From dataset: sampling, filtering, new features

×