Datasets are the fundamental building block for your BigML workflows. Learn how to filter, sample, add new fields, or split a dataset into training and test datasets.
2. BigML Education Program 2Datasets
In This Video
• Introduction
• Typical workflow: 1-click creation
• Purpose of datasets in BigML
• Exploration
• Pre-flight check
• Basic Features
• Other ways to create datasets
• Train/Test split
• More Exploration
• Advanced Features
• Filtering
• Feature engineering with Flatline
4. BigML Education Program 4Datasets
What is a Dataset?
• Datasets are the fundamental building blocks
• Models, Clusters, etc. derive from datasets
• Sources can only become datasets
• Data exploration / Pre-flight check
• Missing/Errors
• Summary statistics
• Non-preferred fields
• Default objective for 1-click actions
6. BigML Education Program 6Datasets
Dataset Features
• Immutable - “dataset/5943226f01440401bf0003bd”
• Creating Datasets
• From a source
• From a dataset: sampling, training/test
• From a batch output
• Dynamic scatterplot
9. BigML Education Program 9Datasets
Loan Status
Charged Off
Current
Default
Fully Paid
In Grace
Late (16-30)
Late (31-120)
Filter
Current
In Grace
Late (16-30)
Late (31-120)
Open
Charged Off
Default
Fully Paid
Closed
Engineer
Good
Bad
Quality
10. BigML Education Program 10Datasets
Summary
• Dataset Purpose
• Fundamental building block
• Pre-flight check: counts, histograms, scatterplot
• Creating dataset
• From source: 1-click and sampling
• Training / Test split
• From batch output
• From dataset: sampling, filtering, new features