BigML, Inc 1
Automation
Poul Petersen @pejpgrep
CIO, BigML, Inc @bigmlcom
API, WhizzML and Predictive Applications
BigML, Inc 2ML Crash Course - API/WhizzML/Predictive Apps
BigML Architecture
Tools
REST API
Distributed Machine Learning Backend
Source
Server
Dataset
Server
Model
Server
Prediction
Server
Sample
Server
WhizzML
Server
Evaluation
Server
Web-based Frontend
Visualizations
Smart Infrastructure
(auto-deployable, auto-scalable)
BigML, Inc 3ML Crash Course - API/WhizzML/Predictive Apps
The Need for a ML API
• Workflow Automation - reduce drudgery
• Abstraction - reuse code
• Composability - powerful combinations of APIs
• Integration - Dashboard or UI component
• Automate deployment
• Repeatable results
BigML, Inc 4ML Crash Course - API/WhizzML/Predictive Apps
Predictive Applications
Collect
& Format
Data
Define
ML
Problem
ETL
Model &
Evaluate
no
yes
Explore
Collect
& Format
Data
Model
Automate
Consume
& Monitor
Predict
Score
Label
Drift &
Anomaly
feature

engineer
Not

Possible
tune

algorithm
Goal
Met?
BigML, Inc 5ML Crash Course - API/WhizzML/Predictive Apps
BigML API Endpoint
https://bigml.io/ / /{id}?{auth}
source
dataset
model
ensemble
prediction
batchprediction
evaluation
…
andromeda
dev
dev/andromeda
• Path elements:
• /andromeda specifies the API version (optional)
• /dev specifies development mode
• if not specified, then latest API in production mode
• {id} is required for PUT and DELETE
• {auth} contains url parameters username and api_key
• api_key can be an alternative key
BigML, Inc 6ML Crash Course - API/WhizzML/Predictive Apps
BigML API Endpoint
https://bigml.io/...{JSON} {JSON}
Operation HTTP Method Semantics
CREATE POST
Creates a new resource. Returns a JSON document
including a unique identifier.
RETRIEVE GET
Retrieves either a specific resource or a list of
resources.
UPDATE PUT Updates a resource. Only certain fields are putable.
DELETE DELETE Deletes a resource
BigML, Inc 7ML Crash Course - API/WhizzML/Predictive Apps
BigML Bindings
https://github.com/bigmlcom/io
BigML, Inc 8ML Crash Course - API/WhizzML/Predictive Apps
Python Binding Overview
Operation HTTP Method Binding Method
CREATE POST api.create_<resource>(from, {opts})
RETRIEVE GET
api.get_<resource>(id, {opts})
api.list_<resource>({opts})
UPDATE PUT api.update_<resource>(id, {opts})
DELETE DELETE api.delete_<resource>(id)
• Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc
• id is a resource identifier or resource dict
• from is a resource identifier, dict, or string depending on context
BigML, Inc 9ML Crash Course - API/WhizzML/Predictive Apps
Diabetes Anomalies
DIABETES
SOURCE
DIABETES
DATASET
TRAIN SET
TEST SET
ALL
MODEL
CLEAN
DATASET
FILTER
ALL
MODEL
ALL
EVALUATION
CLEAN
EVALUATION
COMPARE
EVALUATIONS
ANAOMALY
DETECTOR
BigML, Inc 10
BigML, Inc 11ML Crash Course - API/WhizzML/Predictive Apps
WhizzML
• Complete programming language
• Machine Learning operations are first-class citizens
• Server-side execution abstracts infrastructure
• API First! - Everything is composable
• Shareable
A Domain-Specific Language (DSL) for
automating Machine Learning workflows.
BigML, Inc 12ML Crash Course - API/WhizzML/Predictive Apps
WhizzML vs API
WhizzML API	
  /	
  Bindings
Executes	
  server-­‐side	
  
Zero	
  latency	
  
Paralleliza?on	
  built-­‐in	
  
Sharing	
  built-­‐in	
  
Code	
  agnos?c	
  workflows	
  
Workflows	
  can	
  be	
  UI	
  integrated	
  
Requires	
  local	
  execu?on	
  
Every	
  API	
  call	
  has	
  latency	
  
Manual	
  paralleliza?on	
  
Manual	
  sharing	
  
Code	
  specific	
  workflows	
  
Workflows	
  external	
  to	
  UI
BigML, Inc 13ML Crash Course - API/WhizzML/Predictive Apps
WhizzML vs Flatline
WhizzML Flatline
Concerned	
  with	
  resources	
  
Turing	
  complete	
  
Op?mized	
  for	
  paralleliza?on
Concerned	
  with	
  datasets	
  
More	
  specific	
  to	
  features	
  
Op?mized	
  for	
  speed
BigML, Inc 14ML Crash Course - API/WhizzML/Predictive Apps
Simple Workflow
SOURCE DATASET MODEL
BigML, Inc 15ML Crash Course - API/WhizzML/Predictive Apps
Redfin Workflow
Model
Predicts
Sale Price
Sold
Homes
Compare
List to
Prediction
BigML, Inc 16ML Crash Course - API/WhizzML/Predictive Apps
Redfin Workflow
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES
BigML, Inc 17ML Crash Course - API/WhizzML/Predictive Apps
WhizzML Resources
LIBRARY
CITY 1 SOLD HOMES
CITY 1 DEALS
DATASET
EXECUTION
CITY 1 FORSALE HOMES
SCRIPT
BigML, Inc 18ML Crash Course - API/WhizzML/Predictive Apps
WhizzML Resources
LIBRARY
CITY 2 SOLD HOMES
CITY 2 DEALS
DATASET
EXECUTION
CITY 2 FORSALE HOMES
SCRIPT
BigML, Inc 19ML Crash Course - API/WhizzML/Predictive Apps
Scriptify
• "Reifies" a resource into a WhizzML script.
• Rapid prototyping meets automation.
BigML, Inc 20ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
Worth More
Worth Less
BigML, Inc 21ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
LATITUDE LONGITUDE REFERENCE
LATITUDE
REFERENCE
LONGITUDE
44.583 -123.296775 44.5638 -123.2794
44.604414 -123.296129 44.5638 -123.2794
44.600108 -123.29707 44.5638 -123.2794
44.603077 -123.295004 44.5638 -123.2794
44.589587 -123.301154 44.5638 -123.2794
Distance (m)
700
30.4
19.38
37.8
23.39
Flatline!
BigML, Inc 22ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
https://en.wikipedia.org/wiki/Haversine_formula
BigML, Inc 23ML Crash Course - API/WhizzML/Predictive Apps
WhizML FE
LIBRARY
SCRIPT
Haversine
BigML, Inc 24ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
Fix Missing Values in a “Meaningful” Way
Filter Zeros
Model 

insulin
Predict 

insulin
Select 

insulin
Fixed

Dataset
Amended

Dataset
Original

Dataset
Clean

Dataset
BigML, Inc 25ML Crash Course - API/WhizzML/Predictive Apps
WhizzML Workflow Types
Op?miza?on
Model	
  or	
  Ensemble	
  
Best-­‐First	
  Features	
  
SMACdown
Algorithms
Stacked	
  Generaliza?on	
  
Gradient	
  boos?ng	
  
Cross	
  Valida?on	
  
Transforma?ons
Flatline	
  Wrappers	
  
Remove	
  Anomalies
Domain	
  Specific
Applica?on	
  Workflow	
  
Repe??ve	
  Tasks
BigML, Inc 26ML Crash Course - API/WhizzML/Predictive Apps
Best-First Features
{F1}
CHOOSE BEST
S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb, Fc}
BigML, Inc 27ML Crash Course - API/WhizzML/Predictive Apps
Model Selection
ENSEMBLE LOGISTIC
REGRESSION
EVALUATION
SOURCE DATASET
TRAINING
TEST
MODEL
EVALUATIONEVALUATION
CHOOSE
BigML, Inc 28ML Crash Course - API/WhizzML/Predictive Apps
Model Tuning
ENSEMBLE
N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE
N=10
ENSEMBLE
N=1000
CHOOSE
BigML, Inc 29ML Crash Course - API/WhizzML/Predictive Apps
SMACdown
• How many models?
• How many nodes?
• Missing splits or not?
• Number of random candidates?
• Balance the objective?
SMACdown can tell you!
BigML, Inc 30ML Crash Course - API/WhizzML/Predictive Apps
Path to Automatic ML
time
Automation
REST	
  API
Programmable	
  
Infrastructure
A
Sauron	
  
• Automatic	
  deployment	
  and	
  
auto-­‐scaling
Data	
  Generation	
  and	
  
Filtering
C
Flatline	
  
• DSL	
  for	
  transformation	
  and	
  
new	
  field	
  generation
B
Wintermute	
  
• Distributed	
  Machine	
  Learning	
  
Framework	
  
2011 Spring 2016
Automatic	
  Model	
  
Selection
E
SMACdown	
  	
  
• Automatic	
  parameter	
  
optimization
Workflow	
  
Automation
D
WhizzML	
  
• DSL	
  for	
  programmable	
  
workflows	
  
BigML, Inc 31ML Crash Course - API/WhizzML/Predictive Apps
Higher Level Algorithms
• Stacked Generalization
• Boosting
• Adaboost
• Logitboost
• Martingale Boosting
• Gradient Boosting
BigML, Inc 32ML Crash Course - API/WhizzML/Predictive Apps
Stacked Generalization
ENSEMBLE
LOGISTIC
REGRESSION
SOURCE DATASET
MODEL
BATCH
PREDICTION
BATCH
PREDICTION
BATCH
PREDICTION
EXTENDED
DATASET
EXTENDED
DATASET
EXTENDED
DATASET
LOGISTIC
REGRESSION
BigML, Inc 33ML Crash Course - API/WhizzML/Predictive Apps
Why WhizzML
• Automation is critical to fulfilling the promise of ML
• WhizzML can create workflows that:
• Automate repetitive tasks.
• Automate model tuning and feature
selection.
• Combine ML models into more powerful
algorithms.
• Create shareable and re-usable executions.

API, WhizzML and Apps

  • 1.
    BigML, Inc 1 Automation PoulPetersen @pejpgrep CIO, BigML, Inc @bigmlcom API, WhizzML and Predictive Applications
  • 2.
    BigML, Inc 2MLCrash Course - API/WhizzML/Predictive Apps BigML Architecture Tools REST API Distributed Machine Learning Backend Source Server Dataset Server Model Server Prediction Server Sample Server WhizzML Server Evaluation Server Web-based Frontend Visualizations Smart Infrastructure (auto-deployable, auto-scalable)
  • 3.
    BigML, Inc 3MLCrash Course - API/WhizzML/Predictive Apps The Need for a ML API • Workflow Automation - reduce drudgery • Abstraction - reuse code • Composability - powerful combinations of APIs • Integration - Dashboard or UI component • Automate deployment • Repeatable results
  • 4.
    BigML, Inc 4MLCrash Course - API/WhizzML/Predictive Apps Predictive Applications Collect & Format Data Define ML Problem ETL Model & Evaluate no yes Explore Collect & Format Data Model Automate Consume & Monitor Predict Score Label Drift & Anomaly feature engineer Not Possible tune algorithm Goal Met?
  • 5.
    BigML, Inc 5MLCrash Course - API/WhizzML/Predictive Apps BigML API Endpoint https://bigml.io/ / /{id}?{auth} source dataset model ensemble prediction batchprediction evaluation … andromeda dev dev/andromeda • Path elements: • /andromeda specifies the API version (optional) • /dev specifies development mode • if not specified, then latest API in production mode • {id} is required for PUT and DELETE • {auth} contains url parameters username and api_key • api_key can be an alternative key
  • 6.
    BigML, Inc 6MLCrash Course - API/WhizzML/Predictive Apps BigML API Endpoint https://bigml.io/...{JSON} {JSON} Operation HTTP Method Semantics CREATE POST Creates a new resource. Returns a JSON document including a unique identifier. RETRIEVE GET Retrieves either a specific resource or a list of resources. UPDATE PUT Updates a resource. Only certain fields are putable. DELETE DELETE Deletes a resource
  • 7.
    BigML, Inc 7MLCrash Course - API/WhizzML/Predictive Apps BigML Bindings https://github.com/bigmlcom/io
  • 8.
    BigML, Inc 8MLCrash Course - API/WhizzML/Predictive Apps Python Binding Overview Operation HTTP Method Binding Method CREATE POST api.create_<resource>(from, {opts}) RETRIEVE GET api.get_<resource>(id, {opts}) api.list_<resource>({opts}) UPDATE PUT api.update_<resource>(id, {opts}) DELETE DELETE api.delete_<resource>(id) • Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc • id is a resource identifier or resource dict • from is a resource identifier, dict, or string depending on context
  • 9.
    BigML, Inc 9MLCrash Course - API/WhizzML/Predictive Apps Diabetes Anomalies DIABETES SOURCE DIABETES DATASET TRAIN SET TEST SET ALL MODEL CLEAN DATASET FILTER ALL MODEL ALL EVALUATION CLEAN EVALUATION COMPARE EVALUATIONS ANAOMALY DETECTOR
  • 10.
  • 11.
    BigML, Inc 11MLCrash Course - API/WhizzML/Predictive Apps WhizzML • Complete programming language • Machine Learning operations are first-class citizens • Server-side execution abstracts infrastructure • API First! - Everything is composable • Shareable A Domain-Specific Language (DSL) for automating Machine Learning workflows.
  • 12.
    BigML, Inc 12MLCrash Course - API/WhizzML/Predictive Apps WhizzML vs API WhizzML API  /  Bindings Executes  server-­‐side   Zero  latency   Paralleliza?on  built-­‐in   Sharing  built-­‐in   Code  agnos?c  workflows   Workflows  can  be  UI  integrated   Requires  local  execu?on   Every  API  call  has  latency   Manual  paralleliza?on   Manual  sharing   Code  specific  workflows   Workflows  external  to  UI
  • 13.
    BigML, Inc 13MLCrash Course - API/WhizzML/Predictive Apps WhizzML vs Flatline WhizzML Flatline Concerned  with  resources   Turing  complete   Op?mized  for  paralleliza?on Concerned  with  datasets   More  specific  to  features   Op?mized  for  speed
  • 14.
    BigML, Inc 14MLCrash Course - API/WhizzML/Predictive Apps Simple Workflow SOURCE DATASET MODEL
  • 15.
    BigML, Inc 15MLCrash Course - API/WhizzML/Predictive Apps Redfin Workflow Model Predicts Sale Price Sold Homes Compare List to Prediction
  • 16.
    BigML, Inc 16MLCrash Course - API/WhizzML/Predictive Apps Redfin Workflow MODEL FILTERSOLD HOMES BATCH PREDICTION NEW FEATURES DATASET DEALS DATASET FILTERFORSALE HOMES NEW FEATURES
  • 17.
    BigML, Inc 17MLCrash Course - API/WhizzML/Predictive Apps WhizzML Resources LIBRARY CITY 1 SOLD HOMES CITY 1 DEALS DATASET EXECUTION CITY 1 FORSALE HOMES SCRIPT
  • 18.
    BigML, Inc 18MLCrash Course - API/WhizzML/Predictive Apps WhizzML Resources LIBRARY CITY 2 SOLD HOMES CITY 2 DEALS DATASET EXECUTION CITY 2 FORSALE HOMES SCRIPT
  • 19.
    BigML, Inc 19MLCrash Course - API/WhizzML/Predictive Apps Scriptify • "Reifies" a resource into a WhizzML script. • Rapid prototyping meets automation.
  • 20.
    BigML, Inc 20MLCrash Course - API/WhizzML/Predictive Apps WhizzML FE Worth More Worth Less
  • 21.
    BigML, Inc 21MLCrash Course - API/WhizzML/Predictive Apps WhizzML FE LATITUDE LONGITUDE REFERENCE LATITUDE REFERENCE LONGITUDE 44.583 -123.296775 44.5638 -123.2794 44.604414 -123.296129 44.5638 -123.2794 44.600108 -123.29707 44.5638 -123.2794 44.603077 -123.295004 44.5638 -123.2794 44.589587 -123.301154 44.5638 -123.2794 Distance (m) 700 30.4 19.38 37.8 23.39 Flatline!
  • 22.
    BigML, Inc 22MLCrash Course - API/WhizzML/Predictive Apps WhizzML FE https://en.wikipedia.org/wiki/Haversine_formula
  • 23.
    BigML, Inc 23MLCrash Course - API/WhizzML/Predictive Apps WhizML FE LIBRARY SCRIPT Haversine
  • 24.
    BigML, Inc 24MLCrash Course - API/WhizzML/Predictive Apps WhizzML FE Fix Missing Values in a “Meaningful” Way Filter Zeros Model 
 insulin Predict 
 insulin Select 
 insulin Fixed
 Dataset Amended
 Dataset Original
 Dataset Clean
 Dataset
  • 25.
    BigML, Inc 25MLCrash Course - API/WhizzML/Predictive Apps WhizzML Workflow Types Op?miza?on Model  or  Ensemble   Best-­‐First  Features   SMACdown Algorithms Stacked  Generaliza?on   Gradient  boos?ng   Cross  Valida?on   Transforma?ons Flatline  Wrappers   Remove  Anomalies Domain  Specific Applica?on  Workflow   Repe??ve  Tasks
  • 26.
    BigML, Inc 26MLCrash Course - API/WhizzML/Predictive Apps Best-First Features {F1} CHOOSE BEST S = {Fa} {F2} {F3} {F4} Fn S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1} CHOOSE BEST S = {Fa, Fb} S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1} CHOOSE BEST S = {Fa, Fb, Fc}
  • 27.
    BigML, Inc 27MLCrash Course - API/WhizzML/Predictive Apps Model Selection ENSEMBLE LOGISTIC REGRESSION EVALUATION SOURCE DATASET TRAINING TEST MODEL EVALUATIONEVALUATION CHOOSE
  • 28.
    BigML, Inc 28MLCrash Course - API/WhizzML/Predictive Apps Model Tuning ENSEMBLE N=20 EVALUATION SOURCE DATASET TRAINING TEST EVALUATIONEVALUATION ENSEMBLE N=10 ENSEMBLE N=1000 CHOOSE
  • 29.
    BigML, Inc 29MLCrash Course - API/WhizzML/Predictive Apps SMACdown • How many models? • How many nodes? • Missing splits or not? • Number of random candidates? • Balance the objective? SMACdown can tell you!
  • 30.
    BigML, Inc 30MLCrash Course - API/WhizzML/Predictive Apps Path to Automatic ML time Automation REST  API Programmable   Infrastructure A Sauron   • Automatic  deployment  and   auto-­‐scaling Data  Generation  and   Filtering C Flatline   • DSL  for  transformation  and   new  field  generation B Wintermute   • Distributed  Machine  Learning   Framework   2011 Spring 2016 Automatic  Model   Selection E SMACdown     • Automatic  parameter   optimization Workflow   Automation D WhizzML   • DSL  for  programmable   workflows  
  • 31.
    BigML, Inc 31MLCrash Course - API/WhizzML/Predictive Apps Higher Level Algorithms • Stacked Generalization • Boosting • Adaboost • Logitboost • Martingale Boosting • Gradient Boosting
  • 32.
    BigML, Inc 32MLCrash Course - API/WhizzML/Predictive Apps Stacked Generalization ENSEMBLE LOGISTIC REGRESSION SOURCE DATASET MODEL BATCH PREDICTION BATCH PREDICTION BATCH PREDICTION EXTENDED DATASET EXTENDED DATASET EXTENDED DATASET LOGISTIC REGRESSION
  • 33.
    BigML, Inc 33MLCrash Course - API/WhizzML/Predictive Apps Why WhizzML • Automation is critical to fulfilling the promise of ML • WhizzML can create workflows that: • Automate repetitive tasks. • Automate model tuning and feature selection. • Combine ML models into more powerful algorithms. • Create shareable and re-usable executions.