BSSML17 - API and WhizzML

BigML, Inc 2
API
Automating Machine Learning
Poul Petersen
CIO, BigML, Inc

BigML, Inc 3API / WhizzML
BigML Platform
Web-based Frontend
Visualizations
Distributed Machine Learning Backend
SOURCE
SERVER
DATASET
SERVER
MODEL
SERVER
PREDICTION
SERVER
EVALUATION
SERVER
SAMPLE
SERVER
WHIZZML
SERVER
Tools - https://bigml.com/tools
REST API - https://bigml.com/api
Smart Infrastructure
(auto-deployable, auto-scalable)
SERVERS
EVENTS GEARMAN
QUEUE
DESIRED
TOPOLOGY
AWS
COSTS
RUNQUEUE
SCALER
BUSY
SCALER
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
AUTO
TOPOLOGY
ACTUAL
TOPOLOGY
MESSAGE
QUEUE

The Need for a ML API
• Workﬂow Automation - reduce drudgery
• Abstraction - reuse code
• Composability - powerful combinations of APIs
• Integration - Dashboard or UI component
• Automate deployment
• Repeatable results

Predictive Applications
Collect
& Format
Data
Deﬁne
ML
Problem
ETL
Model &
Evaluate
no
yes
Explore
Collect
& Format
Data
Model
Automate
Consume
& Monitor
Predict
Score
Label
Drift &
Anomaly
feature

engineer
Not

Possible
tune

algorithm
Goal
Met?

BigML API Endpoint
https://bigml.io/ / /{id}?{auth}
source
dataset
model
ensemble
prediction
batchprediction
evaluation
…
andromeda
dev
dev/andromeda
• Path elements:
• /andromeda specifies the API version (optional)
• /dev specifies development mode
• if not specified, then latest API in production mode
• {id} is required for PUT and DELETE
• {auth} contains url parameters username and api_key
• api_key can be an alternative key

BigML API Endpoint
https://bigml.io/...{JSON} {JSON}
Operation HTTP Method Semantics
CREATE POST
Creates a new resource. Returns a JSON document
including a unique identifier.
RETRIEVE GET
Retrieves either a specific resource or a list of
resources.
UPDATE PUT Updates a resource. Only certain fields are putable.
DELETE DELETE Deletes a resource

Simple Workﬂow
SOURCE DATASET MODEL PREDICTION

BigML Bindings
https://github.com/bigmlcom/io

Python Binding Overview
Operation HTTP Method Binding Method
CREATE POST api.create_<resource>(from, {opts})
RETRIEVE GET
api.get_<resource>(id, {opts})
api.list_<resource>({opts})
UPDATE PUT api.update_<resource>(id, {opts})
DELETE DELETE api.delete_<resource>(id)
• Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc
• id is a resource identiﬁer or resource dict
• from is a resource identiﬁer, dict, or string depending on context

Diabetes Anomalies
DIABETES
SOURCE
DIABETES
DATASET
TRAIN SET
TEST SET
ALL
MODEL
CLEAN
DATASET
FILTER
ALL
MODEL
ALL
EVALUATION
CLEAN
EVALUATION
COMPARE
EVALUATIONS
ANAOMALY
DETECTOR

API / Bindings Links
• Full API Documentation
• https://bigml.com/api
• API Endpoint
• https://bigml.io
• Bindings Index
• https://github.com/bigmlcom/io
• Python Bindings
• https://bigml.readthedocs.io/en/latest/

BigML, Inc 16
WhizzML
Server-Side Machine Learning Automation
Poul Petersen
CIO, BigML, Inc

WhizzML
• Complete programming language
• Machine Learning operations are first-class citizens
• Server-side execution abstracts infrastructure
• API First! - Everything is composable
• Shareable
A Domain-Specific Language (DSL) for
automating Machine Learning workflows.

WhizzML vs API
WhizzML API / Bindings
Executes server-side
Zero latency
Parallelization built-in
Sharing built-in
Code agnostic workflows
Workflows can be UI
integrated
Requires local execution
Every API call has latency
Manual parallelization
Manual sharing
Code specific workflows
Workflows external to UI

Simple Workﬂow
SOURCE DATASET MODEL PREDICTION

WhizzML Resources
LIBRARY
SCRIPT
EXECUTION
• Reusable building-block
• Executable code that defines a workflow
• Imports libraries.
• Defines inputs that parametrize script.
• Defines outputs computed by the script.
• Given a specific set in inputs computes outputs.

WhizzML define
• Constants - (define some-variable some-value)
• (define pi 3.14159)
• (define url "https://example.com")
• Expressions - (define some-variable (expression))
• (define source (create-source {opts}))
• Functions - (define (func-name opt1… optn) (expression))
• (define (one-click-dataset source) (create-source source
{opts})

WhizzML let
• Define multiple variables for an expression.
• (let ( pi 3.14159  
url "https://example.com")  
(expression))
• Combine with define to create workflows
• (define (one-click-model source) (  
let (dataset (create-dataset source {opts} ))  
(create-model dataset {opts})))

WhizzML resources
• Handling asynchronous creation
• Implicit: (create-dataset (create-source {opts}))
• Explicit: (create-dataset (create-and-wait-source {opts}))
• Generic: (wait resource)
• Fetching full resource object
• (define src-id (create-and-wait-source {opts})) 
(fetch src-id)
• Extracting values from resource maps
• (define ds-id (create-and-wait-dataset {opts})) 
(define ds-name (get (fetch ds-id) "name")

WhizzML vs Flatline
WhizzML Flatline
Concerned with resources
Turing complete
Optimized for parallelization
Concerned with datasets
More speciﬁc to features
Optimized for speed

WhizzML flatline
• Define Flatline transformations inside WhizzML
• (flatline "string")
• (flatline "(/ (field "Kg") (field "M2"))"
• WhizzML variables can be referenced with {{}}
• (let (inc 1) (flatline "(+ (field "age") {{inc}})")
• Apply to dataset with new_fields
• (create-dataset dataset {  
"new_fields" [ { 
"name" (str colA " + " colB) 
"field" (flatline "( + ( field {{colA}} ) ( field {{colB}}
))") }]}))

Redﬁn Workﬂow
Model
Predicts
Sale Price
Sold
Homes
Compare
List to
Prediction

Redﬁn Workﬂow
MODEL
FILTERSOLD HOMES
BATCH
PREDICTION
NEW FEATURES
DATASET DEALS
DATASET
FILTERFORSALE HOMES NEW FEATURES

WhizzML Resources
LIBRARY
CITY 1 SOLD HOMES
CITY 1 DEALS
DATASET
EXECUTION
CITY 1 FORSALE HOMES
SCRIPT

WhizzML Resources
LIBRARY
CITY 2 SOLD HOMES
CITY 2 DEALS
DATASET
EXECUTION
CITY 2 FORSALE HOMES
SCRIPT

WhizzML FE
Worth More
Worth Less

WhizzML FE
LATITUDE LONGITUDE REFERENCE
LATITUDE
REFERENCE
LONGITUDE
44,583 -123,296775 44,5638 -123,2794
44,604414 -123,296129 44,5638 -123,2794
44,600108 -123,29707 44,5638 -123,2794
44,603077 -123,295004 44,5638 -123,2794
44,589587 -123,301154 44,5638 -123,2794
Distance (m)
700
30,4
19,38
37,8
23,39
Flatline!

WhizzML FE
https://en.wikipedia.org/wiki/Haversine_formula

WhizzML FE
LIBRARY
SCRIPT
Haversine

Scriptify
• "Reiﬁes" a resource into a WhizzML script.
• Rapid prototyping meets automation.

WhizzML FE
Fix Missing Values in a “Meaningful” Way
Filter Zeros
Model  
insulin
Predict  
insulin
Select  
insulin
Fixed 
Dataset
Amended 
Dataset
Original 
Dataset
Clean 
Dataset

WhizzML Workﬂow Types
Optimizatio
Model or Ensemble
Best-First Features
SMACdown
Algorithms
Stacked Generalization
Gradient boosting
Cross Validation
Transformatio
Flatline Wrappers
Remove Anomalies
Domain
Application Workﬂow
Repetitive Tasks

Best-First Features
{F1}
CHOOSE BEST
S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST
S = {Fa, Fb, Fc}

Model Selection
ENSEMBLE LOGISTIC
REGRESSION
EVALUATION
SOURCE DATASET
TRAINING
TEST
MODEL
EVALUATIONEVALUATION
CHOOSE

Model Tuning
ENSEMBLE
N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE
N=10
ENSEMBLE
N=1000
CHOOSE

SMACdown
• How many models?
• How many nodes?
• Missing splits or not?
• Number of random candidates?
• Balance the objective?
SMACdown can tell you!

Higher Level Algorithms
• Stacked Generalization
• Boosting
• Adaboost
• Logitboost
• Martingale Boosting
• Gradient Boosting
• Genetic Algorithms

Stacked Generalization
ENSEMBLE
LOGISTIC
REGRESSION
SOURCE DATASET
MODEL
BATCH
PREDICTION
BATCH
PREDICTION
BATCH
PREDICTION
EXTENDED
DATASET
EXTENDED
DATASET
EXTENDED
DATASET
LOGISTIC
REGRESSION

Why WhizzML
• Automation is critical to fulﬁlling the promise of ML
• WhizzML can create workﬂows that:
• Automate repetitive tasks.
• Automate model tuning and feature
selection.
• Combine ML models into more powerful
algorithms.
• Create shareable and re-usable executions.

BSSML17 - API and WhizzML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BSSML17 - API and WhizzML

Similar to BSSML17 - API and WhizzML (20)

More from BigML, Inc

More from BigML, Inc (20)

Recently uploaded

Recently uploaded (20)

BSSML17 - API and WhizzML