Basic WhizzML Workflows

Basic WhizzML Workﬂows
The BigML Team
May 2016
The BigML Team Basic WhizzML Workﬂows May 2016 1 / 24

Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?

Outline
1 What is WhizzML?

WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries

Outline
1 What is WhizzML?

WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.

Outline
1 What is WhizzML?

Basic Syntax
Atomic constants
"a string value"
23, -10, -1.23E11, 1.42342
true, false
Fully parenthesized preﬁx notation
(list-sources) ;; Function call without arguments
(log-info "Hello World!")
(* 2 (+ 2 3)) ;; Evaluates to 2 * (2 + 3)
(atan (tan 3)) ;; Nested function calls

Variables
Names
dataset_id
date-of-birth
sources*
positive?
x, y
Deﬁnition
(define name "Arthur Samuel")
(define birth-year 1901)
(define age (- 2016 birth-year))

Composite Values: Lists
Literals
[1.2 2.3 3.4]
["red" "blue" "orange" "yellow"]
[[1 2] "this" 3]
[] ;; the empty list
Constructors and accessors
(list 1 (+ 1 1) (* 3 2)) ;; => [1 2 6]
(append [1 2 3] 4) ;; => [1 2 3 4]
(head [1 2 3]) ;; => 1
(tail [1 2 3]) ;; => [2 3]
(nth ["a" "b" [1 2]] 1) ;; => "b"

Composite Values: Maps
Literals
{"name" "John"
"married" true
"date-of-birth" 1901}
{"source" "source/122323445445565665"
"input_fields" ["000000" "000001" "000003"]
"sample" {"rate" 0.3}}
Constructors and accessors
(assoc {"a" 3} "b" 4 "c" 5) ;; => {"a" 3 "b" 4 "c" 5}
(dissoc {"a" 3 "b" "c"} "b") ;; => {"a" 3}
(get {"a" 1 "b" 2} "a") ;; => 1
(get {"a" 1 "b" 2} "non-existent-key") ;; => false
(get {"a" 1 "b" 2} "non-existent-key" 42) ;; => 42
(get-in {"a" {"b" 2 "c" {"d" 42}}} ["a" "c" "d"]) ;; => 42

Functions
Deﬁning a function
(define (function-name arg1 arg2 ...)
body)
Examples
(define (add-numbers x y)
(+ x y))
(define (create-model-and-ensemble dataset-id)
(create-model {"dataset" dataset-id})
(create-ensemble {"dataset" dataset-id
"number_of_models" 10}))

Local variables
Let bindings
(let (name-1 val-1
name-2 val-2
...)
body)
Example:
(define no-of-models 10)
(let (msg "I am creating "
id "dataset/570861ecb85eee0472000016")
;; here msg, id and no-of-models are bound
(log-info msg no-of-models)
(create-ensemble {"dataset" id
"number_of_models" no-of-models}))
;;; here msg and id are *not* bound

Conditionals
if
(if (> x 0) ;; condition
"x is positive" ;; consequent
"x is not positive") ;; alternative
when
(when (positive? n)
(log-info "Creating a few models...")
(create-lots-of-models n))

Conditionals
cond
;; Nested conditionals
(if (> x 3)
"big"
(if (< x 1)
"small"
"standard"))
;; are better with cond:
(cond (> x 3) "big"
(< x 1) "small"
"standard")

Error handling
Signaling errors
(raise {"message" "Division by zero" "code" -10})
Catching errors
(try (/ 42 x)
(catch e
(log-warn "I've got an error with message: "
(get e "message")
" and code "
(get e "code"))))

Demo: a simple script
Create dataset and return its row number
(define (make-dataset id name)
(let (ds-id (create-and-wait-dataset {"source" id
"name" name}))
(fetch ds-id)))
(define dataset (make-dataset source-id source-name))
(define dataset-id (get dataset "resource"))
(define rows (get dataset "rows"))
https://gist.github.com/whizzmler/917a05cf6c173381116e3cc02da70e42

Outline
1 What is WhizzML?

Standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulation
Creation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)

Outline
1 What is WhizzML?

Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble

Model or Ensemble?
;; Functions for creating the two dataset parts
;; and the model and ensemble from the training set.
(define (sample-dataset ds-id rate oob)
(create-and-wait-dataset {"sample_rate" rate
"origin_dataset" ds-id
"out_of_bag" oob
"seed" "whizzml-example"}))
(define (split-dataset ds-id rate)
(list (sample-dataset ds-id rate false)
(sample-dataset ds-id rate true)))
(define (make-model ds-id)
(create-and-wait-model {"dataset" ds-id}))
(define (make-ensemble ds-id size)
(create-and-wait-ensemble {"dataset" ds-id
"number_of_models" size}))

Model or Ensemble?
;; Functions for evaluating model and ensemble
;; using the test set, and to extract f-measure from
;; the evaluation results
(define (evaluate-model model-id ds-id)
(create-and-wait-evaluation {"model" model-id
"dataset" ds-id}))
(define (evaluate-ensemble model-id ds-id)
(create-and-wait-evaluation {"ensemble" model-id
"dataset" ds-id}))
(define (f-measure ev-id)
(get-in (fetch ev-id) ["result" "model" "average_f_measure"]))

Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-and-wait-dataset {"source" src-id})
;; ^ full dataset
ids (split-dataset ds-id 0.8) ;; split it 80/20
train-id (nth ids 0) ;; the 80% for training
test-id (nth ids 1) ;; and 20% for evaluations
m-id (make-model train-id) ;; create a model
e-id (make-ensemble train-id 15) ;; and an ensemble
m-f (f-measure (evaluate-model m-id test-id)) ;; evaluate
e-f (f-measure (evaluate-ensemble e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))

Basic WhizzML Workflows

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (9)

Similar to Basic WhizzML Workflows

Similar to Basic WhizzML Workflows (20)

Recently uploaded

Recently uploaded (20)

Basic WhizzML Workflows