Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Basic WhizzML Workflows

312 views

Published on

WhizzML is a domain-specific language for automating Machine Learning workflows, implement high-level Machine Learning algorithms, and easily share them with others. WhizzML offers out-of-the-box scalability, abstracts away the complexity of underlying infrastructure, and helps analysts, developers, and scientists reduce the burden of repetitive and time-consuming analytics tasks.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Basic WhizzML Workflows

  1. 1. Basic WhizzML Workflows The BigML Team May 2016 The BigML Team Basic WhizzML Workflows May 2016 1 / 24
  2. 2. Outline 1 What is WhizzML? 2 WhizzML Server-side Resources 3 WhizzML Language Basics 4 Standard Library Overview 5 Tutorial Walkthrough: Model or Ensemble? The BigML Team Basic WhizzML Workflows May 2016 2 / 24
  3. 3. Outline 1 What is WhizzML? 2 WhizzML Server-side Resources 3 WhizzML Language Basics 4 Standard Library Overview 5 Tutorial Walkthrough: Model or Ensemble? The BigML Team Basic WhizzML Workflows May 2016 3 / 24
  4. 4. WhizzML in a Nutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries The BigML Team Basic WhizzML Workflows May 2016 4 / 24
  5. 5. Outline 1 What is WhizzML? 2 WhizzML Server-side Resources 3 WhizzML Language Basics 4 Standard Library Overview 5 Tutorial Walkthrough: Model or Ensemble? The BigML Team Basic WhizzML Workflows May 2016 5 / 24
  6. 6. WhizzML REST Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. The BigML Team Basic WhizzML Workflows May 2016 6 / 24
  7. 7. Outline 1 What is WhizzML? 2 WhizzML Server-side Resources 3 WhizzML Language Basics 4 Standard Library Overview 5 Tutorial Walkthrough: Model or Ensemble? The BigML Team Basic WhizzML Workflows May 2016 7 / 24
  8. 8. Basic Syntax Atomic constants "a string value" 23, -10, -1.23E11, 1.42342 true, false Fully parenthesized prefix notation (list-sources) ;; Function call without arguments (log-info "Hello World!") (* 2 (+ 2 3)) ;; Evaluates to 2 * (2 + 3) (atan (tan 3)) ;; Nested function calls The BigML Team Basic WhizzML Workflows May 2016 8 / 24
  9. 9. Variables Names dataset_id date-of-birth sources* positive? x, y Definition (define name "Arthur Samuel") (define birth-year 1901) (define age (- 2016 birth-year)) The BigML Team Basic WhizzML Workflows May 2016 9 / 24
  10. 10. Composite Values: Lists Literals [1.2 2.3 3.4] ["red" "blue" "orange" "yellow"] [[1 2] "this" 3] [] ;; the empty list Constructors and accessors (list 1 (+ 1 1) (* 3 2)) ;; => [1 2 6] (append [1 2 3] 4) ;; => [1 2 3 4] (head [1 2 3]) ;; => 1 (tail [1 2 3]) ;; => [2 3] (nth ["a" "b" [1 2]] 1) ;; => "b" The BigML Team Basic WhizzML Workflows May 2016 10 / 24
  11. 11. Composite Values: Maps Literals {"name" "John" "married" true "date-of-birth" 1901} {"source" "source/122323445445565665" "input_fields" ["000000" "000001" "000003"] "sample" {"rate" 0.3}} Constructors and accessors (assoc {"a" 3} "b" 4 "c" 5) ;; => {"a" 3 "b" 4 "c" 5} (dissoc {"a" 3 "b" "c"} "b") ;; => {"a" 3} (get {"a" 1 "b" 2} "a") ;; => 1 (get {"a" 1 "b" 2} "non-existent-key") ;; => false (get {"a" 1 "b" 2} "non-existent-key" 42) ;; => 42 (get-in {"a" {"b" 2 "c" {"d" 42}}} ["a" "c" "d"]) ;; => 42 The BigML Team Basic WhizzML Workflows May 2016 11 / 24
  12. 12. Functions Defining a function (define (function-name arg1 arg2 ...) body) Examples (define (add-numbers x y) (+ x y)) (define (create-model-and-ensemble dataset-id) (create-model {"dataset" dataset-id}) (create-ensemble {"dataset" dataset-id "number_of_models" 10})) The BigML Team Basic WhizzML Workflows May 2016 12 / 24
  13. 13. Local variables Let bindings (let (name-1 val-1 name-2 val-2 ...) body) Example: (define no-of-models 10) (let (msg "I am creating " id "dataset/570861ecb85eee0472000016") ;; here msg, id and no-of-models are bound (log-info msg no-of-models) (create-ensemble {"dataset" id "number_of_models" no-of-models})) ;;; here msg and id are *not* bound The BigML Team Basic WhizzML Workflows May 2016 13 / 24
  14. 14. Conditionals if (if (> x 0) ;; condition "x is positive" ;; consequent "x is not positive") ;; alternative when (when (positive? n) (log-info "Creating a few models...") (create-lots-of-models n)) The BigML Team Basic WhizzML Workflows May 2016 14 / 24
  15. 15. Conditionals cond ;; Nested conditionals (if (> x 3) "big" (if (< x 1) "small" "standard")) ;; are better with cond: (cond (> x 3) "big" (< x 1) "small" "standard") The BigML Team Basic WhizzML Workflows May 2016 15 / 24
  16. 16. Error handling Signaling errors (raise {"message" "Division by zero" "code" -10}) Catching errors (try (/ 42 x) (catch e (log-warn "I've got an error with message: " (get e "message") " and code " (get e "code")))) The BigML Team Basic WhizzML Workflows May 2016 16 / 24
  17. 17. Demo: a simple script Create dataset and return its row number (define (make-dataset id name) (let (ds-id (create-and-wait-dataset {"source" id "name" name})) (fetch ds-id))) (define dataset (make-dataset source-id source-name)) (define dataset-id (get dataset "resource")) (define rows (get dataset "rows")) https://gist.github.com/whizzmler/917a05cf6c173381116e3cc02da70e42 The BigML Team Basic WhizzML Workflows May 2016 17 / 24
  18. 18. Outline 1 What is WhizzML? 2 WhizzML Server-side Resources 3 WhizzML Language Basics 4 Standard Library Overview 5 Tutorial Walkthrough: Model or Ensemble? The BigML Team Basic WhizzML Workflows May 2016 18 / 24
  19. 19. Standard functions • Numeric and relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) The BigML Team Basic WhizzML Workflows May 2016 19 / 24
  20. 20. Outline 1 What is WhizzML? 2 WhizzML Server-side Resources 3 WhizzML Language Basics 4 Standard Library Overview 5 Tutorial Walkthrough: Model or Ensemble? The BigML Team Basic WhizzML Workflows May 2016 20 / 24
  21. 21. Model or Ensemble? • Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) https://github.com/whizzml/examples/tree/master/model-or-ensemble The BigML Team Basic WhizzML Workflows May 2016 21 / 24
  22. 22. Model or Ensemble? ;; Functions for creating the two dataset parts ;; and the model and ensemble from the training set. (define (sample-dataset ds-id rate oob) (create-and-wait-dataset {"sample_rate" rate "origin_dataset" ds-id "out_of_bag" oob "seed" "whizzml-example"})) (define (split-dataset ds-id rate) (list (sample-dataset ds-id rate false) (sample-dataset ds-id rate true))) (define (make-model ds-id) (create-and-wait-model {"dataset" ds-id})) (define (make-ensemble ds-id size) (create-and-wait-ensemble {"dataset" ds-id "number_of_models" size})) The BigML Team Basic WhizzML Workflows May 2016 22 / 24
  23. 23. Model or Ensemble? ;; Functions for evaluating model and ensemble ;; using the test set, and to extract f-measure from ;; the evaluation results (define (evaluate-model model-id ds-id) (create-and-wait-evaluation {"model" model-id "dataset" ds-id})) (define (evaluate-ensemble model-id ds-id) (create-and-wait-evaluation {"ensemble" model-id "dataset" ds-id})) (define (f-measure ev-id) (get-in (fetch ev-id) ["result" "model" "average_f_measure"])) The BigML Team Basic WhizzML Workflows May 2016 23 / 24
  24. 24. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-and-wait-dataset {"source" src-id}) ;; ^ full dataset ids (split-dataset ds-id 0.8) ;; split it 80/20 train-id (nth ids 0) ;; the 80% for training test-id (nth ids 1) ;; and 20% for evaluations m-id (make-model train-id) ;; create a model e-id (make-ensemble train-id 15) ;; and an ensemble m-f (f-measure (evaluate-model m-id test-id)) ;; evaluate e-f (f-measure (evaluate-ensemble e-id test-id))) (log-info "model f " m-f " / ensemble f " e-f) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) The BigML Team Basic WhizzML Workflows May 2016 24 / 24

×