Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VSSML18. Introduction to WhizzML

185 views

Published on

Introduction to WhizzML: Basic Machine Learning Workflows.
VSSML18: 4th edition of the Valencian Summer School in Machine Learning.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

VSSML18. Introduction to WhizzML

  1. 1. Valencian Summer School in Machine Learning 4th edition September 13–14, 2018
  2. 2. Basic WhizzML Mercè Martín
  3. 3. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 3 / 36
  4. 4. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 4 / 36
  5. 5. Client-side Machine Learning Automation Problems of client-side solutions Complexity Lots of details outside the problem domain Extensibility Bigmler hides complexity at the cost of flexibility • We need to explicitly control the resource management flows and cope with errors • Alternatively, we use assistants that do it for us, but for a limited subset of workflows #VSSML18 Basic WhizzML September 13–14, 2018 5 / 36
  6. 6. Client-side Machine Learning Automation Problems of client-side solutions Scalability Client-side workflows hard to optimize Reuse No inter-language compatibility • We need to deal with number of parallel tasks and available shared resources • The same workflows need to be reprogrammed in many languages We’ve managed to abstract the ML algorithms logic, but not the workflow logic Not enough abstraction #VSSML18 Basic WhizzML September 13–14, 2018 6 / 36
  7. 7. Higher-level Machine Learning #VSSML18 Basic WhizzML September 13–14, 2018 7 / 36
  8. 8. Server-side Machine Learning Solution (scalability, reuse): Back to the server #VSSML18 Basic WhizzML September 13–14, 2018 8 / 36
  9. 9. Basic workflows: automatic generation #VSSML18 Basic WhizzML September 13–14, 2018 9 / 36
  10. 10. Server-side Machine Learning Automation Solution (complexity, extensibility): Domain-specific languages abstracting plus naming and full language flexibility #VSSML18 Basic WhizzML September 13–14, 2018 10 / 36
  11. 11. WhizzML in a Nutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries #VSSML18 Basic WhizzML September 13–14, 2018 11 / 36
  12. 12. WhizzML REST Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. #VSSML18 Basic WhizzML September 13–14, 2018 12 / 36
  13. 13. Use the REPL Defining global variables (define text "Hello BigMLers") Defining local variables (let (local-text "Hello BigMLers") (log-info local-text)) Defining procedures (define (print-hello name) (log-info "Hello " name)) ;; use it! (print-hello "BigMLers") every sentence returns a value and variables are immutable #VSSML18 Basic WhizzML September 13–14, 2018 13 / 36
  14. 14. How to create WhizzML Scripts/Libraries Github Script editor Gallery Other scripts Scriptify −→ #VSSML18 Basic WhizzML September 13–14, 2018 14 / 36
  15. 15. Higher-level Machine Learning #VSSML18 Basic WhizzML September 13–14, 2018 15 / 36
  16. 16. Basic workflow in WhizzML (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML18 Basic WhizzML September 13–14, 2018 16 / 36
  17. 17. Abstraction at a higher level #VSSML18 Basic WhizzML September 13–14, 2018 17 / 36
  18. 18. Scripts in WhizzML: Usable by any binding from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) #VSSML18 Basic WhizzML September 13–14, 2018 18 / 36
  19. 19. Scripts in WhizzML: Trivial parallelization #VSSML18 Basic WhizzML September 13–14, 2018 19 / 36
  20. 20. Scripts in WhizzML: Trivial parallelization #VSSML18 Basic WhizzML September 13–14, 2018 20 / 36
  21. 21. What else do we need? The standard functions • Numeric and relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) #VSSML18 Basic WhizzML September 13–14, 2018 21 / 36
  22. 22. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 22 / 36
  23. 23. Model or Ensemble? • Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) https://github.com/whizzml/examples/tree/master/model-or-ensemble #VSSML18 Basic WhizzML September 13–14, 2018 23 / 36
  24. 24. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset src-id) [train-id test-id] (create-random-dataset-split ds-id 0.8) m-id (create-model train-id) e-id (create-ensemble train-id {"number_of_models" 15}) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (if (> m-f e-f) m-id e-id))) We only need a new function f-measure to evaluate the f-measure of a model #VSSML18 Basic WhizzML September 13–14, 2018 24 / 36
  25. 25. Model or Ensemble? ;; Function to extract the f-measure from an ;; evaluation, given its id. (define (f-measure ev-id) (let (ev-id (wait ev-id) ;; because fetch doesn't wait evaluation (fetch ev-id)) (evaluation ["result" "model" "average_f_measure"]))) #VSSML18 Basic WhizzML September 13–14, 2018 25 / 36
  26. 26. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset src-id) [train-id test-id] (create-random-dataset-split ds-id 0. m-id (create-model train-id) e-id (create-ensemble train-id {"number_of_models" 15}) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) #VSSML18 Basic WhizzML September 13–14, 2018 26 / 36
  27. 27. Outline 1 Server-side workflows: WhizzML 2 Example Workflow: Model or Ensemble? 3 Closing the cycle: WhizzML and Feature engineering #VSSML18 Basic WhizzML September 13–14, 2018 27 / 36
  28. 28. Data Transformations Feature engineering is an unavoidable part of any Machine Learning DSL. WhizzML covers that thanks to a transformations language: Flatline #VSSML18 Basic WhizzML September 13–14, 2018 28 / 36
  29. 29. Transforming item counts to features basket milk eggs flour salt chocolate caviar milk,eggs Y Y N N N N milk,flour Y N Y N N N milk,flour,eggs Y Y Y N N N chocolate N N N N Y N #VSSML18 Basic WhizzML September 13–14, 2018 29 / 36
  30. 30. Adding a new feature with Flatline We need to... Create a new dataset by adding a new field that will have a binary content depending on the value of the basket field The WhizzML expression will be like (create-dataset ds-id {"new_fields" [{"name" new-field-name "field" new-field-value}]}) where the field value should be computed using a Flatline expression (if (contains-items? "basket" "milk") "Y" "N") #VSSML18 Basic WhizzML September 13–14, 2018 30 / 36
  31. 31. Item counts to features with Flatline One new field per category (if (contains-items? "basket" "milk") "Y" "N") (if (contains-items? "basket" "eggs") "Y" "N") (if (contains-items? "basket" "flour") "Y" "N") (if (contains-items? "basket" "salt") "Y" "N") (if (contains-items? "basket" "chocolate") "Y" "N") (if (contains-items? "basket" "caviar") "Y" "N") Parameterized code generation Field name Item values Y/N category names #VSSML18 Basic WhizzML September 13–14, 2018 31 / 36
  32. 32. Flatline code generation with WhizzML The WhizzML code should generate a string per category "(if (contains-items? "basket" "milk") "Y" "N")" #VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
  33. 33. Flatline code generation with WhizzML The WhizzML code should generate a string per category "(if (contains-items? "basket" "milk") "Y" "N")" Let’s extract the parameters in the expression (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
  34. 34. Flatline code generation with WhizzML The WhizzML code should generate a string per category "(if (contains-items? "basket" "milk") "Y" "N")" Let’s extract the parameters in the expression (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) Eventually, let’s create a procedure (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML18 Basic WhizzML September 13–14, 2018 32 / 36
  35. 35. Flatline code generation with WhizzML (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (item-fields field items yes no) (for (item items) {"field" (field-flatline field item yes no)})) (define (dataset-item-fields ds-id field) (let (ds (fetch ds-id) item-dist (ds ["fields" field "summary" "items"]) items (map head item-dist)) (item-fields field items "Y" "N"))) #VSSML18 Basic WhizzML September 13–14, 2018 33 / 36
  36. 36. Flatline code generation with WhizzML (define output-dataset (let (fs {"new_fields" (dataset-item-fields input-dataset field)}) (create-dataset input-dataset fs))) {"inputs": [{"name": "input-dataset", "type": "dataset-id", "description": "The input dataset"}, {"name": "field", "type": "string", "description": "Id of the items field"}], "outputs": [{"name": "output-dataset", "type": "dataset-id", "description": "The id of the generated dataset"}]} #VSSML18 Basic WhizzML September 13–14, 2018 34 / 36
  37. 37. More information Resources • Home: https://bigml.com/whizzml • Documentation: https://bigml.com/whizzml#documentation • Examples: https://github.com/whizzml/examples #VSSML18 Basic WhizzML September 13–14, 2018 35 / 36
  38. 38. Questions? #VSSML18 Basic WhizzML September 13–14, 2018 36 / 36

×