Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VSSML16 L7. REST API, Bindings, and Basic Workflows

260 views

Published on

VSSML16 L7. REST API, Bindings, and Basic Workflows
Valencian Summer School in Machine Learning 2016
Day 2 VSSML16
Lecture 7
REST API, Bindings, and Basic Workflows
jao -- Jose A. Ortega (BigML)
https://bigml.com/events/valencian-summer-school-in-machine-learning-2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

VSSML16 L7. REST API, Bindings, and Basic Workflows

  1. 1. Automating Machine Learning API, bindings, BigMLer and Basic Workflows #VSSML16 September 2016 #VSSML16 Automating Machine Learning September 2016 1 / 43
  2. 2. Outline 1 Machine Learning workflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 2 / 43
  3. 3. Outline 1 Machine Learning workflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 3 / 43
  4. 4. Machine Learning as a System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation #VSSML16 Automating Machine Learning September 2016 4 / 43
  5. 5. Machine Learning workflows #VSSML16 Automating Machine Learning September 2016 5 / 43
  6. 6. Machine Learning workflows, for real #VSSML16 Automating Machine Learning September 2016 6 / 43
  7. 7. Higher-level Machine Learning #VSSML16 Automating Machine Learning September 2016 7 / 43
  8. 8. Outline 1 Machine Learning workflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 8 / 43
  9. 9. Example workflow: Batch Centroid Objective: Label each row in a Dataset with its associated centroid. We need to... • Create Dataset • Create Cluster • Create BatchCentroid from Cluster and Dataset • Save BatchCentroid as new Dataset #VSSML16 Automating Machine Learning September 2016 9 / 43
  10. 10. Example workflow: building blocks curl -X POST "https://bigml.io?$AUTH/dataset" -D '{"source": "source/56fbbfea200d5a3403000db7"}' curl -X POST "https://bigml.io?$AUTH/cluster" -D '{"source": "dataset/43ffe231a34fff333000b65"}' curl -X POST "https://bigml.io?$AUTH/batchcentroid" -D '{"dataset": "dataset/43ffe231a34fff333000b65", "cluster": "cluster/33e2e231a34fff333000b65"}' curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334" #VSSML16 Automating Machine Learning September 2016 10 / 43
  11. 11. Example workflow: Web UI #VSSML16 Automating Machine Learning September 2016 11 / 43
  12. 12. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' # create dataset and cluster, waiting for both dataset = api.create_dataset(source) api.ok(dataset) cluster = api.create_cluster(dataset) api.ok(cluster) # create new dataset with centroid new_dataset = api.create_batch_centroid(cluster, dataset, {'output_dataset': True, 'all_fields': True}) # wait again, via polling, until the job is finished api.ok(new_dataset) #VSSML16 Automating Machine Learning September 2016 12 / 43
  13. 13. Outline 1 Machine Learning workflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 13 / 43
  14. 14. Higher-level Machine Learning #VSSML16 Automating Machine Learning September 2016 14 / 43
  15. 15. Simple workflow in a one-liner # 1-clikc cluster bigmler cluster --output-dir output/job --train data/iris.csv --test-datasets output/job/dataset --remote --to-dataset # the created dataset id: cat output/job/batch_centroid_dataset #VSSML16 Automating Machine Learning September 2016 15 / 43
  16. 16. Simple automation: “1-click” tasks # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "vssml tutorial" # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project vssml_tutorial #VSSML16 Automating Machine Learning September 2016 16 / 43
  17. 17. Rich, parameterized workflows: cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation #VSSML16 Automating Machine Learning September 2016 17 / 43
  18. 18. Rich, parameterized workflows: feature selection bigmler analyze --features # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 2 # number of folds during validation --staleness 2 # stop criterium --optimize precision # optimization metric --penalty 1 # algorithm parameter --output-dir output/diabetes-features-selection #VSSML16 Automating Machine Learning September 2016 18 / 43
  19. 19. Outline 1 Machine Learning workflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 19 / 43
  20. 20. Client-side Machine Learning Automation Problems of client-side solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows hard to optimize Extensibility Bigmler hides complexity at the cost of flexibility Not enough abstraction #VSSML16 Automating Machine Learning September 2016 20 / 43
  21. 21. Higher-level Machine Learning #VSSML16 Automating Machine Learning September 2016 21 / 43
  22. 22. Server-side Machine Learning #VSSML16 Automating Machine Learning September 2016 22 / 43
  23. 23. WhizzML in a Nutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries #VSSML16 Automating Machine Learning September 2016 23 / 43
  24. 24. WhizzML REST Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. #VSSML16 Automating Machine Learning September 2016 24 / 43
  25. 25. Different ways to create WhizzML Scripts/Libraries Github Script editor Gallery Other scripts Scriptify −→ #VSSML16 Automating Machine Learning September 2016 25 / 43
  26. 26. Basic workflow in WhizzML (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML16 Automating Machine Learning September 2016 26 / 43
  27. 27. Basic workflow in WhizzML: Usable by any binding from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) #VSSML16 Automating Machine Learning September 2016 27 / 43
  28. 28. Basic workflow in WhizzML: Trivial parallelization ;; Workflow for 1 resource (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML16 Automating Machine Learning September 2016 28 / 43
  29. 29. Basic workflow in WhizzML: Trivial parallelization ;; Workflow for any number of resources (let (datasets (map create-dataset sources) clusters (map create-cluster datasets) params {"output_dataset" true "all_fields" true}) (map (lambda (d c) (create-batchcentroid d c params)) datasets clusters)) #VSSML16 Automating Machine Learning September 2016 29 / 43
  30. 30. Basic workflows in WhizzML: automatic generation #VSSML16 Automating Machine Learning September 2016 30 / 43
  31. 31. Standard functions • Numeric and relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) #VSSML16 Automating Machine Learning September 2016 31 / 43
  32. 32. Outline 1 Machine Learning workflows 2 Client-side workflows: REST API and bindings 3 Client-side workflows: Bigmler 4 Server-side workflows: WhizzML 5 Example Workflow Walk-throughs #VSSML16 Automating Machine Learning September 2016 32 / 43
  33. 33. Model or Ensemble? • Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) https://github.com/whizzml/examples/tree/master/model-or-ensemble #VSSML16 Automating Machine Learning September 2016 33 / 43
  34. 34. Model or Ensemble? ;; Functions for creating the two dataset parts ;; Sample a dataset taking a fraction of its rows (rate) and ;; keeping either that fraction (out-of-bag? false) or its ;; complement (out-of-bag? true) (define (sample-dataset origin-id rate out-of-bag?) (create-dataset {"origin_dataset" origin-id "sample_rate" rate "out_of_bag" out-of-bag? "seed" "example-seed-0001"}))) ;; Create in parallel two halves of a dataset using ;; the sample function twice. Return a list of the two ;; new dataset ids. (define (split-dataset origin-id rate) (list (sample-dataset origin-id rate false) (sample-dataset origin-id rate true))) #VSSML16 Automating Machine Learning September 2016 34 / 43
  35. 35. Model or Ensemble? ;; Functions to create an ensemble and extract the f-measure from ;; evaluation, given its id. (define (make-ensemble ds-id size) (create-ensemble ds-id {"number_of_models" size})) (define (f-measure ev-id) (let (ev-id (wait ev-id) ;; because fetch doesn't wait evaluation (fetch ev-id)) (evaluation ["result" "model" "average_f_measure"])) #VSSML16 Automating Machine Learning September 2016 35 / 43
  36. 36. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset {"source" src-id}) [train-id test-id] (split-dataset ds-id 0.8) m-id (create-model train-id) e-id (make-ensemble train-id 15) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (log-info "model f " m-f " / ensemble f " e-f) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) #VSSML16 Automating Machine Learning September 2016 36 / 43
  37. 37. Transforming item counts to features basket milk eggs flour salt chocolate caviar milk,eggs Y Y N N N N milk,flour Y N Y N N N milk,flour,eggs Y Y Y N N N chocolate N N N N Y N #VSSML16 Automating Machine Learning September 2016 37 / 43
  38. 38. Item counts to features with Flatline (if (contains-items? "basket" "milk") "Y" "N") (if (contains-items? "basket" "eggs") "Y" "N") (if (contains-items? "basket" "flour") "Y" "N") (if (contains-items? "basket" "salt") "Y" "N") (if (contains-items? "basket" "chocolate") "Y" "N") (if (contains-items? "basket" "caviar") "Y" "N") Parameterized code generation Field name Item values Y/N category names #VSSML16 Automating Machine Learning September 2016 38 / 43
  39. 39. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" #VSSML16 Automating Machine Learning September 2016 39 / 43
  40. 40. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML16 Automating Machine Learning September 2016 39 / 43
  41. 41. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML16 Automating Machine Learning September 2016 39 / 43
  42. 42. Flatline code generation with WhizzML (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (item-fields field items yes no) (for (item items) {"field" (field-flatline field item yes no)})) (define (dataset-item-fields ds-id field) (let (ds (fetch ds-id) item-dist (ds ["fields" field "summary" "items"]) items (map head item-dist)) (item-fields field items "Y" "N"))) #VSSML16 Automating Machine Learning September 2016 40 / 43
  43. 43. Flatline code generation with WhizzML (define output-dataset (let (fs {"new_fields" (dataset-item-fields input-dataset field)}) (create-dataset input-dataset fs))) {"inputs": [{"name": "input-dataset", "type": "dataset-id", "description": "The input dataset"}, {"name": "field", "type": "string", "description": "Id of the items field"}], "outputs": [{"name": "output-dataset", "type": "dataset-id", "description": "The id of the generated dataset"}]} #VSSML16 Automating Machine Learning September 2016 41 / 43
  44. 44. More information Resources • Home: https://bigml.com/whizzml • Documentation: https://bigml.com/whizzml#documentation • Examples: https://github.com/whizzml/examples #VSSML16 Automating Machine Learning September 2016 42 / 43

×