Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VSSML17 L7. REST API, Bindings, and Basic Workflows

277 views

Published on

Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 7: REST API, Bindings, and Basic Workflows. By jao - Jose A. Ortega - (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017

Published in: Data & Analytics
  • Be the first to comment

VSSML17 L7. REST API, Bindings, and Basic Workflows

  1. 1. Automating Machine Learning API, bindings, BigMLer and Basic Workflows #VSSML17 September 2017 #VSSML17 Automating Machine Learning September 2017 1 / 56
  2. 2. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 2 / 56
  3. 3. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 3 / 56
  4. 4. Machine Learning as a System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation #VSSML17 Automating Machine Learning September 2017 4 / 56
  5. 5. The Roadmap #VSSML17 Automating Machine Learning September 2017 5 / 56
  6. 6. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 6 / 56
  7. 7. RESTful-ish ML Services #VSSML17 Automating Machine Learning September 2017 7 / 56
  8. 8. RESTful-ish ML Services #VSSML17 Automating Machine Learning September 2017 8 / 56
  9. 9. RESTful-ish ML Services #VSSML17 Automating Machine Learning September 2017 9 / 56
  10. 10. RESTful-ish ML Services • Excellent abstraction layer • Transparent data model • Immutable resources and UUIDs: traceability • Simple yet effective interaction model • Easy access from any language (API bindings) Algorithmic complexity and computing resources management problems mostly washed away #VSSML17 Automating Machine Learning September 2017 10 / 56
  11. 11. RESTful done right: Whitebox resources • Your data, your model • Model reverse engineering becomes moot • Maximizes reach (Web, CLI, desktop, IoT) #VSSML17 Automating Machine Learning September 2017 11 / 56
  12. 12. Example workflow: Batch Centroid Objective: Label each row in a Dataset with its associated centroid. We need to... • Create Dataset • Create Cluster • Create BatchCentroid from Cluster and Dataset • Save BatchCentroid as new Dataset #VSSML17 Automating Machine Learning September 2017 12 / 56
  13. 13. Example workflow: building blocks curl -X POST "https://bigml.io?$AUTH/dataset" -D '{"source": "source/56fbbfea200d5a3403000db7"}' curl -X POST "https://bigml.io?$AUTH/cluster" -D '{"source": "dataset/43ffe231a34fff333000b65"}' curl -X POST "https://bigml.io?$AUTH/batchcentroid" -D '{"dataset": "dataset/43ffe231a34fff333000b65", "cluster": "cluster/33e2e231a34fff333000b65"}' curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334" #VSSML17 Automating Machine Learning September 2017 13 / 56
  14. 14. Example workflow: Web UI #VSSML17 Automating Machine Learning September 2017 14 / 56
  15. 15. Machine Learning workflows #VSSML17 Automating Machine Learning September 2017 15 / 56
  16. 16. Machine Learning workflows, for real #VSSML17 Automating Machine Learning September 2017 16 / 56
  17. 17. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 17 / 56
  18. 18. Higher-level Machine Learning #VSSML17 Automating Machine Learning September 2017 18 / 56
  19. 19. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' # create dataset and cluster, waiting for both dataset = api.create_dataset(source) api.ok(dataset) cluster = api.create_cluster(dataset) api.ok(cluster) # create new dataset with centroid new_dataset = api.create_batch_centroid(cluster, dataset, {'output_dataset': True, 'all_fields': True}) # wait again, via polling, until the job is finished api.ok(new_dataset) #VSSML17 Automating Machine Learning September 2017 19 / 56
  20. 20. Client-side automation via bindings Strengths of bindings-based solutions Versatility Maximum flexibility and possibility of encapsulation (via proper engineering) Native Easy to support any programming language Offline Whitebox models allow local use of resources (e.g., real-time predictions) #VSSML17 Automating Machine Learning September 2017 20 / 56
  21. 21. Client-side automation via bindings Strengths of bindings-based solutions from bigml.model import Model model_id = 'model/5643d345f43a234ff2310a3e' # Download of (whitebox) resource local_model = Model(model_id) # Purely local calculations local_model.predict({'plasma glucose': 132}) #VSSML17 Automating Machine Learning September 2017 21 / 56
  22. 22. Client-side automation via bindings Problems of bindings-based solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows are hard to optimize Not enough abstraction #VSSML17 Automating Machine Learning September 2017 22 / 56
  23. 23. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 23 / 56
  24. 24. Higher-level Machine Learning #VSSML17 Automating Machine Learning September 2017 24 / 56
  25. 25. Simple workflow in a one-liner # 1-clikc cluster bigmler cluster --output-dir output/job --train data/iris.csv --test-datasets output/job/dataset --remote --to-dataset # the created dataset id: cat output/job/batch_centroid_dataset #VSSML17 Automating Machine Learning September 2017 25 / 56
  26. 26. Simple automation: “1-click” tasks # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "vssml tutorial" # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project vssml_tutorial #VSSML17 Automating Machine Learning September 2017 26 / 56
  27. 27. Rich, parameterized workflows: cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation #VSSML17 Automating Machine Learning September 2017 27 / 56
  28. 28. Rich, parameterized workflows: feature selection bigmler analyze --features # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 2 # number of folds during validation --staleness 2 # stop criterium --optimize precision # optimization metric --penalty 1 # algorithm parameter --output-dir output/diabetes-features-selection #VSSML17 Automating Machine Learning September 2017 28 / 56
  29. 29. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize CLI tools like bigmler hide complexity at the cost of flexibility #VSSML17 Automating Machine Learning September 2017 29 / 56
  30. 30. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize CLI tools like bigmler hide complexity at the cost of flexibility Algorithmic complexity and computing resources management problems mostly washed away are back! #VSSML17 Automating Machine Learning September 2017 29 / 56
  31. 31. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 30 / 56
  32. 32. Client-side Machine Learning Automation Problems of client-side solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows hard to optimize Extensibility Bigmler hides complexity at the cost of flexibility Not enough abstraction #VSSML17 Automating Machine Learning September 2017 31 / 56
  33. 33. Higher-level Machine Learning #VSSML17 Automating Machine Learning September 2017 32 / 56
  34. 34. Server-side Machine Learning Solution (scalability, reuse): Back to the server #VSSML17 Automating Machine Learning September 2017 33 / 56
  35. 35. Basic workflows in WhizzML: automatic generation #VSSML17 Automating Machine Learning September 2017 34 / 56
  36. 36. Server-side Machine Learning Automation Solution (complexity, reuse): Domain-specific languages #VSSML17 Automating Machine Learning September 2017 35 / 56
  37. 37. WhizzML in a Nutshell • Domain-specific language for ML workflow automation High-level problem and solution specification • Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries #VSSML17 Automating Machine Learning September 2017 36 / 56
  38. 38. WhizzML REST Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. #VSSML17 Automating Machine Learning September 2017 37 / 56
  39. 39. Different ways to create WhizzML Scripts/Libraries Github Script editor Gallery Other scripts Scriptify −→ #VSSML17 Automating Machine Learning September 2017 38 / 56
  40. 40. Basic workflow in WhizzML (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML17 Automating Machine Learning September 2017 39 / 56
  41. 41. Basic workflow in WhizzML: Usable by any binding from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs)) #VSSML17 Automating Machine Learning September 2017 40 / 56
  42. 42. Basic workflow in WhizzML: Trivial parallelization ;; Workflow for 1 resource (let (dataset (create-dataset source) cluster (create-cluster dataset)) (create-batchcentroid dataset cluster {"output_dataset" true "all_fields" true})) #VSSML17 Automating Machine Learning September 2017 41 / 56
  43. 43. Basic workflow in WhizzML: Trivial parallelization ;; Workflow for any number of resources (let (datasets (map create-dataset sources) clusters (map create-cluster datasets) params {"output_dataset" true "all_fields" true}) (map (lambda (d c) (create-batchcentroid d c params)) datasets clusters)) #VSSML17 Automating Machine Learning September 2017 42 / 56
  44. 44. Standard functions • Numeric and relational operators (+, *, <, =, ...) • Mathematical functions (cos, sinh, floor ...) • Strings and regular expressions (str, matches?, replace, ...) • Flatline generation • Collections: list traversal, sorting, map manipulation • BigML resources manipulation Creation create-source, create-and-wait-dataset, etc. Retrieval fetch, list-anomalies, etc. Update update Deletion delete • Machine Learning Algorithms (SMACDown, Boosting, etc.) #VSSML17 Automating Machine Learning September 2017 43 / 56
  45. 45. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 44 / 56
  46. 46. Model or Ensemble? • Split a dataset in test and training parts • Create a model and an ensemble with the training dataset • Evaluate both with the test dataset • Choose the one with better evaluation (f-measure) https://github.com/whizzml/examples/tree/master/model-or-ensemble #VSSML17 Automating Machine Learning September 2017 45 / 56
  47. 47. Model or Ensemble? ;; Functions for creating the two dataset parts ;; Sample a dataset taking a fraction of its rows (rate) and ;; keeping either that fraction (out-of-bag? false) or its ;; complement (out-of-bag? true) (define (sample-dataset origin-id rate out-of-bag?) (create-dataset {"origin_dataset" origin-id "sample_rate" rate "out_of_bag" out-of-bag? "seed" "example-seed-0001"}))) ;; Create in parallel two halves of a dataset using ;; the sample function twice. Return a list of the two ;; new dataset ids. (define (split-dataset origin-id rate) (list (sample-dataset origin-id rate false) (sample-dataset origin-id rate true))) #VSSML17 Automating Machine Learning September 2017 46 / 56
  48. 48. Model or Ensemble? ;; Functions to create an ensemble and extract the f-measure from ;; evaluation, given its id. (define (make-ensemble ds-id size) (create-ensemble ds-id {"number_of_models" size})) (define (f-measure ev-id) (let (ev-id (wait ev-id) ;; because fetch doesn't wait evaluation (fetch ev-id)) (evaluation ["result" "model" "average_f_measure"])) #VSSML17 Automating Machine Learning September 2017 47 / 56
  49. 49. Model or Ensemble? ;; Function encapsulating the full workflow (define (model-or-ensemble src-id) (let (ds-id (create-dataset {"source" src-id}) [train-id test-id] (split-dataset ds-id 0.8) m-id (create-model train-id) e-id (make-ensemble train-id 15) m-f (f-measure (create-evaluation m-id test-id)) e-f (f-measure (create-evaluation e-id test-id))) (log-info "model f " m-f " / ensemble f " e-f) (if (> m-f e-f) m-id e-id))) ;; Compute the result of the script execution ;; - Inputs: [{"name": "input-source-id", "type": "source-id"}] ;; - Outputs: [{"name": "result", "type": "resource-id"}] (define result (model-or-ensemble input-source-id)) #VSSML17 Automating Machine Learning September 2017 48 / 56
  50. 50. Outline 1 Machine Learning workflows 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler 5 Server-side workflows: WhizzML 6 Example Workflow: Model or Ensemble? 7 Case study: Using Flatline in Whizzml #VSSML17 Automating Machine Learning September 2017 49 / 56
  51. 51. Transforming item counts to features basket milk eggs flour salt chocolate caviar milk,eggs Y Y N N N N milk,flour Y N Y N N N milk,flour,eggs Y Y Y N N N chocolate N N N N Y N #VSSML17 Automating Machine Learning September 2017 50 / 56
  52. 52. Item counts to features with Flatline (if (contains-items? "basket" "milk") "Y" "N") (if (contains-items? "basket" "eggs") "Y" "N") (if (contains-items? "basket" "flour") "Y" "N") (if (contains-items? "basket" "salt") "Y" "N") (if (contains-items? "basket" "chocolate") "Y" "N") (if (contains-items? "basket" "caviar") "Y" "N") Parameterized code generation Field name Item values Y/N category names #VSSML17 Automating Machine Learning September 2017 51 / 56
  53. 53. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" #VSSML17 Automating Machine Learning September 2017 52 / 56
  54. 54. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML17 Automating Machine Learning September 2017 52 / 56
  55. 55. Flatline code generation with WhizzML "(if (contains-items? "basket" "milk") "Y" "N")" (let (field "basket" item "milk" yes "Y" no "N") (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) #VSSML17 Automating Machine Learning September 2017 52 / 56
  56. 56. Flatline code generation with WhizzML (define (field-flatline field item yes no) (flatline "(if (contains-items? {{field}} {{item}})" "{{yes}}" "{{no}})")) (define (item-fields field items yes no) (for (item items) {"field" (field-flatline field item yes no)})) (define (dataset-item-fields ds-id field) (let (ds (fetch ds-id) item-dist (ds ["fields" field "summary" "items"]) items (map head item-dist)) (item-fields field items "Y" "N"))) #VSSML17 Automating Machine Learning September 2017 53 / 56
  57. 57. Flatline code generation with WhizzML (define output-dataset (let (fs {"new_fields" (dataset-item-fields input-dataset field)}) (create-dataset input-dataset fs))) {"inputs": [{"name": "input-dataset", "type": "dataset-id", "description": "The input dataset"}, {"name": "field", "type": "string", "description": "Id of the items field"}], "outputs": [{"name": "output-dataset", "type": "dataset-id", "description": "The id of the generated dataset"}]} #VSSML17 Automating Machine Learning September 2017 54 / 56
  58. 58. More information Resources • Home: https://bigml.com/whizzml • Documentation: https://bigml.com/whizzml#documentation • Examples: https://github.com/whizzml/examples #VSSML17 Automating Machine Learning September 2017 55 / 56
  59. 59. Questions? #VSSML17 Automating Machine Learning September 2017 56 / 56

×