Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Automating Machine Learning
API, bindings, BigMLer and Basic Workflows
#BSSML16
December 2016
#BSSML16 Automating Machine L...
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindi...
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindi...
Machine Learning as a System Service
The goal
Machine Learning as a system
level service
The means
• APIs: ML building blo...
The Roadmap
#BSSML16 Automating Machine Learning December 2016 5 / 29
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindi...
RESTful-ish ML Services
#BSSML16 Automating Machine Learning December 2016 7 / 29
RESTful-ish ML Services
#BSSML16 Automating Machine Learning December 2016 8 / 29
RESTful-ish ML Services
#BSSML16 Automating Machine Learning December 2016 9 / 29
RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceabili...
RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomes
moot
• Maximizes reach ...
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindi...
Higher-level Machine Learning
#BSSML16 Automating Machine Learning December 2016 13 / 29
Example workflow: Batch Centroid
Objective: Label each row in a Dataset with its associated centroid.
We need to...
• Creat...
Example workflow: building blocks
curl -X POST "https://bigml.io?$AUTH/dataset" 
-D '{"source": "source/56fbbfea200d5a34030...
Example workflow: Web UI
#BSSML16 Automating Machine Learning December 2016 16 / 29
Automation via bindings
from bigml.api import BigML
api = BigML()
project = api.create_project({'name': 'ToyBoost'})
orig_...
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
# cr...
Client-side automation via bindings
Strengths of bindings-based solutions
Versatility Maximum flexibility and possibility o...
Client-side automation via bindings
Strengths of bindings-based solutions
from bigml.model import Model
model_id = 'model/...
Client-side automation via bindings
Problems of bindings-based solutions
Complexity Lots of details outside the problem do...
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindi...
Higher-level Machine Learning
#BSSML16 Automating Machine Learning December 2016 23 / 29
Simple workflow in a one-liner
# 1-clikc cluster
bigmler cluster 
--output-dir output/job
--train data/iris.csv 
--test-dat...
Simple automation: “1-click” tasks
# "1-click" ensemble
bigmler --train data/iris.csv 
--number-of-models 500 
--sample-ra...
Rich, parameterized workflows: cross-validation
bigmler analyze --cross-validation  # parameterized input
--dataset $(cat o...
Rich, parameterized workflows: feature selection
bigmler analyze --features  # parameterized input
--dataset $(cat output/d...
Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumb...
Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumb...
Questions?
#BSSML16 Automating Machine Learning December 2016 29 / 29
Upcoming SlideShare
Loading in …5
×

BSSML16 L8. REST API, Bindings, and Basic Workflows

374 views

Published on

Brazilian Summer School in Machine Learning 2016
Day 2 - Lecture 3: REST API, Bindings, and Basic Workflows
Lecturer: Dr. José Antonio Ortega - jao (BigML)

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

BSSML16 L8. REST API, Bindings, and Basic Workflows

  1. 1. Automating Machine Learning API, bindings, BigMLer and Basic Workflows #BSSML16 December 2016 #BSSML16 Automating Machine Learning December 2016 1 / 29
  2. 2. Outline 1 Introduction: ML as a System Service 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler #BSSML16 Automating Machine Learning December 2016 2 / 29
  3. 3. Outline 1 Introduction: ML as a System Service 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler #BSSML16 Automating Machine Learning December 2016 3 / 29
  4. 4. Machine Learning as a System Service The goal Machine Learning as a system level service The means • APIs: ML building blocks • Abstraction layer over feature engineering • Abstraction layer over algorithms • Automation #BSSML16 Automating Machine Learning December 2016 4 / 29
  5. 5. The Roadmap #BSSML16 Automating Machine Learning December 2016 5 / 29
  6. 6. Outline 1 Introduction: ML as a System Service 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler #BSSML16 Automating Machine Learning December 2016 6 / 29
  7. 7. RESTful-ish ML Services #BSSML16 Automating Machine Learning December 2016 7 / 29
  8. 8. RESTful-ish ML Services #BSSML16 Automating Machine Learning December 2016 8 / 29
  9. 9. RESTful-ish ML Services #BSSML16 Automating Machine Learning December 2016 9 / 29
  10. 10. RESTful-ish ML Services • Excellent abstraction layer • Transparent data model • Immutable resources and UUIDs: traceability • Simple yet effective interaction model • Easy access from any language (API bindings) Algorithmic complexity and computing resources management problems mostly washed away #BSSML16 Automating Machine Learning December 2016 10 / 29
  11. 11. RESTful done right: Whitebox resources • Your data, your model • Model reverse engineering becomes moot • Maximizes reach (Web, CLI, desktop, IoT) #BSSML16 Automating Machine Learning December 2016 11 / 29
  12. 12. Outline 1 Introduction: ML as a System Service 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler #BSSML16 Automating Machine Learning December 2016 12 / 29
  13. 13. Higher-level Machine Learning #BSSML16 Automating Machine Learning December 2016 13 / 29
  14. 14. Example workflow: Batch Centroid Objective: Label each row in a Dataset with its associated centroid. We need to... • Create Dataset • Create Cluster • Create BatchCentroid from Cluster and Dataset • Save BatchCentroid as new Dataset #BSSML16 Automating Machine Learning December 2016 14 / 29
  15. 15. Example workflow: building blocks curl -X POST "https://bigml.io?$AUTH/dataset" -D '{"source": "source/56fbbfea200d5a3403000db7"}' curl -X POST "https://bigml.io?$AUTH/cluster" -D '{"source": "dataset/43ffe231a34fff333000b65"}' curl -X POST "https://bigml.io?$AUTH/batchcentroid" -D '{"dataset": "dataset/43ffe231a34fff333000b65", "cluster": "cluster/33e2e231a34fff333000b65"}' curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334" #BSSML16 Automating Machine Learning December 2016 15 / 29
  16. 16. Example workflow: Web UI #BSSML16 Automating Machine Learning December 2016 16 / 29
  17. 17. Automation via bindings from bigml.api import BigML api = BigML() project = api.create_project({'name': 'ToyBoost'}) orig_source = api.create_source(source, {"name": "ToyBoost", "project": project['resource']}) api.ok(orig_source) orig_dataset = api.create_dataset(orig_source, {"name": "Boost"}) api.ok(orig_dataset) trainset = api.get_dataset(trainset) for loop in range(0,10): api.ok(trainset) model = api.create_model(trainset, { "name": "ToyBoost - Model%d" % loop, "objective_fields": ["letter"], "excluded_fields": ["weight"], "weight_field": "100011"}) api.ok(model) batchp = api.create_batch_prediction(model, trainset, { "name": "ToyBoost - Result%d" % loop, "all_fields": True, "header": True}) api.ok(batchp) batchp = api.get_batch_prediction(batchp) batchp_dataset = api.get_dataset(batchp['object'])#BSSML16 Automating Machine Learning December 2016 17 / 29
  18. 18. Example workflow: Python bindings from bigml.api import BigML api = BigML() source = 'source/5643d345f43a234ff2310a3e' # create dataset and cluster, waiting for both dataset = api.create_dataset(source) api.ok(dataset) cluster = api.create_cluster(dataset) api.ok(cluster) # create a batch centroid with output to dataset centroid = api.create_batch_centroid(cluster, dataset, {'output_dataset': True, 'all_fields': True}) api.ok(centroid) # wait again, via polling, until the dataset is finished batch_dataset_id = centroid['object']['output_dataset_resource'] batch_dataset = api.get_dataset(batch_dataset_id) api.ok(batch_dataset) #BSSML16 Automating Machine Learning December 2016 18 / 29
  19. 19. Client-side automation via bindings Strengths of bindings-based solutions Versatility Maximum flexibility and possibility of encapsulation (via proper engineering) Native Easy to support any programming language Offline Whitebox models allow local use of resources (e.g., real-time predictions) #BSSML16 Automating Machine Learning December 2016 19 / 29
  20. 20. Client-side automation via bindings Strengths of bindings-based solutions from bigml.model import Model model_id = 'model/5643d345f43a234ff2310a3e' # Download of (whitebox) resource local_model = Model(model_id) # Purely local calculations local_model.predict({'plasma glucose': 132}) #BSSML16 Automating Machine Learning December 2016 20 / 29
  21. 21. Client-side automation via bindings Problems of bindings-based solutions Complexity Lots of details outside the problem domain Reuse No inter-language compatibility Scalability Client-side workflows are hard to optimize Not enough abstraction #BSSML16 Automating Machine Learning December 2016 21 / 29
  22. 22. Outline 1 Introduction: ML as a System Service 2 ML as a RESTful Cloudy Service 3 Client-side workflows: REST API and bindings 4 Client-side workflows: Bigmler #BSSML16 Automating Machine Learning December 2016 22 / 29
  23. 23. Higher-level Machine Learning #BSSML16 Automating Machine Learning December 2016 23 / 29
  24. 24. Simple workflow in a one-liner # 1-clikc cluster bigmler cluster --output-dir output/job --train data/iris.csv --test-datasets output/job/dataset --remote --to-dataset # the created dataset id: cat output/job/batch_centroid_dataset #BSSML16 Automating Machine Learning December 2016 24 / 29
  25. 25. Simple automation: “1-click” tasks # "1-click" ensemble bigmler --train data/iris.csv --number-of-models 500 --sample-rate 0.85 --output-dir output/iris-ensemble --project "vssml tutorial" # "1-click" dataset with parameterized fields bigmler --train data/diabetes.csv --no-model --name "4-featured diabetes" --dataset-fields "plasma glucose,insulin,diabetes pedigree,diabetes" --output-dir output/diabetes --project vssml_tutorial #BSSML16 Automating Machine Learning December 2016 25 / 29
  26. 26. Rich, parameterized workflows: cross-validation bigmler analyze --cross-validation # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 3 # number of folds during validation --output-dir output/diabetes-validation #BSSML16 Automating Machine Learning December 2016 26 / 29
  27. 27. Rich, parameterized workflows: feature selection bigmler analyze --features # parameterized input --dataset $(cat output/diabetes/dataset) --k-folds 2 # number of folds during validation --staleness 2 # stop criterium --optimize precision # optimization metric --penalty 1 # algorithm parameter --output-dir output/diabetes-features-selection #BSSML16 Automating Machine Learning December 2016 27 / 29
  28. 28. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize CLI tools like bigmler hide complexity at the cost of flexibility #BSSML16 Automating Machine Learning December 2016 28 / 29
  29. 29. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize CLI tools like bigmler hide complexity at the cost of flexibility Algorithmic complexity and computing resources management problems mostly washed away are back! #BSSML16 Automating Machine Learning December 2016 28 / 29
  30. 30. Questions? #BSSML16 Automating Machine Learning December 2016 29 / 29

×