Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MLSEV. BigML Workshop II

390 views

Published on

Practical Workshop: Automating Machine Learning Workflows, by BigML
MLSEV 2019: 1st edition of the Machine Learning School in Seville, Spain.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

MLSEV. BigML Workshop II

  1. 1. Machine Learning School in Seville 1st edition March 7–8, 2019
  2. 2. Workshop jao - Mercè Martín
  3. 3. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough automation
  4. 4. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Not enough abstraction
  5. 5. Client-side Machine Learning Automation Problems of client-side solutions Complex Too fine-grained, leaky abstractions Cumbersome Error handling, network issues Hard to reuse Tied to a single programming language Hard to scale Parallelization again a problem Hard to generalize Declarative client tools hide complexity at the cost of flexibility Hard to combine Black–box tools cannot be easily integrated as parts of bigger client–side workflows Hard to audit Client–side development environments are complex and very hard to sandbox Algorithmic complexity and computing resources management problems mostly washed away are back!
  6. 6. Machine Learning Automation
  7. 7. Machine Learning Automation Solution (scalability, reuse): Back to the server
  8. 8. Machine Learning Automation Solution (complexity, reuse): Domain-specific languages
  9. 9. Machine Learning Automation Solution (complexity, reuse): Domain-specific languages venturebeat.com
  10. 10. Machine Learning Automation Solution (complexity, reuse): Domain-specific languages
  11. 11. In a Nutshell 1. Workflows reified as server–side, RESTful resources 2. Domain–specific language for ML workflow automation
  12. 12. Workflows as RESTful Resources Library Reusable building-block: a collection of WhizzML definitions that can be imported by other libraries or scripts. Script Executable code that describes an actual workflow. • Imports List of libraries with code used by the script. • Inputs List of input values that parameterize the workflow. • Outputs List of values computed by the script and returned to the user. Execution Given a script and a complete set of inputs, the workflow can be executed and its outputs generated.
  13. 13. Ways to create WhizzML Scripts and Libraries Github Script editor Gallery Other scripts Scriptify −→
  14. 14. Syntactic Abstraction in WhizzML: Simple workflow ;; ML artifacts are first-class citizens, ;; we only need to talk about our domain (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id {"name" "Evaluation 80/20" "missing_strategy" 0}))
  15. 15. Language Interoperability in WhizzML from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'source': 'source/5643d345f43a234ff2310a3e'} # execute api.ok(api.create_execution(script, inputs))
  16. 16. Metaprogramming in reflective DSLs: Scriptify Resources that create resources that create resources that create resources that create resources that create resources that create . . .
  17. 17. Server-side Workflows: the bazaar
  18. 18. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for 1 resource (let ([train-id test-id] (create-dataset-split id 0.8) model-id (create-model train-id)) (create-evaluation test-id model-id))
  19. 19. Domain Specificity and Scalability: Trivial parallelization ;; Workflow for arbitrary number of resources (let (splits (for (id input-datasets) (create-dataset-split id 0.8))) (for (s splits) (create-evaluation (s 1) (create-model (s 0)))))
  20. 20. Domain Specificity and Scalability: Trivial parallelization from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690ff56' # define parameters inputs = {'input-dataset': 'dataset/5643d345f43a234ff2310a30'} # execute api.ok(api.create_execution(script, inputs))
  21. 21. Domain Specificity and Scalability: Trivial parallelization from bigml.api import BigML api = BigML() # choose workflow script = 'script/567b4b5be3f2a123a690de1228' # define parameters inputs = {'input-datasets': ['dataset/5643d345f43a234ff2310a30', 'dataset/5643d345f43a234ff2310a31', 'dataset/5643d345f43a234ff2310a32', ...]} # execute api.ok(api.create_execution(script, inputs))

×