SlideShare a Scribd company logo
1 of 46
Download to read offline
DBG / June 5, 2018 / © 2018 IBM Corporation
Model Parallelism in
Spark ML 

Cross-validation
Nick Pentreath
Principal Engineer
Bryan Cutler
Software Engineer
DBG / June 5, 2018 / © 2018 IBM Corporation
About Nick
@MLnick on Twitter & Github
Principal Engineer, IBM
CODAIT - Center for Open-Source Data & AI
Technologies
Machine Learning & AI
Apache Spark committer & PMC
Author of Machine Learning with Spark
Various conferences & meetups
DBG / June 5, 2018 / © 2018 IBM Corporation
About Bryan
Software Engineer, IBM CODAIT
Apache Spark committer
Apache Arrow committer
Python, Machine Learning OSS
@BryanCutler on Github
DBG / June 5, 2018 / © 2018 IBM Corporation
Center for Open Source Data and AI Technologies
CODAIT
codait.org
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
Improving Enterprise AI Lifecycle in Open Source
DBG / June 5, 2018 / © 2018 IBM Corporation
Agenda
Model Tuning in Spark
Scaling Model Tuning
Performance Results
Best Practices
Future Directions in Optimizing
Pipelines
DBG / June 5, 2018 / © 2018 IBM Corporation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Model selection: workflow within a workflow
Model Tuning in Spark
Ingest
Data
Processing
Feature
Engineering
Model
Selection
Final Model
Candidate
models
Train
Evaluate
Adjust
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
Tokenizer CountVectorizer LogisticRegression
Spark ML Pipeline
# features:
10
# features:
100
regParam:
0.001
regParam:
0.1
Parameters
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
# features:
10
# features:
100
regParam:
0.001
regParam:
0.1
Tokenizer CountVectorizer LogisticRegression
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Pipeline cross-validation
Model Tuning in Spark
DBG / June 5, 2018 / © 2018 IBM Corporation
Cross-validation is expensive!
Model Tuning in Spark
• 5 x 5 x 5 hyperparameters = 125 pipelines
• ... across 4 machine learning models = 500
• If training & evaluation does not fully utilize
available cluster resources then that waste is
compounded for each model
Based on XKCD comic: https://xkcd.com/303/
& https://github.com/mislavcimpersak/xkcd-excuse-generator
DBG / June 5, 2018 / © 2018 IBM Corporation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.001
# features:
10
Tokenizer
CountVectorizer
# features: 10
LogisticRegression
regParam: 0.1
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.001
Tokenizer
CountVectorizer
# features: 100
LogisticRegression
regParam: 0.1
# features:
100
regParam:
0.001
regParam:
0.1
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
• Added in SPARK-19357 and SPARK-21911
(PySpark)
• Parallelism parameter governs the
maximum # models to be trained at once
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
# features:
10
# features:
100
regParam:
0.001
regParam:
0.1
Tokenizer CountVectorizer LogisticRegression
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallel model evaluation
Scaling Model Tuning
DBG / June 5, 2018 / © 2018 IBM Corporation
Implementation considerations
Scaling Model Tuning
• Parallelism parameter sets the size of
threadpool under the hood
• Dedicated ExecutionContext created to
avoid deadlocks with using the default
threadpool
• Used Futures instead of parallel
collections – more flexible
• Model-specific parallel fitting
implementations not supported
• SPARK-22126
DBG / June 5, 2018 / © 2018 IBM Corporation
Performance tests
Scaling Model Tuning
• Compared parallel CV to serial CV with
varying number of samples
• Simple LogisticRegression with regParam
and fitIntercept; parameter grid size 12
• Measure elapsed time for cross-validation
• Data size: 100,000 -> 5,000,000
• Number features: 10
• Number partitions: 10
• Number CV folds: 5
• Parallelism: 3
• Standalone cluster with 30 cores
DBG / June 5, 2018 / © 2018 IBM Corporation
Results
Scaling Model Tuning
• ±2.4x speedup
• Stays roughly constant as #
samples increases
DBG / June 5, 2018 / © 2018 IBM Corporation
Best practices
Scaling Model Tuning
• Simple integer parameter is the only thing
you can set (for now)
• Too low => under-utilize resources
• Too high => could lead to memory issues or
overloading cluster
• Rough rule: # cores / # partitions
• But depends on data and model sizes
• Mid-sized cluster probably <= 10
DBG / June 5, 2018 / © 2018 IBM Corporation
Optimizing Tuning for
Pipeline Models
DBG / June 5, 2018 / © 2018 IBM Corporation
Challenges
Optimizing Tuning for Pipeline Models
• Multi-stage, complex pipelines
• Parameter grid with hyperparameters from
different stages
• Easy to have huge number of candidate
parameter combinations
• Model parallelism helps, but can we do
better?
DBG / June 5, 2018 / © 2018 IBM Corporation
Duplicating work
Optimizing Tuning for Pipeline Models
• Each Pipeline treated
independently
• Depending on parameter grid
and pipeline stages
• Fit the same model multiple
times
• Perform same transformations
multiple times
DBG / June 5, 2018 / © 2018 IBM Corporation
Optimize with a DAG
Optimizing Tuning for Pipeline Models
• A node is an estimator/transformer with a
set of hyperparameters
• A path in the graph is a single pipeline
model
Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
DBG / June 5, 2018 / © 2018 IBM Corporation
Parallelize in breadth-first order
Optimizing Tuning for Pipeline Models
• Example with parallelism parameter set to
2
• Tokenizer is only a transform, proceed to fit
CountVectorizer nodes
Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
DBG / June 5, 2018 / © 2018 IBM Corporation
Fit estimators
Optimizing Tuning for Pipeline Models
• Cache the result and proceed to fit the first
2 LogisticRegression models Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
Cache result
DBG / June 5, 2018 / © 2018 IBM Corporation
Fit estimators
Optimizing Tuning for Pipeline Models
• Unpersist when child tasks done
• Fit final 2 LR models Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
Unpersist
cached
dataframe
Cache
result
DBG / June 5, 2018 / © 2018 IBM Corporation
Fit estimators
Optimizing Tuning for Pipeline Models
• All 4 LR models fitted
Tokenizer
Count
Vectorizer
nfeat=10
Count
Vectorizer
nfeat=100
LR
reg=0.1
LR
reg=0.01
LR
reg=0.1
LR
reg=0.01
Unpersist
cached
dataframe
DBG / June 5, 2018 / © 2018 IBM Corporation
Evaluate models
Optimizing Tuning for Pipeline Models
• Evaluate models using similar method
• CountVectorizerModel is now a transformer
• Cache transform result
Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Cache result
DBG / June 5, 2018 / © 2018 IBM Corporation
Evaluate models
Optimizing Tuning for Pipeline Models
• Evaluate models using similar method
• CountVectorizerModel is now a transformer
• Cache transform result
Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Unpersist
cached
dataframe
Cache
result
Metrics: 0.62 0.62
DBG / June 5, 2018 / © 2018 IBM Corporation
Evaluate models
Optimizing Tuning for Pipeline Models
• All models evaluated for this fold
Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Unpersist
cached
dataframe
Metrics: 0.62 0.62 0.72 0.66
DBG / June 5, 2018 / © 2018 IBM Corporation
Select best model
Optimizing Tuning for Pipeline Models
• Average the metrics from all folds and
select the best PipelineModel Tokenizer
CVModel
nfeat=10
CVModel
nfeat=100
LRModel
reg=0.1
LRModel
reg=0.01
LRModel
reg=0.1
LRModel
reg=0.01
Avg
Metrics:
0.64 0.64 0.71 0.65
DBG / June 5, 2018 / © 2018 IBM Corporation
Performance tests
Optimizing Tuning for Pipeline Models
• Compared to Standard Spark CV with
parallelism enabled
• Pipeline:

MinMaxScaler → PCA → LinearRegression

• Measure elapsed time for cross-validation
varying size of parameter grid from 36 to
80 models to evaluate
• Data size: 1,000,000
• Number features: 50
• Number partitions: 16
• Number CV folds: 4
• Parallelism: 3
• Standalone cluster with 30 cores
DBG / June 5, 2018 / © 2018 IBM Corporation
Results
Optimizing Tuning for Pipeline Models
• Up to 3.25x speedup
• Increases with more models …
• … and more complex pipelines
• Check out:
• https://github.com/BryanCutler/PipelineTuning
• Experimental!
• Watch SPARK-19071
Elapsed time for DAG CV vs Simple Parallel CV
0
275
550
825
1100
# models
36 48 60 80
Parallel DAG Parallel
DBG / June 5, 2018 / © 2018 IBM Corporation
Thank you!
codait.org
twitter.com/MLnick
github.com/MLnick
github.com/BryanCutler
developer.ibm.com/code
FfDL
Sign up for IBM Cloud and try Watson Studio!
https://datascience.ibm.com/
MAX
DBG / June 5, 2018 / © 2018 IBM Corporation
Date, Time, Location & Duration Session title and Speaker
Tue, June 5 | 11 AM
2010-2012, 30 mins
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Nick Pentreath (IBM)
Tue, June 5 | 2 PM
2018, 30 mins
Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing!
Holden Karau (Google) Bryan Cutler (IBM)
Tue, June 5 | 2 PM
Nook by 2001, 30 mins
Making Data and AI Accessible for All
Armand Ruiz Gabernet (IBM)
Tue, June 5 | 2:40 PM
2002-2004, 30 mins
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System
Rajesh Bordawekar (IBM T.J. Watson Research Center)
Tue, June 5 | 3:20 PM
3016-3022, 30 mins
Dynamic Priorities for Apache Spark Application’s Resource Allocations
Michael Feiman (IBM Spectrum Computing) Shinnosuke Okada (IBM Canada Ltd.)
Tue, June 5 | 3:20 PM
2001-2005, 30 mins
Model Parallelism in Spark ML Cross-Validation
Nick Pentreath (IBM) Bryan Cutler (IBM)
Tue, June 5 | 3:20 PM
2007, 30 mins
Serverless Machine Learning on Modern Hardware Using Apache Spark
Patrick Stuedi (IBM)
Tue, June 5 | 5:40 PM
2002-2004, 30 mins
Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation Engine;
Sourav Mazumder (IBM Analytics) Aradhna Tiwari (University of South Florida)
Tue, June 5 | 5:40 PM
2007, 30 mins
Transparent GPU Exploitation on Apache Spark
Dr. Kazuaki Ishizaki (IBM) Madhusudanan Kandasamy (IBM)
Tue, June 5 | 5:40 PM
2009-2011, 30 mins
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for Deep Neural Networks
Yonggang Hu (IBM) Chao Xue (IBM)
IBM Sessions at Spark+AI Summit 2018 (Tuesday, June 5)
DBG / June 5, 2018 / © 2018 IBM Corporation
Date, Time, Location & Duration Session title and Speaker
Wed, June 6 | 12:50 PM Birds of a Feather: Apache Arrow in Spark and More
Bryan Cutler (IBM) Li Jin (Two Sigma Investments, LP)
Wed, June 6 | 2 PM
2002-2004, 30 mins
Deep Learning for Recommender Systems
Nick Pentreath (IBM) )
Wed, June 6 | 3:20 PM
2018, 30 mins
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer
Frederick Reiss (IBM) Vijay Bommireddipalli (IBM Center for Open-Source Data & AI Technologies)
IBM Sessions at Spark+AI Summit 2018 (Wednesday, June 6)
Meet us at IBM booth in the Expo area.
DBG / June 5, 2018 / © 2018 IBM Corporation

More Related Content

What's hot

ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...Altinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Ltd
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIAltinity Ltd
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareAltinity Ltd
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionDatabricks
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfAlkin Tezuysal
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Altinity Ltd
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 

What's hot (20)

ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, CloudflareClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 

Similar to Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan Cutler

Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsNick Pentreath
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...Databricks
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3DataWorks Summit
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathChester Chen
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsDataWorks Summit
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinNick Pentreath
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Julien SIMON
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Amazon Web Services
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseDataWorks Summit
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostAggregage
 
Post compiler software optimization for reducing energy
Post compiler software optimization for reducing energyPost compiler software optimization for reducing energy
Post compiler software optimization for reducing energyAbhishek Abhyankar
 
Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018Pavan Kumar
 

Similar to Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan Cutler (20)

Productionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for AnalyticsProductionizing Spark ML Pipelines with the Portable Format for Analytics
Productionizing Spark ML Pipelines with the Portable Format for Analytics
 
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
Productionizing Spark ML Pipelines with the Portable Format for Analytics wit...
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3Optimizing your SparkML pipelines using the latest features in Spark 2.3
Optimizing your SparkML pipelines using the latest features in Spark 2.3
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Productionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analyticsProductionizing Spark ML pipelines with the portable format for analytics
Productionizing Spark ML pipelines with the portable format for analytics
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge Funds
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
Search and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same CoinSearch and Recommendations: 3 Sides of the Same Coin
Search and Recommendations: 3 Sides of the Same Coin
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
 
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and CostLLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
LLMOps for Your Data: Best Practices to Ensure Safety, Quality, and Cost
 
Post compiler software optimization for reducing energy
Post compiler software optimization for reducing energyPost compiler software optimization for reducing energy
Post compiler software optimization for reducing energy
 
Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018Developer insight into why applications run amazingly Fast in CF 2018
Developer insight into why applications run amazingly Fast in CF 2018
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Model Parallelism in Spark ML Cross-Validation with Nick Pentreath and Bryan Cutler

  • 1. DBG / June 5, 2018 / © 2018 IBM Corporation Model Parallelism in Spark ML 
 Cross-validation Nick Pentreath Principal Engineer Bryan Cutler Software Engineer
  • 2. DBG / June 5, 2018 / © 2018 IBM Corporation About Nick @MLnick on Twitter & Github Principal Engineer, IBM CODAIT - Center for Open-Source Data & AI Technologies Machine Learning & AI Apache Spark committer & PMC Author of Machine Learning with Spark Various conferences & meetups
  • 3. DBG / June 5, 2018 / © 2018 IBM Corporation About Bryan Software Engineer, IBM CODAIT Apache Spark committer Apache Arrow committer Python, Machine Learning OSS @BryanCutler on Github
  • 4. DBG / June 5, 2018 / © 2018 IBM Corporation Center for Open Source Data and AI Technologies CODAIT codait.org CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission Improving Enterprise AI Lifecycle in Open Source
  • 5. DBG / June 5, 2018 / © 2018 IBM Corporation Agenda Model Tuning in Spark Scaling Model Tuning Performance Results Best Practices Future Directions in Optimizing Pipelines
  • 6. DBG / June 5, 2018 / © 2018 IBM Corporation Model Tuning in Spark
  • 7. DBG / June 5, 2018 / © 2018 IBM Corporation Model selection: workflow within a workflow Model Tuning in Spark Ingest Data Processing Feature Engineering Model Selection Final Model Candidate models Train Evaluate Adjust
  • 8. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark Tokenizer CountVectorizer LogisticRegression Spark ML Pipeline # features: 10 # features: 100 regParam: 0.001 regParam: 0.1 Parameters
  • 9. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 10. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark # features: 10 # features: 100 regParam: 0.001 regParam: 0.1 Tokenizer CountVectorizer LogisticRegression
  • 11. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark
  • 12. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark
  • 13. DBG / June 5, 2018 / © 2018 IBM Corporation Pipeline cross-validation Model Tuning in Spark
  • 14. DBG / June 5, 2018 / © 2018 IBM Corporation Cross-validation is expensive! Model Tuning in Spark • 5 x 5 x 5 hyperparameters = 125 pipelines • ... across 4 machine learning models = 500 • If training & evaluation does not fully utilize available cluster resources then that waste is compounded for each model Based on XKCD comic: https://xkcd.com/303/ & https://github.com/mislavcimpersak/xkcd-excuse-generator
  • 15. DBG / June 5, 2018 / © 2018 IBM Corporation Scaling Model Tuning
  • 16. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 17. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 18. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 19. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.001 # features: 10 Tokenizer CountVectorizer # features: 10 LogisticRegression regParam: 0.1 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.001 Tokenizer CountVectorizer # features: 100 LogisticRegression regParam: 0.1 # features: 100 regParam: 0.001 regParam: 0.1
  • 20. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning • Added in SPARK-19357 and SPARK-21911 (PySpark) • Parallelism parameter governs the maximum # models to be trained at once
  • 21. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning # features: 10 # features: 100 regParam: 0.001 regParam: 0.1 Tokenizer CountVectorizer LogisticRegression
  • 22. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning
  • 23. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning
  • 24. DBG / June 5, 2018 / © 2018 IBM Corporation Parallel model evaluation Scaling Model Tuning
  • 25. DBG / June 5, 2018 / © 2018 IBM Corporation Implementation considerations Scaling Model Tuning • Parallelism parameter sets the size of threadpool under the hood • Dedicated ExecutionContext created to avoid deadlocks with using the default threadpool • Used Futures instead of parallel collections – more flexible • Model-specific parallel fitting implementations not supported • SPARK-22126
  • 26. DBG / June 5, 2018 / © 2018 IBM Corporation Performance tests Scaling Model Tuning • Compared parallel CV to serial CV with varying number of samples • Simple LogisticRegression with regParam and fitIntercept; parameter grid size 12 • Measure elapsed time for cross-validation • Data size: 100,000 -> 5,000,000 • Number features: 10 • Number partitions: 10 • Number CV folds: 5 • Parallelism: 3 • Standalone cluster with 30 cores
  • 27. DBG / June 5, 2018 / © 2018 IBM Corporation Results Scaling Model Tuning • ±2.4x speedup • Stays roughly constant as # samples increases
  • 28. DBG / June 5, 2018 / © 2018 IBM Corporation Best practices Scaling Model Tuning • Simple integer parameter is the only thing you can set (for now) • Too low => under-utilize resources • Too high => could lead to memory issues or overloading cluster • Rough rule: # cores / # partitions • But depends on data and model sizes • Mid-sized cluster probably <= 10
  • 29. DBG / June 5, 2018 / © 2018 IBM Corporation Optimizing Tuning for Pipeline Models
  • 30. DBG / June 5, 2018 / © 2018 IBM Corporation Challenges Optimizing Tuning for Pipeline Models • Multi-stage, complex pipelines • Parameter grid with hyperparameters from different stages • Easy to have huge number of candidate parameter combinations • Model parallelism helps, but can we do better?
  • 31. DBG / June 5, 2018 / © 2018 IBM Corporation Duplicating work Optimizing Tuning for Pipeline Models • Each Pipeline treated independently • Depending on parameter grid and pipeline stages • Fit the same model multiple times • Perform same transformations multiple times
  • 32. DBG / June 5, 2018 / © 2018 IBM Corporation Optimize with a DAG Optimizing Tuning for Pipeline Models • A node is an estimator/transformer with a set of hyperparameters • A path in the graph is a single pipeline model Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01
  • 33. DBG / June 5, 2018 / © 2018 IBM Corporation Parallelize in breadth-first order Optimizing Tuning for Pipeline Models • Example with parallelism parameter set to 2 • Tokenizer is only a transform, proceed to fit CountVectorizer nodes Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01
  • 34. DBG / June 5, 2018 / © 2018 IBM Corporation Fit estimators Optimizing Tuning for Pipeline Models • Cache the result and proceed to fit the first 2 LogisticRegression models Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01 Cache result
  • 35. DBG / June 5, 2018 / © 2018 IBM Corporation Fit estimators Optimizing Tuning for Pipeline Models • Unpersist when child tasks done • Fit final 2 LR models Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01 Unpersist cached dataframe Cache result
  • 36. DBG / June 5, 2018 / © 2018 IBM Corporation Fit estimators Optimizing Tuning for Pipeline Models • All 4 LR models fitted Tokenizer Count Vectorizer nfeat=10 Count Vectorizer nfeat=100 LR reg=0.1 LR reg=0.01 LR reg=0.1 LR reg=0.01 Unpersist cached dataframe
  • 37. DBG / June 5, 2018 / © 2018 IBM Corporation Evaluate models Optimizing Tuning for Pipeline Models • Evaluate models using similar method • CountVectorizerModel is now a transformer • Cache transform result Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Cache result
  • 38. DBG / June 5, 2018 / © 2018 IBM Corporation Evaluate models Optimizing Tuning for Pipeline Models • Evaluate models using similar method • CountVectorizerModel is now a transformer • Cache transform result Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Unpersist cached dataframe Cache result Metrics: 0.62 0.62
  • 39. DBG / June 5, 2018 / © 2018 IBM Corporation Evaluate models Optimizing Tuning for Pipeline Models • All models evaluated for this fold Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Unpersist cached dataframe Metrics: 0.62 0.62 0.72 0.66
  • 40. DBG / June 5, 2018 / © 2018 IBM Corporation Select best model Optimizing Tuning for Pipeline Models • Average the metrics from all folds and select the best PipelineModel Tokenizer CVModel nfeat=10 CVModel nfeat=100 LRModel reg=0.1 LRModel reg=0.01 LRModel reg=0.1 LRModel reg=0.01 Avg Metrics: 0.64 0.64 0.71 0.65
  • 41. DBG / June 5, 2018 / © 2018 IBM Corporation Performance tests Optimizing Tuning for Pipeline Models • Compared to Standard Spark CV with parallelism enabled • Pipeline:
 MinMaxScaler → PCA → LinearRegression
 • Measure elapsed time for cross-validation varying size of parameter grid from 36 to 80 models to evaluate • Data size: 1,000,000 • Number features: 50 • Number partitions: 16 • Number CV folds: 4 • Parallelism: 3 • Standalone cluster with 30 cores
  • 42. DBG / June 5, 2018 / © 2018 IBM Corporation Results Optimizing Tuning for Pipeline Models • Up to 3.25x speedup • Increases with more models … • … and more complex pipelines • Check out: • https://github.com/BryanCutler/PipelineTuning • Experimental! • Watch SPARK-19071 Elapsed time for DAG CV vs Simple Parallel CV 0 275 550 825 1100 # models 36 48 60 80 Parallel DAG Parallel
  • 43. DBG / June 5, 2018 / © 2018 IBM Corporation Thank you! codait.org twitter.com/MLnick github.com/MLnick github.com/BryanCutler developer.ibm.com/code FfDL Sign up for IBM Cloud and try Watson Studio! https://datascience.ibm.com/ MAX
  • 44. DBG / June 5, 2018 / © 2018 IBM Corporation Date, Time, Location & Duration Session title and Speaker Tue, June 5 | 11 AM 2010-2012, 30 mins Productionizing Spark ML Pipelines with the Portable Format for Analytics Nick Pentreath (IBM) Tue, June 5 | 2 PM 2018, 30 mins Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing! Holden Karau (Google) Bryan Cutler (IBM) Tue, June 5 | 2 PM Nook by 2001, 30 mins Making Data and AI Accessible for All Armand Ruiz Gabernet (IBM) Tue, June 5 | 2:40 PM 2002-2004, 30 mins Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System Rajesh Bordawekar (IBM T.J. Watson Research Center) Tue, June 5 | 3:20 PM 3016-3022, 30 mins Dynamic Priorities for Apache Spark Application’s Resource Allocations Michael Feiman (IBM Spectrum Computing) Shinnosuke Okada (IBM Canada Ltd.) Tue, June 5 | 3:20 PM 2001-2005, 30 mins Model Parallelism in Spark ML Cross-Validation Nick Pentreath (IBM) Bryan Cutler (IBM) Tue, June 5 | 3:20 PM 2007, 30 mins Serverless Machine Learning on Modern Hardware Using Apache Spark Patrick Stuedi (IBM) Tue, June 5 | 5:40 PM 2002-2004, 30 mins Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation Engine; Sourav Mazumder (IBM Analytics) Aradhna Tiwari (University of South Florida) Tue, June 5 | 5:40 PM 2007, 30 mins Transparent GPU Exploitation on Apache Spark Dr. Kazuaki Ishizaki (IBM) Madhusudanan Kandasamy (IBM) Tue, June 5 | 5:40 PM 2009-2011, 30 mins Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for Deep Neural Networks Yonggang Hu (IBM) Chao Xue (IBM) IBM Sessions at Spark+AI Summit 2018 (Tuesday, June 5)
  • 45. DBG / June 5, 2018 / © 2018 IBM Corporation Date, Time, Location & Duration Session title and Speaker Wed, June 6 | 12:50 PM Birds of a Feather: Apache Arrow in Spark and More Bryan Cutler (IBM) Li Jin (Two Sigma Investments, LP) Wed, June 6 | 2 PM 2002-2004, 30 mins Deep Learning for Recommender Systems Nick Pentreath (IBM) ) Wed, June 6 | 3:20 PM 2018, 30 mins Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer Frederick Reiss (IBM) Vijay Bommireddipalli (IBM Center for Open-Source Data & AI Technologies) IBM Sessions at Spark+AI Summit 2018 (Wednesday, June 6) Meet us at IBM booth in the Expo area.
  • 46. DBG / June 5, 2018 / © 2018 IBM Corporation