SlideShare a Scribd company logo
AUSTRALIA CHINA INDIA ITALY MALAYSIA SOUTH AFRICA monash.edu
Algorithmic Acceleration of Parallel ALS
for Collaborative Filtering:
“Speeding up Distributed Big Data
Recommendation in Spark”
Hans De Sterck1,2, Manda Winlaw2, Mike Hynes2,
Anthony Caterini2
1 Monash University, School of Mathematical Sciences
2 University of Waterloo, Canada, Applied Mathematics
ICPADS 2015, Melbourne, December 2015
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
a talk on algorithms for parallel big data
analytics ...
1.  distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
2.  recommendation – the Netflix prize problem
3.  our contribution: an algorithm to speed up ALS
for recommendation
4.  our contribution: efficient parallel speedup of ALS
recommendation in Spark
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
1. distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ my research background:
– scalable	
  scien>fic	
  compu>ng	
  algorithms	
  
(HPC)	
  
– e.g.,	
  parallel	
  algebraic	
  mul>grid	
  (AMG)	
  for	
  
solving	
  linear	
  systems	
  Ax=b	
  
– e.g.,	
  on	
  Blue	
  Gene	
  (100,000s	
  of	
  cores),	
  MPI	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ more recently: there is a new game of large-scale
distributed computing in town!
– Google	
  PageRank	
  (1998)	
  (already	
  17	
  years...)	
  
•  commodity	
  hardware	
  (fault-­‐tolerant	
  ...)	
  
•  compute	
  where	
  the	
  data	
  is	
  (data-­‐locality)	
  
•  scalability	
  is	
  essen>al!	
  (just	
  like	
  in	
  HPC)	
  
•  beginning	
  of	
  “Big	
  Data”,	
  “Cloud”,	
  “Data	
  
Analy>cs”,	
  ...	
  
– new	
  Big	
  Data	
  analy>cs	
  applica>ons	
  are	
  now	
  
appearing	
  everywhere!	
  
web	
  
crawl	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ “Data Analytics” has grown its own “eco-system”,
“culture”, “software stack” (very different from HPC!)
•  MapReduce	
  
•  Hadoop	
  
•  Spark,	
  ...	
  
•  data	
  locality	
  
•  “implicit”	
  communica>on	
  (restricted	
  (vs	
  MPI),	
  “shuffle”)	
  
•  not	
  fast	
  (vs	
  HPC),	
  but	
  scalable	
  
•  fault-­‐tolerant	
  (replicate	
  data,	
  restart	
  tasks)	
  
(from	
  “Spark:	
  In-­‐Memory	
  Cluster	
  Compu>ng	
  for	
  Itera>ve	
  and	
  Interac>ve	
  Applica>ons”)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ MapReduce/Hadoop:
– major	
  disadvantage	
  for	
  itera>ve	
  algorithms:	
  writes	
  
everything	
  to	
  disk	
  between	
  itera>ons!,	
  extremely	
  
slow	
  (and:	
  not	
  programmer-­‐friendly)	
  
è only	
  very	
  simple	
  algorithms	
  are	
  feasible	
  in	
  
	
  MapReduce	
  
■ the Spark “revolution”:
– store	
  state	
  between	
  itera>ons	
  in	
  memory	
  
– more	
  general	
  opera>ons	
  than	
  Hadoop/MapReduce	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ the Spark “revolution”:
– store	
  state	
  between	
  itera>ons	
  in	
  memory	
  
– more	
  general	
  opera>ons	
  than	
  Hadoop/MapReduce	
  
èmuch	
  faster	
  than	
  Hadoop!	
  (but	
  s>ll	
  much	
  slower	
  than	
  MPI)	
  
•  data	
  locality	
  
•  scalable	
  
•  fault-­‐tolerant	
  
•  “implicit”	
  communica>on	
  (restricted	
  (vs	
  MPI),	
  “shuffle”)	
  
sea change (vs Hadoop): more advanced iterative algorithms for
Data Analytics/Machine Learning are feasible in Spark
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
k	
  
2. recommendation – the Netflix prize problem
■ sparse ratings matrix R
■ k latent features: user factors U, movie factors M
■ similar to SVD, but only match known ratings
■ minimize f=||R – UTM||2’ , and UTM gives predicted
ratings (collaborative filtering)
R	
  
n	
  users	
  
m	
  movies	
  
1	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  5	
  i	
  
j	
  
≈	
   UT	
  
i	
  
j	
  
x	
  	
  x	
  	
  x	
  
x	
  
x	
  
x	
  M	
  
k	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
k	
  
recommendation – the Netflix prize problem
minimize f=||R – UTM||2’ : alternating least squares (ALS)
■ minimize ||R – U(0)T M(0)||2’ : freeze U(0), compute M(0) (LS)
■ minimize ||R – U(1)TM(0)||2’ : freeze M(0), compute U(1) (LS)
■ ... : local least squares problems (parallelizable)
R	
  
n	
  users	
  
m	
  movies	
  
1	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  	
  	
  5	
  i	
  
j	
  
≈	
   UT	
  
i	
  
j	
  
x	
  	
  x	
  	
  x	
  
x	
  
x	
  
x	
  M	
  
k	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
recommendation – the Netflix prize problem
minimize f=||R – UTM||2’ : alternating least squares (ALS)
■ ALS can converge very slowly (block nonlinear Gauss-Seidel)
(g	
  =	
  grad	
  f	
  =	
  0)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
3. our contribution: an algorithm to speed up ALS
for recommendation
min f(U,M)=||R – UTM||2’ , or g(U,M) = grad f(U,M) = 0
■ nonlinear conjugate gradient (NCG) optimization
algorithm for min f(x):
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
define a preconditioned gradient direction:
(De	
  Sterck	
  and	
  Winlaw,	
  2015)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
(NCG	
  accelerates	
  ALS)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
ALS-­‐NCG	
  is	
  much	
  
faster	
  than	
  the	
  widely	
  
used	
  ALS!	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
4. our contribution: efficient parallel speedup of
ALS recommendation in Spark
■ Spark “Resilient Distributed Datasets” (RDDs)
– par>>oned	
  collec>on	
  of	
  (key,	
  value)	
  
pairs	
  
– can	
  be	
  cached	
  in	
  memory	
  
– built	
  using	
  data	
  flow	
  operators	
  on	
  
other	
  RDDs	
  (map,	
  join,	
  group-­‐by-­‐key,	
  
reduce-­‐by-­‐key,	
  ...)	
  
– fault-­‐tolerance:	
  rebuild	
  from	
  lineage	
  
– “implicit”	
  communica>on	
  (shuffling)	
  	
  	
  
(≠	
  MPI)	
  
key	
   (value1,	
  value2,	
  ...)	
  
0	
  
1	
  
2	
  
3	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ efficient Spark programming: similar challenges as efficient
GPU programming with CUDA!
– of	
  course,	
  they	
  have	
  different	
  design	
  objec>ves	
  (GPU:	
  
close	
  to	
  metal,	
  as	
  fast	
  as	
  possible;	
  Spark:	
  scalable,	
  fault-­‐tolerant,	
  data	
  locality...)	
  
– but	
  ...	
  similari>es	
  in	
  how	
  one	
  gets	
  good	
  performance:	
  
•  Spark,	
  CUDA:	
  it	
  is	
  easy	
  to	
  write	
  code	
  that	
  produces	
  the	
  
correct	
  result	
  (but	
  may	
  be	
  very	
  far	
  from	
  achievable	
  speed)	
  
•  	
  Spark,	
  CUDA:	
  it	
  is	
  very	
  hard	
  to	
  write	
  efficient	
  code!	
  
–  implementa>on	
  choices	
  that	
  are	
  crucial	
  for	
  performance	
  are	
  most	
  
ofen	
  not	
  explicit	
  in	
  the	
  language	
  
–  programmer	
  needs	
  very	
  extensive	
  “under	
  the	
  hood”	
  knowledge	
  to	
  
write	
  efficient	
  code	
  
–  this	
  is	
  a	
  research	
  topic	
  (also	
  for	
  Spark),	
  moving	
  target	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ existing implementation of ALS in Spark (Chris Johnson,
Spotify) minimize f=||R – UTM||2’
– store	
  both	
  R	
  and	
  RT	
  
– local	
  LS	
  problems:	
  to	
  update	
  user	
  factor	
  i,	
  need	
  all	
  
movie	
  factors	
  j	
  that	
  i	
  has	
  rated	
  (shuffle!)	
  (efficient)	
  
R	
  
0	
  
1	
  
2	
  
3	
  
0	
   1	
   2	
   3	
  
RT	
  
0	
   1	
   2	
   3	
  
M	
  
U	
  
1	
  0	
   2	
   3	
  
i
j1 j2
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ our work: efficient parallel implementation of ALS-NCG in Spark
minimize f(x)=||R – UTM||2’
– store	
  our	
  vectors	
  x	
  and	
  g	
  
	
  consistent	
  with	
  ALS	
  RDDs,	
  
	
  and	
  employ	
  similar	
  efficient	
  
	
  shuffling	
  scheme	
  for	
  gradient	
  
– BLAS	
  vector	
  opera>ons	
  
– line	
  search:	
  f(x)	
  is	
  a	
  polynomial	
  
	
  of	
  degree	
  4:	
  compute	
  
	
  coefficients	
  once	
  in	
  parallel	
  
U	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: linear granularity scaling for ALS-NCG as for ALS
(no	
  new	
  parallel	
  
boFlenecks	
  for	
  the	
  
more	
  advanced	
  
algorithm)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: ALS-NCG much faster than ALS (20M MovieLens
data, 8 nodes/128 cores)
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: ALS-NCG speeds up ALS on 16 nodes/256 cores
in Spark for 800M ratings by a factor of about 5
(great	
  speedup,	
  in	
  parallel,	
  
in	
  Spark,	
  for	
  large	
  problem	
  
on	
  256	
  cores)	
  
hans.desterck@monash.edu	
   ICPADS	
  2015	
  
some general conclusions ...
■ Spark enables advanced algorithms for Big Data analytics
(linear algebra, optimization, machine learning, ...) (lots of
work: investigate algorithms, implementations, scalability, ...
in Spark)
■ Spark offers a suitable environment for compute-intensive
work!
■ slower than MPI/HPC, but data locality, fault-tolerance,
situated within Big Data “eco-system” (HDFS data, familiar
software stack, ...)
■ will HPC and Big Data hardware/software converge? (also
for “exascale” ...), and if so, which aspects of the Spark
(and others ...) or MPI/HPC approaches will prevail?

More Related Content

What's hot

Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
Vidhya Murali
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
Shuai Zhang
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
MLconf
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
Benjamin Bengfort
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Daiki Tanaka
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
Oswald Campesato
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
Barbara Fusinska
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
MLconf
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
MLconf
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
MLconf
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
Bhaskar Mitra
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
Crossing Minds
 

What's hot (20)

Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
 

Similar to Speeding up Distributed Big Data Recommendation in Spark

Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
Daniel S. Katz
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
Animesh Chaturvedi
 
Tutorial5
Tutorial5Tutorial5
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_sparkGeetanjali G
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
Anirudh Gangwar
 
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
EUDAT
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
Vijay Srinivas Agneeswaran, Ph.D
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
Amir Sedighi
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
Avinash Pandu
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
Hektor Jacynycz García
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Sneha Challa
 

Similar to Speeding up Distributed Big Data Recommendation in Spark (20)

Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_spark
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013Big data analytics_beyond_hadoop_public_18_july_2013
Big data analytics_beyond_hadoop_public_18_july_2013
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 

Recently uploaded

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Speeding up Distributed Big Data Recommendation in Spark

  • 1. AUSTRALIA CHINA INDIA ITALY MALAYSIA SOUTH AFRICA monash.edu Algorithmic Acceleration of Parallel ALS for Collaborative Filtering: “Speeding up Distributed Big Data Recommendation in Spark” Hans De Sterck1,2, Manda Winlaw2, Mike Hynes2, Anthony Caterini2 1 Monash University, School of Mathematical Sciences 2 University of Waterloo, Canada, Applied Mathematics ICPADS 2015, Melbourne, December 2015
  • 2. hans.desterck@monash.edu   ICPADS  2015   a talk on algorithms for parallel big data analytics ... 1.  distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) 2.  recommendation – the Netflix prize problem 3.  our contribution: an algorithm to speed up ALS for recommendation 4.  our contribution: efficient parallel speedup of ALS recommendation in Spark
  • 3. hans.desterck@monash.edu   ICPADS  2015   1. distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ my research background: – scalable  scien>fic  compu>ng  algorithms   (HPC)   – e.g.,  parallel  algebraic  mul>grid  (AMG)  for   solving  linear  systems  Ax=b   – e.g.,  on  Blue  Gene  (100,000s  of  cores),  MPI  
  • 4. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ more recently: there is a new game of large-scale distributed computing in town! – Google  PageRank  (1998)  (already  17  years...)   •  commodity  hardware  (fault-­‐tolerant  ...)   •  compute  where  the  data  is  (data-­‐locality)   •  scalability  is  essen>al!  (just  like  in  HPC)   •  beginning  of  “Big  Data”,  “Cloud”,  “Data   Analy>cs”,  ...   – new  Big  Data  analy>cs  applica>ons  are  now   appearing  everywhere!   web   crawl  
  • 5. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ “Data Analytics” has grown its own “eco-system”, “culture”, “software stack” (very different from HPC!) •  MapReduce   •  Hadoop   •  Spark,  ...   •  data  locality   •  “implicit”  communica>on  (restricted  (vs  MPI),  “shuffle”)   •  not  fast  (vs  HPC),  but  scalable   •  fault-­‐tolerant  (replicate  data,  restart  tasks)   (from  “Spark:  In-­‐Memory  Cluster  Compu>ng  for  Itera>ve  and  Interac>ve  Applica>ons”)  
  • 6. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ MapReduce/Hadoop: – major  disadvantage  for  itera>ve  algorithms:  writes   everything  to  disk  between  itera>ons!,  extremely   slow  (and:  not  programmer-­‐friendly)   è only  very  simple  algorithms  are  feasible  in    MapReduce   ■ the Spark “revolution”: – store  state  between  itera>ons  in  memory   – more  general  opera>ons  than  Hadoop/MapReduce  
  • 7. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...)
  • 8. hans.desterck@monash.edu   ICPADS  2015   distributed computing frameworks for Big Data analytics – Spark (vs HPC, MPI, Hadoop, ...) ■ the Spark “revolution”: – store  state  between  itera>ons  in  memory   – more  general  opera>ons  than  Hadoop/MapReduce   èmuch  faster  than  Hadoop!  (but  s>ll  much  slower  than  MPI)   •  data  locality   •  scalable   •  fault-­‐tolerant   •  “implicit”  communica>on  (restricted  (vs  MPI),  “shuffle”)   sea change (vs Hadoop): more advanced iterative algorithms for Data Analytics/Machine Learning are feasible in Spark
  • 9. hans.desterck@monash.edu   ICPADS  2015   k   2. recommendation – the Netflix prize problem ■ sparse ratings matrix R ■ k latent features: user factors U, movie factors M ■ similar to SVD, but only match known ratings ■ minimize f=||R – UTM||2’ , and UTM gives predicted ratings (collaborative filtering) R   n  users   m  movies   1        2                  5  i   j   ≈   UT   i   j   x    x    x   x   x   x  M   k  
  • 10. hans.desterck@monash.edu   ICPADS  2015   k   recommendation – the Netflix prize problem minimize f=||R – UTM||2’ : alternating least squares (ALS) ■ minimize ||R – U(0)T M(0)||2’ : freeze U(0), compute M(0) (LS) ■ minimize ||R – U(1)TM(0)||2’ : freeze M(0), compute U(1) (LS) ■ ... : local least squares problems (parallelizable) R   n  users   m  movies   1        2                  5  i   j   ≈   UT   i   j   x    x    x   x   x   x  M   k  
  • 11. hans.desterck@monash.edu   ICPADS  2015   recommendation – the Netflix prize problem minimize f=||R – UTM||2’ : alternating least squares (ALS) ■ ALS can converge very slowly (block nonlinear Gauss-Seidel) (g  =  grad  f  =  0)  
  • 12. hans.desterck@monash.edu   ICPADS  2015   3. our contribution: an algorithm to speed up ALS for recommendation min f(U,M)=||R – UTM||2’ , or g(U,M) = grad f(U,M) = 0 ■ nonlinear conjugate gradient (NCG) optimization algorithm for min f(x):
  • 13. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG define a preconditioned gradient direction: (De  Sterck  and  Winlaw,  2015)  
  • 14. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG (NCG  accelerates  ALS)  
  • 15. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG
  • 16. hans.desterck@monash.edu   ICPADS  2015   our contribution: an algorithm to speed up ALS for recommendation min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0 ■ our idea: use ALS as a nonlinear preconditioner for NCG ALS-­‐NCG  is  much   faster  than  the  widely   used  ALS!  
  • 17. hans.desterck@monash.edu   ICPADS  2015   4. our contribution: efficient parallel speedup of ALS recommendation in Spark ■ Spark “Resilient Distributed Datasets” (RDDs) – par>>oned  collec>on  of  (key,  value)   pairs   – can  be  cached  in  memory   – built  using  data  flow  operators  on   other  RDDs  (map,  join,  group-­‐by-­‐key,   reduce-­‐by-­‐key,  ...)   – fault-­‐tolerance:  rebuild  from  lineage   – “implicit”  communica>on  (shuffling)       (≠  MPI)   key   (value1,  value2,  ...)   0   1   2   3  
  • 18. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ efficient Spark programming: similar challenges as efficient GPU programming with CUDA! – of  course,  they  have  different  design  objec>ves  (GPU:   close  to  metal,  as  fast  as  possible;  Spark:  scalable,  fault-­‐tolerant,  data  locality...)   – but  ...  similari>es  in  how  one  gets  good  performance:   •  Spark,  CUDA:  it  is  easy  to  write  code  that  produces  the   correct  result  (but  may  be  very  far  from  achievable  speed)   •   Spark,  CUDA:  it  is  very  hard  to  write  efficient  code!   –  implementa>on  choices  that  are  crucial  for  performance  are  most   ofen  not  explicit  in  the  language   –  programmer  needs  very  extensive  “under  the  hood”  knowledge  to   write  efficient  code   –  this  is  a  research  topic  (also  for  Spark),  moving  target  
  • 19. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ existing implementation of ALS in Spark (Chris Johnson, Spotify) minimize f=||R – UTM||2’ – store  both  R  and  RT   – local  LS  problems:  to  update  user  factor  i,  need  all   movie  factors  j  that  i  has  rated  (shuffle!)  (efficient)   R   0   1   2   3   0   1   2   3   RT   0   1   2   3   M   U   1  0   2   3   i j1 j2
  • 20. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ our work: efficient parallel implementation of ALS-NCG in Spark minimize f(x)=||R – UTM||2’ – store  our  vectors  x  and  g    consistent  with  ALS  RDDs,    and  employ  similar  efficient    shuffling  scheme  for  gradient   – BLAS  vector  opera>ons   – line  search:  f(x)  is  a  polynomial    of  degree  4:  compute    coefficients  once  in  parallel   U  
  • 21. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ performance: linear granularity scaling for ALS-NCG as for ALS (no  new  parallel   boFlenecks  for  the   more  advanced   algorithm)  
  • 22. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ performance: ALS-NCG much faster than ALS (20M MovieLens data, 8 nodes/128 cores)
  • 23. hans.desterck@monash.edu   ICPADS  2015   our contribution: efficient parallel speedup of ALS recommendation in Spark ■ performance: ALS-NCG speeds up ALS on 16 nodes/256 cores in Spark for 800M ratings by a factor of about 5 (great  speedup,  in  parallel,   in  Spark,  for  large  problem   on  256  cores)  
  • 24. hans.desterck@monash.edu   ICPADS  2015   some general conclusions ... ■ Spark enables advanced algorithms for Big Data analytics (linear algebra, optimization, machine learning, ...) (lots of work: investigate algorithms, implementations, scalability, ... in Spark) ■ Spark offers a suitable environment for compute-intensive work! ■ slower than MPI/HPC, but data locality, fault-tolerance, situated within Big Data “eco-system” (HDFS data, familiar software stack, ...) ■ will HPC and Big Data hardware/software converge? (also for “exascale” ...), and if so, which aspects of the Spark (and others ...) or MPI/HPC approaches will prevail?