Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit EU talk by Rolf Jagerman

483 views

Published on

Glint: An Asynchronous Parameter Server for Spark

Published in: Data & Analytics
  • Be the first to comment

Spark Summit EU talk by Rolf Jagerman

  1. 1. 
 An Asynchronous Parameter Server for Spark Rolf Jagerman University of Amsterdam
  2. 2. Motivation Data Machine Learning Algorithm Model
  3. 3. Motivation Data Machine Learning Algorithm Model
  4. 4. Motivation Data ModelMachine Learning Algorithm
  5. 5. Motivation Deep Learning
  6. 6. Motivation Factorization Machines https://github.com/MLnick/glint-fm
  7. 7. Tourism Video games Javascript Biology hotel allianc write tag local hord docum protein car euro var hypothet holidai warcraft return gene area wow prop cytoplasm golf gold function prk wed warhamm subset locu Motivation Topic Modeling (LDA) Tourism Video games Javascript Biology hotel allianc write tag local hord docum protein car euro var hypothet holidai warcraft return gene area wow prop cytoplasm golf gold function prk wed warhamm subset locu Tourism Video games Javascript Biology hotel allianc write tag local hord docum protein car euro var hypothet holidai warcraft return gene area wow prop cytoplasm golf gold function prk wed warhamm subset locu
  8. 8. Problem Model size exceeds memory of a single machine! Data ModelMachine Learning Algorithm
  9. 9. 9 Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker
  10. 10. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker 10
  11. 11. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker 11
  12. 12. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker 12
  13. 13. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker 13
  14. 14. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker 14
  15. 15. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker 15
  16. 16. Parameter Server • A machine learning framework • Distributes a model over multiple machines • Offers two operations: • Pull: query parts of the model • Push: update parts of the model 16
  17. 17. Parameter Server • Machine learning update equation: wi ← wi + Δ
 • (Stochastic) gradient descent • Collapsed Gibbs sampling for topic modeling 17
  18. 18. Parameter Server • Machine learning update equation: wi ← wi + Δ
 • (Stochastic) gradient descent • Collapsed Gibbs sampling for topic modeling • Aggregate push updates via addition (+) 18
  19. 19. 19 Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server
  20. 20. 20 Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server
  21. 21. 21 Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server
  22. 22. 22 Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server
  23. 23. 23 Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server
  24. 24. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server 24
  25. 25. Spark Worker Spark Driver Spark Worker Spark Worker Spark Worker Parameter Server Parameter Server Parameter Server 25
  26. 26. Demo 26
  27. 27. Experiments Task • LDA topic modeling • 1,000 topics • 27TB data set (ClueWeb12) 27
  28. 28. Experiments Setup • 30 Spark workers (16 CPU cores each) • 3.7TB RAM total • Interconnected over 10Gb/s ethernet 28
  29. 29. Experiments Approach • Glint for model storage (billions of parameters) • LDA approximation algorithm for runtime improvements • A small loss in model quality is acceptable 29
  30. 30. Experiments Comparing three methods: 1. Glint LDA 2. MLLib EM LDA 3. MLLib Online LDA 30
  31. 31. Experiments 31 Glint MLLib EM MLLib Online 6,108 -1.3% +0.4% 5,731 -5.4% -4.6% 5,427 -10.7% -8.2% 6,021 -5.1% +4.0% 5,813 +1.2% -1.3% 5,520 +4.8% -5.5% 5,861 +6.8% -0.5% Data (GB) # Topics 50 20 100 20 150 20 200 20 200 40 200 60 200 80 Perplexity (topic model quality)
  32. 32. Experiments 32 Glint MLLib EM MLLib Online 6.3 9.7 16.3 7.1 14.2 17.8 8.9 14.1 19.6 10.8 22.3 21.5 11.9 23.7 57.5 13.4 32.4 131.0 14.7 34.4 233.2 Data (GB) # Topics 50 20 100 20 150 20 200 20 200 40 200 60 200 80 Runtime (minutes)
  33. 33. Experiments 33 Glint MLLib EM MLLib Online 3.3 4.6 5.5 6.2 12.1 18.0 23.9 Data (GB) # Topics 50 20 100 20 150 20 200 20 200 40 200 60 200 80 Shuffle Write (GB) No Shuffle Write No Shuffle Write
  34. 34. • MLLib could not scale beyond 200GB or
 100 topics due to task and job failures
 • Glint can compute a topic model on the full
 27TB with 1,000 topics Experiments 34
  35. 35. Experiments 0 20 40 60 Time (hours) Perplexity 104204304404504 35
  36. 36. Conclusion • Glint is a parameter server for Spark • Machine learning for very large models • Asynchrony enables highly flexible threading • Extremely easy to use • Outperforms MLLib on LDA topic modeling 36
  37. 37. Future work • Better fault tolerance (using Chord/DHT) • User defined functions for aggregation • Support for sparse models • Implementing other algorithms • Deep learning • Linear models 37
  38. 38. THANK YOU github.com/rjagerman/glint

×