Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

6,330 views

Published on

Flink forward 2015

Published in: Technology
  • Be the first to comment

Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink

  1. 1. Stale Synchronous Parallel Iterations on Flink TRAN Nam-Luc / Engineer @EURA NOVA Research & Development FLINK FORWARD 2015 BERLIN, GERMANY OCTOBER 2015
  2. 2. Our people: 40 employees from business engineers to data scientists 7 freelances 3 founding partners EURA NOVA? OUR INNOVATION-DRIVEN MODEL & DISRUPTIVE CULTURE KEY FIGURES “EURA NOVA is a team of passionated IT experts devoted to providing knowledge & skills to people with great ideas” Data Science, Distributed computing, Software engineering, Big Data. Our researches Since 2009 2 Phd thesis & 18 master thesis with 4 renowned Universities 20 publications in conferences as lecturer 4 large R&D projects 3 open-source products
  3. 3. Worker 1 Worker 2 Worker 3 Worker 4 Worker 6 Worker 5 Worker 7 Worker 8 Worker 9 Worker 10 STRAGGLER
  4. 4. Bulk Synchronous Parallelism synchronizes threads after each iteration. THE BIG PICTURE 4 There are always stragglers in a cluster. In large clusters, that causes a lot of workers waiting !
  5. 5. Worker 1 Worker 2 Worker 3
  6. 6. CONTRIBUTION 6 1. STALE SYNCHRONOUS PARALLEL ITERATIONS Tackling the straggler problem within Flink 2. DISTRIBUTED FRANK-WOLFE ALGORITHM Applied on LASSO regression, as use case
  7. 7. PART 1: STALE SYNCHRONOUS PARALLEL ITERATIONS ON FLINK
  8. 8. There are stragglers in distributed processing frameworks … → Hardware heterogeneity → Skewed data distribution → Garbage collection 8 THE STRAGGLER PROBLEM Not predictable Costly to reschedule ! … especially in the context of data center operating systems:
  9. 9. Distribution of iterative-convergent algorithms: 9 BULK VS STALE SYNCHRONOUS STALE STALE Explicit synchronization barrier
  10. 10. 10 PARAMETER SERVER STALE STALE Explicit synchronization barrier How to keep workers up-to-date? x x x Parameter server
  11. 11. 1. SSP iteration control model 2. Parameter server 11 INTEGRATION WITH FLINK What does Flink need to enable SSP?
  12. 12. if clocki <= cluster-wide clock + staleness do iteration ++clocki , then send to clocki synchronization sink else wait until clocki <= cluster-wide clock + staleness 12 ITERATION CONTROL MODEL IN FLINK Worker pi Clock Synchronization Sink clocki cluster-wide clock store clocki in C cluster-wide clock = min(C) broadcast cluster-wide clock if changed
  13. 13. ITERATION CONTROL MODEL IN FLINK 13 IterationHead worker done worker done worker done Iteration Intermediate IterationTail backchannel IterationHead Iteration Intermediate IterationTail backchannel IterationHead Iteration Intermediate IterationTail backchannel all workers done all workers done all workers done IterationSynchronizationTask
  14. 14. ITERATION CONTROL MODEL IN FLINK 14 IterationHead Clock pi Iteration Intermediate IterationTail backchannel IterationHead Iteration Intermediate IterationTail backchannel IterationHead Iteration Intermediate IterationTail backchannel ClockSynchronizationTask cluster-wide clock Clock pi Clock pi cluster-wide clock cluster-wide clock
  15. 15. 15 ITERATION CONTROL MODEL IN FLINK SuperstepBarrier IterationHeadPACTTask SyncEventHandler IterationSynchronizationTask SSPIterationHeadPACTTask ClockHolder ClockSyncEventHandler ClockSynchronizationTask
  16. 16. BULK SYNCHRONOUS PARALLEL Convergence determined at synchronization barrier 16 CONVERGENCE CHECK STALE SYNCHRONOUS PARALLEL Convergence reached when no more worker can improve the solution
  17. 17. dataSet.Iterate(nIterations) 17 STALE SYNCHRONOUS API dataSet.IterateWithSSP(nIterations, staleness)
  18. 18. Simple API RichMapFunctionWithParameterServer extends RichMapFunction { update(id, clock, parameter) get(id) } 18 PARAMETER SERVER DATA GRID SHARED MODEL Worker Worker Worker Worker Architecture
  19. 19. PART 2: DISTRIBUTED FRANK-WOLFE ALGORITHM
  20. 20. Solving the current optimization problem: Distributed version (Bellet et al. 2015): 20 DISTRIBUTED FRANK-WOLFE ALGORITHM Linear combination of atoms sparse coefficients
  21. 21. Distributed version (Bellet et al. 2015): 21 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 Linear combination of atoms sparse coefficients
  22. 22. Distributed version (Bellet et al. 2015): 22 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3
  23. 23. Distributed version (Bellet et al. 2015): 23 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 1. Local selection of atoms
  24. 24. Distributed version (Bellet et al. 2015): 24 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 2. Global consensus
  25. 25. Distributed version (Bellet et al. 2015): 25 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 3. α Coefficients update
  26. 26. Stale synchronous version: 26 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 1. Get α coefficients from parameter server Parameter Server
  27. 27. Stale synchronous version: 27 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 2. Local selection of atoms Parameter Server
  28. 28. Stale synchronous version: 28 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 3. Compute α coefficients from locally selected atoms Parameter Server
  29. 29. Stale synchronous version: 29 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 4. Update α coefficients to parameter server Parameter Server
  30. 30. Stale synchronous version: 30 DISTRIBUTED FRANK-WOLFE ALGORITHM Atom1 Atom2 Atom3 Atom4 ... Atomn W1 W2 W3 Repeat while within staleness bounds Parameter Server
  31. 31. See our full paper for full implementation details properties application to LASSO REGRESSION convergence proof N-L Tran, T Peel, S Skhiri, Distributed Frank-Wolfe under Pipelined Stale Synchronous Parallelism, proceedings of IEEE BigData 2015, Santa Clara, November 2015 DISTRIBUTED FRANK-WOLFE ALGORITHM 31
  32. 32. Application on LASSO regression Random sparse 1.000 x 10.000 matrices Sparsity ratio = 0,001 Generated load: at any time, 1 random node under 100% load during 12 seconds 32 EXPERIMENTS 5 nodes, 2 Ghz, 3Gb RAM
  33. 33. 33 RESULTS Convergence of the objective function
  34. 34. Stragglers in a cluster are an issue. Mitigate them with Stale Synchronous Parallel Iterations. 34 RECAP
  35. 35. Pull request #967 35 WANNA TRY IT OUT? Stale Synchronous Parallel iterations + API Pull request #1101 Frank-Wolfe algorithm + LASSO regression
  36. 36. THANK YOU! Do you have any questions? namluc.tran@euranova.eu
  37. 37. AGENDA 37 1. STALE SYNCHRONOUS PARALLEL ITERATIONS ∙ The straggler problem ∙ BSP vs SSP ∙ Integration with Flink ∙ Iteration control model ∙ API 2. DISTRIBUTED FRANK-WOLFE ALGORITHM ∙ Problem statement ∙ Application: LASSO regression ∙ Experiments
  38. 38. 38 RESULTS Sparsity of the coefficients
  39. 39. The parameter server keeps track of the intermediate results → Key-object store → Distributed, with local caching 39 PARAMETER SERVER

×