Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deploying Accelerators At Datacenter Scale Using Spark

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 29 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Deploying Accelerators At Datacenter Scale Using Spark (20)

More from Jen Aman (13)

Advertisement

Recently uploaded (20)

Deploying Accelerators At Datacenter Scale Using Spark

  1. 1. Deploying Accelerators At Datacenter Scale Using Spark Di Wu and Muhuan Huang University of California, Los Angeles, Falcon Computing Solutions, Inc. UCLA Collaborators: Cody Hao Yu, Zhenman Fang, Tyson Condie and Jason Cong
  2. 2. 3x speedup in 3 hours
  3. 3. Accelerators in Datacenter • CPU core scaling coming to an end – Datacenters demand new technology to sustain scaling • GPU is popular, FPGA is gaining popularity – Intel prediction: 30% datacenter nodes with FPGA by 2020
  4. 4. About us • UCLA Center for Domain-Specific Computing – Expeditions in Computing program from NSF in 2009 – Public-private partnership between NSF and Intel in 2014 – http://cdsc.ucla.edu • Falcon Computing Solutions, Inc. – Founded in 2014 – Enable customized computing for big data applications – http://www.falcon-computing.com
  5. 5. What is FPGA? • Field Programmable Gate Array (FPGA) – Reconfigurable hardware – Can be used to accelerate specific computations • FPGA benefits – Low-power, energy efficient – Customized high performance PCI-E FPGA - IBM CAPI FPGA in CPU socket - Intel HARP
  6. 6. Problems of deploying accelerators in datacenters efficiently …
  7. 7. 1. Complicated Programming Model • Too much hardware specific knowledge • Lack of platform-portability
  8. 8. 2. JVM-to-ACC data transfer overheads • Data serialization/deserialization • Additional data transfer
  9. 9. 3. Accelerator management is non-trivial Accelerator designer System administrator Big-data application developer Does my accelerator work in your cluster …? Which cluster node has the accelerator …? How can I use your accelerator …?
  10. 10. More Challenges for FPGAs 4. Reconfiguration time is long • Takes about 0.5 - 2 seconds – Transfer FPGA Binary – Reset the bits – … • Naïve runtime FPGA sharing may slow down the performance by 4x
  11. 11. What we did • Provide a better programming model: – APIs for accelerator developers • Easier to integrate into big-data workload, e.g. Spark and Hadoop – APIs for big-data application developers • Requires no knowledge about accelerators • Provide an accelerator management runtime – Currently supports FPGAs and GPUs è Blaze: a system providing Accelerator-as-a-Service
  12. 12. YARN Today Client RM AM NM NM Container Container Node status Job submission Resource request Application status
  13. 13. Blaze: Accelerator Runtime System Client RM AM NM NM Container Container Accelerator status GAM NAM NAM FPGA GPU GlobalAccelerator Manager accelerator-centric scheduling Node Accelerator Manager Local accelerator service management, JVM-to-ACC communication optimization GAM NAM
  14. 14. Details on Programming Interfaces and Runtime Implementation
  15. 15. Interface for Spark val points = sc.textfile().cache for (i <- 1 to ITERATIONS) { val gradient = points.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ).reduce(_ + _) w -= gradient } val points = blaze.wrap(sc.textfile().cache) for (i <- 1 to ITERATIONS) { val gradient = points.map( new LogisticGrad(w) ).reduce(_ + _) w -= gradient } class LogisticGrad(..) extends Accelerator[T, U] { val id: String = “Logistic” } RDD AccRDD def compute(): serialize data communicate with NAM deserialize results blaze.wrap()
  16. 16. Interface for Accelerators class LogisticACC : public Task { // extend the basic Task interface LogisticACC(): Task() {;} // overwrite the compute function virtual void compute() { // get input/output using provided APIs // perform computation } };
  17. 17. Interface for Deployment • Managing accelerator services: through labels • [YARN-796] allow for labels on nodes and resource- requests
  18. 18. Putting it All Together • Register – Interface to add accelerator service to corresponding nodes • Request – Use acc_id as label – GAM allocates corresponding nodes User Application Global ACC Manager Node ACC Manager FPGA GPU ACC ACC Labels Containers ContainerInfo ACC Info ACC Invoke Input data Output data
  19. 19. Accelerator-as-a-Service • Logical Accelerators – Accelerator function – Services for applications • Physical Accelerators – Implementation on a specific device (FPGA/GPU) NAM Logical Queue Logical Queue Logical Queue Physical Queue Physical Queue FPGA GPU Task Scheduler Application Scheduler Task Task Task Task Task
  20. 20. JVM-to-ACC Transfer Optimization • Double-buffering / Accelerator Sharing • Data caching – On GPU/FPGA device memory • Broadcast
  21. 21. Global FPGA Allocation Optimization • Avoid reprogramming • GAM policy – Group the containers that need the same accelerator to the same set of nodes Node ACC1 ACC2 App1 container App3 container Node ACC1 ACC2 App2 container App4 container Better container allocation Node ACC1 ACC 2 App1 container App2 container Node ACC1 ACC2 App3 container App4 container Bad container allocation
  22. 22. Programming Efforts Reduction Lines of Code Accelerator Management Logistic Regression (LR) 325 à 0 Kmeans (KM) 364 à 0 Compression (ZIP) 360 à 0 Genome Sequency Alignment (GSA) 896 à 0
  23. 23. Heterogeneous Clusters • CDSC and Falcon Clusters – Low-power GPUs – PCI-E FPGAs • Workloads – Genome sequencing – Machine learning
  24. 24. 3.0 2.0 1.7 3.2 2.6 1.8 1.5 2.8 0.0 1.0 2.0 3.0 4.0 LR KM GSA ZIP ImprovementOver CPUBaseline(x) Job latency Energy 1.7x ~ 3.2x Speedup System Performance and Energy Efficiency 1.5x ~ 2.8x Energy reduction
  25. 25. DEMO
  26. 26. Take Away • Accelerator deployment can be made easy • FPGA requires special considerations • Key to efficiency is JVM-to-ACC overheads – Looking for new ideas • Blaze is an open-source project – Looking for collaboration
  27. 27. About Falcon Computing Runtime VirtualizationOffline customization Accelerated Applications (Spark/MapReduce/C/Java) Falcon Accelerator Libraries Falcon Merlin Compiler Falcon Kestrel Runtime Blaze Customized Platform Support Managementtools Accelerators
  28. 28. We thank our sponsors: • NSF/Intel Innovation Transition Grant awarded to the Center for Domain-Specific Computing • Intel for funding and machine donations • Xilinx for FPGA board donations
  29. 29. THANK YOU. Di Wu, allwu@cs.ucla.edu Muhuan Huang, mhhuang@cs.ucla.edu Blaze: http://www.github.com/UCLA-VAST/blaze Center for Domain-Specific Computing: http://cdsc.ucla.edu Falcon Computing Solutions, Inc.: http://www.falcon-computing.com

×