Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of's Computation Frameworks


Published on

Bing Bai and Tao Huang from
Alluxio NYC meetup 20180914

Published in: Technology
  • Be the first to comment

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of's Computation Frameworks

  1. 1. Using Alluxio as a fault-tolerant pluggable optimization component of's computation frameworks 2018-09-13 Bing Bai, Tao Huang,
  2. 2. Introduce and BDP’s architecture and business JD and BDP 01 02 The JD use case of Alluxio JDPresto on Alluxio 03 Alluxio on yarn & shuffle service & storage-computing separation Ongoing Exploration Contents
  3. 3. JD & BDP (Big Data Platform) 2
  4. 4. JD Introduction • China’s largest retailer, online or offline • First Chinese internet company to make the Fortune Global 500 list • Strict “zero-tolerance” policy toward counterfeit goods. Customers trust JD because the brand is a guarantee of authenticity 2012 2013 2014 2015 2016 2017 系列 1 Rapid Growth in GMV in Last Six Years* 144.5 billion 93.3 billion 13.4 billion 23.5 billion 46.8 billion Sustained, Rapid Growth 199.1 billion
  5. 5. JD BDP Platform 30k+ Node, off-line cluster 18k+, user 6000+ Cluster scale Computing ability off-line data daily 40PB+, Job daily 1millon+ 450PB+, daily increase 500TB+ Business capability business 40+, data model 450+ Storage capacity
  6. 6. JD BDP 6
  7. 7. JDPresto on Alluxio3
  8. 8. JDPresto on Alluxio JDPresto on Alluxio advantage Pluggable Fault-tolerant Locality Alluxio can be online or updated at any time, and business’s feeliing is just a little slow When we use Alluxio for JDPresto, we make some changes and bring some good features. When Alluxio unable to access,JDPresto can access HDFS directly. Reduce the remote read • Alluxio led to 10x performance improvement • 100+ nodes • More than 1 year.
  9. 9. JDPresto on Alluxio Locality Isolation load once use every time ō AfterBefore
  10. 10. JDPresto on Alluxio Presto HDFS Alluxio Access Alluxio exception Access HDFS directly Read HDFS Data Cache to Alluxio Read Alluxio
  11. 11. JDPresto on Alluxio
  12. 12. JDPresto on Alluxio 12
  13. 13. JDPresto on Alluxio 13 Speed Contrast
  14. 14. JDPresto on Alluxio
  15. 15. Ongoing Exploration4
  16. 16. Alluxio on YARN ResourceM anager NodeManager Alluxio AppMaster Alluxio Worker Alluxio Master Alluxio Master NodeManager Alluxio Worker Alluxio Worker Client Spark Presto • Unified resource management • Better elasticity • Better configuration control and management
  17. 17. Shuffle Service on Alluxio 17 Disk I/O performance bottleneck Not enough space for the local disk Executor fails without recalculating Uniform data TTL ensures that temporary files are deleted.
  18. 18. Shuffle Service on Alluxio Shuffle Write phase Alluxio Node Alluxio Node Alluxio Node Map Map Map Shuffle Read phase Alluxio Cluster Alluxio Node ReduceReduce
  19. 19. Shuffle Service on Alluxio spark-default.conf spark.shuffle.service.enabled=true = DistributedCache Implement DistributeCache implemention for shuffle Re-implement org.apache.spark.shuffle.sort.SortShuffleWriter Re-implement org.apache.spark.shuffle.sort.HashShuffleReader
  20. 20. Shuffle Service on Alluxio CPU Usage CPU Usage TimeTime Percent Percent The comparison between Alluxio FUSE and Alluxio API
  21. 21. Shuffle Service On Alluxio Using Alluxio FUSE Using Alluxio API
  22. 22. Separate computing and storage ResourceM anager NodeManager DataNode NodeManager DataNode NodeManager DataNode DataNode DataNode NodeManagerNodeManager NameNode Resource Manager Cluster1 Cluster2 Alluxio
  23. 23. JD Contribution to Alluxio PMC 1 Contributor 6 PR 50 Merged PR 47 Merged Commit 218 Additions/Deletions +4150/-2251
  24. 24. JD Contribution to Alluxio JD Contribution ui-grid based sort/pagination/filter add an input field New WebUI high watermark start evict low watermark stop evict Watermark evict strategy check startup check every time Consistency monitor JVM pause Periodically log message and metrics JVM Pause Monitor cp/ls/load/rm/ format Shell Command DeadLock thrift add timeout time … Bug fix Shell RESTful API Change Log Level SyncQuery AlluxioTools … Test
  25. 25. Thank You!