Using Alluxio as a fault-tolerant pluggable optimization
component of JD.com's computation frameworks
2018-09-13
Bing Bai, JD.com
Tao Huang, JD.com
Introduce JD.com and BDP’s architecture and business
JD and BDP
01
02 The JD use case of Alluxio
JDPresto on Alluxio
03 Alluxio on yarn & shuffle service & storage-computing separation
Ongoing Exploration
Contents
JD & BDP
(Big Data Platform)
2
JD Introduction
• China’s largest retailer, online or offline
• First Chinese internet company to make the
Fortune Global 500 list
• Strict “zero-tolerance” policy toward
counterfeit goods. Customers trust JD
because the brand is a guarantee of
authenticity
2012 2013 2014 2015 2016 2017
系列 1
Rapid Growth in GMV in Last Six Years*
144.5
billion
93.3
billion
13.4
billion
23.5
billion
46.8
billion
Sustained, Rapid Growth
199.1
billion
JD BDP Platform
30k+ Node, off-line
cluster 18k+, user
6000+
Cluster
scale
Computing
ability
off-line data daily
40PB+, Job daily
1millon+
450PB+, daily
increase 500TB+
Business
capability
business 40+, data
model 450+
Storage
capacity
JD BDP
6
JDPresto on Alluxio3
JDPresto on Alluxio
JDPresto on Alluxio advantage
Pluggable
Fault-tolerant
Locality
Alluxio can be online or updated at any time, and business’s feeliing is
just a little slow
When we use Alluxio for JDPresto, we make some changes
and bring some good features.
When Alluxio unable to access,JDPresto can access HDFS directly.
Reduce the remote read
• Alluxio led to 10x performance
improvement
• 100+ nodes
• More than 1 year.
JDPresto on Alluxio
Locality
Isolation
load once
use every time
ō
AfterBefore
JDPresto on Alluxio
Presto HDFS
Alluxio
Access Alluxio exception
Access HDFS directly
Read HDFS
Data Cache to Alluxio
Read Alluxio
JDPresto on Alluxio
JDPresto on Alluxio
12
JDPresto on Alluxio
13
Speed Contrast
JDPresto on Alluxio
Ongoing Exploration4
Alluxio on YARN
ResourceM
anager
NodeManager
Alluxio
AppMaster
Alluxio
Worker
Alluxio
Master
Alluxio
Master
NodeManager
Alluxio
Worker
Alluxio
Worker
Client
Spark
Presto
• Unified resource management
• Better elasticity
• Better configuration control and management
Shuffle Service on Alluxio
17
Disk I/O performance bottleneck
Not enough space for the local disk
Executor fails without recalculating
Uniform data TTL ensures that
temporary files are deleted.
Shuffle Service on Alluxio
Shuffle Write phase
Alluxio Node Alluxio Node Alluxio Node
Map Map Map
Shuffle Read phase
Alluxio Cluster
Alluxio Node
ReduceReduce
Shuffle Service on Alluxio
spark-default.conf
spark.shuffle.service.enabled=true
spark.shuffle.store.type = DistributedCache
spark.shuffle.store.distributed.cache.url=alluxio://bdp.jd.com
Implement DistributeCache implemention for shuffle
Re-implement org.apache.spark.shuffle.sort.SortShuffleWriter
Re-implement org.apache.spark.shuffle.sort.HashShuffleReader
Shuffle Service on Alluxio
CPU Usage CPU Usage
TimeTime
Percent
Percent
The comparison between Alluxio FUSE and Alluxio API
Shuffle Service On Alluxio
Using Alluxio FUSE
Using Alluxio API
Separate computing and storage
ResourceM
anager
NodeManager
DataNode
NodeManager
DataNode
NodeManager
DataNode
DataNode DataNode
NodeManagerNodeManager
NameNode
Resource
Manager
Cluster1
Cluster2
Alluxio
JD Contribution to Alluxio
PMC 1
Contributor 6
PR 50
Merged PR 47
Merged Commit 218
Additions/Deletions +4150/-2251
JD Contribution to Alluxio
JD
Contribution
ui-grid based
sort/pagination/filter
add an input field
New WebUI
high watermark start evict
low watermark stop evict
Watermark evict strategy
check startup
check every time
Consistency
monitor JVM pause Periodically
log message and metrics
JVM Pause Monitor
cp/ls/load/rm/
format
Shell Command
DeadLock
thrift add timeout time
…
Bug fix
Shell
RESTful API
Change Log Level
SyncQuery
AlluxioTools
…
Test
Thank You!
baibing3@jd.com
huangtao6@jd.com

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com's Computation Frameworks

  • 1.
    Using Alluxio asa fault-tolerant pluggable optimization component of JD.com's computation frameworks 2018-09-13 Bing Bai, JD.com Tao Huang, JD.com
  • 2.
    Introduce JD.com andBDP’s architecture and business JD and BDP 01 02 The JD use case of Alluxio JDPresto on Alluxio 03 Alluxio on yarn & shuffle service & storage-computing separation Ongoing Exploration Contents
  • 3.
    JD & BDP (BigData Platform) 2
  • 4.
    JD Introduction • China’slargest retailer, online or offline • First Chinese internet company to make the Fortune Global 500 list • Strict “zero-tolerance” policy toward counterfeit goods. Customers trust JD because the brand is a guarantee of authenticity 2012 2013 2014 2015 2016 2017 系列 1 Rapid Growth in GMV in Last Six Years* 144.5 billion 93.3 billion 13.4 billion 23.5 billion 46.8 billion Sustained, Rapid Growth 199.1 billion
  • 5.
    JD BDP Platform 30k+Node, off-line cluster 18k+, user 6000+ Cluster scale Computing ability off-line data daily 40PB+, Job daily 1millon+ 450PB+, daily increase 500TB+ Business capability business 40+, data model 450+ Storage capacity
  • 6.
  • 7.
  • 8.
    JDPresto on Alluxio JDPrestoon Alluxio advantage Pluggable Fault-tolerant Locality Alluxio can be online or updated at any time, and business’s feeliing is just a little slow When we use Alluxio for JDPresto, we make some changes and bring some good features. When Alluxio unable to access,JDPresto can access HDFS directly. Reduce the remote read • Alluxio led to 10x performance improvement • 100+ nodes • More than 1 year.
  • 9.
    JDPresto on Alluxio Locality Isolation loadonce use every time ō AfterBefore
  • 10.
    JDPresto on Alluxio PrestoHDFS Alluxio Access Alluxio exception Access HDFS directly Read HDFS Data Cache to Alluxio Read Alluxio
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Shuffle Service onAlluxio 17 Disk I/O performance bottleneck Not enough space for the local disk Executor fails without recalculating Uniform data TTL ensures that temporary files are deleted.
  • 18.
    Shuffle Service onAlluxio Shuffle Write phase Alluxio Node Alluxio Node Alluxio Node Map Map Map Shuffle Read phase Alluxio Cluster Alluxio Node ReduceReduce
  • 19.
    Shuffle Service onAlluxio spark-default.conf spark.shuffle.service.enabled=true spark.shuffle.store.type = DistributedCache spark.shuffle.store.distributed.cache.url=alluxio://bdp.jd.com Implement DistributeCache implemention for shuffle Re-implement org.apache.spark.shuffle.sort.SortShuffleWriter Re-implement org.apache.spark.shuffle.sort.HashShuffleReader
  • 20.
    Shuffle Service onAlluxio CPU Usage CPU Usage TimeTime Percent Percent The comparison between Alluxio FUSE and Alluxio API
  • 21.
    Shuffle Service OnAlluxio Using Alluxio FUSE Using Alluxio API
  • 22.
    Separate computing andstorage ResourceM anager NodeManager DataNode NodeManager DataNode NodeManager DataNode DataNode DataNode NodeManagerNodeManager NameNode Resource Manager Cluster1 Cluster2 Alluxio
  • 23.
    JD Contribution toAlluxio PMC 1 Contributor 6 PR 50 Merged PR 47 Merged Commit 218 Additions/Deletions +4150/-2251
  • 24.
    JD Contribution toAlluxio JD Contribution ui-grid based sort/pagination/filter add an input field New WebUI high watermark start evict low watermark stop evict Watermark evict strategy check startup check every time Consistency monitor JVM pause Periodically log message and metrics JVM Pause Monitor cp/ls/load/rm/ format Shell Command DeadLock thrift add timeout time … Bug fix Shell RESTful API Change Log Level SyncQuery AlluxioTools … Test
  • 25.