Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com's Computation Frameworks

Using Alluxio as a fault-tolerant pluggable optimization
component of JD.com's computation frameworks
2018-09-13
Bing Bai, JD.com
Tao Huang, JD.com

Introduce JD.com and BDP’s architecture and business
JD and BDP
01
02 The JD use case of Alluxio
JDPresto on Alluxio
03 Alluxio on yarn & shuffle service & storage-computing separation
Ongoing Exploration
Contents

JD & BDP
（Big Data Platform）
2

JD Introduction
• China’s largest retailer, online or offline
• First Chinese internet company to make the
Fortune Global 500 list
• Strict “zero-tolerance” policy toward
counterfeit goods. Customers trust JD
because the brand is a guarantee of
authenticity
2012 2013 2014 2015 2016 2017
系列 1
Rapid Growth in GMV in Last Six Years*
144.5
billion
93.3
billion
13.4
billion
23.5
billion
46.8
billion
Sustained, Rapid Growth
199.1
billion

JD BDP Platform
30k+ Node, off-line
cluster 18k+, user
6000+
Cluster
scale
Computing
ability
off-line data daily
40PB+, Job daily
1millon+
450PB+, daily
increase 500TB+
Business
capability
business 40+, data
model 450+
Storage
capacity

JDPresto on Alluxio
JDPresto on Alluxio advantage
Pluggable
Fault-tolerant
Locality
Alluxio can be online or updated at any time, and business’s feeliing is
just a little slow
When we use Alluxio for JDPresto, we make some changes
and bring some good features.
When Alluxio unable to access，JDPresto can access HDFS directly.
Reduce the remote read
• Alluxio led to 10x performance
improvement
• 100+ nodes
• More than 1 year.

JDPresto on Alluxio
Locality
Isolation
load once
use every time
≈ç
AfterBefore

JDPresto on Alluxio
Presto HDFS
Alluxio
Access Alluxio exception
Access HDFS directly
Read HDFS
Data Cache to Alluxio
Read Alluxio

JDPresto on Alluxio
13
Speed Contrast

Alluxio on YARN
ResourceM
anager
NodeManager
Alluxio
AppMaster
Alluxio
Worker
Alluxio
Master
Alluxio
Master
NodeManager
Alluxio
Worker
Alluxio
Worker
Client
Spark
Presto
• Unified resource management
• Better elasticity
• Better configuration control and management

Shuffle Service on Alluxio
17
Disk I/O performance bottleneck
Not enough space for the local disk
Executor fails without recalculating
Uniform data TTL ensures that
temporary files are deleted.

Shuffle Write phase
Alluxio Node Alluxio Node Alluxio Node
Map Map Map
Shuffle Read phase
Alluxio Cluster
Alluxio Node
ReduceReduce

spark-default.conf
spark.shuffle.service.enabled=true
spark.shuffle.store.type = DistributedCache
spark.shuffle.store.distributed.cache.url=alluxio://bdp.jd.com
Implement DistributeCache implemention for shuffle
Re-implement org.apache.spark.shuffle.sort.SortShuffleWriter
Re-implement org.apache.spark.shuffle.sort.HashShuffleReader

CPU Usage CPU Usage
TimeTime
Percent
Percent
The comparison between Alluxio FUSE and Alluxio API

Shuffle Service On Alluxio
Using Alluxio FUSE
Using Alluxio API

Separate computing and storage
ResourceM
anager
NodeManager
DataNode
NodeManager
DataNode
NodeManager
DataNode
DataNode DataNode
NodeManagerNodeManager
NameNode
Resource
Manager
Cluster1
Cluster2
Alluxio

JD Contribution to Alluxio
PMC 1
Contributor 6
PR 50
Merged PR 47
Merged Commit 218
Additions/Deletions +4150/-2251

JD Contribution to Alluxio
JD
Contribution
ui-grid based
sort/pagination/filter
add an input field
New WebUI
high watermark start evict
low watermark stop evict
Watermark evict strategy
check startup
check every time
Consistency
monitor JVM pause Periodically
log message and metrics
JVM Pause Monitor
cp/ls/load/rm/
format
Shell Command
DeadLock
thrift add timeout time
…
Bug fix
Shell
RESTful API
Change Log Level
SyncQuery
AlluxioTools
…
Test

Thank You!
baibing3@jd.com
huangtao6@jd.com

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com's Computation Frameworks

More Related Content

What's hot

Similar to Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com's Computation Frameworks

More from Alluxio, Inc.

Recently uploaded

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com's Computation Frameworks