How Spark Fits Baidu’s Scale
Dr. James Peng
Principal Architect, Baidu U.S.A.
Baidu is Leading Chinese Search Engine
No. 1
Website in China
By average daily visitors
and page views
No. 1
Chinese search engine
By number of search
question conducted
No. 1
Most valuable brand
Among private sector
companies in China
73
%
Search market share
in China
$73
bn.
Market capitalization
Premium global
Internet company
$2 bn.
Q1 2015 total revenues
By number of search
question conducted
47,000
Employees
Top 3 China’s
best employer
Baidu’s Big Data Infrastructure
Baidu’s History in Big Data Infrastructure
2003 2008 2009 2010 2011 2012 2013 2014
Distributed
Search
System
Hadoop in
production
Distributed
webpage storage:
storing over 100
billion web pages
Large-scale
machine learning
platform
> 10000 nodes
Distributed
computing engine
in production
Real-time
streaming system
with delay in
milliseconds range
Large-scale
deployment of in-house
developed network
switcher and ARM
servers
Large scale
DNN: supporting
over 100 billion
features
Largest MR Cluster
0
2000
4000
6000
8000
10000
12000
14000
2008 2009 2010 2011 2012 2013
MR Cluster Size
Number of Nodes
0
200000
400000
600000
800000
1000000
1200000
2008 2009 2010 2011 2012 2013
MR Cluster Daily Load
Number of Jobs
Baidu’s Distributed Shuffle Engine
http://sortbenchmark.org/BaiduSort2014.pdf
Champion of 2014 Sort Benchmark
Indy
2014, 8.38 TB/min
BaiduSort
100 TB in 716 seconds 
982 nodes x 
(2 2.10Ghz Intel Xeon E5-2450, 192 GB memory, 8x3TB
7200 RPM SATA) 
Dasheng Jiang 
Baidu Inc. and Peking University
Baidu Proprietary Platforms
DCE/DAG
•  DAG model
Distributed
Computing
Engines
•  Outperforms
open-source
MR by at
least 30%
DSTREAM
•  Distributed real-
time
computation
engine
•  99.99% of
tasks with
latency less
than 50 ms
TM
•  mini-batch
execution
engine
•  Transaction-
based,
guarantee
exactly once
delivery
PADDLE
•  PArallel
Distributed
Deep Learning
Engine
•  With CPU /
GPU support
and suitable
for vision,
speech, and
NLP
workloads
ELF
•  Essential
Learning
Framework
•  Dataflow-
based
programming
model
•  Parameter
server able to
store > 1
trillion
parameters
THERE’S STRONG, AND THEN THERE’S BAIDU STRONG
A Spark in Baidu
•  Scale: > 1000 nodes(20000 Cores, 100 TB Memory)
•  Daily Jobs:2000~3000
•  Supported Business Units: Ads, Search, Map, Commerce, etc
Initial
Evaluation
Spark
0.8
Internal
Beta
Usage
Spark
0.9
Beta
Cluster
with 200
nodes
Spark
1.0
Incubation
Integrate
into Baidu
Open
Cloud
Spark
1.1
Spark SQL
in
Production
Spark
1.2
Continue
Scaling
Spark
1.3
Production
and Scaling
Spark’s History in Baidu
Spark in Baidu Internal Infrastructure
SparkDSTREAM TM DCE/DAG ELF Compute
Layer
Baidu Data Centers + Resource Management + Resource Scheduling
Resource
Layer
Dataflow Compiler
Java C++ Python Scala SQL
Programming
Layer
Spark in Baidu Open Cloud
Baidu Cloud Cluster
HDFS
Map Reduce
Baidu Object Storage
HBase
Pig Hive SQL Streaming GraphX MLLib
BMR Console BMR SDK Audit
Scheduler
Management
http://bce.baidu.com/
Enabling Interactive Queries with
Spark and Tachyon
Need for Interaction (Speed)
Problem
•  John is a PM and he needs to keep track of the top queries submitted
to Baidu everyday
•  Based on the top queries of the day, he will perform additional analysis
•  But John is very frustrated that each query takes tens of minutes
Requirements for Baidu Interactive Query Engine
•  Manages Petabytes of data
•  Able to finish 95% of queries within 30 seconds
Query UI
Data Warehouse
Baidu File System
Storage Layer
Tachyon	
  Hot Data Layer
Baidu Interactive Query Architecture
Compute Layer Meta Store
HiveQL
Spark SQL
UDFs SerDes
Apache Spark
Baidu Interactive Query Performance
> 50X Acceleration of Big
Data Analytics Workloads
0
200
400
600
800
1000
1200
MR (sec) Spark (sec) Spark + Tachyon
(sec)
Setup:
1.  Use MR to query 6 TB of data
2.  Use Spark to query 6 TB of data
3.  Use Spark + Tachyon to query 6 TB
of data
Summary
Spark and Tachyon provides
significant performance advantage
•  Suitable for latency
sensitive workloads,
such as interactive ad-
hoc query
Takes time for Spark to reach MR
scale
•  We still use MR for
high throughput
batch workloads
•  Deployment Scale:
Spark 1000 nodes vs.
MR 13000 nodes
•  Memory Management
is one of the main
blockers of scalability
Look forward to Spark
Improvements
•  Project Tungsten:
better memory
management: main
blocker of scalability
•  Continuous
improvement of
Catalyst
•  Support for multiple
meta stores
Looking Into the Future
Spark + TachyonSystems Software
FPGA + GPU
Acceleration
Hardware Faster Network
Spark
SQL
Applications DNN Spark R
Spark
MlLib
Speed
Functionality

How Spark Fits into Baidu's Scale-(James Peng, Baidu)

  • 1.
    How Spark FitsBaidu’s Scale Dr. James Peng Principal Architect, Baidu U.S.A.
  • 2.
    Baidu is LeadingChinese Search Engine No. 1 Website in China By average daily visitors and page views No. 1 Chinese search engine By number of search question conducted No. 1 Most valuable brand Among private sector companies in China 73 % Search market share in China $73 bn. Market capitalization Premium global Internet company $2 bn. Q1 2015 total revenues By number of search question conducted 47,000 Employees Top 3 China’s best employer
  • 3.
    Baidu’s Big DataInfrastructure
  • 4.
    Baidu’s History inBig Data Infrastructure 2003 2008 2009 2010 2011 2012 2013 2014 Distributed Search System Hadoop in production Distributed webpage storage: storing over 100 billion web pages Large-scale machine learning platform > 10000 nodes Distributed computing engine in production Real-time streaming system with delay in milliseconds range Large-scale deployment of in-house developed network switcher and ARM servers Large scale DNN: supporting over 100 billion features
  • 5.
    Largest MR Cluster 0 2000 4000 6000 8000 10000 12000 14000 20082009 2010 2011 2012 2013 MR Cluster Size Number of Nodes 0 200000 400000 600000 800000 1000000 1200000 2008 2009 2010 2011 2012 2013 MR Cluster Daily Load Number of Jobs
  • 6.
    Baidu’s Distributed ShuffleEngine http://sortbenchmark.org/BaiduSort2014.pdf Champion of 2014 Sort Benchmark Indy 2014, 8.38 TB/min BaiduSort 100 TB in 716 seconds  982 nodes x  (2 2.10Ghz Intel Xeon E5-2450, 192 GB memory, 8x3TB 7200 RPM SATA)  Dasheng Jiang  Baidu Inc. and Peking University
  • 7.
    Baidu Proprietary Platforms DCE/DAG • DAG model Distributed Computing Engines •  Outperforms open-source MR by at least 30% DSTREAM •  Distributed real- time computation engine •  99.99% of tasks with latency less than 50 ms TM •  mini-batch execution engine •  Transaction- based, guarantee exactly once delivery PADDLE •  PArallel Distributed Deep Learning Engine •  With CPU / GPU support and suitable for vision, speech, and NLP workloads ELF •  Essential Learning Framework •  Dataflow- based programming model •  Parameter server able to store > 1 trillion parameters THERE’S STRONG, AND THEN THERE’S BAIDU STRONG
  • 8.
  • 9.
    •  Scale: >1000 nodes(20000 Cores, 100 TB Memory) •  Daily Jobs:2000~3000 •  Supported Business Units: Ads, Search, Map, Commerce, etc Initial Evaluation Spark 0.8 Internal Beta Usage Spark 0.9 Beta Cluster with 200 nodes Spark 1.0 Incubation Integrate into Baidu Open Cloud Spark 1.1 Spark SQL in Production Spark 1.2 Continue Scaling Spark 1.3 Production and Scaling Spark’s History in Baidu
  • 10.
    Spark in BaiduInternal Infrastructure SparkDSTREAM TM DCE/DAG ELF Compute Layer Baidu Data Centers + Resource Management + Resource Scheduling Resource Layer Dataflow Compiler Java C++ Python Scala SQL Programming Layer
  • 11.
    Spark in BaiduOpen Cloud Baidu Cloud Cluster HDFS Map Reduce Baidu Object Storage HBase Pig Hive SQL Streaming GraphX MLLib BMR Console BMR SDK Audit Scheduler Management http://bce.baidu.com/
  • 12.
    Enabling Interactive Querieswith Spark and Tachyon
  • 13.
    Need for Interaction(Speed) Problem •  John is a PM and he needs to keep track of the top queries submitted to Baidu everyday •  Based on the top queries of the day, he will perform additional analysis •  But John is very frustrated that each query takes tens of minutes Requirements for Baidu Interactive Query Engine •  Manages Petabytes of data •  Able to finish 95% of queries within 30 seconds
  • 14.
    Query UI Data Warehouse BaiduFile System Storage Layer Tachyon  Hot Data Layer Baidu Interactive Query Architecture Compute Layer Meta Store HiveQL Spark SQL UDFs SerDes Apache Spark
  • 15.
    Baidu Interactive QueryPerformance > 50X Acceleration of Big Data Analytics Workloads 0 200 400 600 800 1000 1200 MR (sec) Spark (sec) Spark + Tachyon (sec) Setup: 1.  Use MR to query 6 TB of data 2.  Use Spark to query 6 TB of data 3.  Use Spark + Tachyon to query 6 TB of data
  • 16.
    Summary Spark and Tachyonprovides significant performance advantage •  Suitable for latency sensitive workloads, such as interactive ad- hoc query Takes time for Spark to reach MR scale •  We still use MR for high throughput batch workloads •  Deployment Scale: Spark 1000 nodes vs. MR 13000 nodes •  Memory Management is one of the main blockers of scalability Look forward to Spark Improvements •  Project Tungsten: better memory management: main blocker of scalability •  Continuous improvement of Catalyst •  Support for multiple meta stores
  • 17.
    Looking Into theFuture Spark + TachyonSystems Software FPGA + GPU Acceleration Hardware Faster Network Spark SQL Applications DNN Spark R Spark MlLib Speed Functionality