Hadoop Hardware @Twitter: Size does matter.

Hadoop Hardware @Twitter:
Size does matter.
@joep and @eecraft
Hadoop Summit 2013

v2.3

About us
Joep Rottinghuis

•

•
•
•

Engineering Manager Hadoop/HBase team @ Twitter
Follow me @joep

Jay Shenoy

•

•
•
•
•

Software Engineer @ Twitter

Hardware Engineer @ Twitter
Engineering Manager HW @ Twitter
Follow me @eecraft

HW & Hadoop teams @ Twitter, Many others

#HadoopSummit2013

@Twitter

2

Agenda
•

Scale of Hadoop Clusters

•

Single versus multiple clusters

•

Twitter Hadoop Architecture
Hardware investigations

•

Results

•

#HadoopSummit2013

@Twitter

3

Scale

•

Scaling limits

# Nodes

• JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent
slots

•
•
•
•

Namenode 250-300 M objects in single namespace
Namenode @~100 GB heap -> full GC pauses
Shipping job jars to 1,000’s of nodes

#HadoopSummit2013

JobHistory server at a few 100’s K job history/conf files

@Twitter

4

When / why to split clusters ?
•

In principle preference for single cluster

•

Common logs, shared free space, reduced admin burden, more rack
diversity

•

Varying SLA’s

•

Workload diversity

•
•
•
•

Storage intensive
Processing (CPU / Disk IO) intensive
Network intensive

Data access

•

Hot, Warm, Cold

#HadoopSummit2013

@Twitter

5

Cluster Architecture

#HadoopSummit2013

@Twitter

6

Hardware investigations

#HadoopSummit2013

@Twitter

7

Service criteria for hardware
•

•

•

Hadoop does not need live HDD swap
Twitter DC : No SLA on data nodes
Rack SLA : Only 1 rack down at any time in a cluster

#HadoopSummit2013

@Twitter

8

Baseline Hadoop Server (~ early 2012)
DIMM
DIMM

GbE

E56xx

PCH

NIC

Characteristics:

DIMM

• Standard 2U
HBA

• 20 servers / rack

DIMM
DIMM

E56xx

DIMM

Works for the general cluster,
but...

• Need more density for storage
• Potential IO bottlenecks
#HadoopSummit2013

server

Expander

• E5645 CPU
• Dual 6-core
• 72GB memory
• 12 x 2TB HDD
• 2 x 1 GbE

@Twitter

9

Hadoop Server: Possible evolution
DIMM
DIMM
DIMM

E5-26xx or
E5-24xx

GbE

NIC

10GbE ?

Characteristics:

DIMM

+ CPU performance
? 20 servers / rack

DIMM
DIMM
DIMM

E5-26xx or
E5-24xx

HBA

DIMM

Expander

16 x 2T?
16 x 3T?
24 x 3T?

•

Candidate for
DW

Can deploy into the general DW cluster, but...

• Too much CPU for storage intensive apps
• Server failure domain too large if we scale up
disks

#HadoopSummit2013

@Twitter

10

Rethinking hardware evolution
•

Debunking myths

• Bigger is always better
• One size fits all
•

Back to Hadoop Hardware Roots:

• Scale horizontally, not vertically

Twitter Hadoop Server - “THS”
#HadoopSummit2013

@Twitter

11

THS for backups
GbE

NIC

DIMM

+ IO Performance

E3-12xx

• Few fast cores

DIMM

SAS
HBA

Storage focus:

• Cost efficient (single socket, 3T
drives)

Characteristics:

PCH

• E3-1230 V2 CPU
• 16 GB memory
• 12 x 3 TB HDD
• SSD boot
• 2 x 1 GbE

• Less memory needed
#HadoopSummit2013

@Twitter

12

THS variant for Hadoop-Proc and HBase
NIC

DIMM

10GbE

Characteristics:
+ IO Performance

E3-12xx
DIMM

• Few fast cores
SAS
HBA

Processing / throughput focus:

• Cost efficient (single socket, 1T
drives)

PCH

• E3-1230 V2 CPU
• 32 GB memory
• 12 x 1 TB HDD
• SSD boot
• 1 x 10 GbE

• More disk and network IO per
socket
#HadoopSummit2013

@Twitter

13

THS for cold cluster
GbE

NIC

DIMM

• Disk Efficiency
• Some compute

E3-12xx
DIMM

SAS
HBA

Combination of previous 2 use cases:

Characteristics:

PCH

• E3-1230 V2 CPU
• 32 GB memory
• 12 x 3 TB HDD
• 2 x 1 GbE

• Space & power efficient
• Storage dense and some processing
capabilities

#HadoopSummit2013

@Twitter

14

Rack-level view

1G TOR

Baseline

1G TOR
1G TOR

10G TOR

1G TOR
1G TOR

Twitter Hadoop Server
Backups

Proc

Cold

~ 8 kW

~ 8 kW

~ 8 kW

~ 8 kW

CPU sockets; DRAM

40; 1440 GB

40; 640 GB

40; 1280 GB

40; 1280 GB

Spindles; TB raw

240; 480 TB

480; 1,440 TB

480; 480 TB

480; 1,440 TB

Uplink; Internal BW

20 ; 40 Gbps

20 ; 80 Gbps

40 ; 400 Gbps

20 ; 80 Gbps

Power

#HadoopSummit2013

@Twitter

15

Processing performance comparison
Benchmark

Baseline Server

THS (-Cold)

TestDFSIO (write replication = 1)

360 MB/s / node

780 MB/s / node

TeraGen (30TB replication = 3)

1:36 hrs

1:35 hrs

TeraSort (30 TB, replication = 3)

6:11 hrs

4:22 hrs

2 Parallel TeraSort (30 TB each, replication = 3)

10:36 hrs

6:21 hrs

Application #1

4:37 min

3:09 min

Application set #2

13:3 hrs

10:57 hrs

Performance benchmark set up:

• Each clusters 102 nodes of respective type
• Efficient server = 3 racks, Baseline 5+ racks
• “Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3
#HadoopSummit2013

@Twitter

16

Results

#HadoopSummit2013

@Twitter

17

LZO performance comparison

#HadoopSummit2013

@Twitter

18
16

Recap
•

At a certain scale it makes sense to split into multiple clusters

• For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP
•

For large enough clusters, depending on use-case, it may be worth to choose
different HW configurations

#HadoopSummit2013

@Twitter

19

Conclusion

@Twitter our “Twitter Hadoop Server”
not only saves many $$$, it is also
faster !

#HadoopSummit2013

@Twitter

20

#ThankYou
@joep and @eecraft
Come talk to us at booth 26

Hadoop Hardware @Twitter: Size does matter.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Hadoop Hardware @Twitter: Size does matter.

Similar to Hadoop Hardware @Twitter: Size does matter. (20)

More from Michael Zhang

More from Michael Zhang (20)

Recently uploaded

Recently uploaded (20)

Hadoop Hardware @Twitter: Size does matter.