Hadoop Hardware @Twitter: Size does matter!

  • 4,598 views
Uploaded on

At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by …

At Twitter we started out with a large monolithic cluster that served most of the use-cases. As the usage expanded and the cluster grew accordingly, we realized we needed to split the cluster by access pattern. This allows us to tune the access policy, SLA, and configuration for each cluster. We will explain our various use-cases, their performance requirements, and operational considerations and how those are served by the corresponding clusters. We will discuss what our baseline Hadoop node looks like. Various, sometimes competing, considerations such as storage size, disk IO, CPU throughput, fewer fast cores versus many slower cores, 1GE bonded network interfaces versus a single 10 GE card, 1T, 2T or 3T disk drives, and power draw all need to be considered in a trade-off where cost and performance are major factors. We will show how we have arrived at quite different hardware platforms at Twitter, not only saving money, but also increasing performance.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Video: http://youtu.be/rqzbLd110NY
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,598
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
107
Comments
1
Likes
10

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3
  • 2. @Twitter#HadoopSummit2013 2 Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay Shenoy Hardware Engineer @ Twitter Engineering Manager HW @ Twitter Follow me @eecraft HW & Hadoop teams @ Twitter, Many others • • • • • • • • • About us
  • 3. @Twitter#HadoopSummit2013 3 Scale of Hadoop Clusters Single versus multiple clusters Twitter Hadoop Architecture Hardware investigations Results • • • • • Agenda
  • 4. @Twitter#HadoopSummit2013 Scale 4 Scaling limits JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent slots Namenode 250-300 M objects in single namespace Namenode @~100 GB heap -> full GC pauses Shipping job jars to 1,000’s of nodes JobHistory server at a few 100’s K job history/conf files • • • • • • # Nodes
  • 5. @Twitter#HadoopSummit2013 When / why to split clusters ? 5 In principle preference for single cluster Common logs, shared free space, reduced admin burden, more rack diversity Varying SLA’s Workload diversity Storage intensive Processing (CPU / Disk IO) intensive Network intensive Data access Hot, Warm, Cold • • • • • • • • •
  • 6. @Twitter#HadoopSummit2013 Cluster Architecture 6
  • 7. @Twitter#HadoopSummit2013 Hardware investigations 7
  • 8. @Twitter#HadoopSummit2013 8 Hadoop does not need live HDD swap Twitter DC : No SLA on data nodes Rack SLA : Only 1 rack down at any time in a cluster • • • Service criteria for hardware
  • 9. @Twitter#HadoopSummit2013 9 Baseline Hadoop Server (~ early 2012) E56xx DIMM DIMM DIMM E56xx DIMM DIMM DIMM PCH NIC GbE HBA Expander Works for the general cluster, but... Need more density for storage Potential IO bottlenecks • • Characteristics: Standard 2U server 20 servers / rack E5645 CPU Dual 6-core 72GB memory 12 x 2TB HDD 2 x 1 GbE • • • • • • •
  • 10. @Twitter#HadoopSummit2013 10 Hadoop Server: Possible evolution Characteristics: + CPU performance ? 20 servers / rack Candidate for DW • NIC GbE HBA Expander 16 x 2T? 16 x 3T? 24 x 3T? E5-26xx or E5-24xx DIMM DIMM DIMM DIMM E5-26xx or E5-24xx DIMM DIMM DIMM DIMM 10GbE ? Can deploy into the general DW cluster, but... Too much CPU for storage intensive apps Server failure domain too large if we scale up disks • •
  • 11. @Twitter#HadoopSummit2013 Rethinking hardware evolution 11 Debunking myths Bigger is always better One size fits all Back to Hadoop Hardware Roots: Scale horizontally, not vertically Twitter Hadoop Server - “THS” • • • • •
  • 12. @Twitter#HadoopSummit2013 12 NIC SAS HBA E3-12xx DIMM DIMM PCH GbE THS for backups Storage focus: Cost efficient (single socket, 3T drives) Less memory needed • • Characteristics: + IO Performance Few fast cores E3-1230 V2 CPU 16 GB memory 12 x 3 TB HDD SSD boot 2 x 1 GbE • • • • • •
  • 13. @Twitter#HadoopSummit2013 13 THS variant for Hadoop-Proc and HBase NIC SAS HBA 10GbE E3-12xx DIMM DIMM PCH Characteristics: + IO Performance Few fast cores E3-1230 V2 CPU 32 GB memory 12 x 1 TB HDD SSD boot 1 x 10 GbE • • • • • • Processing / throughput focus: Cost efficient (single socket, 1T drives) More disk and network IO per socket • •
  • 14. @Twitter#HadoopSummit2013 14 THS for cold cluster NIC SAS HBA E3-12xx DIMM DIMM PCH GbE Characteristics: Disk Efficiency Some compute E3-1230 V2 CPU 32 GB memory 12 x 3 TB HDD 2 x 1 GbE • • • • • •Combination of previous 2 use cases: Space & power efficient Storage dense and some processing capabilities • •
  • 15. @Twitter#HadoopSummit2013 15 Rack-level view Baseline Twitter Hadoop Server Backups Proc Cold Power ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kW CPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GB Spindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TB Uplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps 1G TOR 1G TOR 1G TOR 1G TOR 1G TOR10G TOR
  • 16. @Twitter#HadoopSummit2013 16 Processing performance comparison Benchmark Baseline Server THS (-Cold) TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / node TeraGen (30TB replication = 3) 1:36 hrs 1:35 hrs TeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs 2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrs Application #1 4:37 min 3:09 min Application set #2 13:3 hrs 10:57 hrs Performance benchmark set up: Each clusters 102 nodes of respective type Efficient server = 3 racks, Baseline 5+ racks “Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3 • • •
  • 17. @Twitter#HadoopSummit2013 Results 17
  • 18. @Twitter#HadoopSummit2013 16 LZO performance comparison 18
  • 19. @Twitter#HadoopSummit2013 Recap 19 At a certain scale it makes sense to split into multiple clusters For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP For large enough clusters, depending on use-case, it may be worth to choose different HW configurations • • •
  • 20. @Twitter#HadoopSummit2013 Conclusion 20 @Twitter our “Twitter Hadoop Server” not only saves many $$$, it is also faster !
  • 21. #ThankYou @joep and @eecraft Come talk to us at booth 26