Your SlideShare is downloading. ×

Hadoop Hardware @Twitter: Size does matter.

348

Published on

@joep and @eecraft …

@joep and @eecraft
Hadoop Summit 2013

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
348
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3
  • 2. About us Joep Rottinghuis • • • • Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay Shenoy • • • • • Software Engineer @ Twitter Hardware Engineer @ Twitter Engineering Manager HW @ Twitter Follow me @eecraft HW & Hadoop teams @ Twitter, Many others #HadoopSummit2013 @Twitter 2
  • 3. Agenda • Scale of Hadoop Clusters • Single versus multiple clusters • Twitter Hadoop Architecture Hardware investigations • Results • #HadoopSummit2013 @Twitter 3
  • 4. Scale • Scaling limits # Nodes • JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent slots • • • • Namenode 250-300 M objects in single namespace Namenode @~100 GB heap -> full GC pauses Shipping job jars to 1,000’s of nodes #HadoopSummit2013 JobHistory server at a few 100’s K job history/conf files @Twitter 4
  • 5. When / why to split clusters ? • In principle preference for single cluster • Common logs, shared free space, reduced admin burden, more rack diversity • Varying SLA’s • Workload diversity • • • • Storage intensive Processing (CPU / Disk IO) intensive Network intensive Data access • Hot, Warm, Cold #HadoopSummit2013 @Twitter 5
  • 6. Cluster Architecture #HadoopSummit2013 @Twitter 6
  • 7. Hardware investigations #HadoopSummit2013 @Twitter 7
  • 8. Service criteria for hardware • • • Hadoop does not need live HDD swap Twitter DC : No SLA on data nodes Rack SLA : Only 1 rack down at any time in a cluster #HadoopSummit2013 @Twitter 8
  • 9. Baseline Hadoop Server (~ early 2012) DIMM DIMM GbE E56xx PCH NIC Characteristics: DIMM • Standard 2U HBA • 20 servers / rack DIMM DIMM E56xx DIMM Works for the general cluster, but... • Need more density for storage • Potential IO bottlenecks #HadoopSummit2013 server Expander • E5645 CPU • Dual 6-core • 72GB memory • 12 x 2TB HDD • 2 x 1 GbE @Twitter 9
  • 10. Hadoop Server: Possible evolution DIMM DIMM DIMM E5-26xx or E5-24xx GbE NIC 10GbE ? Characteristics: DIMM + CPU performance ? 20 servers / rack DIMM DIMM DIMM E5-26xx or E5-24xx HBA DIMM Expander 16 x 2T? 16 x 3T? 24 x 3T? • Candidate for DW Can deploy into the general DW cluster, but... • Too much CPU for storage intensive apps • Server failure domain too large if we scale up disks #HadoopSummit2013 @Twitter 10
  • 11. Rethinking hardware evolution • Debunking myths • Bigger is always better • One size fits all • Back to Hadoop Hardware Roots: • Scale horizontally, not vertically Twitter Hadoop Server - “THS” #HadoopSummit2013 @Twitter 11
  • 12. THS for backups GbE NIC DIMM + IO Performance E3-12xx • Few fast cores DIMM SAS HBA Storage focus: • Cost efficient (single socket, 3T drives) Characteristics: PCH • E3-1230 V2 CPU • 16 GB memory • 12 x 3 TB HDD • SSD boot • 2 x 1 GbE • Less memory needed #HadoopSummit2013 @Twitter 12
  • 13. THS variant for Hadoop-Proc and HBase NIC DIMM 10GbE Characteristics: + IO Performance E3-12xx DIMM • Few fast cores SAS HBA Processing / throughput focus: • Cost efficient (single socket, 1T drives) PCH • E3-1230 V2 CPU • 32 GB memory • 12 x 1 TB HDD • SSD boot • 1 x 10 GbE • More disk and network IO per socket #HadoopSummit2013 @Twitter 13
  • 14. THS for cold cluster GbE NIC DIMM • Disk Efficiency • Some compute E3-12xx DIMM SAS HBA Combination of previous 2 use cases: Characteristics: PCH • E3-1230 V2 CPU • 32 GB memory • 12 x 3 TB HDD • 2 x 1 GbE • Space & power efficient • Storage dense and some processing capabilities #HadoopSummit2013 @Twitter 14
  • 15. Rack-level view 1G TOR Baseline 1G TOR 1G TOR 10G TOR 1G TOR 1G TOR Twitter Hadoop Server Backups Proc Cold ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kW CPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GB Spindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TB Uplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps Power #HadoopSummit2013 @Twitter 15
  • 16. Processing performance comparison Benchmark Baseline Server THS (-Cold) TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / node TeraGen (30TB replication = 3) 1:36 hrs 1:35 hrs TeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs 2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrs Application #1 4:37 min 3:09 min Application set #2 13:3 hrs 10:57 hrs Performance benchmark set up: • Each clusters 102 nodes of respective type • Efficient server = 3 racks, Baseline 5+ racks • “Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3 #HadoopSummit2013 @Twitter 16
  • 17. Results #HadoopSummit2013 @Twitter 17
  • 18. LZO performance comparison #HadoopSummit2013 @Twitter 18 16
  • 19. Recap • At a certain scale it makes sense to split into multiple clusters • For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP • For large enough clusters, depending on use-case, it may be worth to choose different HW configurations #HadoopSummit2013 @Twitter 19
  • 20. Conclusion @Twitter our “Twitter Hadoop Server” not only saves many $$$, it is also faster ! #HadoopSummit2013 @Twitter 20
  • 21. #ThankYou @joep and @eecraft Come talk to us at booth 26

×