Hadoop Hardware @Twitter: Size does matter.

  • 296 views
Uploaded on

@joep and @eecraft …

@joep and @eecraft
Hadoop Summit 2013

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
296
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3
  • 2. About us Joep Rottinghuis • • • • Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay Shenoy • • • • • Software Engineer @ Twitter Hardware Engineer @ Twitter Engineering Manager HW @ Twitter Follow me @eecraft HW & Hadoop teams @ Twitter, Many others #HadoopSummit2013 @Twitter 2
  • 3. Agenda • Scale of Hadoop Clusters • Single versus multiple clusters • Twitter Hadoop Architecture Hardware investigations • Results • #HadoopSummit2013 @Twitter 3
  • 4. Scale • Scaling limits # Nodes • JobTracker 10’s thousands of jobs per day; 10’s Ks concurrent slots • • • • Namenode 250-300 M objects in single namespace Namenode @~100 GB heap -> full GC pauses Shipping job jars to 1,000’s of nodes #HadoopSummit2013 JobHistory server at a few 100’s K job history/conf files @Twitter 4
  • 5. When / why to split clusters ? • In principle preference for single cluster • Common logs, shared free space, reduced admin burden, more rack diversity • Varying SLA’s • Workload diversity • • • • Storage intensive Processing (CPU / Disk IO) intensive Network intensive Data access • Hot, Warm, Cold #HadoopSummit2013 @Twitter 5
  • 6. Cluster Architecture #HadoopSummit2013 @Twitter 6
  • 7. Hardware investigations #HadoopSummit2013 @Twitter 7
  • 8. Service criteria for hardware • • • Hadoop does not need live HDD swap Twitter DC : No SLA on data nodes Rack SLA : Only 1 rack down at any time in a cluster #HadoopSummit2013 @Twitter 8
  • 9. Baseline Hadoop Server (~ early 2012) DIMM DIMM GbE E56xx PCH NIC Characteristics: DIMM • Standard 2U HBA • 20 servers / rack DIMM DIMM E56xx DIMM Works for the general cluster, but... • Need more density for storage • Potential IO bottlenecks #HadoopSummit2013 server Expander • E5645 CPU • Dual 6-core • 72GB memory • 12 x 2TB HDD • 2 x 1 GbE @Twitter 9
  • 10. Hadoop Server: Possible evolution DIMM DIMM DIMM E5-26xx or E5-24xx GbE NIC 10GbE ? Characteristics: DIMM + CPU performance ? 20 servers / rack DIMM DIMM DIMM E5-26xx or E5-24xx HBA DIMM Expander 16 x 2T? 16 x 3T? 24 x 3T? • Candidate for DW Can deploy into the general DW cluster, but... • Too much CPU for storage intensive apps • Server failure domain too large if we scale up disks #HadoopSummit2013 @Twitter 10
  • 11. Rethinking hardware evolution • Debunking myths • Bigger is always better • One size fits all • Back to Hadoop Hardware Roots: • Scale horizontally, not vertically Twitter Hadoop Server - “THS” #HadoopSummit2013 @Twitter 11
  • 12. THS for backups GbE NIC DIMM + IO Performance E3-12xx • Few fast cores DIMM SAS HBA Storage focus: • Cost efficient (single socket, 3T drives) Characteristics: PCH • E3-1230 V2 CPU • 16 GB memory • 12 x 3 TB HDD • SSD boot • 2 x 1 GbE • Less memory needed #HadoopSummit2013 @Twitter 12
  • 13. THS variant for Hadoop-Proc and HBase NIC DIMM 10GbE Characteristics: + IO Performance E3-12xx DIMM • Few fast cores SAS HBA Processing / throughput focus: • Cost efficient (single socket, 1T drives) PCH • E3-1230 V2 CPU • 32 GB memory • 12 x 1 TB HDD • SSD boot • 1 x 10 GbE • More disk and network IO per socket #HadoopSummit2013 @Twitter 13
  • 14. THS for cold cluster GbE NIC DIMM • Disk Efficiency • Some compute E3-12xx DIMM SAS HBA Combination of previous 2 use cases: Characteristics: PCH • E3-1230 V2 CPU • 32 GB memory • 12 x 3 TB HDD • 2 x 1 GbE • Space & power efficient • Storage dense and some processing capabilities #HadoopSummit2013 @Twitter 14
  • 15. Rack-level view 1G TOR Baseline 1G TOR 1G TOR 10G TOR 1G TOR 1G TOR Twitter Hadoop Server Backups Proc Cold ~ 8 kW ~ 8 kW ~ 8 kW ~ 8 kW CPU sockets; DRAM 40; 1440 GB 40; 640 GB 40; 1280 GB 40; 1280 GB Spindles; TB raw 240; 480 TB 480; 1,440 TB 480; 480 TB 480; 1,440 TB Uplink; Internal BW 20 ; 40 Gbps 20 ; 80 Gbps 40 ; 400 Gbps 20 ; 80 Gbps Power #HadoopSummit2013 @Twitter 15
  • 16. Processing performance comparison Benchmark Baseline Server THS (-Cold) TestDFSIO (write replication = 1) 360 MB/s / node 780 MB/s / node TeraGen (30TB replication = 3) 1:36 hrs 1:35 hrs TeraSort (30 TB, replication = 3) 6:11 hrs 4:22 hrs 2 Parallel TeraSort (30 TB each, replication = 3) 10:36 hrs 6:21 hrs Application #1 4:37 min 3:09 min Application set #2 13:3 hrs 10:57 hrs Performance benchmark set up: • Each clusters 102 nodes of respective type • Efficient server = 3 racks, Baseline 5+ racks • “Dated” stack: CentOS 5.5, Sun 1.6 JRE, Hadoop 2.0.3 #HadoopSummit2013 @Twitter 16
  • 17. Results #HadoopSummit2013 @Twitter 17
  • 18. LZO performance comparison #HadoopSummit2013 @Twitter 18 16
  • 19. Recap • At a certain scale it makes sense to split into multiple clusters • For us: RT, PROC, DW, COLD, BACKUPS, TST, EXP • For large enough clusters, depending on use-case, it may be worth to choose different HW configurations #HadoopSummit2013 @Twitter 19
  • 20. Conclusion @Twitter our “Twitter Hadoop Server” not only saves many $$$, it is also faster ! #HadoopSummit2013 @Twitter 20
  • 21. #ThankYou @joep and @eecraft Come talk to us at booth 26