• Save
HBaseCon 2013: Scalable Network Designs for Apache HBase
 

HBaseCon 2013: Scalable Network Designs for Apache HBase

on

  • 2,622 views

Presented by: Benoit Sigoure, Arista Networks

Presented by: Benoit Sigoure, Arista Networks

Statistics

Views

Total Views
2,622
Views on SlideShare
2,118
Embed Views
504

Actions

Likes
9
Downloads
0
Comments
0

5 Embeds 504

http://dataddict.wordpress.com 494
https://dataddict.wordpress.com 6
http://feedly.com 2
http://translate.googleusercontent.com 1
http://www.google.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HBaseCon 2013: Scalable Network Designs for Apache HBase HBaseCon 2013: Scalable Network Designs for Apache HBase Presentation Transcript

    • Scalable Network Designs for Benoît“tsuna”Sigoure Member of the Yak Shaving Staff tsuna@aristanetworks.com @tsunanet • Scalable Network Design • Some Measurements • How the Network Can Help
    • 1970 A Brief History of Network Software Custom monolithic embedded OS 1980 1990 2000 Modified BSD, QNX or Linux kernel base 2010
    • 1970 A Brief History of Network Software Custom monolithic embedded OS “The switch is just a Linux box” 1980 1990 2000 Modified BSD, QNX or Linux kernel base 2010
    • 1970 A Brief History of Network Software Custom monolithic embedded OS “The switch is just a Linux box” 1980 1990 2000 Modified BSD, QNX or Linux kernel base 2010 EOS = Linux
    • 1970 A Brief History of Network Software Custom monolithic embedded OS “The switch is just a Linux box” 1980 1990 2000 Modified BSD, QNX or Linux kernel base 2010 EOS = Linux Custom ASIC
    • Inside a Modern Switch Arista 7050Q-64
    • Inside a Modern Switch 2 PSUs 4 Fans Arista 7050Q-64
    • Inside a Modern Switch 2 PSUs 4 Fans DDR3 x86 CPU USB Flash SATA SSD Arista 7050Q-64
    • Inside a Modern Switch 2 PSUs 4 Fans DDR3 x86 CPU Network chip(s) USB Flash SATA SSD Arista 7050Q-64
    • Inside a Modern Switch 2 PSUs 4 Fans DDR3 x86 CPU Network chip(s) USB Flash SATA SSD Arista 7050Q-64
    • Command API It’s All About The Software Zero Touch Replacement VmTracer CloudVision Email Wireshark Fast Failover DANZ Fabric Monitoring
    • Command API It’s All About The Software Zero Touch Replacement VmTracer CloudVision Email Wireshark Fast Failover DANZ Fabric Monitoring OpenTSDB
    • Command API #!/bin/bash It’s All About The Software Zero Touch Replacement VmTracer CloudVision Email Wireshark Fast Failover DANZ Fabric Monitoring OpenTSDB
    • Command API #!/bin/bash It’s All About The Software Zero Touch Replacement VmTracer CloudVision Email Wireshark Fast Failover DANZ Fabric Monitoring OpenTSDB$ sudo yum install <pkg>$ sudo yum install <pkg>
    • Typical Design: Keep It Simple Server 1 Server 2 Server 3 Server N Rack-1 Start small: single rack, single switch Scale: 64 nodes max 10Gbps network, line-rate 48 RJ45 Ports 7050T-64 Console Port Mgmt Port 4 QSFP+ Ports USB Port Air Vents
    • Typical Design: Keep It Simple Server 1 Server 2 Server 3 Server N Rack-1 Start small: 2 racks, 2 switches, no core Scale: 96 nodes max with 4 uplinks Oversubscription ratio 1:3 Server 1 Server 2 Server 3 Server N Rack-2 40Gbps uplinks
    • Typical Design: Keep It Simple Server 1 Server 2 Server 3 Server N Rack-1 Server 1 Server 2 Server 3 Server N Rack-2 Server 1 Server 2 Server 3 Server N Rack-3 3 racks, 3 switches, no core Scale: 144 nodes max with 4 uplinks Oversubscription ratio 1:6
    • Server 1 Server 2 Server 3 Server N Rack-1 Server 1 Server 2 Server 3 Server N Rack-2 Server 1 Server 2 Server 3 Server N Rack-3 3 racks, 4 switches Scale: 144 nodes max with 4 uplinks/rack Oversubscription ratio 1:3 4x40Gbps uplinks
    • Server 1 Server 2 Server 3 Server N Rack-1 Server 1 Server 2 Server 3 Server N Rack-2 Server 1 Server 2 Server 3 Server N Rack-3 4 racks, 5 switches Scale: 192 nodes max Oversubscription ratio 1:3 4x40Gbps uplinks Server 1 Server 2 Server 3 Server N Rack-4
    • Server 1 Server 2 Server 3 Server N Rack-1 Server 1 Server 2 Server 3 Server N Rack-2 Server 1 Server 2 Server 3 Server N Rack-3 5 to 7 racks, 7 to 9 switches Scale: 240 to 336 nodes Oversubscription ratio 1:3 2x40Gbps uplinks Server 1 Server 2 Server 3 Server N Rack-4 Server 1 Server 2 Server 3 Server N Rack-5
    • Server 1 Server 2 Server 3 Server N Rack-1 Server 1 Server 2 Server 3 Server N Rack-2 56 racks, 8 core switches Scale: 3136 nodes Oversubscription ratio 1:3 8x10Gbps uplinks Server 1 Server 2 Server 3 Server N Rack-55 Server 1 Server 2 Server 3 Server N Rack-56 ... ... ...
    • Server 1 Server 2 Server 3 Server N Rack-1 Server 1 Server 2 Server 3 Server N Rack-2 1152 racks, 16 core modular switches Scale: 55296 nodes Oversubscription ratio still 1:3 16x10Gbps uplinks Server 1 Server 2 Server 3 Server N Rack-1151 Server 1 Server 2 Server 3 Server N Rack-1152 ... ... ...
    • Layer 2 vs Layer 3 • Layer 2: “data link layer” – Ethernet “bridge” • Layer 3: “network layer” – IP “router” A B C D E F G H Uplinks blocked by STP Layer 2 E,F,G,H→ ← A, B, C, D E, F, G, H → ←A,B,C,D A B C D E F G H 0.0.0.0/0 via core1 via core2 Layer 3
    • Layer 2 vs Layer 3 • Layer 2: “data link layer” – Ethernet “bridge” • Layer 3: “network layer” – IP “router” A B C D E F G H E,F,G,H→ ← A, B, C, D E, F, G, H → ←A,B,C,D A B C D E F G H 0.0.0.0/0 via core1 via core2 Layer 3Layer 2 with vendor magic
    • Layer 2 vs Layer 3 • Layer 2: “data link layer” – Ethernet “bridge” • Layer 3: “network layer” – IP “router” A B C D E F G H E,F,G,H→ ← A, B, C, D E, F, G, H → ←A,B,C,D A B C D E F G H 0.0.0.0/0 via core1 via core2 Layer 3Layer 2 with vendor magic
    • Layer 2 Pros: • Plug’n’play Cons: • Everybody needs to be aware of everybody watch your MAC & ARP table sizes • No visibility of individual hops • Vendor magic or spanning tree • If you break STP and cause a loop game over • Broadcast packets hit everybody
    • Layer 3 Pros: • Horizontally scalable • Deterministic IP assignment scheme • 100% based on open standards & protocols • Can debug with ping / traceroute • Degrades better under fault or misconfig Cons: • Need to think up your IP assignment scheme ahead of time
    • 1G vs 10G • 1Gbps is today what 100Mbps was yesterday • New generation servers come with 10G LAN on board • At 1Gbps, the network is one of the first bottlenecks • A single HBase client can very easily push over 4Gbps to HBase 0 175 350 525 700 TeraGen in seconds 1Gbps 10Gbps 3.5x
    • Jumbo Frames Elephant Framed! by Dhinakaran Gajavarathan
    • Jumbo Frames 0 500 1000 1500 2000 Million packets per TeraGen MTU=1500 MTU=9212 3x reduction
    • Jumbo Frames 0 85 170 255 340 Context switches MTU=1500 MTU=9212 0 250 500 750 1000 Interrupts 11% 8% 0 125 250 375 500 Soft IRQ CPU Time 15% (millions) (millions) (thousand jiffies)
    • «Hardware» buffers by Roger Ferrer Ibáñez Deep Buffers
    • «Hardware» buffers by Roger Ferrer Ibáñez Deep Buffers 0 38 75 113 150 Packet buffer per 10G port (MB) Arista7500 Arista7500E IndustryAverage 1.5 48 128
    • 0 250000 500000 750000 1000000 Packets dropped per TeraGen 1MB 1:4 1MB 1:5.33 48MB 1:5.33 4x10G 1:4 3x10G 1:5.33 ... 16 hosts x 10G 16 hosts x 10G ... Deep Buffers
    • 0 250000 500000 750000 1000000 Packets dropped per TeraGen 1MB 1:4 1MB 1:5.33 48MB 1:5.33 Zero! 4x10G 1:4 3x10G 1:5.33 ... 16 hosts x 10G 16 hosts x 10G ... Deep Buffers
    • 0 250000 500000 750000 1000000 Packets dropped per TeraGen 1MB 1:4 1MB 1:5.33 48MB 1:5.33 Zero! 4x10G 1:4 3x10G 1:5.33 ... 16 hosts x 10G 16 hosts x 10G ... Deep Buffers
    • Machine Crashes
    • Machine Crashes
    • Machine Crashes
    • Machine Crashes
    • Machine Crashes
    • • Detect machine failures • Redirect traffic to the switch • Have the switch’s kernel send TCP resets • Immediately kills all existing and new flows 10.4.5.1 10.4.5.3 10.4.6.1 10.4.6.2 10.4.6.3 10.4.5.254 10.4.6.254 10.4.5.2 How Can a Modern Network Help?
    • 10.4.5.2 • Detect machine failures • Redirect traffic to the switch • Have the switch’s kernel send TCP resets • Immediately kills all existing and new flows 10.4.5.1 10.4.5.2 10.4.5.3 10.4.6.1 10.4.6.2 10.4.6.3 10.4.5.254 10.4.6.25410.4.5.210.4.5.2 EOS = Linux How Can a Modern Network Help?
    • 10.4.5.2 • Switch learns & tracks IP ↔ port mapping • Port down → take over IP and MAC addresses • Kicks in as soon as hardware notifies software of the port going down, within a few milliseconds • Port back up → resume normal operation • Ideal to prevent RegionServers stalling on WAL 10.4.5.1 10.4.5.2 10.4.5.3 10.4.6.1 10.4.6.2 10.4.6.3 10.4.5.254 10.4.6.25410.4.5.210.4.5.2 EOS = Linux Fast Server Failover
    • Conclusion • Start thinking of the network as yet another service that runs on Linux boxes • Build layer 3 networks based on standards • Use jumbo frames • Deep buffers prevent packet loss on oversubscribed uplinks • The network sits in the middle of everything; it can help. Expect to see more in that area.
    • Thank You
    • We’re hiring in SF, Santa Clara, Vancouver, and Bangalore Benoît“tsuna”Sigoure Member of the Yak Shaving Staff tsuna@aristanetworks.com@tsunanet