Cisco Hadoop Summit 2013

10,584 views

Published on

Cisco's presentations at Hadoop Summit 2013.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes

Comments are closed

  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
10,584
On SlideShare
0
From Embeds
0
Number of Embeds
8,620
Actions
Shares
0
Downloads
49
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workloadReduced spike with 10G and smoother job completion timeMultiple 1G or 10G links can be bonded together to not only increase bandwidth, but increase resiliency.
  • Talk about intensity of failure with smaller job vs bigger jobThe MAP job are executed parallel so unit time for each MAP tasks/node remains same and more less completes the job roughly at the same time. However during the failure, set of MAP task remains pending (since other nodes in the cluster are still completing their task) till ALL the node finishes the assigned tasks.Once all the node finishes their MAP task, the left over MAP task being reassigned by name node, the unit time it take to finish those sets of MAP task remain the same(linear) as the time it took to finish the other MAPs – its just happened to be NOT done in parallel thus it could double job completion time. This is the worst case scenario with Terasort, other workload may have variable completion time.
  • Cisco Hadoop Summit 2013

    1. 1. The Data Center and Hadoop Jacob Rapp, Cisco jarapp@cisco.com
    2. 2. Hadoop Considerations • Traffic Types, Job Patterns, Network Considerations, Compute Network Integration • Co-exist with current Data Center infrastructure • Open, Programmable and Application-Aware Networks Multi-tenancy • Remove the “Silo clusters” 2
    3. 3. 3
    4. 4. 4 Analyze Extract Transform Load (ETL) Explode Reduce Reduce Reduce Ingress vs. Egress Data Set 1:0.3 Ingress vs. Egress Data Set 1:1 Ingress vs. Egress Data Set 1:2 The Time the reducers start is dependent on: mapred.reduce.slowstart.co mpleted.maps It doesn’t change the amount of data sent to Reducers, but may change the timing to send that data
    5. 5. 5 Small Flows/Messaging (Admin Related, Heart-beats, Keep-alive, delay sensitive application messaging) Small – Medium Incast (Hadoop Shuffle) Large Flows (HDFS Ingest) Large Incast (Hadoop Replication)
    6. 6. 6 Many-to-Many Traffic Pattern Map 1 Map 2 Map NMap 3 Reducer 1 Reducer 2 Reducer 3 Reducer N HDFS Shuffle Output Replication NameNode JobTracker ZooKeeper
    7. 7. Analyze Simulated with Shakespeare Wordcount Extract Transform Load (ETL) Simulated with Yahoo TeraSort Extract Transform Load (ETL) Simulated with Yahoo TeraSort with output replication Job Patterns have varying impact on network utilization
    8. 8. 8
    9. 9. 9  Network Attributes  Architecture  Availability  Capacity, Scale & Oversubscription  Flexibility  Management & Visibility Integration Considerations
    10. 10. 10 Single 1GE 100% Utilized Dual 1GE 75% Utilized 10GE 40% Utilized Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload
    11. 11. • No single point of failure from network view point. No impact on job completion time • NIC bonding configured at Linux – with LACP mode of bonding • Effective load-sharing of traffic flow on two NICs. • Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-sharing 11
    12. 12. 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 373 385 397 409 421 433 445 457 469 481 493 505 517 529 541 553 565 577 589 601 613 625 637 649 661 673 685 697 709 721 733 745 757 769 781 793 JobCompletion CellUsage 1G Buffer Used 10G Buffer Used 1G Map % 1G Reduce % 10G Map % 10G Reduce % 1GE vs. 10GE Buffer Usage 12 Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer. By moving to 10GE, the data node has a wider pipe to receive data lessening the need for buffers on the network as the total aggregate transfer rate and amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities
    13. 13. Goals • Extensive Validation of Hadoop Workload • Reference Architecture Make it easy for Enterprise Demystify Network for Hadoop Deployment Integration with Enterprise with efficient choices of network topology/devices Findings • 10G and/or Dual attached server provides consistent job completion time & better buffer utilization • 10G provide reduce burst at the access layer • Dual Attached Sever is recommended design – 1G or 10G. 10G for future proofing • Rack failure has the biggest impact on job completion time • Does not require non-blocking network • Latency does not matter much in Hadoop workloads 13 http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-design http://youtu.be/YJODsK0T67A More Details From Hadoop Summit 2012 at:
    14. 14. 14
    15. 15. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15 n3548-001# show interface brief -------------------------------------------------------------------------------- Ethernet VLAN Type Mode Status Reason Speed Port Interface Ch # -------------------------------------------------------------------------------- Eth1/1 1 eth access up none 10G(D) -- Eth1/2 1 eth access up none 10G(D) -- Eth1/3 1 eth access up none 10G(D) -- Eth1/4 1 eth access up none 10G(D) -- Eth1/5 1 eth access up none 10G(D) –- . . Eth1/33 1 eth access up none 10G(D) -- Eth1/34 1 eth access up none 10G(D) -- Eth1/35 1 eth access down SFP not inserted 10G(D) -- Eth1/36 1 eth access down SFP not inserted 10G(D) -- Eth1/37 1 eth access down Administratively down 10G(D) – .
    16. 16. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16 n3548-001# show mac address-table dynamic Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since first seen,+ - primary entry using vPC Peer- Link VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+---------------- -- * 1 e8b7.484d.a208 dynamic 60570 F F Eth1/31 * 1 e8b7.484d.a20a dynamic 60560 F F Eth1/31 * 1 e8b7.484d.a73e dynamic 60560 F F Eth1/34 * 1 e8b7.484d.a740 dynamic 60560 F F Eth1/34 * 1 e8b7.484d.ad15 dynamic 60560 F F Eth1/28 * 1 e8b7.484d.ad17 dynamic 60560 F F Eth1/28 * 1 e8b7.484d.b3e9 dynamic 60570 F F Eth1/25 * 1 e8b7.484d.b3eb dynamic 60560 F F Eth1/25 . . MAC Addresses of the connected devices … and the port they are on…
    17. 17. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17 n3548-001# portServerMap ======================================= Port Server FQDN --------------------------------------- Eth1/1 c200-m2-10g2-001.cluster10g.com Eth1/2 c200-m2-10g2-002.cluster10g.com Eth1/3 c200-m2-10g2-003.cluster10g.com Eth1/4 c200-m2-10g2-004.cluster10g.com Eth1/5 c200-m2-10g2-005.cluster10g.com Eth1/6 c200-m2-10g2-006.cluster10g.com Eth1/7 c200-m2-10g2-031.cluster10g.com Eth1/8 c200-m2-10g2-008.cluster10g.com Eth1/9 c200-m2-10g2-009.cluster10g.com Eth1/11 c200-m2-10g2-011.cluster10g.com . . .
    18. 18. © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18 n3548-001# trackerList =========================================== Port Server Server Port ------------------------------------------- Eth1/2 c200-m2-10g2-002 50544 Eth1/3 c200-m2-10g2-003 41909 Eth1/4 c200-m2-10g2-004 36480 Eth1/5 c200-m2-10g2-005 38179 Eth1/6 c200-m2-10g2-006 51375 Eth1/7 c200-m2-10g2-031 41915 Eth1/8 c200-m2-10g2-008 50983 Eth1/9 c200-m2-10g2-009 37056 Eth1/11 c200-m2-10g2-011 35882 Eth1/12 c200-m2-10g2-012 44551 . . .
    19. 19. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19 n3548-001# bufferServerMap =================================================================== Port Server 1sec 5sec 60sec 5min 1hr ------------------------------------------------------------------- Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 1536KB 2304KB 2304KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1536KB 1536KB Eth1/4 c200-m2-10g2-004 384KB 384KB 2304KB 2304KB 2304KB Eth1/5 c200-m2-10g2-005 384KB 384KB 768KB 1536KB 1536KB Eth1/6 c200-m2-10g2-006 384KB 2304KB 2304KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 384KB 384KB 3456KB 3840KB 3840KB Eth1/8 c200-m2-10g2-008 768KB 768KB 2688KB 2688KB 2688KB Eth1/9 c200-m2-10g2-009 384KB 384KB 2304KB 2304KB 2304KB Eth1/11 c200-m2-10g2-011 384KB 384KB 1920KB 1920KB 1920KB . . . Eth1/1(c200-m2-10g2-001) has 0 buffer usage because it’s the name node
    20. 20. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20 n3548-001# jobsBuffer Hadoop Job Info ... =================================================================== 1 jobs currently running JobId RunTime(secs) User Priority job_201306131423_0009 120 hadoop NORMAL =================================================================== Buffer Info - Per Port Port Server 1sec 5sec 60sec 5min 1hr ------------------------------------------------------------------- Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 768KB 768KB 768KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1152KB 1152KB Eth1/4 c200-m2-10g2-004 384KB 1536KB 1536KB 1536KB 1536KB Eth1/5 c200-m2-10g2-005 384KB 768KB 1152KB 1152KB 1152KB . . What jobs were running during peak buffer usage … and for how long were they running
    21. 21. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21 n3548-001(config)# jobsBuffer Hadoop Job Info ... =================================================================== 0 jobs currently running JobId RunTime(secs) User Priority =================================================================== Buffer Info - Per Port Port Server 1sec 5sec 60sec 5min 1hr ------------------------------------------------------------------- Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 0KB 0KB 0KB 1920KB 1920KB Eth1/3 c200-m2-10g2-003 0KB 0KB 0KB 2304KB 2304KB Eth1/4 c200-m2-10g2-004 0KB 0KB 0KB 2688KB 2688KB Eth1/5 c200-m2-10g2-005 0KB 0KB 0KB 2304KB 2304KB Eth1/6 c200-m2-10g2-006 0KB 0KB 0KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 0KB 0KB 0KB 1920KB 2688KB . Historic look at the buffer usage …
    22. 22. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
    23. 23. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
    24. 24. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
    25. 25. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25 Buffer Usage Shuffle Replication Reduce Map 0 60 120 180 240 300 360 420 480 540 600 660 720 780
    26. 26. © 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26 (Python Socket) Push Data Push Data Push Data PTP Grandmaster (OPTIONAL) Analyze github.com/datacenter
    27. 27. 27
    28. 28. 28  Hadoop + HBASE  Job Based  Department Based Various Multitenant Environments Need to understand Traffic Patterns Scheduling Dependent Permissions and Scheduling Dependent
    29. 29. 29 Map 1 Map 2 Map NMap 3 Reducer 1 Reducer 2 Reducer 3 Reducer N HDFS Shuffle Output Replication Region Server Region Server Client Client Major Compaction Read Read Read Update Update Read Major Compaction
    30. 30. 30 Hbase During Major Compaction 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Latency(us) Time UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us) Read/Update Latency Comparison of Non- QoS vs. QoS Policy ~45% for Read Improvement Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations
    31. 31. Switch Buffer Usage With Network QoS Policy to prioritize Hbase Update/Read Operations 0 5000 10000 15000 20000 25000 30000 35000 40000 Latency(us) Time UPDATE - Average Latency (us) READ - Average Latency (us) QoS - UPDATE - Average Latency (us) QoS - READ - Average Latency (us) 1 70 139 208 277 346 415 484 553 622 691 760 829 898 967 1036 1105 1174 1243 1312 1381 1450 1519 1588 1657 1726 1795 1864 1933 2002 2071 2140 2209 2278 2347 2416 2485 2554 2623 2692 2761 2830 2899 2968 3037 3106 3175 3244 3313 3382 3451 3520 3589 3658 3727 3796 3865 3934 4003 4072 4141 4210 4279 4348 4417 4486 4555 4624 4693 4762 4831 4900 4969 5038 5107 5176 5245 5314 5383 5452 5521 5590 5659 5728 5797 5866 5935 BufferUsed Timeline HadoopTeraSort Hbase Hbase + Hadoop Map Reduce Read/Update Latency Comparison of Non- QoS vs. QoS Policy ~60% for Read Improvement
    32. 32. Cisco Unified Data Center UNIFIED FABRIC UNIFIED COMPUTING Highly Scalable, Secure Network Fabric Modular Stateless Computing Elements UNIFIED MANAGEMENT Automated Management THANK YOU FOR LISTENING www.cisco.com/go/ucswww.cisco.com/go/nexus http://www.cisco.com/go/wor kloadautomation Manages Enterprise Workloads Cisco.com Big Data www.cisco.com/go/bigdata Data Center Script Examples from Presentation: github.com/datacenter

    ×