Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Increase Performance of Your Hadoop Cluster


Published on

A presentation made by Altoros and Joyent together at the NoSQL Now! 2013 conference.

Published in: Technology
  • Be the first to comment

How to Increase Performance of Your Hadoop Cluster

  1. 1. NoSQL Now! Aug 21, 2013 Ben Wen, Joyent Renat Khasanshyn, Altoros
  2. 2. About Joyent  The high-performance public cloud infrastructure provider  Cloud IaaS Virtual Machines:  Linux, Windows, BSD, SmartOS (fka Solaris) with Zones  Core founding sponsors of Node.js  Four global datacenters  Key markets:  Big data, mobile, e-commerce, finsvc, SaaS  Open Source contributions:  Node.js, KVM, DTrace, ZFS, SmartOS
  3. 3. 4  Running bare-metal only practical for some organizations  Performance varies significantly across various job types  In fact, for many jobs, less = more  Utilization of most clusters in production is low  Optimizing Hadoop/MapReduce performance is hard
  4. 4. 5  Get upset when truth comes out!  Biased (to the shiny side of the coin)  Often add controversy and confusion
  5. 5. 6 - For Hadoop, what is the impact of Container-based virtualization vs Hardware emulation (KVM)* - What are the Hadoop optimization strategies? Is there a “rule of thumb” when it comes to determining the optimization approach? - What are the optimal Hadoop cluster settings for 1TB TeraSort benchmark on 100 and 400 node clusters running Linux and SmartOS on the Joyent Public Cloud?
  6. 6. 7 Physical (disks, cpu, network) OS/Hypervisor (especially for virtualized environments) Hadoop/MapReduce (tons of settings) Algorithmic (data structures, join strategies, big-O…) Implementation (code efficiency, architecture decisions that fit all other factors)
  7. 7. 8 Open source Unix operating system based on the active fork of Open Solaris technology (illumos) for the cloud. Uses containerized OS virtualization, called Zones (think a mature LXC with secure RBAC and auditing) operating system based on the Debian Linux distribution and distributed as free and open source software. Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. Derived from Google's MapReduce and Google File System (GFS) papers, Hadoop enables applications to work with thousands of computation- independent computers and petabytes of data.
  8. 8. 9 Written by Opscode and released as open source under the Apache License 2.0., Chef is a DevOps tool used for configuring cloud services or to streamline the task of configuring a company's internal servers. Chef automatically sets up and tweaks the operating systems and programs that run in massive data centers. Developed by creators of the Starfish project from Duke University, Unravel brings run-time profiling of Hadoop jobs followed by a cost-based database query optimization. Unravel connects to streams of Hadoop and system instrumentation data, and applies statistical machine learning to optimize cost of Hadoop jobs and increase cluster utilization.
  9. 9. 1 0 Comparing I/O Path on Bare Metal Unix Vs Zones Vs KVM • Code path is essentially the same as bare metal • Zones partition at the OS level • Performance is higher • KVM is encapsulated by hypervisor • Code path is much more circuitous in a KVM process. • Performance is impacted Bare-metal OS Virtualization Kernel Virtualization
  10. 10. 1 1 No over head for Zones: Stack traces show how a network packet is transmitted from: Bare Metal vs Joyent Zone vs Fedora VM on KVM Bare Metal Joyent Zone (aka SmartMachine) Fedora VM on KVM VM Start Start Start 1 kernel`start_xmit 2 kernel`dtrace_int3_handler+0xd2 3 kernel`kmem_cache_free+0x2f 4 kernel`dtrace_int3+0x3a 5 kernel`eth_header 6 kernel`__kfree_skb+0x47 7 kernel`start_xmit+0x1 8 kernel`dev_hard_start_xmit+0x322 9 kernel`sch_direct_xmit+0xef 10 kernel`dev_queue_xmit+0x184 11 kernel`eth_header+0x3a 12 kernel`neigh_resolve_output+0x11e 13 kernel`nf_hook_slow+0x75 14 kernel`ip_finish_output 15 kernel`ip_finish_output+0x17e 16 kernel`ip_output+0x98 17 kernel`__ip_local_out+0xa4 18 kernel`ip_local_out+0x29 19 kernel`ip_queue_xmit+0x14f 20 kernel`tcp_transmit_skb+0x3e4 21 kernel`__kmalloc_node_track_caller+0x185 22 kernel`sk_stream_alloc_skb+0x41 23 kernel`tcp_write_xmit+0xf7 24 kernel`__alloc_skb+0x8c 25 kernel`__tcp_push_pending_frames+0x26 26 kernel`tcp_sendmsg+0x895 27 kernel`inet_sendmsg+0x64 28 kernel`sock_aio_write+0x13a 29 kernel`do_sync_write+0xd2 30 kernel`security_file_permission+0x2c 31 kernel`rw_verify_area+0x61 32 kernel`vfs_write+0x16d 33 kernel`sys_write+0x4a 34 kernel`sys_rt_sigprocmask+0x84 35 kernel`system_call_fastpath+0x16 36 igb`igb_tx_ring_send+0x33 37 mac`mac_hwring_tx+0x1d 38 mac`mac_tx_send+0x5dc 39 mac`mac_tx_single_ring_mode+0x6e mac`mac_tx+0xda mac`mac_tx+0xda mac`mac_tx+0xda dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53 ip`ip_xmit+0x82d ip`ip_xmit+0x82d ip`ip_xmit+0x82d ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9 ip`conn_ip_output+0x190 ip`conn_ip_output+0x190 ip`conn_ip_output+0x190 ip`tcp_send_data+0x59 ip`tcp_send_data+0x59 ip`tcp_send_data+0x59 ip`tcp_output+0x58c ip`tcp_output+0x58c ip`tcp_output+0x58c ip`squeue_enter+0x426 ip`squeue_enter+0x426 ip`squeue_enter+0x426 ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48 sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c genunix`fop_write+0x8b genunix`fop_write+0x8b genunix`fop_write+0x8b genunix`write+0x250 genunix`write+0x250 genunix`write+0x250 genunix`write32+0x1e genunix`write32+0x1e genunix`write32+0x1e unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x149 Skips stepping through 39 functions required when Fedora is running on KVM/qemu Note that a Joyent Zone is exactly the same as “Bare Metal”
  11. 11. Three identical Apache Hadoop 1.0.4 clusters were provisioned on Joyent infrastructure using Joyent REST API and Opscode Chef Each cluster was tweaked for optimal performance following best practices for TeraSort benchmark.
  12. 12. 13 A custom script launches virtual machines using Joyent API and stores information about them in a json file.
  13. 13. 14 Each machine in cluster is being configured according to its role in cluster using Chef cookbooks.
  14. 14. 15 As part of TeraSort benchmark a dataset is generated using TeraGen utility included in Apache Hadoop.
  15. 15. 16 On one of the nodes a Hadoop TeraSort job using previously generated dataset is submitted.
  16. 16. 17 See: Hadoop job_201210261134_0010 on hadoop-smartos-r-1.html The key difference between the two clusters was unveiled when monitoring I/O and CPU utilization. Ubuntu cluster was spending too much time in OS kernel while performing I/O operations as demonstrated on Figure 1.
  17. 17. SmartOS cluster was using CPU much more efficiently and was able to utilize larger number of Hadoop mappers and reducers, key configuration parameters for Hadoop:
  18. 18. 20
  19. 19. 21
  20. 20. 22
  21. 21. The key difference between the clusters was unveiled when monitoring I/O and CPU utilization. Ubuntu cluster was spending too much time in OS kernel while performing I/O (for copies of config files and job reports – email
  22. 22. 24 1) Basic cluster configuration is key (one time effort for typical workloads) DATA DISK SCALING COMPRESSION JVM REUSE POLICY HDFS BLOCK SIZE MAP-SIDE SPILLS COPY/SHUFFLE PHASE TUNING REDUCE-SIDE SPILLS 2) Tune the number of map and reduce tasks appropriately 3) Consider GPU for some workloads
  23. 23. 25 • Forthcoming in October • Includes cloud performance • Co-author DTrace book • More here on his techniques: •
  24. 24. 26 Thank you! Ben Wen: Renat Khasanshyn: @renatco (650) 395-7002