Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving Hadoop Cluster Performance via Linux Configuration

4,152 views

Published on

Published in: Technology
  • Authoritative/original deck: http://www.slideshare.net/technmsg/improving-hadoop-performancevialinux
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Improving Hadoop Cluster Performance via Linux Configuration

  1. 1. Improving Hadoop Cluster Performance via Linux Configuration 2014 Hadoop Summit – San Jose, California Alex Moundalexis alexm at clouderagovt.com @technmsg
  2. 2. 2 Tips from a Former SA
  3. 3. Click to edit Master title style CC BY 2.0 / Richard Bumgardner Been there, done that.
  4. 4. 4 Tips from a Former SA Field Guy
  5. 5. Click to edit Master title style CC BY 2.0 / Alex Moundalexis Home sweet home.
  6. 6. 6 Tips from a Former SA Field Guy Easy steps to take…
  7. 7. 7 Tips from a Former SA Field Guy Easy steps to take… that most people don’t.
  8. 8. What This Talk Isn’t About • Deploying • Puppet, Chef, Ansible, homegrown scripts, intern labor • Sizing & Tuning • Depends heavily on data and workload • Coding • Unless you count STDOUT redirection • Algorithms • I suck at math, but we’ll try some multiplication later 8
  9. 9. 9 “ The answer to most Hadoop questions is it depends.”
  10. 10. So What ARE We Talking About? • Seven simple things • Quick • Safe • Viable for most environments and use cases • Identify issue, then offer solution • Note: Commands run as root or sudo 10
  11. 11. 11 Bad news, best not to… 1. Swapping
  12. 12. Swapping • A form of memory management • When OS runs low on memory… • write blocks to disk • use now-free memory for other things • read blocks back into memory from disk when needed • Also known as paging 12
  13. 13. Swapping • Problem: Disks are slow, especially to seek • Hadoop is about maximizing IO • spend less time acquiring data • operate on data in place • large streaming reads/writes from disk • Memory usage is limited within JVM • we should be able to manage our memory 13
  14. 14. Disable Swap in Kernel • Well, as much as possible. • Immediate: # echo 0 > /proc/sys/vm/swappiness • Persist after reboot: # echo “vm.swappiness = 0” >> /etc/sysctl.conf 14
  15. 15. Swapping Peculiarities • Behavior varies based on Linux kernel • CentOS 6.4+ / Ubuntu 10.10+ • For you kernel gurus, that’s Linux 2.6.32-303+ • Prior • We don’t swap, except to avoid OOM condition. • After • We don’t swap, ever. • Details: http://tiny.cloudera.com/noswap 15
  16. 16. 16 Disable this too. 2. File Access Time
  17. 17. File Access Time • Linux tracks access time • writes to disk even if all you did was read • Problem • more disk seeks • HDFS is write-once, read-many • NameNode tracks access information for HDFS 17
  18. 18. Don’t Track Access Time • Mount volumes with noatime option • In /etc/fstab: /dev/sdc /data01 ext3 defaults,noatime 0 • Note: noatime assumes nodirtime as well • What about relatime? • Faster than atime but slower than noatime • No reboot required • # mount -o remount /data01 18
  19. 19. 19 Reclaim it, impress your bosses! 3. Root Reserved Space
  20. 20. Root Reserved Space • EXT3/4 reserve 5% of disk for root-owned files • On an OS disk, sure • System logs, kernel panics, etc 20
  21. 21. Click to edit Master title style CC BY 2.0 / Alex Moundalexis Disks used to be much smaller, right?
  22. 22. Do The Math • Conservative • 5% of 1 TB disk = 46 GB • 5 data disks per server = 230 GB • 5 servers per rack = 1.15 TB • Quasi-Aggressive • 5% of 4 TB disk = 186 GB • 12 data disks per server = 2.23 TB • 18 servers per rack = 40.1 TB • That’s a LOT of unused storage! 22
  23. 23. Root Reserved Space • On a Hadoop data disk, no root-owned files • When creating a partition # mkfs.ext3 –m 0 /dev/sdc • On existing partitions # tune2fs -m 0 /dev/sdc • 0 is safe, 1 is for the ultra-paranoid 23
  24. 24. 24 Turn it on, already! 4. Name Service Cache Daemon
  25. 25. Name Service Cache Daemon • Daemon that caches name service requests • Passwords • Groups • Hosts • Helps weather network hiccups • Helps more with high latency LDAP, NIS, NIS+ • Small footprint • Zero configuration required 25
  26. 26. Name Service Cache Daemon • Hadoop nodes • largely a network-based application • on the network constantly • issue lots of DNS lookups, especially HBase & distcp • can thrash DNS servers • Reducing latency of service requests? Smart. • Reducing impact on shared infrastructure? Smart. 26
  27. 27. Name Service Cache Daemon • Turn it on, let it work, leave it alone: # chkconfig --level 345 nscd on # service nscd start • Check on it later: # nscd -g • Unless using Red Hat SSSD; modify ncsd config first! • Don’t use nscd to cache passwd, group, or netgroup • Red Hat, Using NSCD with SSSD. http://goo.gl/68HTMQ 27
  28. 28. 28 Not a problem, until they are. 5. File Handle Limits
  29. 29. File Handle Limits • Kernel refers to files via a handle • Also called descriptors • Linux is a multi-user system • File handles protect the system from • Poor coding • Malicious users • Pictures of cats on the Internet 29
  30. 30. 30 Microsoft Office EULA. Really. java.io.FileNotFoundException: (Too many open files)
  31. 31. File Handle Limits • Linux defaults usually not enough • Increase maximum open files (default 1024) # echo hdfs – nofile 32768 >> /etc/security/limits.conf # echo mapred – nofile 32768 >> /etc/security/limits.conf # echo hbase – nofile 32768 >> /etc/security/limits.conf • Bonus: Increase maximum processes too # echo hdfs – nproc 32768 >> /etc/security/limits.conf # echo mapred – nproc 32768 >> /etc/security/limits.conf # echo hbase – nproc 32768 >> /etc/security/limits.conf • Note: Cloudera Manager will do this for you. 31
  32. 32. 32 Don’t be tempted to share, even on monster disks. 6. Dedicated Disk for OS and Logs
  33. 33. The Situation in Easy Steps 1. Your new server has a dozen 1 TB disks 2. Eleven disks are used to store data 3. One disk is used for the OS • 20 GB for the OS • 980 GB sits unused 4. Someone asks “can we store data there too?” 5. Seems reasonable, lots of space… “OK, why not.” Sound familiar? 33
  34. 34. 34 Microsoft Office EULA. Really. I don’t understand it, there’s no consistency to these run times!
  35. 35. No Love for Shared Disk • Our quest for data gets interrupted a lot: • OS operations • OS logs • Hadoop logging, quite chatty • Hadoop execution • userspace execution • Disk seeks are slow, remember? 35
  36. 36. Dedicated Disk for OS and Logs • At install time • Disk 0, OS & logs • Disk 1-n, Hadoop data • After install, more complicated effort, requires manual HDFS block rebalancing: 1. Take down HDFS • If you can do it in under 10 minutes, just the DataNode 2. Move or distribute blocks from disk0/dir to disk[1-n]/dir 3. Remove dir from HDFS config (dfs.data.dir) 4. Start HDFS 36
  37. 37. 37 Sane, both forward and reverse. 7. Name Resolution
  38. 38. Name Resolution Options 1. Hosts file, if you must 2. DNS, much preferred 38
  39. 39. Name Resolution with Hosts File • Set canonical names properly • Right 10.1.1.1 r01m01.cluster.org r01m01 master1 10.1.1.2 r01w01.cluster.org r01w01 worker1 • Wrong 10.1.1.1 r01m01 r01m01.cluster.org master1 10.1.1.2 r01w01 r01w01.cluster.org worker1 39
  40. 40. Name Resolution with Hosts File • Set loopback address properly • Ensure 127.0.0.1 resolves to localhost, NOT hostname • Right 127.0.0.1 localhost • Wrong 127.0.0.1 r01m01 40
  41. 41. Name Resolution with DNS • Forward • Reverse • Hostname should MATCH the FQDN in DNS 41
  42. 42. This Is What You Ought to See 42
  43. 43. Name Resolution Errata • Mismatches? Expect odd results. • Problems starting DataNodes • Non-FQDN in Web UI links • Security features are extra sensitive to FQDN • Errors so common that link to FAQ is included in logs! • http://wiki.apache.org/hadoop/UnknownHost • Get name resolution working BEFORE enabling nscd! 43
  44. 44. 44 Time to take out your camera phones… Summary
  45. 45. Summary 1. disable vm.swappiness 2. data disks: mount with noatime option 3. data disks: disable root reserve space 4. enable nscd 5. increase file handle limits 6. use dedicated OS/logging disk 7. sane name resolution http://tiny.cloudera.com/7steps 45
  46. 46. Recommended Reading • Hadoop Operations http://amzn.to/1hDaN9B 46
  47. 47. 47 Preferably related to the talk… Questions?
  48. 48. 48 Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.
  49. 49. 49 Because we had enough time… 8. Bonus Round
  50. 50. Others Things to Check • Disk IO • hdparm • # hdparm -Tt /dev/sdc • Looking for at least 70 MB/s from 7200 RPM disks • Slower could indicate a failing drive, disk controller, array, etc. • dd • http://romanrm.ru/en/dd-benchmark 50
  51. 51. Others Things to Check • Disable Red Hat Transparent Huge Pages (RH6+ Only) • Can reduce elevated CPU usage • In rc.local: echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled • Reference: Linux 6 Transparent Huge Pages and Hadoop Workloads, http://goo.gl/WSF2qC 51
  52. 52. Others Things to Check • Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-20% 52
  53. 53. Others Things to Check • Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-20% • Monitor Everything • How else will you know what’s happening? • Nagios, Ganglia, CM, Ambari 53
  54. 54. 54 Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.

×