• Save
Improving Hadoop Cluster Performance via Linux Configuration
Upcoming SlideShare
Loading in...5
×
 

Improving Hadoop Cluster Performance via Linux Configuration

on

  • 822 views

 

Statistics

Views

Total Views
822
Views on SlideShare
818
Embed Views
4

Actions

Likes
8
Downloads
0
Comments
1

2 Embeds 4

https://www.linkedin.com 3
http://mym.corp.yahoo.co.jp 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Authoritative/original deck: http://www.slideshare.net/technmsg/improving-hadoop-performancevialinux
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • If you’re in the audience and you’re reading this, the gods of multi-display haven’t been offered their tithe lately. <br /> <br /> Panic.
  • Then: Years racking, stacking, cabling, loading, scripting, tickets, etc. <br />
  • Now: I deal with clusters. In customer spaces. Advising. Consulting. Guiding.
  • Now: I deal with clusters. In customer spaces. Advising. Consulting. Guiding.
  • Much of it isn’t hard. Well documented.
  • I see a lot of clusters. As simple as this stuff is, it’s rarely done. <br /> <br /> We get excited. We don’t sweat the small stuff. We forget.
  • Who said that? ME, to my customers, ALL THE TIME. <br /> <br /> There are many configuration options for Hadoop, BUT I choose these seven items because they’ll help no matter your environment.
  • If you remember the early days of Windows multitasking, you could hear your hard drive chunking along as it tried to switch windows. <br /> <br /> Page files, swap partitions.
  • You could also use sysctl for this if you’re more comfortable.
  • 0 will swap only to avoid an out of memory condition, but then we have other issues <br /> <br /> On RHEL/CentOS kernels > 2.6.32-303, 0 actually means zero. <br /> http://www.mysqlperformanceblog.com/2014/04/28/oom-relation-vm-swappiness0-new-kernel/
  • More time we aren’t grabbing the data we want. <br /> <br /> Redundant.
  • relatime maintains atime data, but not for every time the file is accessed
  • 5% of a disk wasn’t a lot back in the day <br /> <br /> Explain root owned files
  • 30 years ago, hard drive space was far more limited…
  • 5% of a disk wasn’t a lot back in the day <br /> <br /> # tune2fs -l /dev/sdc <br /> Look for “Reserved block count” <br /> Division is involved TODO <br />
  • Hadoop has the tendency to BREAK things.
  • Inconsistent processing, disk seeks for OS writes
  • Inconsistent processing, disk seeks for OS writes
  • Inconsistent processing, disk seeks for OS writes <br /> <br /> By default, dfs.namenode.recheck-internal is set to 10.5 minutes… that’s how long you have until the NameNode thinks a DataNode is dead, after that it will start re-replicating blocks.
  • There IS a third option, which isn’t on the slide FOR A REASON. IP, all the time.
  • The way a DataNode determines its own hostname is particularly convoluted, and takes up three full pages of text in Hadoop Operations (ch 4). <br /> <br /> A DataNode reports its hostname to the NN, which then uses that hostname in msgs to other nodes. If it’s wrong/non-FQDN, things break.
  • If you haven’t understood any of what I just said, but have someone at the home office who would, the next slide is for you. Get your camera out.
  • Pause for pictures, water, laughter.
  • Jumbo frames are at point of contention between engineers, some swear there’s an impact, others say no.
  • Jumbo frames are at point of contention between engineers, some swear there’s an impact, others say no.

Improving Hadoop Cluster Performance via Linux Configuration Improving Hadoop Cluster Performance via Linux Configuration Presentation Transcript

  • Improving Hadoop Cluster Performance via Linux Configuration 2014 Hadoop Summit – San Jose, California Alex Moundalexis alexm at clouderagovt.com @technmsg
  • 2 Tips from a Former SA
  • Click to edit Master title style CC BY 2.0 / Richard Bumgardner Been there, done that.
  • 4 Tips from a Former SA Field Guy
  • Click to edit Master title style CC BY 2.0 / Alex Moundalexis Home sweet home.
  • 6 Tips from a Former SA Field Guy Easy steps to take…
  • 7 Tips from a Former SA Field Guy Easy steps to take… that most people don’t.
  • What This Talk Isn’t About • Deploying • Puppet, Chef, Ansible, homegrown scripts, intern labor • Sizing & Tuning • Depends heavily on data and workload • Coding • Unless you count STDOUT redirection • Algorithms • I suck at math, but we’ll try some multiplication later 8
  • 9 “ The answer to most Hadoop questions is it depends.”
  • So What ARE We Talking About? • Seven simple things • Quick • Safe • Viable for most environments and use cases • Identify issue, then offer solution • Note: Commands run as root or sudo 10
  • 11 Bad news, best not to… 1. Swapping
  • Swapping • A form of memory management • When OS runs low on memory… • write blocks to disk • use now-free memory for other things • read blocks back into memory from disk when needed • Also known as paging 12
  • Swapping • Problem: Disks are slow, especially to seek • Hadoop is about maximizing IO • spend less time acquiring data • operate on data in place • large streaming reads/writes from disk • Memory usage is limited within JVM • we should be able to manage our memory 13
  • Disable Swap in Kernel • Well, as much as possible. • Immediate: # echo 0 > /proc/sys/vm/swappiness • Persist after reboot: # echo “vm.swappiness = 0” >> /etc/sysctl.conf 14
  • Swapping Peculiarities • Behavior varies based on Linux kernel • CentOS 6.4+ / Ubuntu 10.10+ • For you kernel gurus, that’s Linux 2.6.32-303+ • Prior • We don’t swap, except to avoid OOM condition. • After • We don’t swap, ever. • Details: http://tiny.cloudera.com/noswap 15
  • 16 Disable this too. 2. File Access Time
  • File Access Time • Linux tracks access time • writes to disk even if all you did was read • Problem • more disk seeks • HDFS is write-once, read-many • NameNode tracks access information for HDFS 17
  • Don’t Track Access Time • Mount volumes with noatime option • In /etc/fstab: /dev/sdc /data01 ext3 defaults,noatime 0 • Note: noatime assumes nodirtime as well • What about relatime? • Faster than atime but slower than noatime • No reboot required • # mount -o remount /data01 18
  • 19 Reclaim it, impress your bosses! 3. Root Reserved Space
  • Root Reserved Space • EXT3/4 reserve 5% of disk for root-owned files • On an OS disk, sure • System logs, kernel panics, etc 20
  • Click to edit Master title style CC BY 2.0 / Alex Moundalexis Disks used to be much smaller, right?
  • Do The Math • Conservative • 5% of 1 TB disk = 46 GB • 5 data disks per server = 230 GB • 5 servers per rack = 1.15 TB • Quasi-Aggressive • 5% of 4 TB disk = 186 GB • 12 data disks per server = 2.23 TB • 18 servers per rack = 40.1 TB • That’s a LOT of unused storage! 22
  • Root Reserved Space • On a Hadoop data disk, no root-owned files • When creating a partition # mkfs.ext3 –m 0 /dev/sdc • On existing partitions # tune2fs -m 0 /dev/sdc • 0 is safe, 1 is for the ultra-paranoid 23
  • 24 Turn it on, already! 4. Name Service Cache Daemon
  • Name Service Cache Daemon • Daemon that caches name service requests • Passwords • Groups • Hosts • Helps weather network hiccups • Helps more with high latency LDAP, NIS, NIS+ • Small footprint • Zero configuration required 25
  • Name Service Cache Daemon • Hadoop nodes • largely a network-based application • on the network constantly • issue lots of DNS lookups, especially HBase & distcp • can thrash DNS servers • Reducing latency of service requests? Smart. • Reducing impact on shared infrastructure? Smart. 26
  • Name Service Cache Daemon • Turn it on, let it work, leave it alone: # chkconfig --level 345 nscd on # service nscd start • Check on it later: # nscd -g • Unless using Red Hat SSSD; modify ncsd config first! • Don’t use nscd to cache passwd, group, or netgroup • Red Hat, Using NSCD with SSSD. http://goo.gl/68HTMQ 27
  • 28 Not a problem, until they are. 5. File Handle Limits
  • File Handle Limits • Kernel refers to files via a handle • Also called descriptors • Linux is a multi-user system • File handles protect the system from • Poor coding • Malicious users • Pictures of cats on the Internet 29
  • 30 Microsoft Office EULA. Really. java.io.FileNotFoundException: (Too many open files)
  • File Handle Limits • Linux defaults usually not enough • Increase maximum open files (default 1024) # echo hdfs – nofile 32768 >> /etc/security/limits.conf # echo mapred – nofile 32768 >> /etc/security/limits.conf # echo hbase – nofile 32768 >> /etc/security/limits.conf • Bonus: Increase maximum processes too # echo hdfs – nproc 32768 >> /etc/security/limits.conf # echo mapred – nproc 32768 >> /etc/security/limits.conf # echo hbase – nproc 32768 >> /etc/security/limits.conf • Note: Cloudera Manager will do this for you. 31
  • 32 Don’t be tempted to share, even on monster disks. 6. Dedicated Disk for OS and Logs
  • The Situation in Easy Steps 1. Your new server has a dozen 1 TB disks 2. Eleven disks are used to store data 3. One disk is used for the OS • 20 GB for the OS • 980 GB sits unused 4. Someone asks “can we store data there too?” 5. Seems reasonable, lots of space… “OK, why not.” Sound familiar? 33
  • 34 Microsoft Office EULA. Really. I don’t understand it, there’s no consistency to these run times!
  • No Love for Shared Disk • Our quest for data gets interrupted a lot: • OS operations • OS logs • Hadoop logging, quite chatty • Hadoop execution • userspace execution • Disk seeks are slow, remember? 35
  • Dedicated Disk for OS and Logs • At install time • Disk 0, OS & logs • Disk 1-n, Hadoop data • After install, more complicated effort, requires manual HDFS block rebalancing: 1. Take down HDFS • If you can do it in under 10 minutes, just the DataNode 2. Move or distribute blocks from disk0/dir to disk[1-n]/dir 3. Remove dir from HDFS config (dfs.data.dir) 4. Start HDFS 36
  • 37 Sane, both forward and reverse. 7. Name Resolution
  • Name Resolution Options 1. Hosts file, if you must 2. DNS, much preferred 38
  • Name Resolution with Hosts File • Set canonical names properly • Right 10.1.1.1 r01m01.cluster.org r01m01 master1 10.1.1.2 r01w01.cluster.org r01w01 worker1 • Wrong 10.1.1.1 r01m01 r01m01.cluster.org master1 10.1.1.2 r01w01 r01w01.cluster.org worker1 39
  • Name Resolution with Hosts File • Set loopback address properly • Ensure 127.0.0.1 resolves to localhost, NOT hostname • Right 127.0.0.1 localhost • Wrong 127.0.0.1 r01m01 40
  • Name Resolution with DNS • Forward • Reverse • Hostname should MATCH the FQDN in DNS 41
  • This Is What You Ought to See 42
  • Name Resolution Errata • Mismatches? Expect odd results. • Problems starting DataNodes • Non-FQDN in Web UI links • Security features are extra sensitive to FQDN • Errors so common that link to FAQ is included in logs! • http://wiki.apache.org/hadoop/UnknownHost • Get name resolution working BEFORE enabling nscd! 43
  • 44 Time to take out your camera phones… Summary
  • Summary 1. disable vm.swappiness 2. data disks: mount with noatime option 3. data disks: disable root reserve space 4. enable nscd 5. increase file handle limits 6. use dedicated OS/logging disk 7. sane name resolution http://tiny.cloudera.com/7steps 45
  • Recommended Reading • Hadoop Operations http://amzn.to/1hDaN9B 46
  • 47 Preferably related to the talk… Questions?
  • 48 Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.
  • 49 Because we had enough time… 8. Bonus Round
  • Others Things to Check • Disk IO • hdparm • # hdparm -Tt /dev/sdc • Looking for at least 70 MB/s from 7200 RPM disks • Slower could indicate a failing drive, disk controller, array, etc. • dd • http://romanrm.ru/en/dd-benchmark 50
  • Others Things to Check • Disable Red Hat Transparent Huge Pages (RH6+ Only) • Can reduce elevated CPU usage • In rc.local: echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled • Reference: Linux 6 Transparent Huge Pages and Hadoop Workloads, http://goo.gl/WSF2qC 51
  • Others Things to Check • Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-20% 52
  • Others Things to Check • Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-20% • Monitor Everything • How else will you know what’s happening? • Nagios, Ganglia, CM, Ambari 53
  • 54 Thank You! Alex Moundalexis alexm at clouderagovt.com @technmsg We’re hiring, kids! Well, not kids.