Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop I/O Analysis

1,566 views

Published on

Some initial analysis of the Hadoop Stack using vProbes

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Hadoop I/O Analysis

  1. 1. Architects)view)of)Hadoop)I/O) I/O)analysis)using)vProbes) ) Richard)McDougall) V1.0)) April)2012)
  2. 2. Architect’s)QuesFons)•  Does)Hadoop)really)need)compute)+)data) local)•  How)much)and)what)I/O)rates)of)ephemeral) data)do)we)need)to)design)for?)•  What)I/O)paKerns)do)we)need)to)support) HDFS?)•  What)is)the)I/O)paKern)of)MNR)tasks)•  Are)there)opportuniFes)for)caching)–)map) input,)output)or)ephemeral?)
  3. 3. Controlled)Small)Study)•  Focus)on)developing)tooling)•  Using)vProbes)+)Perl)+)R)•  Hadoop)0.20.204)•  Terasort)@)1GB)•  One)Namenode,)Tasktracker,)Datanode)
  4. 4. Terasort) Map)Task) Map)Task) Reduce) Output)File)Input)File) Shuffle) (Sort)) Map)Task) Map)Task) Input) Splits) Sort)Chunk)of) Shuffle)output) Combine)and)Sort) (x16)) Of)KeyNValues) To)Reducers)
  5. 5. Log)of)the)sort)‘Job’)$ log.pl job_201201261301_0005_1327649126255_rmc_TeraSort ! Item Time Jobname Taskname Phase Start-Time End-Time Elapsed ! Job 0.000 201201261301_0005 ! Job 201201261301_0005 ! Job 0.475 201201261301_0005 PREP ! Task 1.932 201201261301_0005 m_000017 SETUP ! MapAttempt 3.066 201201261301_0005 m_000017 SETUP ! MapAttempt 10.409 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.409 8.477 "setup"! Task 10.966 201201261301_0005 m_000017 SETUP SUCCESS 1.932 10.966 9.034 ! Job 201201261301_0005 RUNNING ! Task 10.970 201201261301_0005 m_000000 MAP ! Task 10.972 201201261301_0005 m_000001 MAP ! MapAttempt 10.981 201201261301_0005 m_000000 MAP ! MapAttempt 65.819 201201261301_0005 m_000000 MAP SUCCESS 10.970 65.819 54.849 ""! Task 68.063 201201261301_0005 m_000000 MAP SUCCESS 10.970 68.063 57.093 ! MapAttempt 10.998 201201261301_0005 m_000001 MAP ! MapAttempt 65.363 201201261301_0005 m_000001 MAP SUCCESS 10.972 65.363 54.391 ""! Task 68.065 201201261301_0005 m_000001 MAP SUCCESS 10.972 68.065 57.093 ! Task 68.066 201201261301_0005 m_000002 MAP ! Task 68.067 201201261301_0005 m_000003 MAP ! Task 68.068 201201261301_0005 r_000000 REDUCE ! MapAttempt 68.075 201201261301_0005 m_000002 MAP ! MapAttempt 139.789 201201261301_0005 m_000002 MAP SUCCESS 68.066 139.789 71.723 ""! Task 140.193 201201261301_0005 m_000002 MAP SUCCESS 68.066 140.193 72.127 ! MapAttempt 68.076 201201261301_0005 m_000003 MAP ! MapAttempt 139.927 201201261301_0005 m_000003 MAP SUCCESS 68.067 139.927 71.860 ""! Task 140.198 201201261301_0005 m_000003 MAP SUCCESS 68.067 140.198 72.131 !…! ReduceAttempt 68.112 201201261301_0005 r_000000 REDUCE ! ReduceAttempt 795.299 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 795.299 727.231 "reduce > reduce"! Task 798.223 201201261301_0005 r_000000 REDUCE SUCCESS 68.068 798.223 730.155 ! Task 798.226 201201261301_0005 m_000016 CLEANUP ! MapAttempt 798.241 201201261301_0005 m_000016 CLEANUP ! MapAttempt 806.113 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 806.113 7.887 "cleanup"! Task 807.252 201201261301_0005 m_000016 CLEANUP SUCCESS 798.226 807.252 9.026 ! Job 807.253 201201261301_0005 SUCCESS 0.000 807.253 807.253 !
  6. 6. Terasort:)Map)and)Reduce)Phases) Setup)Map) Elapsed)Time)N)Seconds) Mappers) Reducer) Cleanup)Map)
  7. 7. Terasort:)Map)and)Reduce)Phases) Setup)Map) Elapsed)Time)N)Seconds) Zoom)in) on) Map)Task) I/O) Mappers) Zoom)in) on) Reduce) Task)I/O) Reducer) Cleanup)Map)
  8. 8. VMware)vProbes)•  Dynamic) InstrumentaFon)•  Probe)mulFple) VMs)•  Probe) VirtualizaFon) Layer)•  VMware)Fusion) and)WorkstaFon)
  9. 9. vProbes)GUEST:ENTER:system_call {! string path;! comm = curprocname();! tid = curtid();! pid = curpid();! ppid = curppid();! syscall_num = sysnum;!! if(syscall_num == NR_open) {! !path = guestloadstr(sys_arg0);! syscall_name = "open";! sprintf(syscall_args, ""%s", %x, %x", path, sys_arg1, sys_arg2); ! …!}!!GUEST:OFFSET:ret_from_sys_call:0 {! !printf("%s/%d/%d/%d %s(%s) = %d <0>n", comm, pid, rtid, ppid, syscall_name,! syscall_args, getgpr(REG_RAX)); !}!!!java/14774/15467/1 open("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 0, 1b6) = 144 <0>!java/14774/15467/1 stat("/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta", 7f0b80a4e590) = 0<0>!java/14774/15467/1 read(144, 7f0b80a4c470, 4096) = 167 <0>!!
  10. 10. Pathname)ResoluFon)filetracevp.pl: !!if ($syscall =~ m/open/) {! $path1 = $line;! $path1 =~ s/[A-z/0-9]+[ ]+[a-z]+("([^"]+)".*n/1/;! $fd1 = $line;! if ($fd1 =~ s/.* ([0-9]+) <.*>n/1/) {! $fds{$pid,$fd1} = $path1;!!if ($syscall =~ m/write/) {! $params = $line;! if ($params =~ s/^[A-z/0-9]+[ ]+[a-z]+(([0-9]+),.* ([0-9]+)) = ([0-9]+) <(.*)>n/1,2,3,4/) {! ($fd1, $size, $bytes, $lat) = split(,, $params);! $path1 = $fds{$pid, $fd1};!…!!!java,14774,15467,,open,0,0,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!java,14774,15467,,stat,0,0,0,0,0,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!java,14774,15467,,read,4096,167,0,0,144,/host/hadoop/hdfs/data/current/subdir0/blk_1719908349220085071_1649.meta,0,!!!!!
  11. 11. Controlled)SmallNScale)Study) $ hadoop jar hadoop-examples-0.20.204.0.jar teragen 10000000 teradata! <begin trace>! $ hadoop jar hadoop-examples-0.20.204.0.jar terasort teradata teraout! !Job Counters ! Hadoop)Distro) 236) Launched reduce tasks=1! Hadoop)Logs) 132) SLOTS_MILLIS_MAPS=1146887! Hadoop)clienKmp)unjar) 1) Launched map tasks=16! Data-local map tasks=16! Mappers)files)jobcache)N)spills) 1753) SLOTS_MILLIS_REDUCES=766823! Mappers)files)jobcache)N)output) 1777) File Input Format Counters ! Reducer)Intermediate) 764) Bytes Read=1000057358! Reducers)Shuffle)and)Intermediate) 1744) File Output Format Counters ! Bytes Written=1000000000! Jobcache)class)files)and)shell)scripts) 1) FileSystemCounters! Hadoop)Datanode) 1690) FILE_BYTES_READ=2382257412! JVM)N)/usr/lib/jvm…) 98) HDFS_BYTES_READ=1000059070! Total&MB& 7987& FILE_BYTES_WRITTEN=3402627838! HDFS_BYTES_WRITTEN=1000000000! JVM)N)/usr/lib/jvm…) Map-Reduce Framework! Map output materialized bytes=1020000096! Hadoop)Datanode) Map input records=10000000! Jobcache)class)files)and)shell)scripts) Reduce shuffle bytes=1020000096! Spilled Records=33355441! Reducers)files)jobcache)N)output) Map output bytes=1000000000! Reducer)intermediate)file) Map input bytes=1000000000! Combine input records=0! Mappers)files)jobcache)N)map)output) SPLIT_RAW_BYTES=1712! Mappers)files)jobcache)N)spills) Reduce input records=10000000! Reduce input groups=10000000! Hadoop)clienKmp)unjar) Combine output records=0! Hadoop)Logs) Reduce output records=10000000! Hadoop)Distro) Map output records=10000000! 0) 200) 400) 600) 800)1000)1200)1400)1600)1800)2000)
  12. 12. Hadoop)I/O)Model) (With)some)data)from)early)observaFons)) Map)Task) Reduce) Map)Task)Job) Map) Reduce) Sort) Map)Task) Output) file.out* Spills) Map)Task) DFS) Spills) &)Logs) ) Shuffle) Map_*.out* Input) Data) spill*.out* 75%)of) Combine) DFS) Intermediate.out* Output) ) Disk)Bandwidth) ) Data) 12%)of) 12%)of) Bandwidth) Bandwidth) HDFS)12)
  13. 13. One)Mapper)Task:)Temp)Data)path bytes/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/file.out 67586124/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill1.out 52762519/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill0.out 52508540/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/output/spill2.out 29698564/usr/lib/jvm/javaD6Dopenjdk/jre/lib/rt.jar 5057763/home/rmc/untars/hadoopD0.20.204.0/hadoopDcoreD0.20.204.0.jar 895582/home/rmc/untars/hadoopD0.20.204.0/lib/log4jD1.2.15.jar 82522/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDlangD2.4.jar 70477/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDconfigurationD1.6.jar 61007/usr/lib/x86_64DlinuxDgnu/gconv/gconvDmodules 51772/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/job.xml 44420/home/rmc/untars/hadoopD0.20.204.0/lib/commonsDcollectionsD3.2.1.jar 29974/host/hadoop/clienttmp/mapred/local/taskTracker/rmc/jobcache/job_201201251035_0001/attempt_201201251035_0001_m_000000_0/job.xml 21695/usr/lib/jvm/javaD6Dopenjdk/jre/lib/amd64/libnio.so 15946/home/rmc/untars/hadoopD0.20.204.0/conf/coreDsite.xml 11024/usr/lib/jvm/javaD6Dopenjdk/jre/lib/security/java.security 10081/proc/self/maps 7523
  14. 14. Number of I/Os 0 10000 20000 30000 40000 50000 60000 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 I/O)measured)at)syscall) Number of I/Os Number of I/Os 0 5000 10000 15000 20000 25000 30000 0 5000 10000 15000 20000 25000 30000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024Write I/O Size Bucket 2048 Read I/O Size Bucket 2048 4096 4096 8192 8192 16384 16384 One)Mapper)Task:)Temp)I/O)Counts) 32768 32768 65536 131072 65536 131072
  15. 15. One)Mapper)Task:)Tmp)Bytes)Transferred)2.5e+08 6e+072.0e+08 5e+07 4e+071.5e+08 BytesBytes 3e+071.0e+08 2e+075.0e+07 1e+070.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 I/O Size Bucket I/O Size Bucket I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls))
  16. 16. Reducer)Task:)Temp)Data)
  17. 17. Number of I/Os 0e+00 1e+05 2e+05 3e+05 4e+05 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 I/O)measured)at)syscall) Number of I/Os Number of I/Os 0 50000 100000 150000 200000 250000 300000 0 20000 40000 60000 80000 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Reducer)Task:)Temp)I/O)Counts) Read I/O Size Bucket 2048Write I/O Size Bucket 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072
  18. 18. Reducer)Task:)Tmp)Bytes)Transferred)1.5e+09 5e+08 4e+081.0e+09Bytes 3e+08 Bytes 2e+085.0e+08 1e+080.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 I/O Size Bucket I/O Size Bucket I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls))
  19. 19. Datanode)–)Bytes)Transferred) 5e+08 7e+08 6e+08 5e+081.5e+09 4e+08 Bytes 4e+08 3e+08 2e+08 1e+081.0e+09Bytes 3e+08 Bytes 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Read I/O Size Bucket 1e+095.0e+08 8e+08 2e+08 6e+08 Bytes 4e+08 1e+080.0e+00 2e+08 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 0e+00 I/O Size Bucket 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 Write I/O Size Bucket 0e+00
  20. 20. Datanode)–)Actual)vs.)Logical)I/O)Size) 5e+08 5e+081.5e+09 4e+08 4e+081.0e+09 3e+08 BytesBytes 3e+08 Bytes 2e+085.0e+08 1e+08 2e+080.0e+00 0e+00 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 I/O Size Bucket I/O Size Bucket 1e+08 I/O)measured)at)syscall) Logical)I/O)(sequenFal)grouping)of)syscalls)) 0e+00
  21. 21. Number of I/Os 0 5000 10000 15000 20000 25000 Bytes0e+00 1e+08 2e+08 3e+08 4e+08 5e+08 1 2 4 8 16 32 64 128 256 512 1024 I/O Size Bucket 2048 4096 8192 16384 32768 65536 131072 Number of I/Os Number of I/Os 0 5000 10000 15000 0 2000 4000 6000 8000 10000 1 1 2 2 4 4 8 8 Datanode)–)IOPS) 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024 Write I/O Size Bucket Read I/O Size Bucket 2048 2048 4096 4096 8192 8192 16384 16384 32768 32768 65536 65536 131072 131072
  22. 22. Back)of)the)Envelope)Modeling)))•  How)much)bandwidth)does)terasort)need?) –  10)seconds)of)CPU/core)Fme)per)task) –  128MB)of)HDFS)per)task) –  ~3x,)384MB)of)temporary)data)per)task)I/O&Component& Per7task& Per7task&Bandwidth& Per7host&(24& cores)&HDFS)I/O) 128MB) ~13MBytes/s) 312Mbytes/sec)Temp) 384MB) ~38Mbytes/sec) 912Mbytes/sec)
  23. 23. Do)we)need)locality?)•  Main)issue)is)crossNsecFonal)bandwidth) –  Secondary)issue)is)perNhost)link)speed) –  Just)look)at)storage)I/O)now,)consider)shuffle)next) I/O& Per7host&(24& Network& Rack& Component& cores)& Bandwidth&& Bandwidth& w/&0%&locality& w/40&hosts& HDFS)I/O) 312Mbytes/sec) 2.5Gbits) 100gbits) Temp) 912Mbytes/sec) 7.3Gbits) 300gbits)•  Possible)Conclusion) –  Must)have)locality)w/1Gbit)host)link) –  Feasible)to)have)remote)data)w/10Gbit)and)keeping) temp)local)only)

×