Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Profile hadoop apps


Published on

Profile Your Hadoop Jobs

Published in: Technology
  • Dear, thank you very much for this presentation is very helpful and clear , I just have one problem in download log files I FOUND USERLOG but empty ? also is all tasks will be on on file ?
    Are you sure you want to  Yes  No
    Your message goes here

Profile hadoop apps

  1. 1. Profiling Hadoop Applications Basant Verma
  2. 2. Agenda • Profiling General Background • Available Options • Profile using Free and Open Source tools • Profile using YourKit • Other troubleshooting tools
  3. 3. What does Profiling Provide? • Profiling runtime / CPU usage: – what lines of code the program is spending the most time in – what call/invocation paths were used to get to these lines • naturally represented as tree structures • Profiling memory usage: – what kinds of objects are sitting on the heap – where were they allocated – who is pointing to them now – memory leaks
  4. 4. Profiler Types and Components • Components needed for profiling – Profiling Agent • Collects profiled data (samples, traces, exceptions etc.) – Analysis Tool • Provides interface for analyzing profiled data and help user identify potential problems • Types of Profilers – insertion – sampling – instrumenting
  5. 5. Available Options • Sun JDK Tools – hprof: Profiler (uses jvmti) – jmap: Provides memory map (dump) heap – jhat: Analyze memory dump – jstack: Provide thread dump – Jvisualvm: GUI based profile data analyzer • Open Source – Visual VM (same as jvisualvm but downloaded as independent app) • Uses HPROF internally for profiling. Provides GUI for analysis of heap dump and profiler outputs – NetBeans Profiler • Similar to VisualVM but integrated into IDE – Eclipse MAT (Memory Analysis Tool) • Can load .hprof files • Commercial – YourKit – JProfile
  7. 7. 7 Official hprof Documentation usage: java -Xrunhprof:[help]|[<option>=<value>, ...] Option Name and Value Description Default --------------------- ----------- ------- heap=dump|sites|all heap profiling all cpu=samples|times|old CPU usage off monitor=y|n monitor contention n format=a|b text(txt) or binary output a file=<file> write data to file off depth=<size> stack trace depth 4 interval=<ms> sample interval in ms 10 cutoff=<value> output cutoff point 0.0001 lineno=y|n line number in traces? Y thread=y|n thread in traces? N doe=y|n dump on exit? Y msa=y|n Solaris micro state accounting n force=y|n force output to <file> y verbose=y|n print messages about dumps y
  8. 8. 8 Sample hprof usage • To measure CPU usage, try the following: java -Xrunhprof:cpu=samples,depth=6,heap=dump • Settings: – Takes samples of CPU execution – Record call traces that include the last 6 levels on the stack – Dumps the heap map (bigger file size but helps in finding problems) • Creates the file java.hprof.txt in the current directory
  9. 9. HPROF with Hadoop • Hadoop uses hprof as the default profiler • Profiling related parameters Purpose JobConf API Command line Parameter Enable Profiling setProfileEnabled(true) mapred.task.profile=true Additional parameters for Profiler setProfileParams(…) mapred.task.profile.params Range of sampled task to profile setProfileTaskRange mapred.task.profile.maps mapred.task.profile.reduces
  10. 10. Example • Using Java API • Using Command line parameters jobConf.setProfileEnabled(true); jobConf.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites” + “,depth=4,thread=y,file=%s"); jobConf.setProfileTaskRange(true, "0-2"); jobConf.setProfileTaskRange(false, "0-1"); hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,file=%s -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-1 input output
  11. 11. Collecting Profiler Output • Hadoop JobClient automatically downloads profile logs from all the profiled tasks – If output format type is not specified, hprof creates profile output in text format (format=a) • Profiler Outputs are also available via History WebUI • You can also download profile output using curl – curl -o attempt_201305161037_0004_m_000000_0.hprof " 201305161037_0004_m_000000_0&filter=profile"
  12. 12. Task User Log
  13. 13. Analyze Profiler output • You can use VisualVM, NetBeans profiler or YourKit for analyzing the profiling data. – The above tools support only binary format of hprof output (i.e. option format=b) • Example – Run profiler with Hadoop job – Load Profiler output using VisualVM menu option hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=all, depth=4,thread=y,format=b,file=%s input output
  14. 14. Analyze Profile Output in VisualVM
  15. 15. Object Query Language • VisualVM and jhat support special query language (OQL) to query Java heap. – Example : Select all Strings with length 1K or more • More information about OQL is available at select s from java.lang.String where s.count > 1024;
  16. 16. Analyze Profile Output in Eclipse MAT
  17. 17. Profiling Pig Jobs • Use Hadoop command line parameters • More information about Pig job profiling is available at Pig Wiki – pig -Dmapred.task.profile=true -Dmapred.task.profile.params=-agentlib:hprof=cpu=samples,heap=sites,thread=y,verbose=n -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-0 mypigscript.pig
  18. 18. Profiling Hive Queries • Set appropriate Hadoop parameters before submitting the queries hive> set mapred.task.profile=true; hive> set mapred.task.profile.params=-agentlib:hprof=heap=dump,format=b,file=%s; hive> set mapred.task.profile.maps=0-2; hive> set mapred.task.profile.reduces=0-0; hive> hive> <hive query>
  20. 20. YourKit Profiler - Summary • Commercial Java Profiling Tool – Free tryout and Open Source licenses are available • Used by many Open Source projects including Hadoop, Pig, Hive etc. • Features – On-Demand Profiling – CPU, Memory and Concurrency profiling methods – Has integration (Eclipse, NetBeans, IntelliJ) – Above all, has relatively low performance overhead
  21. 21. Using YourKit Profiler • You will need to install YourKit profiler (just the profiler lib) on to each TaskTracker • Tell Hadoop to use a different profiler • Theoretically, you can also use DistributedCache to make binaries available on TaskTracker machines – Though, I did not have success with this hadoop jar $HADOOP_HOME/hadoop-examples.jar wordcount -Dmapred.task.profile=true -Dmapred.task.profile.params=- agentpath:<yourkit_path>/libyjpagent.jnilib=dir=/tmp/yourkit_snapnshot,sampling,disablej2ee -Dmapred.task.profile.maps=0-2 -Dmapred.task.profile.reduces=0-1 input output
  22. 22. Small Glitch • Hadoop JobClient.waitforCompletion(…) will throw error since profile logs are not available in the default directory. • However, the job will continue to run successfully. • To avoid this, you can instead use option to specify the profiling parameters
  23. 23. YourKit to Analyze Jobs • Can analyze profile output from both YourKit Profiler and hprof/jmap.
  24. 24. OTHER TOOLS
  25. 25. Using other Tools • JDK Tool ‘jmap’ – Can be used for capturing heap map of a running Java process and later used for analysis inside VisualVM or YourKit • $ jmap -dump:live,format=b,file=xyz.hprof <jvm-pid> • Don’t run jmap with -histo:live option on JT or NN – Java process can also be instructed to generate hprof dump of heap map in case of OutOfMemoryError • -XX:+HeapDumpOnOutOfMemoryError • JDK Tool ‘jhat’ – Can read heap dump in hprof format and provides a light weight web interface to analyze profiler output
  26. 26. Other Tools (Cont…) • Hadoop Vaidya (Simple Diagnostic Tool) – Identifies common performance problem related to Hadoop Jobs (unbalanced partitioning, granularity of tasks, combiners etc.) – Works merely on Hadoop Job (does not understands the specifics of Hive/Pig)
  27. 27. Other Recommendation • If possible try running Hadoop (MR/Pig/Hive) in local mode using LocalJobRunner – LocalJobRunner runs the entire MapReduce job in a single JVM – It simplifies profiling and log collection – Can also be used for attaching debugger from IDE
  28. 28. Resources • Troubleshooting Java application – • Profile Hadoop Job (Chapter 5 - “Hadoop – The definitive Guide”) – 1974/tuning-a-job/id3545664 • Profiling Pig Job – • ‘hprof’ Official Documentation – • YourKit Profiler –