HiTune sharing             Xiao Zhu             1/29/2013
HiTune is...–   a Hadoop performance analyzer–   developed by Intel–   based on Chukwa–   https://github.com/intel-hadoop/...
Example of HiTune Output                           3
Example of HiTune Output                           4
Example of HiTune Output                           5
Chukwa is...– an open source data collection system  for monitoring large distributed  systems.– based on HDFS and Map/Red...
HiTune is based on Chukwa                    is partly based on  Tracker                                        Agent     ...
HiTune is based on Chukwa                    is partly based on  Tracker                                         Agent    ...
HiTune/Chukwa System Basic                 StructureHiTune/Chukwa itself needs to set up on a standalone hadoopcluster. We...
HiTune/Chukwa Process and Data Flow           Hadoop Cluster                        Chukwa Cluster            HiTune Agent...
HiTune/Chukwa Process and Data Flow           Hadoop Cluster                        Chukwa Cluster            HiTune Agent...
HiTune/Chukwa Process and Data Flow           Hadoop Cluster                        Chukwa Cluster            HiTune Agent...
HiTune/Chukwa Process and Data Flow          Hadoop Cluster                        Chukwa Cluster            HiTune Agents...
HiTune/Chukwa Process and Data Flow           Hadoop Cluster                         Chukwa Cluster            HiTune Agen...
HiTune/Chukwa Process and Data Flow           Hadoop Cluster                          Chukwa Cluster             HiTune Ag...
HiTune/Chukwa Process and Data Flow• Yes if you want you can deploy Chukwa on Hadoop cluster.• Doing so will add difficult...
Why such structure?• Using Hadoop for MapReduce processing of  logs is somewhat troublesome.• Logs are generated increment...
Why such structure?• Chukwa is devoted to bridging that gap  between logs and MapReduce.• Chukwa is a scalable distributed...
Why such structure?• The overhead is mainly caused by  agents, since only agents run on Hadoop  Cluster.• According to the...
current HiTune version: 0.9• Support Hadoop 0.2 best• Based on Chukwa 0.4• Can support Hadoop 0.2+ , some options need  to...
Questions?
Backup
HiTune trouble shooting• Trouble shooting on HiTune is usually painful.• Need to check those logs: Hadoop cluster logs  (t...
HiTune/Chukwa Process and Data Flow           Hadoop Cluster                        Chukwa Cluster            HiTune Agent...
Upcoming SlideShare
Loading in …5
×

Hi tune sharing

488 views
401 views

Published on

Belief view on HiTune

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
488
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hi tune sharing

  1. 1. HiTune sharing Xiao Zhu 1/29/2013
  2. 2. HiTune is...– a Hadoop performance analyzer– developed by Intel– based on Chukwa– https://github.com/intel-hadoop/HiTune– Contact: jason.dai@intel.com jie.huang@intel.com.– Has 3 parts:– 1) Tracker– 2) Aggregation Engine– 3) Analysis Engine 2
  3. 3. Example of HiTune Output 3
  4. 4. Example of HiTune Output 4
  5. 5. Example of HiTune Output 5
  6. 6. Chukwa is...– an open source data collection system for monitoring large distributed systems.– based on HDFS and Map/Reduce framework.– http://incubator.apache.org/chukwa/– Has many parts, including:– 1) Agent– 2) Collector– 3) DemuxManager– 4) Other processes for logging and archive 6
  7. 7. HiTune is based on Chukwa is partly based on Tracker Agent is based on Aggregation Engine Collector is partly based on Analysis Engine Demux ManagerWe tend to call those parts by the right side names, and when we refer toHiTune, we are considering HiTune and Chukwa togetherSome of them are simply built upon Chukwa componentsbut others are implemented by modifying Chukwa or add new components.You will find Chukwa patches and patched Chukwa binary in HiTune release.So when you are going to deploy HiTune, I do not suggest deploy Chukwafirst manually (though you can), for HiTune has already included it. 7
  8. 8. HiTune is based on Chukwa is partly based on Tracker Agent is based on Aggregation Engine Collector is partly based on Analysis Engine Demux ManagerThe tracker includes HiTune java agent part and Chukwa agent part.The analysis engines includes HiTune script part and Chukwa Demux part.See following data flow for explanations on those parts. 8
  9. 9. HiTune/Chukwa System Basic StructureHiTune/Chukwa itself needs to set up on a standalone hadoopcluster. We name it as ‘Chukwa Cluster’, and the target cluster isnamed ‘Hadoop Cluster’. Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer 9
  10. 10. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer10 1. HiTune agents (java agent part) will be invoked by JVM when the workload starts on every node in hadoop cluster. This part will get system status and hadoop logs and save them on local storage.
  11. 11. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer11 2. Agent (Chukwa agent part) process will check java agent output periodically and send new data to (one of) the Collector(s).
  12. 12. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer12 3. Collector(s) put data to HDFS on Chukwa Cluster, When it has received 64MB data or a given time interval has passed, it pack received data to data packages (.done)
  13. 13. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer13 4. Demux Manager check data packages in Collector output dir on HDFS every 20 seconds. If it find .done files, it start Map/Reduce procedure to analyze it (May cost a long time to finish).
  14. 14. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer14 4. (Cont.) After Demux finishes, a HiTune script is required to run by the user. This script will run Map/Reduce to get final output (.csv files) (May cost a long time to finish, but faster than 3).
  15. 15. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer15 5. User get final output from hdfs://.JOBS/ manually. Then apply the output (.csv files) to HiTune Excel template to see the result. Graphics, Summaries and etc. will be computed by Excel.
  16. 16. HiTune/Chukwa Process and Data Flow• Yes if you want you can deploy Chukwa on Hadoop cluster.• Doing so will add difficulties to management and maintenance, but this is theoretically feasible.
  17. 17. Why such structure?• Using Hadoop for MapReduce processing of logs is somewhat troublesome.• Logs are generated incrementally across many machines, but Hadoop MapReduce works best on a small number of large files.• HDFS doesnt currently support appends, making it difficult to keep the distributed copy fresh. 17
  18. 18. Why such structure?• Chukwa is devoted to bridging that gap between logs and MapReduce.• Chukwa is a scalable distributed monitoring and analysis system, particularly logs from Hadoop and other large systems.• Though process of agents and collectors, large, appended, distributed logs are transformed into large data chunks, which are suitable for Map/Reduce. 18
  19. 19. Why such structure?• The overhead is mainly caused by agents, since only agents run on Hadoop Cluster.• According to the HiTune paper, the overhead is less than 2%• See those papers:• Dai, Jinquan, et al. "Hitune: Dataflow-based performance analysis for big data cloud." Proc. of the 2011 USENIX ATC (2011): 87-100. (Available on HiTune Github https://github.com/intel-hadoop/HiTune)• Boulon, Jerome, et al. "Chukwa, a large-scale monitoring system." Proceedings of CCA. Vol. 8. 2008. 19
  20. 20. current HiTune version: 0.9• Support Hadoop 0.2 best• Based on Chukwa 0.4• Can support Hadoop 0.2+ , some options need to be changed, and some metrics will be missing. (Current IDH is using Hadoop 1.0+)• Usually require a long time to complete aggregating and analyzing. Better deploy it on a fast cluster.
  21. 21. Questions?
  22. 22. Backup
  23. 23. HiTune trouble shooting• Trouble shooting on HiTune is usually painful.• Need to check those logs: Hadoop cluster logs (task tracker logs, job tracker logs, namenode logs, datanode logs), (most important!)Chukwa logs (agent logs, collector logs, demux logs), HiTune logs(script outputs).• If there is no error or warning in logs, check outputs on disk and HDFS• HiTuneStatusCheck.sh is not reliable. Check the logs yourself.
  24. 24. HiTune/Chukwa Process and Data Flow Hadoop Cluster Chukwa Cluster HiTune Agents Demux Workload Collectors Map/ Map/ Reduce Reduce HDFS HDFS Excel User’s Computer24 6. Later, Chukwa will group and archive data used on Chukwa Cluster HDFS to save space, but we will not discuss it here.

×