SlideShare a Scribd company logo
1 of 23
Download to read offline
Method for Monitoring and
Profiling of Hadoop using AspectJ

    Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane
   Graduate School of Natural Science & Technology,
                 Kanazawa University

             PRDC2012@TOKIMESSE
Introduction
The use scene of Large-scale Distributed Systems is increasing


              Large-scale Distributed System is ...

            “Flexible and available architecture for large
            scale computation and data processing on a
                 network of commodity hardware”
                            [-- P. Julio, 2009]

                      - e.g. Apache Hadoop
For Dependable Distributed System ..
 We have to consider
about and deal with ...
                           Only using advance
- Non-deterministic        and static analysis
  network                  or verification
- Fault tolerance           is difficult
- Incomprehensible users

 We also need runtime monitoring and analysis
How to monitor and debug

 General method of debugging or monitoring the
 Hadoop is ...
• logging text messages

• checking metrics via Web Interfaces, Ganglia, etc..
There are difficulties and requirements



 General method of debugging or monitoring the
 Hadoop is ...
• logging text message
 → Difficulties by a huge number of nodes
• checking metrics via Web Interfaces, Ganglia, etc..
 → For operators, not enough to developers
Introduction

 Proposal

   1. The Method Level Monitor

   2. The Adaptive Profiling

- Provide effective information for development
- Help developers to understand system behaviors
and specifications
Outline of Talk

Introduction
- Distributed system’s difficulty
Proposal
- Monitor
- Profile Method
Experimental Results & Conclusion
2. PROPOSALS

        The Runtime Monitor


                  &


     The Adaptive Profiling Method
Outline of Proposed System
 Hadoop       Monitor        Profile

•MapReduce   Record Trace    Count up
             using AspectJ   frequency
•HDFS
                             of
•RPC                         instruction
Monitor

•   observe the system behavior at runtime
•   logging executed instructions passively = make “Trace”
    ‣ using   AspectJ
       -   “AspectJ is implementation of “Aspect
           Oriented Programming” using Java “
      ‣ no    modification is needed to applications
Architecture of Hadoop & Monitor
 Master

 Name      Job                                        Slaves
 Node    Tracker

                                       Map                                   Map

                          Data               Reduce             Data               Reduce
                         Blocks                                Blocks
    Monitor


                          Data           Task                   Data           Task
                          Node          Tracker                 Node          Tracker
 RPC

                   RPC
                                  Monitor                               Monitor
Architecture of Hadoop & Monitor
 Master

 Name      Job                                        Slaves
 Node    Tracker

                                       Map                                   Map

                          Data               Reduce             Data               Reduce
                         Blocks                                Blocks
    Monitor


                          Data           Task                   Data           Task
                          Node          Tracker                 Node          Tracker
 RPC

                   RPC
                                  Monitor                               Monitor
Architecture of Hadoop & Monitor
 Master

 Name      Job                                        Slaves
 Node    Tracker

                                       Map                                       Map

                          Data               Reduce                 Data               Reduce
                         Blocks                                    Blocks
    Monitor


                          Data           Task                       Data           Task
                          Node          Tracker                     Node          Tracker
 RPC

                   RPC
                                  Monitor                                   Monitor




   Master’s Trace                                      Slaves’ Trace
  ‣NameNode Trace                                     ‣DataNode Trace
  ‣JobTracker Trace                                   ‣TaskTracker Trace
  ‣RPC Trace                                          ‣RPC Trace
Method of Profiling

•   based on frequency of instructions
•   count up instructions involved in “Trace”
•   count up on each grain
    ➡   each node
        ➡   each process
            ➡   each method
Outline of Talk
Introduction
- Distributed system’s difficulty
Proposal
- Monitor
- Profile Method
Experimental Results & Conclusion
3. EXPERIMENT

   Benchmark on the impact of the Monitor
                      &
                do Profiling
                      &
        Visualize the profiling results
Benchmark                   - the impact of Monitor

Throughput [MB/sec] = Data size / Elapsed time
 Data size               Elapsed time           Throughput   Trace size
             Monitor
   [GB]                     [sec]                [MB/sec]       [MB]

     1         ⃝       2m 25s (145sec)   6.9                    2.4
                                                    84.1%
     1          ×      2m 2s (122s)      8.2                     0


    10         ⃝       8m 45s (525sec)   19.0                   3.6
                                                    88.3%
    10          ×      7m 45s (465sec)   21.5                    0

                       1h 21m 54s
    100        ⃝                         20.4                  31.6
                                                    96.2%
                       (4,914sec)
                       1h 18m 37s
    100         ×                        21.2                    0
                       (4,717sec)


     use “terasort” - a sample sorting program using MapReduce
     Trace size increase by 6.43 KB/sec
A Part of Profiling
    the statistics of the last 10 seconds, about master
   Tue Nov 13 12:30:08 JST 2012
from 1352777408766 until 10000 after
HOSTNAME ::> DAEMON & PROCESS = { METHODS }
--------------------------
sirius:177 ::>>
  [namenodetrace : 23, jobtrackertrace : 41, datanodetrace : 0,
tasktrackertrace : 0, rpctrace : 113]
 ={
! hdfs.server.namenode.CorruptReplicasMap.numCorruptReplicas=5
! hdfs.server.namenode.FSNamesystem.getBlockLocations=3
! hdfs.server.namenode.FSNamesystem.getDatanode=1
! hdfs.server.namenode.NameNode.getBlockLocations=4
! hdfs.server.namenode.NameNode.getFileInfo=2
! hdfs.server.namenode.NameNode.sendHeartbeat=2
! hdfs.server.namenode.NameNode.verifyVersion=3
! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.hasNext=2
! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.next=1
! ipc.Client.Connection.PingInputStream.read=4
! ipc.Client.Connection.sendParam=2
! ipc.Client.call=1
! ipc.ConnectionHeader.readFields=4
Node Level Profiling
                                               Node Level Profiling is
                                             -- profiling by aggregating frequencies of
                                             instruction within each node for per unit
                                             time.
                              800
                                                     192.168.1.10    192.168.1.11
number	
  of	
  occurrences




                              640
                                                     192.168.1.12    192.168.1.13
                                                     192.168.1.14    192.168.1.15
                              480


                              320


                              160


                               0
                                         time(s)                          6420
Process Level Profiling about MASTER




  Process Level Profiling is
-- profiling by aggregating frequencies of instruction of each process
within each node for per unit time.
                                            Master
                                      400
                                                               rpc
        number	
  of	
  occurrences




                                      300                      jobtracker
                                                               namenode
                                      200


                                      100


                                       0
                                                                            6420
                                                     time(s)
Process Level Profiling about Slaves
                                       192.168.1.11
                                200
  number	
  of	
  occurrences



                                                                                        rpctrace
                                150                                                     tasktrackertrace
                                                                                        datanodetrace
                                100

                                 50

                                 0
                                                                                  6420 time(s)
                                         Map phase                 Reduce phase
                                  192.168.1.12              192.168.1.13
                                                                           There are free resouces.
200

150
                                                      150

                                                      113
                                                                                  should do
100                                                   75                    speculative executions.
 50                                                   38



                                  192.168.1.14              192.168.1.15
200                                                   200
150

100
                                                      150

                                                      100
                                                                           Imbalance of RPC
 50                                                    50
Conclusion
    summary
•   Proposal
    -   the lightweight method-level monitor using AspectJ
    -   the profiling method based on frequency of instruction
•   Provide effective information for development
•   Help developers to understand system behaviors and
    specifications
    future work
•   Create an algorithm for determining the degree of deviation
    using a profiling results indicate the possibility of failure.
Thank you for your kind attention

More Related Content

What's hot

Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
Tungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsTungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsContinuent
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
Storm distributed processing
Storm distributed processingStorm distributed processing
Storm distributed processingducquoc_vn
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Internship Project (Lasindu) WSO2
Internship Project (Lasindu) WSO2Internship Project (Lasindu) WSO2
Internship Project (Lasindu) WSO2lasinducharith
 
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB Rakuten Group, Inc.
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataIntel Nervana
 

What's hot (14)

Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Tungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten ReplicatorsTungsten University: Setup and Operate Tungsten Replicators
Tungsten University: Setup and Operate Tungsten Replicators
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Storm distributed processing
Storm distributed processingStorm distributed processing
Storm distributed processing
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Internship Project (Lasindu) WSO2
Internship Project (Lasindu) WSO2Internship Project (Lasindu) WSO2
Internship Project (Lasindu) WSO2
 
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Using neon for pattern recognition in audio data
Using neon for pattern recognition in audio dataUsing neon for pattern recognition in audio data
Using neon for pattern recognition in audio data
 

Viewers also liked

14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)Swiss Big Data User Group
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopYafang Chang
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Athemaster Co., Ltd.
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache AmbariHortonworks
 
Designing Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles PatternDesigning Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles PatternPuppet
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...Sonatype
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performanceDataWorks Summit
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingBart Vandewoestyne
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best PracticesEdward Capriolo
 

Viewers also liked (18)

Soldagem 2009 2-emi
Soldagem 2009 2-emiSoldagem 2009 2-emi
Soldagem 2009 2-emi
 
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
14.05.2012 Social Media Monitoring with Hadoop (Nils Kübler, MeMo News)
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated HadoopHadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)Cloudera security and enterprise license by Athemaster(繁中)
Cloudera security and enterprise license by Athemaster(繁中)
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
DevOps Overview
DevOps OverviewDevOps Overview
DevOps Overview
 
Dev ops
Dev opsDev ops
Dev ops
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
Designing Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles PatternDesigning Puppet: Roles/Profiles Pattern
Designing Puppet: Roles/Profiles Pattern
 
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup ParisKudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Hadoop & Big Data benchmarking
Hadoop & Big Data benchmarkingHadoop & Big Data benchmarking
Hadoop & Big Data benchmarking
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best Practices
 

Similar to Prdc2012

Insider Threat Visualization - HackInTheBox 2007
Insider Threat Visualization - HackInTheBox 2007Insider Threat Visualization - HackInTheBox 2007
Insider Threat Visualization - HackInTheBox 2007Raffael Marty
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurRaffael Marty
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudNick Gerner
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
Microsoft - The Big Data opportunity
Microsoft - The Big Data opportunityMicrosoft - The Big Data opportunity
Microsoft - The Big Data opportunityLee Stott
 
Network State Awareness & Troubleshooting
Network State Awareness & TroubleshootingNetwork State Awareness & Troubleshooting
Network State Awareness & TroubleshootingAPNIC
 
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsLibrato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsHeroku
 
SVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control SystemsSVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control SystemsTao Jiang
 
13 monitor-analyse-system
13 monitor-analyse-system13 monitor-analyse-system
13 monitor-analyse-systemsanganiraju
 
Выявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedВыявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedElena Marianenko
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
Crawlware
CrawlwareCrawlware
Crawlwarekidrane
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Hao Chen
 
ASR-9000 в качестве платформы внедрения SDN в сетях операторов связи
ASR-9000 в качестве платформы внедрения SDN в сетях операторов связиASR-9000 в качестве платформы внедрения SDN в сетях операторов связи
ASR-9000 в качестве платформы внедрения SDN в сетях операторов связиCisco Russia
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 

Similar to Prdc2012 (20)

Insider Threat Visualization - HackInTheBox 2007
Insider Threat Visualization - HackInTheBox 2007Insider Threat Visualization - HackInTheBox 2007
Insider Threat Visualization - HackInTheBox 2007
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala Lumpur
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Common Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the CloudCommon Sense Performance Indicators in the Cloud
Common Sense Performance Indicators in the Cloud
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
Microsoft - The Big Data opportunity
Microsoft - The Big Data opportunityMicrosoft - The Big Data opportunity
Microsoft - The Big Data opportunity
 
Network State Awareness & Troubleshooting
Network State Awareness & TroubleshootingNetwork State Awareness & Troubleshooting
Network State Awareness & Troubleshooting
 
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor AppsLibrato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
Librato's Joseph Ruscio at Heroku's 2013: Instrumenting 12-Factor Apps
 
SVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control SystemsSVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control Systems
 
13 monitor-analyse-system
13 monitor-analyse-system13 monitor-analyse-system
13 monitor-analyse-system
 
Выявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов RiverbedВыявление и локализация проблем в сети с помощью инструментов Riverbed
Выявление и локализация проблем в сети с помощью инструментов Riverbed
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Crawlware
CrawlwareCrawlware
Crawlware
 
Design review
Design reviewDesign review
Design review
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015
 
ASR-9000 в качестве платформы внедрения SDN в сетях операторов связи
ASR-9000 в качестве платформы внедрения SDN в сетях операторов связиASR-9000 в качестве платформы внедрения SDN в сетях операторов связи
ASR-9000 в качестве платформы внедрения SDN в сетях операторов связи
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 

Prdc2012

  • 1. Method for Monitoring and Profiling of Hadoop using AspectJ Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane Graduate School of Natural Science & Technology, Kanazawa University PRDC2012@TOKIMESSE
  • 2. Introduction The use scene of Large-scale Distributed Systems is increasing Large-scale Distributed System is ... “Flexible and available architecture for large scale computation and data processing on a network of commodity hardware” [-- P. Julio, 2009] - e.g. Apache Hadoop
  • 3. For Dependable Distributed System .. We have to consider about and deal with ... Only using advance - Non-deterministic and static analysis network or verification - Fault tolerance is difficult - Incomprehensible users We also need runtime monitoring and analysis
  • 4. How to monitor and debug General method of debugging or monitoring the Hadoop is ... • logging text messages • checking metrics via Web Interfaces, Ganglia, etc..
  • 5. There are difficulties and requirements General method of debugging or monitoring the Hadoop is ... • logging text message → Difficulties by a huge number of nodes • checking metrics via Web Interfaces, Ganglia, etc.. → For operators, not enough to developers
  • 6. Introduction Proposal 1. The Method Level Monitor 2. The Adaptive Profiling - Provide effective information for development - Help developers to understand system behaviors and specifications
  • 7. Outline of Talk Introduction - Distributed system’s difficulty Proposal - Monitor - Profile Method Experimental Results & Conclusion
  • 8. 2. PROPOSALS The Runtime Monitor & The Adaptive Profiling Method
  • 9. Outline of Proposed System Hadoop Monitor Profile •MapReduce Record Trace Count up using AspectJ frequency •HDFS of •RPC instruction
  • 10. Monitor • observe the system behavior at runtime • logging executed instructions passively = make “Trace” ‣ using AspectJ - “AspectJ is implementation of “Aspect Oriented Programming” using Java “ ‣ no modification is needed to applications
  • 11. Architecture of Hadoop & Monitor Master Name Job Slaves Node Tracker Map Map Data Reduce Data Reduce Blocks Blocks Monitor Data Task Data Task Node Tracker Node Tracker RPC RPC Monitor Monitor
  • 12. Architecture of Hadoop & Monitor Master Name Job Slaves Node Tracker Map Map Data Reduce Data Reduce Blocks Blocks Monitor Data Task Data Task Node Tracker Node Tracker RPC RPC Monitor Monitor
  • 13. Architecture of Hadoop & Monitor Master Name Job Slaves Node Tracker Map Map Data Reduce Data Reduce Blocks Blocks Monitor Data Task Data Task Node Tracker Node Tracker RPC RPC Monitor Monitor Master’s Trace Slaves’ Trace ‣NameNode Trace ‣DataNode Trace ‣JobTracker Trace ‣TaskTracker Trace ‣RPC Trace ‣RPC Trace
  • 14. Method of Profiling • based on frequency of instructions • count up instructions involved in “Trace” • count up on each grain ➡ each node ➡ each process ➡ each method
  • 15. Outline of Talk Introduction - Distributed system’s difficulty Proposal - Monitor - Profile Method Experimental Results & Conclusion
  • 16. 3. EXPERIMENT Benchmark on the impact of the Monitor & do Profiling & Visualize the profiling results
  • 17. Benchmark - the impact of Monitor Throughput [MB/sec] = Data size / Elapsed time Data size Elapsed time Throughput Trace size Monitor [GB] [sec] [MB/sec] [MB] 1 ⃝ 2m 25s (145sec) 6.9 2.4 84.1% 1 × 2m 2s (122s) 8.2 0 10 ⃝ 8m 45s (525sec) 19.0 3.6 88.3% 10 × 7m 45s (465sec) 21.5 0 1h 21m 54s 100 ⃝ 20.4 31.6 96.2% (4,914sec) 1h 18m 37s 100 × 21.2 0 (4,717sec) use “terasort” - a sample sorting program using MapReduce Trace size increase by 6.43 KB/sec
  • 18. A Part of Profiling the statistics of the last 10 seconds, about master Tue Nov 13 12:30:08 JST 2012 from 1352777408766 until 10000 after HOSTNAME ::> DAEMON & PROCESS = { METHODS } -------------------------- sirius:177 ::>> [namenodetrace : 23, jobtrackertrace : 41, datanodetrace : 0, tasktrackertrace : 0, rpctrace : 113] ={ ! hdfs.server.namenode.CorruptReplicasMap.numCorruptReplicas=5 ! hdfs.server.namenode.FSNamesystem.getBlockLocations=3 ! hdfs.server.namenode.FSNamesystem.getDatanode=1 ! hdfs.server.namenode.NameNode.getBlockLocations=4 ! hdfs.server.namenode.NameNode.getFileInfo=2 ! hdfs.server.namenode.NameNode.sendHeartbeat=2 ! hdfs.server.namenode.NameNode.verifyVersion=3 ! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.hasNext=2 ! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.next=1 ! ipc.Client.Connection.PingInputStream.read=4 ! ipc.Client.Connection.sendParam=2 ! ipc.Client.call=1 ! ipc.ConnectionHeader.readFields=4
  • 19. Node Level Profiling Node Level Profiling is -- profiling by aggregating frequencies of instruction within each node for per unit time. 800 192.168.1.10 192.168.1.11 number  of  occurrences 640 192.168.1.12 192.168.1.13 192.168.1.14 192.168.1.15 480 320 160 0 time(s) 6420
  • 20. Process Level Profiling about MASTER Process Level Profiling is -- profiling by aggregating frequencies of instruction of each process within each node for per unit time. Master 400 rpc number  of  occurrences 300 jobtracker namenode 200 100 0 6420 time(s)
  • 21. Process Level Profiling about Slaves 192.168.1.11 200 number  of  occurrences rpctrace 150 tasktrackertrace datanodetrace 100 50 0 6420 time(s) Map phase Reduce phase 192.168.1.12 192.168.1.13 There are free resouces. 200 150 150 113 should do 100 75 speculative executions. 50 38 192.168.1.14 192.168.1.15 200 200 150 100 150 100 Imbalance of RPC 50 50
  • 22. Conclusion summary • Proposal - the lightweight method-level monitor using AspectJ - the profiling method based on frequency of instruction • Provide effective information for development • Help developers to understand system behaviors and specifications future work • Create an algorithm for determining the degree of deviation using a profiling results indicate the possibility of failure.
  • 23. Thank you for your kind attention