SherLog: Error Diagnosis by Connecting Clues from
                  Run-time Logs




                Dacong (Tony) Yan
                  April 07, 2010
Introduction



            Scenario - production run failure
                   failure reproduction: reproduce the failed execution trying to figure
                   out what was going on with the program




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   2/13
Introduction



            Scenario - production run failure
                   failure reproduction: reproduce the failed execution trying to figure
                   out what was going on with the program
            Challenges
                   customers’ privacy concerns
                   difficulty in setting up exact same execution environment
                   lack of low-overhead logging mechanism for failure reproduction on
                   multi-processors (why?)




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   2/13
Introduction



            Scenario - production run failure
                   failure reproduction: reproduce the failed execution trying to figure
                   out what was going on with the program
            Challenges
                   customers’ privacy concerns
                   difficulty in setting up exact same execution environment
                   lack of low-overhead logging mechanism for failure reproduction on
                   multi-processors (why?)
            Common Practice in Industry
                   customers send logs to vendors in case of failure
                   vendors analyze logs to find clues to the problem




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   2/13
Introduction



            Scenario - production run failure
                   failure reproduction: reproduce the failed execution trying to figure
                   out what was going on with the program
            Challenges
                   customers’ privacy concerns
                   difficulty in setting up exact same execution environment
                   lack of low-overhead logging mechanism for failure reproduction on
                   multi-processors (why?)
            Common Practice in Industry
                   customers send logs to vendors in case of failure
                   vendors analyze logs to find clues to the problem
            Research Question
                   how to locate root cause of failure by analyzing logs?
                   even without reproduce the failure execution


CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   2/13
Approach




            Idea
                   Ideal Goal: find out what exactly happened in the failure execution,
                   i.e. the exact failure-inducing execution paths




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   3/13
Approach




            Idea
                   Ideal Goal: find out what exactly happened in the failure execution,
                   i.e. the exact failure-inducing execution paths
                   Realistic Goal: identify the Must-Have, May-Have, and
                   Must-Not-Have paths, and the states of variables on the possible
                   paths




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   3/13
Approach




            Idea
                   Ideal Goal: find out what exactly happened in the failure execution,
                   i.e. the exact failure-inducing execution paths
                   Realistic Goal: identify the Must-Have, May-Have, and
                   Must-Not-Have paths, and the states of variables on the possible
                   paths
            Usage Scenario
                   runs the tool to get an interesting path
                   queries or examines values of certain interesting variables along the
                   path
                   repeats the previous step until the root cause is found




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   3/13
Design


     Three main components:
            Log Parsing: locates the source code lines printing the messages
            Path Inference: infers the Must-Paths, May-Paths, and
            Pruned-Paths
            Value Inference: infers the variable values on the paths




CSE 888, Dacong (Tony) Yan     SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   5/13
Design


     Three main components:
            Log Parsing: locates the source code lines printing the messages
            Path Inference: infers the Must-Paths, May-Paths, and
            Pruned-Paths
            Value Inference: infers the variable values on the paths




CSE 888, Dacong (Tony) Yan     SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   5/13
Evaluation
Evaluation


            Methodology
                   manually reproduce and diagnose the failure
                   collect path summaries at runtime
                   compare the result of SherLog with the reproduction
            Terminology
                   useful: SherLog infers a subset of the summarized information
                   complete: SherLog infers all the information necessary for debugging




CSE 888, Dacong (Tony) Yan        SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   7/13
Experimental Results
Overall Results




CSE 888, Dacong (Tony) Yan   SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   9/13
Case Studies



     Three case studies to demonstrate the effectiveness of SherLog:
            Case 1: ln of coreutils 4.5.1
            Case 2: Squid web proxy cache server
            Case 3: CVS Configuration Error




CSE 888, Dacong (Tony) Yan   SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   10/13
Squid Case Study




CSE 888, Dacong (Tony) Yan   SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   11/13
Performance




CSE 888, Dacong (Tony) Yan   SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   12/13
Discussion



            What can we do with the results of SherLog? Can we make these
            successive steps automated as well?
            How much helpful the result of SherLog is for debugging? Or more
            generally, how do we evaluate automated debugging tools?
            How much useful SherLog is when it is not complete?




CSE 888, Dacong (Tony) Yan   SherLog: Error Diagnosis by Connecting Clues from Run-time Logs   13/13

SherLog: Error Diagnosis by Connecting Clues from Run-time Logs

  • 1.
    SherLog: Error Diagnosisby Connecting Clues from Run-time Logs Dacong (Tony) Yan April 07, 2010
  • 2.
    Introduction Scenario - production run failure failure reproduction: reproduce the failed execution trying to figure out what was going on with the program CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
  • 3.
    Introduction Scenario - production run failure failure reproduction: reproduce the failed execution trying to figure out what was going on with the program Challenges customers’ privacy concerns difficulty in setting up exact same execution environment lack of low-overhead logging mechanism for failure reproduction on multi-processors (why?) CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
  • 4.
    Introduction Scenario - production run failure failure reproduction: reproduce the failed execution trying to figure out what was going on with the program Challenges customers’ privacy concerns difficulty in setting up exact same execution environment lack of low-overhead logging mechanism for failure reproduction on multi-processors (why?) Common Practice in Industry customers send logs to vendors in case of failure vendors analyze logs to find clues to the problem CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
  • 5.
    Introduction Scenario - production run failure failure reproduction: reproduce the failed execution trying to figure out what was going on with the program Challenges customers’ privacy concerns difficulty in setting up exact same execution environment lack of low-overhead logging mechanism for failure reproduction on multi-processors (why?) Common Practice in Industry customers send logs to vendors in case of failure vendors analyze logs to find clues to the problem Research Question how to locate root cause of failure by analyzing logs? even without reproduce the failure execution CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 2/13
  • 6.
    Approach Idea Ideal Goal: find out what exactly happened in the failure execution, i.e. the exact failure-inducing execution paths CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13
  • 7.
    Approach Idea Ideal Goal: find out what exactly happened in the failure execution, i.e. the exact failure-inducing execution paths Realistic Goal: identify the Must-Have, May-Have, and Must-Not-Have paths, and the states of variables on the possible paths CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13
  • 8.
    Approach Idea Ideal Goal: find out what exactly happened in the failure execution, i.e. the exact failure-inducing execution paths Realistic Goal: identify the Must-Have, May-Have, and Must-Not-Have paths, and the states of variables on the possible paths Usage Scenario runs the tool to get an interesting path queries or examines values of certain interesting variables along the path repeats the previous step until the root cause is found CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 3/13
  • 10.
    Design Three main components: Log Parsing: locates the source code lines printing the messages Path Inference: infers the Must-Paths, May-Paths, and Pruned-Paths Value Inference: infers the variable values on the paths CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 5/13
  • 11.
    Design Three main components: Log Parsing: locates the source code lines printing the messages Path Inference: infers the Must-Paths, May-Paths, and Pruned-Paths Value Inference: infers the variable values on the paths CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 5/13
  • 12.
  • 13.
    Evaluation Methodology manually reproduce and diagnose the failure collect path summaries at runtime compare the result of SherLog with the reproduction Terminology useful: SherLog infers a subset of the summarized information complete: SherLog infers all the information necessary for debugging CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 7/13
  • 14.
  • 15.
    Overall Results CSE 888,Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 9/13
  • 16.
    Case Studies Three case studies to demonstrate the effectiveness of SherLog: Case 1: ln of coreutils 4.5.1 Case 2: Squid web proxy cache server Case 3: CVS Configuration Error CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 10/13
  • 17.
    Squid Case Study CSE888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 11/13
  • 18.
    Performance CSE 888, Dacong(Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 12/13
  • 19.
    Discussion What can we do with the results of SherLog? Can we make these successive steps automated as well? How much helpful the result of SherLog is for debugging? Or more generally, how do we evaluate automated debugging tools? How much useful SherLog is when it is not complete? CSE 888, Dacong (Tony) Yan SherLog: Error Diagnosis by Connecting Clues from Run-time Logs 13/13