SlideShare a Scribd company logo
1 of 47
Download to read offline
@_jon_bell_ 5/23/13
Chronicler: Lightweight Recording to
Reproduce Field Failures
Jonathan Bell, Nikhil Sarda and Gail Kaiser
Columbia University, NewYork, NY USA
@_jon_bell_ 5/23/13
What happens when
software crashes?
@_jon_bell_ 5/23/13
@_jon_bell_ 5/23/13
@_jon_bell_ 5/23/13
@_jon_bell_ 5/23/13
Stack Traces Aren’t
Enough
Method 1
StackTrace
Method 1
@_jon_bell_ 5/23/13
Stack Traces Aren’t
Enough
Method 1 Method 2
StackTraceWrites
Method 1
Method 2
@_jon_bell_ 5/23/13
Stack Traces Aren’t
Enough
Method 1 Method 2
StackTraceWrites
Method 1
@_jon_bell_ 5/23/13
Stack Traces Aren’t
Enough
Method 1 Method 2 Method 3
StackTraceWrites
Method 1
Method 3
@_jon_bell_ 5/23/13
Stack Traces Aren’t
Enough
Method 1 Method 2 Method 3
StackTraceWrites Reads
*Crashes*
Method 1
Method 3
@_jon_bell_ 5/23/13
Potential Solutions
• Frequent Snapshots [Artzi et al]
• Guided symbolic execution [Cheung et al],
[Crameri et al], [Jin and Orso], [Zamfir and Candea]
@_jon_bell_ 5/23/13
Recording External
Inputs
[Geels et al], [Clause and Orso], [Mickens et al], this work
@_jon_bell_ 5/23/13
Other
applications on
the Network
Files on the
user's disk
Inputs from the user
(e.g. keyboard)
Our
application
Recording External Inputs
@_jon_bell_ 5/23/13
Records all
external interaction
Recording External Inputs
Other
applications on
the Network
Files on the
user's disk
Inputs from the user
(e.g. keyboard)
Our
application
Recording System
@_jon_bell_ 5/23/13
Record and Replay:
Workflow
R+R System
Application
Instrumented for log
Instrumented for
replay
R+R system generates
test case
Bug successfully
reproduced in the lab
Used in the
field
Crashes
Crashes
@_jon_bell_ 5/23/13
Recording Inputs
• Previous work looks at logging system calls
• e.g. read, write, open
• Can lead to very high overhead inVM-
based languages, which perform many such
operations just to setup theVM!
@_jon_bell_ 5/23/13
Our insight:
API Level Logging
@_jon_bell_ 5/23/13
Reading a File: Log
System Calls
public class ReadFile {
public static void main(String[] args) throws
FileNotFoundException {
String text = new Scanner(new
File("testFile")).useDelimiter("A").next();
}
}
684 System Calls
$ java ReadFile
@_jon_bell_ 5/23/13
Reading a File:API Logging
public class ReadFile {
public static void main(String[] args) throws
FileNotFoundException {
String text = new Scanner(new
File("testFile")).useDelimiter("A").next();
}
}
1 “External” API call
$ java ReadFile
@_jon_bell_ 5/23/13
Language VM (.NET CLR, JVM, etc)
Outside world (sources of
external input)
Normal APIApplication
Chronicler
Language API
External-input API
API Level Logging:The
Challenge
@_jon_bell_ 5/23/13
Language VM (.NET CLR, JVM, etc)
Outside world (sources of
external input)
Normal APIApplication
Chronicler
Language API
External-input API
API Level Logging:The
Challenge
Must determine reliably
@_jon_bell_ 5/23/13
Identifying External Inputs
Outside world (sources of
external input)
Language VM (.NET CLR, JVM, etc)
"Native" interface
@_jon_bell_ 5/23/13
External Inputs in Java
Outside world (sources of
external input)
Native Methods in API
Language VM (.NET CLR, JVM, etc)
Language API
@_jon_bell_ 5/23/13
External Inputs in Java
Outside world (sources of
external input)
Native Methods in API
Language VM (.NET CLR, JVM, etc)
Language API
API Calling Native Methods
@_jon_bell_ 5/23/13
Problem:A lot is done
outside of theVM
@_jon_bell_ 5/23/13
Log explosion
public static native void arraycopy ...
Not all native methods return external input
1984 direct callers
@_jon_bell_ 5/23/13
Log explosion:
System.arraycopy
125,714 indirect callers
public String concat(String str) {
...
getChars(0, count, buf, 0);
...
}
void getChars(char dst[], int dstBegin) {
System.arraycopy(...);
}
@_jon_bell_ 5/23/13
Managing Log Explosion
Outside world
(external input)
Native methods in API receiving external input
Language VM (.NET CLR, JVM, etc)
Language API
API calling external input methods
Outside world
(not input-
providing)
@_jon_bell_ 5/23/13
Managing Log Explosion
• Stop list: approx 70 native methods that do
NOT gather external input
• Also includes StrictMath, some image
processing
• Not necessary to be complete in the
stop list
@_jon_bell_ 5/23/13
Managing Log Explosion
• Stop list: approx 70 native methods that do
NOT gather external input
• Also includes StrictMath, some image
processing
• Not necessary to be complete in the
stop list
• Result: decrease log sites by ~25%
@_jon_bell_ 5/23/13
Implementation
• Targets Java
• Bytecode transformation (no source!)
• Logs all external inputs from the API
• Plenty of clever tricks: details in the paper
(fast logging, callbacks, finalizers, and more!)
@_jon_bell_ 5/23/13
Evaluation
@_jon_bell_ 5/23/13
Functionality Evaluation
Find bug in bug
tracker
Instrument with
Chronicler
Manually
Reproduce Bug
Chronicler
Reproduces it
Again
@_jon_bell_ 5/23/13
Functionality Evaluation
• Attempted to reproduce 11 real bugs in:
• Jetty
• Apache Commons-Math
• Apache Commons-Lang
• Groovy
• Succeeded in all cases
@_jon_bell_ 5/23/13
0 5000 10000 15000 20000 25000 30000 35000 40000
avrora
batik
eclipse
fop
h2
jython
luindex
lusearch
pmd
sunflow
tomcat
tradebeans
tradesoap
xalan
Average benchmark time (ms)
Baseline
Chronicler
Performance: DaCapo
<5% overhead (6)
DaCapo Absolute Performance
@_jon_bell_ 5/23/13
0 5000 10000 15000 20000 25000 30000 35000 40000
avrora
batik
eclipse
fop
h2
jython
luindex
lusearch
pmd
sunflow
tomcat
tradebeans
tradesoap
xalan
Average benchmark time (ms)
Baseline
Chronicler
Performance: DaCapo
<10% overhead (4)
DaCapo Absolute Performance
@_jon_bell_ 5/23/13
0 5000 10000 15000 20000 25000 30000 35000 40000
avrora
batik
eclipse
fop
h2
jython
luindex
lusearch
pmd
sunflow
tomcat
tradebeans
tradesoap
xalan
Average benchmark time (ms)
Baseline
Chronicler
Performance: DaCapo
<30% overhead (2)
DaCapo Absolute Performance
@_jon_bell_ 5/23/13
0 5000 10000 15000 20000 25000 30000 35000 40000
avrora
batik
eclipse
fop
h2
jython
luindex
lusearch
pmd
sunflow
tomcat
tradebeans
tradesoap
xalan
Average benchmark time (ms)
Baseline
Chronicler
Performance: DaCapo
<40% overhead (2; nearly worst case)
DaCapo Absolute Performance
@_jon_bell_ 5/23/13
0 5000 10000 15000 20000 25000 30000 35000 40000
avrora
batik
eclipse
fop
h2
jython
luindex
lusearch
pmd
sunflow
tomcat
tradebeans
tradesoap
xalan
Average benchmark time (ms)
Baseline
Chronicler
Performance: DaCapo
DaCapo Absolute Performance
@_jon_bell_ 5/23/13
0 5000 10000 15000 20000 25000 30000 35000 40000
avrora
batik
eclipse
fop
h2
jython
luindex
lusearch
pmd
sunflow
tomcat
tradebeans
tradesoap
xalan
Average benchmark time (ms)
Baseline
Chronicler
Performance: DaCapo
DaCapo Absolute Performance
@_jon_bell_ 5/23/13
Performance: Best Case
0
500
1000
1500
2000
2500
3000
3500
Composite
FFT SOR Monte Carlo
MatMult
LU
Performance(Megaflops)
Baseline Chronicler
0.23%
1.15%
0.29%
0.19%
0.27%
0%
SciMark Absolute Performance
@_jon_bell_ 5/23/13
Performance:Worst Case
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2 4 8 16 32 64 128 512 1024 2048 3072
Overhead
Input File Size (MB)
I/O Microbenchmark Relative Performance
Logging overhead masked
by JVM setup/teardown
Worst case performance
@_jon_bell_ 5/23/13
Limitations and Future
Work
@_jon_bell_ 5/23/13
Limitations and Future
Work
• AssumesVM correctness
• Thread Interleavings
• Employ developer-side race detectors [e.g. Naik]
• Integrate with existing thread access loggers [e.g. Huang]
• Privacy
• Symbolic execution on the client to generate new inputs
[e.g. Castro, Clause]
@_jon_bell_ 5/23/13
Chronicler: Lightweight Recording to
Reproduce Field Failures
Jonathan Bell, Nikhil Sarda and Gail Kaiser
Columbia University, NewYork, NY USA
https://github.com/Programming-Systems-Lab/chroniclerj
jbell@cs.columbia.edu

More Related Content

Similar to Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at ICSE 2013)

SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsDacong (Tony) Yan
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)OpenBlend society
 
Building resilient services in go
Building resilient services in goBuilding resilient services in go
Building resilient services in goJaehue Jang
 
Effective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareSangmin Park
 
Have you been stalking your servers?
Have you been stalking your servers?Have you been stalking your servers?
Have you been stalking your servers?morpht
 
Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)
Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)
Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)James Titcumb
 
Java forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.key
Java forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.keyJava forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.key
Java forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.keyMartin Sjöblom
 
44CON London 2015 - Going AUTH the Rails on a Crazy Train
44CON London 2015 - Going AUTH the Rails on a Crazy Train44CON London 2015 - Going AUTH the Rails on a Crazy Train
44CON London 2015 - Going AUTH the Rails on a Crazy Train44CON
 
Searching for Multi-Fault Programs in Defects4J
Searching for Multi-Fault Programs in Defects4JSearching for Multi-Fault Programs in Defects4J
Searching for Multi-Fault Programs in Defects4JGabin An
 
Code lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf LinzCode lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf LinzIvan Krylov
 
EuroMPI 2019: Multilevel Checkpointing for MPI Applications
EuroMPI 2019: Multilevel Checkpointing for MPI ApplicationsEuroMPI 2019: Multilevel Checkpointing for MPI Applications
EuroMPI 2019: Multilevel Checkpointing for MPI ApplicationsLEGATO project
 
Java 7 - New Features - by Mihail Stoynov and Svetlin Nakov
Java 7 - New Features - by Mihail Stoynov and Svetlin NakovJava 7 - New Features - by Mihail Stoynov and Svetlin Nakov
Java 7 - New Features - by Mihail Stoynov and Svetlin NakovSvetlin Nakov
 
Securing AEM webapps by hacking them
Securing AEM webapps by hacking themSecuring AEM webapps by hacking them
Securing AEM webapps by hacking themMikhail Egorov
 
ARCS Presentation 2008
ARCS Presentation 2008ARCS Presentation 2008
ARCS Presentation 2008Spondon Saha
 
High Performance Managed Languages
High Performance Managed LanguagesHigh Performance Managed Languages
High Performance Managed LanguagesJ On The Beach
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone CivettaCocoaHeads France
 

Similar to Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at ICSE 2013) (20)

SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time LogsSherLog: Error Diagnosis by Connecting Clues from Run-time Logs
SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
 
Building resilient services in go
Building resilient services in goBuilding resilient services in go
Building resilient services in go
 
Effective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent SoftwareEffective Fault-Localization Techniques for Concurrent Software
Effective Fault-Localization Techniques for Concurrent Software
 
Have you been stalking your servers?
Have you been stalking your servers?Have you been stalking your servers?
Have you been stalking your servers?
 
Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)
Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)
Low Latency Logging with RabbitMQ (Brno PHP, CZ - 20th Sep 2014)
 
Java forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.key
Java forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.keyJava forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.key
Java forum Gothenburg Chronon - 2012-02-07 - Martin Sjöblom.key
 
Reddirt2011
Reddirt2011Reddirt2011
Reddirt2011
 
44CON London 2015 - Going AUTH the Rails on a Crazy Train
44CON London 2015 - Going AUTH the Rails on a Crazy Train44CON London 2015 - Going AUTH the Rails on a Crazy Train
44CON London 2015 - Going AUTH the Rails on a Crazy Train
 
Searching for Multi-Fault Programs in Defects4J
Searching for Multi-Fault Programs in Defects4JSearching for Multi-Fault Programs in Defects4J
Searching for Multi-Fault Programs in Defects4J
 
Code lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf LinzCode lifecycle in the jvm - TopConf Linz
Code lifecycle in the jvm - TopConf Linz
 
EuroMPI 2019: Multilevel Checkpointing for MPI Applications
EuroMPI 2019: Multilevel Checkpointing for MPI ApplicationsEuroMPI 2019: Multilevel Checkpointing for MPI Applications
EuroMPI 2019: Multilevel Checkpointing for MPI Applications
 
Debugging TV Frame 0x13
Debugging TV Frame 0x13Debugging TV Frame 0x13
Debugging TV Frame 0x13
 
Java 7 - New Features - by Mihail Stoynov and Svetlin Nakov
Java 7 - New Features - by Mihail Stoynov and Svetlin NakovJava 7 - New Features - by Mihail Stoynov and Svetlin Nakov
Java 7 - New Features - by Mihail Stoynov and Svetlin Nakov
 
Securing AEM webapps by hacking them
Securing AEM webapps by hacking themSecuring AEM webapps by hacking them
Securing AEM webapps by hacking them
 
ARCS Presentation 2008
ARCS Presentation 2008ARCS Presentation 2008
ARCS Presentation 2008
 
High Performance Managed Languages
High Performance Managed LanguagesHigh Performance Managed Languages
High Performance Managed Languages
 
Logging system of Android
Logging system of AndroidLogging system of Android
Logging system of Android
 
Code quality par Simone Civetta
Code quality par Simone CivettaCode quality par Simone Civetta
Code quality par Simone Civetta
 

More from jon_bell

Replay without Recording of Production Bugs for Service Oriented Applications
Replay without Recording of Production Bugs for Service Oriented ApplicationsReplay without Recording of Production Bugs for Service Oriented Applications
Replay without Recording of Production Bugs for Service Oriented Applicationsjon_bell
 
A Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage EvolutionA Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage Evolutionjon_bell
 
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)jon_bell
 
Efficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test AccelerationEfficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test Accelerationjon_bell
 
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMsPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMsjon_bell
 
ICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVMICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVMjon_bell
 
Unit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing TimeUnit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing Timejon_bell
 
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of WarcraftA Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraftjon_bell
 

More from jon_bell (8)

Replay without Recording of Production Bugs for Service Oriented Applications
Replay without Recording of Production Bugs for Service Oriented ApplicationsReplay without Recording of Production Bugs for Service Oriented Applications
Replay without Recording of Production Bugs for Service Oriented Applications
 
A Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage EvolutionA Large-Scale Study of Test Coverage Evolution
A Large-Scale Study of Test Coverage Evolution
 
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
CROCHET - Checkpoint Rollback in JVM (ECOOP 2018)
 
Efficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test AccelerationEfficient Dependency Detection for Safe Java Test Acceleration
Efficient Dependency Detection for Safe Java Test Acceleration
 
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMsPhosphor: Illuminating Dynamic Data Flow in Commodity JVMs
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
 
ICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVMICSE 2014: Unit Test Virtualization with VMVM
ICSE 2014: Unit Test Virtualization with VMVM
 
Unit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing TimeUnit Test Virtualization: Optimizing Testing Time
Unit Test Virtualization: Optimizing Testing Time
 
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of WarcraftA Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
A Large-Scale, Longitudinal Study of User Profiles in World of Warcraft
 

Chronicler: Lightweight Recording to Reproduce Field Failures (Presented at ICSE 2013)