SlideShare a Scribd company logo
1 of 25
An Exploratory Study of the Evolution of
Communicated Information
about the Execution of Large Software Systems
Weiyi Shang
Zhen Ming Jiang
Bram Adams
Ahmed E. Hassan
Michael W. Godfrey
University of WaterlooQueen’s University
Mohamed Nasser
Parminder Flora
Research In Motion (RIM)
2
What run-time actions cause the
failure?
Automated profiling & instrumentation
Detail No domain knowledgeLarge scale
3
Communicated information (CI)
Execution
Logs
System
Alerts
Code
Comments
/*
…
*/
StaticDynamic
4
Field experienceDeveloper experience
CI forms basis of Ecosystem of Log Processing
Apps
Workload recoveryAnomaly
detection
Capacity
planning System
monitoring
Performance
analysis
5
Failure
diagnosis
How to keep Log Processing Apps in sync with
CI?
Release 1 Release 2 Release 3
6
Our Study Dimensions
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
How does CI
evolve over
time?
7
Case Study Setup
Data
Collection
Log
Abstraction
System
Deployment
time=1, Trying to launch, TaskID=01A
time=$t, Trying to launch, TaskID=$id
Enterprise Application (EA)
8
Log
Events
Our Study Dimensions
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
How does CI
evolve over
time?
9
CI keeps on growing over time
0
20
40
60
80
100
120
140
160
180
0.14.0
0.15.0
0.16.0
0.17.0
0.18.0
0.19.0
0.20.0
0.20.1
0.20.2
0.21.0
releases
#
execution
events
10
…even when system size decreases
# K SLOC # Execution log events
0.19.0 293 113
0.20.0 250 121
11
CI is impacted by re-engineering
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
0.15.0 0.16.0 0.17.0 0.18.0 0.19.0 0.20.0 0.20.1 0.20.2 0.21.0
Unchanged CI
Large amounts of implementation changes
12
How does CI
evolve over
time?
13
Growing &
changing
Document &
track
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
14
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Hadoop mapred Reduce task fetch n bytes
Hadoop MapReduce task Reduce fetch n bytes
15
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
ShuffleRamManager memory limit n MaxSingleShuffleLimit m
ShuffleRamManager memory limit n MaxSingleShuffleLimit
m mergeThreshold Q
16
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Adding task to tasktracker
Adding Map Task to
tasktracker
Adding Reduce Task to
tasktracker
17
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Avoidable
18
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Recoverable
19
Six types of modification exist
Rephrasing Redundant
information
Adding
information
Deleting
information
Diverging Merging
Unavoidable
20
Most modifications can be avoided
9.86%
61.97%
14.08%
7.04% 7.04% 2.82%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
redundant
info
rephrasing adding info deleting
info
diverging merging
avoidable recoverable unavoidable
21
How does CI
evolve over
time?
22
Growing &
changing
Document &
track
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
6 types
Are mostly
avoidable
Short-lived CI contains implementation details
Hadoop saves output to a machine.
Hadoop assigns a reduce task to a machine.
Map task updates its progress.
Hadoop reads from a local file.
Hadoop Attempt saves its output and reports to
the task tracker.
23
Node name
Local path
Using ipc
Output file name
How does CI
evolve over
time?
24
Growing &
changing
Document &
track
What types of
modifications
happen to CI?
What information is
conveyed by the
short-lived CI?
Quantity Type Content
6 types
Are mostly
avoidable
Implementation-
level details
Fragile
Maintenance
effort
25

More Related Content

Viewers also liked (13)

Q1_NorthAmericanOfficeReport
Q1_NorthAmericanOfficeReportQ1_NorthAmericanOfficeReport
Q1_NorthAmericanOfficeReport
 
I beacon mobile_tea
I beacon mobile_teaI beacon mobile_tea
I beacon mobile_tea
 
Mark kalpakis
Mark kalpakis Mark kalpakis
Mark kalpakis
 
Hygiene and disease prevention in Allah Dad Khan
Hygiene and disease prevention in  Allah Dad KhanHygiene and disease prevention in  Allah Dad Khan
Hygiene and disease prevention in Allah Dad Khan
 
Top 8 liaison nurse resume samples
Top 8 liaison nurse resume samplesTop 8 liaison nurse resume samples
Top 8 liaison nurse resume samples
 
Correct PH Determinants
Correct PH DeterminantsCorrect PH Determinants
Correct PH Determinants
 
Top 8 property adjuster resume samples
Top 8 property adjuster resume samplesTop 8 property adjuster resume samples
Top 8 property adjuster resume samples
 
STEPHEN D RESUME
STEPHEN D RESUMESTEPHEN D RESUME
STEPHEN D RESUME
 
P1141218183
P1141218183P1141218183
P1141218183
 
CV
CVCV
CV
 
Character steps to improve character
Character   steps to improve characterCharacter   steps to improve character
Character steps to improve character
 
PATTANAIK
PATTANAIKPATTANAIK
PATTANAIK
 
Publication
PublicationPublication
Publication
 

Similar to Ian wcre2011

Icse2013 shang
Icse2013 shangIcse2013 shang
Icse2013 shang
SAIL_QU
 
MADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event StoreMADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event Store
Tilmann Rabl
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 

Similar to Ian wcre2011 (20)

Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Icse2013 shang
Icse2013 shangIcse2013 shang
Icse2013 shang
 
2015-09-16 IoT in Oil and Gas Conference
2015-09-16 IoT in Oil and Gas Conference2015-09-16 IoT in Oil and Gas Conference
2015-09-16 IoT in Oil and Gas Conference
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York CityThe Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
The Connected Data Imperative: Why Graphs? at Neo4j GraphDay New York City
 
Streaming HYpothesis REasoning
Streaming HYpothesis REasoningStreaming HYpothesis REasoning
Streaming HYpothesis REasoning
 
MADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event StoreMADES - A Multi-Layered, Adaptive, Distributed Event Store
MADES - A Multi-Layered, Adaptive, Distributed Event Store
 
Bhawani prasad data integration-ppt
Bhawani prasad data integration-pptBhawani prasad data integration-ppt
Bhawani prasad data integration-ppt
 
Data integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcuttaData integration ppt-bhawani nandan prasad - iim calcutta
Data integration ppt-bhawani nandan prasad - iim calcutta
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
The future Proof Financial: Fintech
The future Proof Financial: FintechThe future Proof Financial: Fintech
The future Proof Financial: Fintech
 
Systems analysis and design (abe)
Systems analysis and design (abe)Systems analysis and design (abe)
Systems analysis and design (abe)
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Ian wcre2011

  • 1. An Exploratory Study of the Evolution of Communicated Information about the Execution of Large Software Systems Weiyi Shang Zhen Ming Jiang Bram Adams Ahmed E. Hassan Michael W. Godfrey University of WaterlooQueen’s University Mohamed Nasser Parminder Flora Research In Motion (RIM)
  • 2. 2 What run-time actions cause the failure?
  • 3. Automated profiling & instrumentation Detail No domain knowledgeLarge scale 3
  • 5. CI forms basis of Ecosystem of Log Processing Apps Workload recoveryAnomaly detection Capacity planning System monitoring Performance analysis 5 Failure diagnosis
  • 6. How to keep Log Processing Apps in sync with CI? Release 1 Release 2 Release 3 6
  • 7. Our Study Dimensions What types of modifications happen to CI? What information is conveyed by the short-lived CI? Quantity Type Content How does CI evolve over time? 7
  • 8. Case Study Setup Data Collection Log Abstraction System Deployment time=1, Trying to launch, TaskID=01A time=$t, Trying to launch, TaskID=$id Enterprise Application (EA) 8 Log Events
  • 9. Our Study Dimensions What types of modifications happen to CI? What information is conveyed by the short-lived CI? Quantity Type Content How does CI evolve over time? 9
  • 10. CI keeps on growing over time 0 20 40 60 80 100 120 140 160 180 0.14.0 0.15.0 0.16.0 0.17.0 0.18.0 0.19.0 0.20.0 0.20.1 0.20.2 0.21.0 releases # execution events 10
  • 11. …even when system size decreases # K SLOC # Execution log events 0.19.0 293 113 0.20.0 250 121 11
  • 12. CI is impacted by re-engineering 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0.15.0 0.16.0 0.17.0 0.18.0 0.19.0 0.20.0 0.20.1 0.20.2 0.21.0 Unchanged CI Large amounts of implementation changes 12
  • 13. How does CI evolve over time? 13 Growing & changing Document & track What types of modifications happen to CI? What information is conveyed by the short-lived CI? Quantity Type Content
  • 14. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging 14
  • 15. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging Hadoop mapred Reduce task fetch n bytes Hadoop MapReduce task Reduce fetch n bytes 15
  • 16. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging ShuffleRamManager memory limit n MaxSingleShuffleLimit m ShuffleRamManager memory limit n MaxSingleShuffleLimit m mergeThreshold Q 16
  • 17. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging Adding task to tasktracker Adding Map Task to tasktracker Adding Reduce Task to tasktracker 17
  • 18. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging Avoidable 18
  • 19. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging Recoverable 19
  • 20. Six types of modification exist Rephrasing Redundant information Adding information Deleting information Diverging Merging Unavoidable 20
  • 21. Most modifications can be avoided 9.86% 61.97% 14.08% 7.04% 7.04% 2.82% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% redundant info rephrasing adding info deleting info diverging merging avoidable recoverable unavoidable 21
  • 22. How does CI evolve over time? 22 Growing & changing Document & track What types of modifications happen to CI? What information is conveyed by the short-lived CI? Quantity Type Content 6 types Are mostly avoidable
  • 23. Short-lived CI contains implementation details Hadoop saves output to a machine. Hadoop assigns a reduce task to a machine. Map task updates its progress. Hadoop reads from a local file. Hadoop Attempt saves its output and reports to the task tracker. 23 Node name Local path Using ipc Output file name
  • 24. How does CI evolve over time? 24 Growing & changing Document & track What types of modifications happen to CI? What information is conveyed by the short-lived CI? Quantity Type Content 6 types Are mostly avoidable Implementation- level details Fragile Maintenance effort
  • 25. 25

Editor's Notes

  1. neon
  2. Apps make use of these logs
  3. Font bigger
  4. Fixing bugs then gone