TMPA-2017: Unity Application Testing Automation with Appium and Image Recogni...Iosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Unity Application Testing Automation with Appium and Image Recognition
Evgeny Pyshkin, Maxim Mozgovoy, The University of Aizu
For video follow the link: https://youtu.be/kfPGUShSUy8
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
A brief introduction to Process synchronization in Operating Systems with classical examples and solutions using semaphores. A good starting tutorial for beginners.
TMPA-2017: Unity Application Testing Automation with Appium and Image Recogni...Iosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Unity Application Testing Automation with Appium and Image Recognition
Evgeny Pyshkin, Maxim Mozgovoy, The University of Aizu
For video follow the link: https://youtu.be/kfPGUShSUy8
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
A brief introduction to Process synchronization in Operating Systems with classical examples and solutions using semaphores. A good starting tutorial for beginners.
TMPA-2017: Distributed Analysis of the BMC Kind: Making It Fit the Tornado Su...Iosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Distributed Analysis of the BMC Kind: Making It Fit the Tornado Supercomputer
Azat Abdullin, Daniil Stepanov,St.Petersburg Polytechnic University
Marat Akhin, JetBrains Research
For video follow the link: https://youtu.be/CPlPpwFtN7k
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
TMPA-2017: Live testing distributed system fault tolerance with fault injecti...Iosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Live testing distributed system fault tolerance with fault injection techniques
Alexey Vasyukov (Inventa), Vadim Zherder (MOEX)
For video follow the link: https://youtu.be/mGLRH2gqZwc
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
Building resilient scheduling in distributed systems with SpringMarek Jeszka
It is common to have jobs running periodically, especially in asynchronous and distributed systems. If the service is scaled horizontally (i.e. there are multiple instances of the same service), you often only want a single node to handle the task.
In this session I will demonstrate how to manually setup Spring to have custom logic in scheduling configuration and perform recurring tasks only on a single node. This requires keeping notation of the leader and persisting the selection.
The key takeaway of this session is how to implement distributed locking and how simple it is to run Spring application on top of it. In this talk you will learn how to mitigate challenges that arise when you use traditional declarative approach for scheduling and how to switch to a more flexible programmatic approach.
Operating system 23 process synchronizationVaibhav Khanna
Processes can execute concurrently
May be interrupted at any time, partially completing execution
Concurrent access to shared data may result in data inconsistency
Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes
Illustration of the problem:Suppose that we wanted to provide a solution to the consumer-producer problem that fills all the buffers. We can do so by having an integer counter that keeps track of the number of full buffers. Initially, counter is set to 0. It is incremented by the producer after it produces a new buffer and is decremented by the consumer after it consumes a buffer
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
A wealth of information produced by individuals and organizations is expressed in natural language text. Text lacks the explicit structure that is necessary to support rich querying and analysis. Information extraction systems are sophisticated software tools to discover structured information in natural language text. Unfortunately, information extraction is a challenging and time-consuming task.
In this talk, I will first present our proposal to optimize information extraction programs. It consists of a holistic approach that focuses on: (i) optimizing all key aspects of the information extraction process collectively and in a coordinated manner, rather than focusing on individual subtasks in isolation; (ii) accurately predicting the execution time, recall, and precision for each information extraction execution plan; and (iii) using these predictions to choose the best execution plan to execute a given information extraction program.
Then, I will briefly present a principled, learning-based approach for ranking documents according to their potential usefulness for an extraction task. Our online learning-to-rank methods exploit the information collected during extraction, as we process new documents and the fine-grained characteristics of the useful documents are revealed. Then, these methods decide when the ranking model should be updated, hence significantly improving the document ranking quality over time.
This is joint work with Gonçalo Simões, INESC-ID and IST/University of Lisbon, and Pablo Barrio and Luis Gravano from Columbia University, NY.
We report on the advances in this sixth edition of the JUnit tool
competitions. This year the contest introduces new benchmarks to
assess the performance of JUnit testing tools on different types of
real-world software projects. Following on the statistical analyses
from the past contest work, we have extended it with the performance of the combined tool aiming to beat the human-made tests. Overall,
the 6th competition evaluates four automated JUnit testing tools
taking as baseline human written test cases for the selected benchmark
projects. The paper details the modications performed to
the methodology and provides full results of the competition.
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
Presentation held at ICCS 2015 Conference - Reykjavik, Iceland
High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process estimates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.
More information: www.rafaelsilva.com
TMPA-2017: Distributed Analysis of the BMC Kind: Making It Fit the Tornado Su...Iosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Distributed Analysis of the BMC Kind: Making It Fit the Tornado Supercomputer
Azat Abdullin, Daniil Stepanov,St.Petersburg Polytechnic University
Marat Akhin, JetBrains Research
For video follow the link: https://youtu.be/CPlPpwFtN7k
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
TMPA-2017: Live testing distributed system fault tolerance with fault injecti...Iosif Itkin
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Live testing distributed system fault tolerance with fault injection techniques
Alexey Vasyukov (Inventa), Vadim Zherder (MOEX)
For video follow the link: https://youtu.be/mGLRH2gqZwc
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
Building resilient scheduling in distributed systems with SpringMarek Jeszka
It is common to have jobs running periodically, especially in asynchronous and distributed systems. If the service is scaled horizontally (i.e. there are multiple instances of the same service), you often only want a single node to handle the task.
In this session I will demonstrate how to manually setup Spring to have custom logic in scheduling configuration and perform recurring tasks only on a single node. This requires keeping notation of the leader and persisting the selection.
The key takeaway of this session is how to implement distributed locking and how simple it is to run Spring application on top of it. In this talk you will learn how to mitigate challenges that arise when you use traditional declarative approach for scheduling and how to switch to a more flexible programmatic approach.
Operating system 23 process synchronizationVaibhav Khanna
Processes can execute concurrently
May be interrupted at any time, partially completing execution
Concurrent access to shared data may result in data inconsistency
Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes
Illustration of the problem:Suppose that we wanted to provide a solution to the consumer-producer problem that fills all the buffers. We can do so by having an integer counter that keeps track of the number of full buffers. Initially, counter is set to 0. It is incremented by the producer after it produces a new buffer and is decremented by the consumer after it consumes a buffer
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...inside-BigData.com
In this deck from PASC18, Robert Searles from the University of Delaware presents: Abstractions and Directives for Adapting Wavefront Algorithms to Future Architectures.
"Architectures are rapidly evolving, and exascale machines are expected to offer billion-way concurrency. We need to rethink algorithms, languages and programming models among other components in order to migrate large scale applications and explore parallelism on these machines. Although directive-based programming models allow programmers to worry less about programming and more about science, expressing complex parallel patterns in these models can be a daunting task especially when the goal is to match the performance that the hardware platforms can offer. One such pattern is wavefront. This paper extensively studies a wavefront-based miniapplication for Denovo, a production code for nuclear reactor modeling.
We parallelize the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm in the main kernel of Minisweep (the miniapplication) using CUDA, OpenMP and OpenACC. Our OpenACC implementation running on NVIDIA's next-generation Volta GPU boasts an 85.06x speedup over serial code, which is larger than CUDA's 83.72x speedup over the same serial implementation. Our experimental platform includes SummitDev, an ORNL representative architecture of the upcoming Summit supercomputer. Our parallelization effort across platforms also motivated us to define an abstract parallelism model that is architecture independent, with a goal of creating software abstractions that can be used by applications employing the wavefront sweep motif."
Watch the video: https://wp.me/p3RLHQ-iPU
Read the Full Paper: https://doi.org/10.1145/3218176.3218228
and
https://pasc18.pasc-conference.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
A wealth of information produced by individuals and organizations is expressed in natural language text. Text lacks the explicit structure that is necessary to support rich querying and analysis. Information extraction systems are sophisticated software tools to discover structured information in natural language text. Unfortunately, information extraction is a challenging and time-consuming task.
In this talk, I will first present our proposal to optimize information extraction programs. It consists of a holistic approach that focuses on: (i) optimizing all key aspects of the information extraction process collectively and in a coordinated manner, rather than focusing on individual subtasks in isolation; (ii) accurately predicting the execution time, recall, and precision for each information extraction execution plan; and (iii) using these predictions to choose the best execution plan to execute a given information extraction program.
Then, I will briefly present a principled, learning-based approach for ranking documents according to their potential usefulness for an extraction task. Our online learning-to-rank methods exploit the information collected during extraction, as we process new documents and the fine-grained characteristics of the useful documents are revealed. Then, these methods decide when the ranking model should be updated, hence significantly improving the document ranking quality over time.
This is joint work with Gonçalo Simões, INESC-ID and IST/University of Lisbon, and Pablo Barrio and Luis Gravano from Columbia University, NY.
We report on the advances in this sixth edition of the JUnit tool
competitions. This year the contest introduces new benchmarks to
assess the performance of JUnit testing tools on different types of
real-world software projects. Following on the statistical analyses
from the past contest work, we have extended it with the performance of the combined tool aiming to beat the human-made tests. Overall,
the 6th competition evaluates four automated JUnit testing tools
taking as baseline human written test cases for the selected benchmark
projects. The paper details the modications performed to
the methodology and provides full results of the competition.
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
Presentation held at ICCS 2015 Conference - Reykjavik, Iceland
High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process estimates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.
More information: www.rafaelsilva.com
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
In this talk we present lessons learned, good ideas, and thoughts on the future, with an eye toward informing junior researchers about the realities and opportunities of a long-running project. We highlight some notions from the original paper that stood the test of time, some that were not as prescient, and some that became more relevant as industrial practice advanced. We place the work in context, highlighting perceptions from software engineering and evolutionary computing, then and now, of how program repair could possibly work. We discuss the importance of measurable benchmarks and reproducible research in bringing scientists together and advancing the area. We give our thoughts on the role of quality requirements and properties in program repair. From testing to metrics to scalability to human factors to technology transfer, software repair touches many aspects of software engineering, and we hope a behind-the-scenes exploration of some of our struggles and successes may benefit researchers pursuing new projects.
Presentation from HiPC'18 (https://arxiv.org/abs/1805.01177).
Abstract:
Given the cost of HPC clusters, making best use of them is crucial to improve infrastructure ROI. Likewise, reducing failed HPC jobs and related waste in terms of user wait times is crucial to improve HPC user productivity (aka human ROI). While most efforts (e.g.,debugging HPC programs) explore technical aspects to improve ROI of HPC clusters, we hypothesize non-technical (human) aspects are worth exploring to make non-trivial ROI gains; specifically, understanding non-technical aspects and how they contribute to the failure of HPC jobs.
In this regard, we conducted a case study in the context of Beocat cluster at Kansas State University. The purpose of the study was to learn the reasons why users terminate jobs and to quantify wasted computations in such jobs in terms of system utilization and user wait time. The data from the case study helped identify interesting and actionable reasons why users terminate HPC jobs. It also helped confirm that user terminated jobs may be associated with non-trivial amount of wasted computation, which if reduced can help improve the ROI of HPC clusters.
Instrumenting application code is like flossing your teeth. Developers know they ought to be doing it more often. Code instrumentation is an important practice for establishing baseline performance metrics and identifying bottlenecks. Getting the right metrics is core to understanding how much concurrency your application can handle, determining what latency is normal for the application, and indicating when performance is deviating from those norms.
While most developers acknowledge the value of instrumentation, few actually implement it. If Bytecode injection sounds as scary as a root canal, take heart, effective instrumentation doesn't have to be complicated. I've written an open-source instrumentation framework to encourage developers to get the metrics they need to pilot their application safely. We'll examine some strategies for code instrumentation, run some load tests, and make sense of the numbers.
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
Have you ever wondered how to speed up your code in Python? This presentation will show you how to start. I will begin with a guide how to locate performance bottlenecks and then give you some tips how to speed up your code. Also I would like to discuss how to avoid premature optimization as it may be ‘the root of all evil’ (at least according to D. Knuth).
Provenance for Data Munging EnvironmentsPaul Groth
Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic.
Presented at Information Sciences Institute - August 13, 2015
Presentation given on Monday 10 September at the ROOT Users' Workshop 2018 in Sarajevo. Progress update on the Automated Parallel Computation of Collaborative Statistical Models project, a collaboration between the Netherlands eScience Center and Nikhef.
We present an update on our recent efforts to further parallelize RooFit. We have performed extensive benchmarks and identified at least three bottlenecks that will benefit from parallelization. To tackle these and possible future bottlenecks, we designed a parallelization layer that allows us to parallelize existing classes with minimal effort, but with high performance and retaining as much of the existing class's interface as possible. The high-level parallelization model is a task-stealing approach. The implementation is currently based on the bi-directional memory mapped pipe (BidirMMapPipe), but could in the future be replaced by other modes of communication between processes.
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
1. A Longitudinal Study of
Programmers’ Backtracking
YoungSeokYoon
(youngseok@cs.cmu.edu)
Institute for Software Research
Carnegie Mellon University
Brad Myers
(bam@cs.cmu.edu)
Human-Computer Interaction Institute
Carnegie Mellon University
3. What is Backtracking?
• Reverting code fragments to an earlier state
• Examples
– Reverting a parameter to a previously used value
– Removing debugging statements after fixing a bug
– Restoring some deleted code
– …
VL/HCC 2014 3
4. Previous Studies of Backtracking
• Two qualitative studies of backtracking
[Yoon+, CHASE’12]
1. Preliminary lab study (12 programmers)
2. Online survey (48 respondents)
VL/HCC 2014 4
5. Previous Studies of Backtracking
• Observation
– Programmers face challenges when backtracking
• locating the right code to be backtracked
• restoring some deleted code correctly
• reverting inter-related code fragments together
– Programmers backtrack relatively often
(75% answered at least “sometimes”)
VL/HCC 2014 5
6. Limitations of the Previous Studies
• Lab study tasks required participants to backtrack
• Survey results may not correctly reflect the reality
(e.g., programmers might backtrack unconsciously)
• The analyses were mostly qualitative
VL/HCC 2014 6
8. Longitudinal Study of Backtracking
• Two main goals
– Obtain backtracking statistics in order to quantify
the need for backtracking tools
– Identify backtracking situations that are not very
well supported by existing programming tools
VL/HCC 2014 8
9. Data Collection – Fluorite Logger
http://www.cs.cmu.edu/~fluorite/
• Eclipse logger for fine-grained code
editing data [Yoon+, PLATEAU’11]
• Information collected:
– Initial snapshot of each source file
– All edit operations (insert, delete, or replace)
– Timestamps, executed editor commands, etc.
• Distributed to programmers since April 2012
VL/HCC 2014 9
[Image Src:Attribution: Rob Lavinsky, iRocks.com - CC-BY-SA-3.0]
10. Study Participants
Group Description No. of Participants
Coding Time (hours)
[min
/
avg
/
max
/
sum]
The first author (myself) 1 294
/
294
/
294
/
294
Graduate students @ CMU 13
3
/
40
/
216
/
520
Research programmers /
System scientists @ CMU
5
6
/
118
/
446
/
588
Graduate students @ UPitt 2
6
/
29
/
51
/
57
Total 21 people 1,460 hours
VL/HCC 2014 10
11. Analysis Process
• The data was too big for manual inspection
– 1,345,241 coding events in the logs
• Key idea of the automated analysis
– Keep the evolution history of individual AST nodes
of interest throughout the lifetime of the nodes
– Detect backtracking instances within each node
VL/HCC 2014 11
12. Analysis Process Illustrated
VL/HCC 2014 12
package example;
public class Example {
public void printRectangleInfo() {
Rectangle rect = getEnclosingRect();
int value = rect.getHeight();
System.out.println("Value:" + value);
}
public Rectangle getEnclosingRect() {
// return some rectangle here
// actual code omitted
// ...
}
}
[Example Source Code Being Processed]
S1
S2
S3
Change history of S1
[v1] Rectangle rect = getEnclosingRect();
Change history of S2
[v1] int value = rect.getHeight();
[v2] int value = rect.getWidth();
[v3] int value = rect.getSize();
[v4] int value = rect.getHeight();
Change history of S3
[v1] System.out.println(value);
[v2] System.out.println("Value:" + value);
[Memory of the Analyzer]
Backtracking
Detected!
13. Backtracking Instance
A B C A B A
v1 v2 v3 v4 v5 v6
time
getHeight(); getWidth(); getSize(); getHeight(); getWidth(); getHeight();
Three Backtracking Instances:
• v1..v4
• v2..v5
• v4..v6
NOTE: v1..v6 is NOT a
backtracking instance
VL/HCC 2014 13
14. Research Questions
1. How frequently do programmers backtrack in reality?
2. How large are the backtrackings?
3. How exactly do programmers perform backtracking?
Are they backtracking manually?
4. Is there evidence of exploratory programming?
5. Are there backtrackings performed across multiple editing
sessions?
6. Are there selective backtrackings, which cannot be
performed by the undo command?
7. Do programmers backtrack to the same code repeatedly?
VL/HCC 2014 14
15. 1. Frequency of Backtracking
“How frequently do programmers backtrack in reality?”
• A total of 15,095
backtracking instances
detected
• 10.3 instances/hour
on average
VL/HCC 2014 15
0 10 20 30
P20
P19
P18
P17
P16
P15
P14
P13
P12
P11
P10
P9
P8
P7
P6
P5
P4
P3
P2
P1
P0
Backtracking Instances per Hour
3.8 (min)
28.4 (max)
Average: 10.3/h
Rate varied across
participants
(min=3.8/h, max=28.4/h),
but all of them backtracked
frequently
16. 2. Size of Backtracking
“How large are the backtrackings?”
• How did we define the size of a backtracking?
– Measured the edit distance (Levenshtein distance) between the original
version and the other versions
– Took the maximum value as the size of backtracking instance
A B C D E A
v1 v2 v3 v4 v5 v6
time
farthest
version
(max edit distance)
forward changes backward changes
original
version
VL/HCC 2014 16
17. 2. Size of Backtracking
“How large are the backtrackings?”
VL/HCC 2014 17
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
18. 2. Size of Backtracking
“How large are the backtrackings?”
• Method / variable
names
• String literals
• Number literals
VL/HCC 2014 18
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
19. 2. Size of Backtracking
“How large are the backtrackings?”
• Simple parameter
changes
• Reverting
renaming changes
on methods or
variables
VL/HCC 2014 19
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
20. 2. Size of Backtracking
“How large are the backtrackings?”
• Single statement
changes
• Surrounding existing
code (e.g., try-catch)
then reverting
VL/HCC 2014 20
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
21. 2. Size of Backtracking
“How large are the backtrackings?”
• Adding, removing, or
modifying multiple
statements and
then reverting them
altogether
VL/HCC 2014 21
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
22. 2. Size of Backtracking
“How large are the backtrackings?”
• Significant
algorithmic
changes
• Adding / removing /
modifying multiple
methods and then
reverting
VL/HCC 2014 22
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
23. 2. Size of Backtracking
“How large are the backtrackings?”
VL/HCC 2014 23
1304
3752
5269
2026
2259
265 220
0
2000
4000
6000
1 2-9 10
-49
50
-99
100
-499
500
-999
≥1000
Numberof
BacktrackingInstances
Backtracking Size (No. of Characters)
Programmers backtrack at
varying granularities, from
simple name changes to
significant algorithmic changes
24. 3. Backtracking Tactics
“How exactly do programmers perform backtracking?”
How were the backtrackings
performed?
Manually
38% Using
Existing Tools
49%
Others
13%
VL/HCC 2014 24
25. 3. Backtracking Tactics
“How exactly do programmers perform backtracking?”
How were the backtrackings
performed?
Manually
38% Using
Existing Tools
49%
Others
13%
• Undo (37%)
• Paste (6%)
• Redo (3%)
• Content Assist (2%)
• Toggle Comment (1%)
VL/HCC 2014 25
26. 3. Backtracking Tactics
“How exactly do programmers perform backtracking?”
How were the backtrackings
performed?
Manually
38% Using
Existing Tools
49%
Others
13%
• Unidentified (9%)
• Multiple (4%)
VL/HCC 2014 26
27. 3. Backtracking Tactics
“How exactly do programmers perform backtracking?”
How were the backtrackings
performed?
Manually
38% Using
Existing Tools
49%
Others
13%
• Manual Deletion (25%)
• Manual Typing (13%)
VL/HCC 2014 27
38% of the backtracking
instances were NOT
supported by existing tools,
indicating programmers need
better backtracking tools
28. 4. Cross-Run Backtracking
“Is there evidence of exploratory programming?”
• Make some changes à run the
application à revert the code
back to the way it was before
• 20.4% of all instances were cross-
run instances on average.
VL/HCC 2014 28
0% 10% 20% 30% 40% 50%
P20
P19
P18
P17
P16
P15
P14
P13
P12
P11
P10
P9
P8
P7
P6
P5
P4
P3
P2
P1
P0
Cross-Run Backtracking Percentage
Average: 20.4%
This provides support that
programmers do this kind of
exploratory programming.
29. 5. Cross-Session Backtracking
“Are there backtrackings performed across multiple editing sessions?”
96.7%
98.2%
98.8%
99.0%
99.2% 99.3%
96%
97%
98%
99%
100%
Same
Session
≤1 ≤2 ≤3 ≤4 ≤5
CumulativePercentageofAllBIs
Editing Session Distance
VL/HCC 2014 29
A backtracking tool would
work for 97% of the cases
with only the history within
the same editing session.
30. 6. Selective Backtracking
“Are there backtrackings that could not have done by regular undo?”
• Selective backtracking?
– There are edits in the middle
of a backtracking that change
other parts of the same file, that
are not backtracked together
VL/HCC 2014 30
0% 5% 10% 15% 20%
P20
P19
P18
P17
P16
P15
P14
P13
P12
P11
P10
P9
P8
P7
P6
P5
P4
P3
P2
P1
P0
Selective Backtracking Percentage
Average: 9.5%
31. 6. Selective Backtracking
“Are there backtrackings that could not have done by regular undo?”
• Selective backtracking?
– There are edits in the middle
of a backtracking that change
other parts of the same file, that
are not backtracked together
VL/HCC 2014 31
0% 5% 10% 15% 20%
P20
P19
P18
P17
P16
P15
P14
P13
P12
P11
P10
P9
P8
P7
P6
P5
P4
P3
P2
P1
P0
Selective Backtracking Percentage
Average: 9.5%
On average, 9.5% of all
backtracking instances were
selective, supporting that
programmers need better
selective backtracking tools
32. 7. Repeat Count
“Do programmers backtrack to the same code repeatedly?”
85.0%
11.1%
2.7% 0.7% 0.6%
0%
20%
40%
60%
80%
100%
1 2 3 4 ≥5
PercentageofBacktrackedNodes
Repeat Count
VL/HCC 2014 32
Most (85%) of the time,
programmers backtrack once
and then never gets back to
the same state after diverging
from it
34. Limitations of the Analysis
• Only exact and successful backtracking instances were
detected
• Only for Java / Eclipse
• Could not determine the semantic relationships
among the backtracking instances
VL/HCC 2014 34
35. Main Takeaways
• Programmers backtrack quite frequently (10.3/hr)
• 38% of the backtrackings are done purely manually
• 9.5% of the backtrackings are selective, meaning that
they are not supported by conventional undo
• Programmers would benefit from better
backtracking tools!
VL/HCC 2014 35
36. Azurite – Selective Undo Tool
http://www.cs.cmu.edu/~azurite/
• A selective undo plug-in for Eclipse IDE
– can handle the 9.5% of selective backtrackings
• Presented atVL/HCC
– Initial User Interfaces of the Tool:
Yoon, Myers, & Koo,“Visualization of Fine-Grained Code Change History”,
Full Paper atVL/HCC’13
– Tool Demonstration (yesterday):
Yoon & Myers,“A Demonstration of Azurite: Backtracking Tool for Programmers”,
Showpiece atVL/HCC’14
VL/HCC 2014 36
[Image Src:Attribution: cobalt, flickr.com - CC-BY-SA-2.0 ]
37. ThankYou!
• FLUORITE: A logging plug-in for Eclipse
(Full of Low-level User Operations Recorded In The Editor)
available at: http://www.cs.cmu.edu/~fluorite/
• AZURITE: A selective undo plug-in for Eclipse
(Adding Zest to Undoing and Restoring Improves Textual Exploration)
available at: http://www.cs.cmu.edu/~azurite/
• Thanks for funding from:
VL/HCC 2014 37