• Save
Isolating Failure Causes through Test Case Generation
Upcoming SlideShare
Loading in...5
×
 

Isolating Failure Causes through Test Case Generation

on

  • 421 views

Manual debugging is driven by experiments—test runs that narrow down failure causes by systematically confirming or excluding individual factors. The BUGEX approach leverages test case generation to ...

Manual debugging is driven by experiments—test runs that narrow down failure causes by systematically confirming or excluding individual factors. The BUGEX approach leverages test case generation to systematically isolate such causes from a single failing test run—causes such as properties of execution states or branches taken that correlate with the failure. Identifying these causes allows for deriving conclusions as: “The failure occurs whenever the daylight savings time starts at midnight local time.” In our evaluation, a prototype of BUGEX precisely pinpointed important failure explaining facts for six out of seven real-life bugs.

Statistics

Views

Total Views
421
Views on SlideShare
421
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hello and Welcome my talk on How Test Case Generation isolates Failure Causes\nMy name is Jeremias Rößler and this work was conducted together with Gordon Fraser, Andreas Zeller and Alex Orso\nThis is about automated debugging. So lets just start off with an example.\n
  • Joda Time is a Date and Time library and a drop-in replacement for the default date and time api that ships with Java. It contained a bug that we named the Brazilian Date Bug, since it only manifests on a particular date in the Brazilian Time Zone. And what you see here is the test that reproduces the failure:\nYou create the specific date and time zone and than transform the date to an time interval that represents the 24 hours of that date in the Brazilian time zone. And if you execute this code: boom, your program crashes. Now, as a developer, what do you do?\n
  • Well, you know there is this field called statistical debugging that is concerned with providing the developer with a defect location. The first ones in the field were Jones, Harrold and Stasko with their tarantula tool.\n
  • And Liblit and colleagues improved this by focusing on predicate values.\n
  • But since then, much has been done and so our implementation bases on the work from Rui Abreu and colleagues.\n
  • And this is how Statistical Debugging works in a nutshell. You take a number of passing and failing executions, and you correlate them to for instance statements in the code to find the faulty ones. So what you do is, you count how many times a statement was executed that resulted in a success and how many times it was executed that resulted in a failure. The intuition is that the more often a statement was executed during a failing execution, and the less often during a passing one, more probable it is that the statement contains the defect that causes the failure. It has been shown to work well using a benchmark with thousands of test cases.\n
  • And this is how Statistical Debugging works in a nutshell. You take a number of passing and failing executions, and you correlate them to for instance statements in the code to find the faulty ones. So what you do is, you count how many times a statement was executed that resulted in a success and how many times it was executed that resulted in a failure. The intuition is that the more often a statement was executed during a failing execution, and the less often during a passing one, more probable it is that the statement contains the defect that causes the failure. It has been shown to work well using a benchmark with thousands of test cases.\n
  • And this is how Statistical Debugging works in a nutshell. You take a number of passing and failing executions, and you correlate them to for instance statements in the code to find the faulty ones. So what you do is, you count how many times a statement was executed that resulted in a success and how many times it was executed that resulted in a failure. The intuition is that the more often a statement was executed during a failing execution, and the less often during a passing one, more probable it is that the statement contains the defect that causes the failure. It has been shown to work well using a benchmark with thousands of test cases.\n
  • So you crank out statistical debugging and apply it to the problem. And this is what you get: A long list of statements with their corresponding probability to contain the defect. In practice, this means that a developer would manually have to sift through that list to try to spot the defect.\n
  • So you crank out statistical debugging and apply it to the problem. And this is what you get: A long list of statements with their corresponding probability to contain the defect. In practice, this means that a developer would manually have to sift through that list to try to spot the defect.\n
  • So lets actually do the developers work and inspect some of the results. So this is some static initialization code. To me this doesn’t look very suspicious but then there is also no explanation provided what would make that code suspicious. So hmm, I don’t know. Maybe lets look at the second result.\n
  • Now this looks like a place where the fields are assembled. Probably also some kind of initialization. Again this doesn’t look very suspicious. But again, I don’t really know. Maybe let’s have a look at the next result.\n
  • This creates an instance of an UnsupportedOperationField. This sounds as it could have to do something with the failure. But when I look at the code, again this is an initialization of a cache. Hmm, without an explanation, this also doesn’t look very suspicious, but who knows?\n
  • And in this fashion you would continue ... \n
  • ... to go through the code ...\n
  • ... until you loose interest.\n
  • And this is what the probability distribution looks like on a pie chart. As you can see, there is no clearly probable location for the defect and the results quickly vanish into the long tail.\n
  • This is an instance of a problem that already has been described by Parnin and Orso.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • So in summary, I think that statistical debugging actually is a very good idea. But in its current flour it comes with two issues: it produces a long list of results with weak correlations and it provides only the location, without context or explanation why that location would be correlated to the failure.\nAnd this is the contribution of our work: we remedy both issues. We strengthen the correlations, reducing the number of correlated locations in the process. And we not only provide a location but an explanation of the defect to the developer. Now lets focus on the first point for a moment: strengthening the correlations.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • How do we do this? Well the underlying problem simply is that the problematic code is not executed enough to create a strong correlation. So what we do is to simply add more executions. And this indeed makes the correlation stronger.\n
  • But we even can do better than this: we implement a feedback loop that highly increases effectiveness AND efficiency. So how does this feedback loop work?\n
  • But we even can do better than this: we implement a feedback loop that highly increases effectiveness AND efficiency. So how does this feedback loop work?\n
  • Considering the test case I showed you earlier...\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • that gives you this probability distribution. Now lets see what we can do about: So to now consider some of the branches in the program. And taking one of these branches, some these executions failed and some passed. But there is no clear picture, which is why we don’t get a consistent result.\n
  • So now lets focus on one of these branches. Now we try to come up with an additional execution of this branch. And we see that it also fails, so this ...\n
  • ... increases the correlation with the failure. And getting yet another execution that fails ...\n
  • ... further increases that correlation. Now lets have a look at another branch.\n
  • If we generate an additional execution for this one, and it passes.\n
  • This lessens the correlation with the failure and further increases the correlation of the other branch. And getting yet another passing execution ...\n
  • ... further decreases probability and increases it for the other branch.\n
  • So we go on in a fashion like this until ...\n
  • ... eventually we end up with an probability distribution like this. As you can see: now we get a single branch that is very highly correlated with the defect. And by this I mean like you know VERY highly correlated. And this actually is the result that our tool returns to the developer.\n
  • So now let’s have a look at the result that we get: the single branch that is returned. And going to that part in the code we find that it compares two offsets. And the comment above the code explains what is going on: because the offsets differ, we must be near a DST boundary.\nAnd this already explains the failure: the specific date that we create is a daylight cut day and the cutover time is midnight. So indeed we are near a DST boundary. And the underlying problem is that since we are near a DST boundary, the time interval that starts at midnight starts at a nonexisting time, because the day does not start at midnight - which in turn triggers an internal consistency check.\n
  • So I said we would remedy both issues. But the explanation that we got was just coincidentally, because we were lucky to find some comments in the code, right?\n
  • So how else can we provide an explanation?\n
  • Well, as it turns out, statistical debugging is a very general approach, that not only works with statements or branches, but with ANY runtime features.\n
  • So if you change the focus from trying to locate the defect and instead try to explain it, we could for instance try to correlate state predicates. Or if we have a concurrent program and the bug is concurrency driven, than we could correlate the thread schedule. Or if the failure is data driven, we could use definition-usage pairs. In fact: you could use ANY runtime feature that you think will help you understand what is going on. So now for an example, let’s have a look at state predicates, because this is what we happened to also implement.\n
  • So if you change the focus from trying to locate the defect and instead try to explain it, we could for instance try to correlate state predicates. Or if we have a concurrent program and the bug is concurrency driven, than we could correlate the thread schedule. Or if the failure is data driven, we could use definition-usage pairs. In fact: you could use ANY runtime feature that you think will help you understand what is going on. So now for an example, let’s have a look at state predicates, because this is what we happened to also implement.\n
  • So if you change the focus from trying to locate the defect and instead try to explain it, we could for instance try to correlate state predicates. Or if we have a concurrent program and the bug is concurrency driven, than we could correlate the thread schedule. Or if the failure is data driven, we could use definition-usage pairs. In fact: you could use ANY runtime feature that you think will help you understand what is going on. So now for an example, let’s have a look at state predicates, because this is what we happened to also implement.\n
  • So if you change the focus from trying to locate the defect and instead try to explain it, we could for instance try to correlate state predicates. Or if we have a concurrent program and the bug is concurrency driven, than we could correlate the thread schedule. Or if the failure is data driven, we could use definition-usage pairs. In fact: you could use ANY runtime feature that you think will help you understand what is going on. So now for an example, let’s have a look at state predicates, because this is what we happened to also implement.\n
  • So if you change the focus from trying to locate the defect and instead try to explain it, we could for instance try to correlate state predicates. Or if we have a concurrent program and the bug is concurrency driven, than we could correlate the thread schedule. Or if the failure is data driven, we could use definition-usage pairs. In fact: you could use ANY runtime feature that you think will help you understand what is going on. So now for an example, let’s have a look at state predicates, because this is what we happened to also implement.\n
  • So if you change the focus from trying to locate the defect and instead try to explain it, we could for instance try to correlate state predicates. Or if we have a concurrent program and the bug is concurrency driven, than we could correlate the thread schedule. Or if the failure is data driven, we could use definition-usage pairs. In fact: you could use ANY runtime feature that you think will help you understand what is going on. So now for an example, let’s have a look at state predicates, because this is what we happened to also implement.\n
  • So first: what is a state predicate? Well, a state predicate encodes the features of an object. So for every object that we find in the state, we create a binary predicate for every attribute and inspector of this object with every other attribute and inspector and constant.\nSo for instance what you might get is something like: the shape area is greater or equal to zero. Or the width of the square equals the height of the square.\n
  • So first: what is a state predicate? Well, a state predicate encodes the features of an object. So for every object that we find in the state, we create a binary predicate for every attribute and inspector of this object with every other attribute and inspector and constant.\nSo for instance what you might get is something like: the shape area is greater or equal to zero. Or the width of the square equals the height of the square.\n
  • So first: what is a state predicate? Well, a state predicate encodes the features of an object. So for every object that we find in the state, we create a binary predicate for every attribute and inspector of this object with every other attribute and inspector and constant.\nSo for instance what you might get is something like: the shape area is greater or equal to zero. Or the width of the square equals the height of the square.\n
  • And to show you a practical result, lets consider a bug that appeared in the Apache Commons Codec library. So these two lines of code trigger a failure. What you do is to create a small byteArray and simply ask whether this array is valid base64. And executing these, the program crashes.\n
  • Now what does bugex return as a result for predicates? Well first it says that the input array always has length 3. This actually is an artifact from a limitation in the underlying test case generation technique. Actually, in the mean time this should be fixed, but anyway. And the second result we get is that the program crashes whenever octect, which is the name of the parameter of a method, is smaller or equal to zero.\n
  • Now what does bugex return as a result for predicates? Well first it says that the input array always has length 3. This actually is an artifact from a limitation in the underlying test case generation technique. Actually, in the mean time this should be fixed, but anyway. And the second result we get is that the program crashes whenever octect, which is the name of the parameter of a method, is smaller or equal to zero.\n
  • And if we look in the code, we find that the reason why the program fails is that the input is used unchecked to lookup its base64 equivalence from an array. And of course when that input is negative, the program is bound to fail.\n
  • And if we look in the code, we find that the reason why the program fails is that the input is used unchecked to lookup its base64 equivalence from an array. And of course when that input is negative, the program is bound to fail.\n
  • And if we look in the code, we find that the reason why the program fails is that the input is used unchecked to lookup its base64 equivalence from an array. And of course when that input is negative, the program is bound to fail.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right. So first I want to mention that in ALL cases, we were faster than plain statistical debugging.\n
  • However, it turns out that it is just our implementation that keeps on running to really make sure we didn’t concentrate on the wrong runtime features and therefore keeps on trying to generate other executions. However, for 6 out of 7 cases, the final results for branches were already ready after 20 seconds. So if you use a life preview of the results, the developer can practically start to investigate the interesting branches right away. And also, I want to stress the fact, that this technique comes without any additional effort whatsoever. All you need is a failing test, and the approach can start. So you could for instance set it up to run on a build server right after a test of the continuous build crashes. Anyway, as long as computation time is cheaper than human time, this approach makes sense to use, right.\nAnd second interesting aspect is that the results are easy to review because only a small number of facts are highly correlated. So even in the case that not all predicates for instance where meaningful, reviewing only a handful of predicates is fast and easy and far different from receiving a list of all predicates ordered by probability. In four defects, it reports even a single branch. \nAnd I want to stress that in 6 out of 7 times, these results pinpointed the problem and give you the cause of the defect with no additional effort on the side of the developer.\n
  • However, it turns out that it is just our implementation that keeps on running to really make sure we didn’t concentrate on the wrong runtime features and therefore keeps on trying to generate other executions. However, for 6 out of 7 cases, the final results for branches were already ready after 20 seconds. So if you use a life preview of the results, the developer can practically start to investigate the interesting branches right away. And also, I want to stress the fact, that this technique comes without any additional effort whatsoever. All you need is a failing test, and the approach can start. So you could for instance set it up to run on a build server right after a test of the continuous build crashes. Anyway, as long as computation time is cheaper than human time, this approach makes sense to use, right.\nAnd second interesting aspect is that the results are easy to review because only a small number of facts are highly correlated. So even in the case that not all predicates for instance where meaningful, reviewing only a handful of predicates is fast and easy and far different from receiving a list of all predicates ordered by probability. In four defects, it reports even a single branch. \nAnd I want to stress that in 6 out of 7 times, these results pinpointed the problem and give you the cause of the defect with no additional effort on the side of the developer.\n
  • However, it turns out that it is just our implementation that keeps on running to really make sure we didn’t concentrate on the wrong runtime features and therefore keeps on trying to generate other executions. However, for 6 out of 7 cases, the final results for branches were already ready after 20 seconds. So if you use a life preview of the results, the developer can practically start to investigate the interesting branches right away. And also, I want to stress the fact, that this technique comes without any additional effort whatsoever. All you need is a failing test, and the approach can start. So you could for instance set it up to run on a build server right after a test of the continuous build crashes. Anyway, as long as computation time is cheaper than human time, this approach makes sense to use, right.\nAnd second interesting aspect is that the results are easy to review because only a small number of facts are highly correlated. So even in the case that not all predicates for instance where meaningful, reviewing only a handful of predicates is fast and easy and far different from receiving a list of all predicates ordered by probability. In four defects, it reports even a single branch. \nAnd I want to stress that in 6 out of 7 times, these results pinpointed the problem and give you the cause of the defect with no additional effort on the side of the developer.\n
  • However, it turns out that it is just our implementation that keeps on running to really make sure we didn’t concentrate on the wrong runtime features and therefore keeps on trying to generate other executions. However, for 6 out of 7 cases, the final results for branches were already ready after 20 seconds. So if you use a life preview of the results, the developer can practically start to investigate the interesting branches right away. And also, I want to stress the fact, that this technique comes without any additional effort whatsoever. All you need is a failing test, and the approach can start. So you could for instance set it up to run on a build server right after a test of the continuous build crashes. Anyway, as long as computation time is cheaper than human time, this approach makes sense to use, right.\nAnd second interesting aspect is that the results are easy to review because only a small number of facts are highly correlated. So even in the case that not all predicates for instance where meaningful, reviewing only a handful of predicates is fast and easy and far different from receiving a list of all predicates ordered by probability. In four defects, it reports even a single branch. \nAnd I want to stress that in 6 out of 7 times, these results pinpointed the problem and give you the cause of the defect with no additional effort on the side of the developer.\n
  • So these are current results. Now what next? For future work, we plan to implement the approach for more runtime features to help the developer understand other kinds of defects.\nAlso, the main limitations of the approach stem from the underlying test case generation approach. So there we want to both improve the underlying technique and perhaps try to incorporate other such techniques.\nAlso, as you saw, we applied the approach only to seven defects. These results are encouraging, but honestly this is only a start. To get a much deeper insight, we need to do a much larger evaluation with much more examples. And we are currently working on that.\nAnd last but not least, since the goal is to help a developer understand the bug, ultimately we will perform some kind of user study like the ones you saw in yesterdays “empirical studies” session.\n
  • So these are current results. Now what next? For future work, we plan to implement the approach for more runtime features to help the developer understand other kinds of defects.\nAlso, the main limitations of the approach stem from the underlying test case generation approach. So there we want to both improve the underlying technique and perhaps try to incorporate other such techniques.\nAlso, as you saw, we applied the approach only to seven defects. These results are encouraging, but honestly this is only a start. To get a much deeper insight, we need to do a much larger evaluation with much more examples. And we are currently working on that.\nAnd last but not least, since the goal is to help a developer understand the bug, ultimately we will perform some kind of user study like the ones you saw in yesterdays “empirical studies” session.\n
  • So these are current results. Now what next? For future work, we plan to implement the approach for more runtime features to help the developer understand other kinds of defects.\nAlso, the main limitations of the approach stem from the underlying test case generation approach. So there we want to both improve the underlying technique and perhaps try to incorporate other such techniques.\nAlso, as you saw, we applied the approach only to seven defects. These results are encouraging, but honestly this is only a start. To get a much deeper insight, we need to do a much larger evaluation with much more examples. And we are currently working on that.\nAnd last but not least, since the goal is to help a developer understand the bug, ultimately we will perform some kind of user study like the ones you saw in yesterdays “empirical studies” session.\n
  • So these are current results. Now what next? For future work, we plan to implement the approach for more runtime features to help the developer understand other kinds of defects.\nAlso, the main limitations of the approach stem from the underlying test case generation approach. So there we want to both improve the underlying technique and perhaps try to incorporate other such techniques.\nAlso, as you saw, we applied the approach only to seven defects. These results are encouraging, but honestly this is only a start. To get a much deeper insight, we need to do a much larger evaluation with much more examples. And we are currently working on that.\nAnd last but not least, since the goal is to help a developer understand the bug, ultimately we will perform some kind of user study like the ones you saw in yesterdays “empirical studies” session.\n
  • To sum up my talk, I introduced Statistical Debugging and showed you what it looks like when applied to a real live example. Which shows its limitations. But I also showed that statistical debugging is not inherently flawed, but rather makes some assumptions that are not met by reality. how our approach performs in comparison, and then gave an intuition of how our approach works.\nThank you for listening and now I will gladly try to answer your questions.\n
  • To sum up my talk, I introduced Statistical Debugging and showed you what it looks like when applied to a real live example. Which shows its limitations. But I also showed that statistical debugging is not inherently flawed, but rather makes some assumptions that are not met by reality. how our approach performs in comparison, and then gave an intuition of how our approach works.\nThank you for listening and now I will gladly try to answer your questions.\n
  • To sum up my talk, I introduced Statistical Debugging and showed you what it looks like when applied to a real live example. Which shows its limitations. But I also showed that statistical debugging is not inherently flawed, but rather makes some assumptions that are not met by reality. how our approach performs in comparison, and then gave an intuition of how our approach works.\nThank you for listening and now I will gladly try to answer your questions.\n
  • To sum up my talk, I introduced Statistical Debugging and showed you what it looks like when applied to a real live example. Which shows its limitations. But I also showed that statistical debugging is not inherently flawed, but rather makes some assumptions that are not met by reality. how our approach performs in comparison, and then gave an intuition of how our approach works.\nThank you for listening and now I will gladly try to answer your questions.\n
  • To sum up my talk, I introduced Statistical Debugging and showed you what it looks like when applied to a real live example. Which shows its limitations. But I also showed that statistical debugging is not inherently flawed, but rather makes some assumptions that are not met by reality. how our approach performs in comparison, and then gave an intuition of how our approach works.\nThank you for listening and now I will gladly try to answer your questions.\n
  • \n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Actually we tried our tool on 7 defects. There are the Brazilian Date bug and the Base64 Lookup bug that I already showed. Then there’s the Sparse Iterator bug of the Apache Commons Math library and it is due to an inconsistency in the way that the state of “no more elements” is represented internally. And all 8 branches returned relate to this inconsistency, and the two predicates that are returned pinpoint that problem. Then there is another bug from Apache Commons Codec, where a buffer is handled incorrectly. And again, the single branch returned pinpoints this problem. Then we have another bug from Joda Time, where one of the 7 branches that was returned is the place where the bug was actually fixed and the 9 predicates contain the very reason of the failure as given in the bug report. Than we have a small artificial example where the problem was pinpointed by the branch returned. And then there’s the last example, also from the Joda Time library, where our approach fails. The reason why it fails is due to the oracle problem: in the test, a valid french date is created that is wrongly classified as invalid by the API. And our test generation technique happily generated lots and lots of input that failed with the same exception. But of course few of those were valid french dates, so we actually did not reproduce the problem when we made the program crash, which is why it didn’t work.\nSo if you look at the time needed, you first think that this is way to much time and its unpractical for reality, right.\n
  • Depending on the example, we generate some 10s of thousands of test cases. Now this might sound like a lot. However, as this graph shows, in most cases already after 12 minutes, the final result is ready. Also, for some cases we observed something weird: bugex was faster than executing the code instrumented for regular statistical analysis or even running the bare complete test suite.\n
  • \n
  • \n

Isolating Failure Causes through Test Case Generation Isolating Failure Causes through Test Case Generation Presentation Transcript

  • Isolating Failure Causes through Test Case GenerationJeremias Rößler • Gordon Fraser • Andreas Zeller • Alessandro Orso Saarland University / Georgia Institute of Technology
  • Joda Time Brazilian Datepublic void testBrazil() { LocalDate date = new LocalDate(2009, 10, 18); DateTimeZone dtz = DateTimeZone.forID("America/Sao_Paulo"); Interval interval = date.toInterval(dtz);}
  • Joda Time Brazilian Datepublic void testBrazil() { LocalDate date = new LocalDate(2009, 10, 18); DateTimeZone dtz = DateTimeZone.forID("America/Sao_Paulo"); Interval interval = date.toInterval(dtz);}
  • Statistical Debugging
  • Statistical Debugging
  • Statistical Debugging
  • Statistical Debugging Statement ~
  • Statistical Debugging Statement ~
  • Statement ~ Failure
  • Statement ~ Failure5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:585.5% – org.joda.time.chrono.ISOChronology#assemble:1695.5% – org.joda.time.field.UnsupportedDurationField#getInstance:485.5% – org.joda.time.DateTimeZone#<clinit>:1235.5% – org.joda.time.DateTimeZone#<clinit>:1305.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:465.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:455.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:945.5% – org.joda.time.chrono.GregorianChronology#getInstance:1171.4% – org.joda.time.DateTimeZone#getDefaultProvider:4251.4% – org.joda.time.DateTimeZone#getDefaultProvider:4371.4% – org.joda.time.DateTimeZone#getDefaultProvider:4461.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5091.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5211.4% – org.joda.time.DateTimeZone#setNameProvider0:4911.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:585.5% – org.joda.time.chrono.ISOChronology#assemble:1695.5% – org.joda.time.field.UnsupportedDurationField#getInstance:485.5% – org.joda.time.DateTimeZone#<clinit>:1235.5% – org.joda.time.DateTimeZone#<clinit>:1305.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:465.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:455.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:945.5% – org.joda.time.chrono.GregorianChronology#getInstance:1171.4% – org.joda.time.DateTimeZone#getDefaultProvider:4251.4% – org.joda.time.DateTimeZone#getDefaultProvider:4371.4% – org.joda.time.DateTimeZone#getDefaultProvider:4461.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5091.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5211.4% – org.joda.time.DateTimeZone#setNameProvider0:4911.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure 5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:58 5.5% – org.joda.time.chrono.ISOChronology#assemble:169 5.5% – org.joda.time.field.UnsupportedDurationField#getInstance:48static– {org.joda.time.DateTimeZone#<clinit>:123 5.5% ... 5.5% – org.joda.time.DateTimeZone#<clinit>:130 5.5% – minSum = 0; long org.joda.time.field.ZeroIsMaxDateTimeField#<init>:46 5.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:45 for (int i = 0; i < 11; i++) { 5.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:94 long millis = 5.5% – org.joda.time.chrono.GregorianChronology#getInstance:117 1.4% –MIN_DAYS_PER_MONTH_ARRAY[i] * org.joda.time.DateTimeZone#getDefaultProvider:425 1.4% –MILLIS_PER_DAY; org.joda.time.DateTimeZone#getDefaultProvider:437 minSum += millis; 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:446 ... 1.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:509 } 1.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:521} 1.4% – org.joda.time.DateTimeZone#setNameProvider0:491 1.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure 5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:58 5.5% – org.joda.time.chrono.ISOChronology#assemble:169 5.5% – org.joda.time.chrono.ISOChronology#assemble:169 5.5% – org.joda.time.field.UnsupportedDurationField#getInstance:48 5.5% – org.joda.time.DateTimeZone#<clinit>:123protected void assemble(Fields fields) { 5.5% – org.joda.time.DateTimeZone#<clinit>:130 if (getBase().getZone() == DateTimeZone.UTC) { 5.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:46 5.5% Use zero based century and year of century. // – org.joda.time.tz.CachedDateTimeZone#<clinit>:45 fields.centuryOfEra = 5.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:94 5.5% –new DividedDateTimeField( org.joda.time.chrono.GregorianChronology#getInstance:117 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:425 ISOYearOfEraDateTimeField.INSTANCE, 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:437 DateTimeFieldType.centuryOfEra(), 100); 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:446 ... 1.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:509 } 1.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:521} 1.4% – org.joda.time.DateTimeZone#setNameProvider0:491 1.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure 5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:58 5.5% – org.joda.time.chrono.ISOChronology#assemble:169 5.5% – org.joda.time.field.UnsupportedDurationField#getInstance:48 5.5% – org.joda.time.DateTimeZone#<clinit>:123public org.joda.time.DateTimeZone#<clinit>:130 5.5% – staticUnsupportedDurationField 5.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:46 5.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:45getInstance(DurationFieldType type) { 5.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:94 UnsupportedDurationField field; 5.5% – org.joda.time.chrono.GregorianChronology#getInstance:117 if (cCache == null) { 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:425 cCache = new HashMap(7); 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:437 field = null; 1.4% – org.joda.time.DateTimeZone#getDefaultProvider:446 } else { 1.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:509 field = (UnsupportedDurationField) cCache.get(type); 1.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:521 } 1.4% – org.joda.time.DateTimeZone#setNameProvider0:491 1.4% – org.joda.time.DateTimeZone#setProvider0:392 ...
  • Statement ~ Failure5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:585.5% – org.joda.time.chrono.ISOChronology#assemble:1695.5% – org.joda.time.field.UnsupportedDurationField#getInstance:485.5% – org.joda.time.DateTimeZone#<clinit>:1235.5% – org.joda.time.DateTimeZone#<clinit>:1305.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:465.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:455.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:945.5% – org.joda.time.chrono.GregorianChronology#getInstance:1171.4% – org.joda.time.DateTimeZone#getDefaultProvider:4251.4% – org.joda.time.DateTimeZone#getDefaultProvider:4371.4% – org.joda.time.DateTimeZone#getDefaultProvider:4461.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5091.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5211.4% – org.joda.time.DateTimeZone#setNameProvider0:4911.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:585.5% – org.joda.time.chrono.ISOChronology#assemble:1695.5% – org.joda.time.field.UnsupportedDurationField#getInstance:485.5% – org.joda.time.DateTimeZone#<clinit>:1235.5% – org.joda.time.DateTimeZone#<clinit>:1305.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:465.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:455.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:945.5% – org.joda.time.chrono.GregorianChronology#getInstance:1171.4% – org.joda.time.DateTimeZone#getDefaultProvider:4251.4% – org.joda.time.DateTimeZone#getDefaultProvider:4371.4% – org.joda.time.DateTimeZone#getDefaultProvider:4461.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5091.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5211.4% – org.joda.time.DateTimeZone#setNameProvider0:4911.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:585.5% – org.joda.time.chrono.ISOChronology#assemble:1695.5% – org.joda.time.field.UnsupportedDurationField#getInstance:485.5% – org.joda.time.DateTimeZone#<clinit>:1235.5% – org.joda.time.DateTimeZone#<clinit>:1305.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:465.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:455.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:945.5% – org.joda.time.chrono.GregorianChronology#getInstance:1171.4% – org.joda.time.DateTimeZone#getDefaultProvider:4251.4% – org.joda.time.DateTimeZone#getDefaultProvider:4371.4% – org.joda.time.DateTimeZone#getDefaultProvider:4461.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5091.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5211.4% – org.joda.time.DateTimeZone#setNameProvider0:4911.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure5.5% – org.joda.time.chrono.BasicGJChronology#<clinit>:585.5% – org.joda.time.chrono.ISOChronology#assemble:1695.5% – org.joda.time.field.UnsupportedDurationField#getInstance:485.5% – org.joda.time.DateTimeZone#<clinit>:1235.5% – org.joda.time.DateTimeZone#<clinit>:1305.5% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:465.5% – org.joda.time.tz.CachedDateTimeZone#<clinit>:455.5% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:945.5% – org.joda.time.chrono.GregorianChronology#getInstance:1171.4% – org.joda.time.DateTimeZone#getDefaultProvider:4251.4% – org.joda.time.DateTimeZone#getDefaultProvider:4371.4% – org.joda.time.DateTimeZone#getDefaultProvider:4461.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5091.4% – org.joda.time.DateTimeZone#getDefaultNameProvider:5211.4% – org.joda.time.DateTimeZone#setNameProvider0:4911.4% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 6 % BasicGJChronology#<clinit>:58 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % ISOChronology#assemble:169 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % UnsupportedDurationField#getInstance:48 0 % 0 % 0 % 0 % 0 % 1 % 1 % 1 % 1 % 6 % DateTimeZone#<clinit>:123 1 % 1 % 1 % 1 % 6 % DateTimeZone#<clinit>:130 1 % 1 % 1 % 1 % 6 % ZeroIsMaxDateTimeField#<init>:46 1 % 1 % 1 % 1 % 6 % CachedDateTimeZone#<clinit>:45 1 % 1 % 1 %1 %1 % 6 % 6 % BasicGJChronology#getMonthOfYear:94 ... GregorianChronology#getInstance:117
  • Statistical Debugging
  • Two Issues
  • Two Issues1. Many weak correlations
  • Two Issues1. Many weak correlations2. Only location
  • Two Issues and two solutions1. Many weak correlations2. Only location
  • Two Issues and two solutions1. Many weak correlations – strengthen correlations2. Only location
  • Two Issues and two solutions1. Many weak correlations – strengthen correlations2. Only location – provide explanations
  • Two Issues and two solutions1. Many weak correlations – strengthen correlations2. Only location – provide explanations BugEx
  • Two Issues and two solutions1. Many weak correlations – strengthen correlations2. Only location – provide explanations BugEx
  • Test Case Generation Statement ~
  • Test Case Generation Statement ~
  • Test Case Generation stronger! Statement ~
  • Test Case Generation stronger! Statement ~
  • Test Case Generationdirected stronger! Statement ~
  • Joda Time Brazilian Datepublic void testBrazil() { LocalDate date = new LocalDate(2009, 10, 18); DateTimeZone dtz = DateTimeZone.forID("America/Sao_Paulo"); Interval interval = date.toInterval(dtz);}
  • Statistical Debugging 0 % 0 %0 % 0 % 0 %0 % 6 % 0 % 0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 % 1 % 1 %1 %1 %1 % 6 % 6 %
  • Statistical Debugging 0 % 0 %0 % 0 % 0 %0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % T F 0 % 0 % 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 %T F T F 1 % 1 % 1 % 1 % 6 % 1 % 1 %… … … … 1 % 1 % 6 % 1 % 1 %1 %1 %1 % 6 % 6 %
  • Statistical Debugging 0 % 0 %0 % 0 % 0 %0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % T F 0 % 0 % 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 %T F T F 1 % 1 % 1 % 1 % 6 % 1 % 1 %… … … … 1 % 1 % 6 % 1 % 1 %1 %1 %1 % 6 % 6 %
  • Directed Generation 0 % 0 %0 % 0 % 0 %0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % T F 0 % 0 % 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 %T F T F 1 % 1 % 1 % 1 % 6 % 1 % 1 %… … … … 1 % 1 % 6 % 1 % 1 %1 %1 %1 % 6 % 6 %
  • Directed Generation 0 % 0 %0 % 0 % 0 %0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % T F 0 % 0 % 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 %T F T F 1 % 1 % 1 % 1 % 6 % 1 % 1 %… … … … 1 % 1 % 6 % 1 % 1 %1 %1 %1 % 6 % 6 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 %0 % 0 %0 % … 0 % 0 % 0 % 0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 11 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % T F 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 % 1 %T F T F 1 % 1 % 1 % 6 % 1 % 1 % 1 %… … … … 1 % 6 % 1 % 1 % 1 % 1 % 1 % 6 % 6 % 6 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 %0 % 0 %0 % … 0 % 0 % 0 % 0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 11 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 6 % 0 % 0 % 0 % 0 % T F 0 % 0 % 1 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 % 6 % 1 %T F T F 1 % 1 % 1 % 6 % 1 % 1 % 1 %… … … … 1 % 6 % 1 % 1 % 1 % 1 % 1 % 6 % 6 % 6 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 16 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 1 % 1 % 1 % 5 % 1 % 1 % 1 % 1 % 1 % 5 % 1 % 1 %T F T F 1 % 1 % 1 % 5 % 1 % 1 % 1 % 5 % 1 %… … … … 1 % 1 % 1 % 1 % 5 % 5 % 5 % 5 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 16 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 1 % 1 % 1 % 5 % 1 % 1 % 1 % 1 % 1 % 5 % 1 % 1 %T F T F 1 % 1 % 1 % 5 % 1 % 1 % 1 % 5 % 1 %… … … … 1 % 1 % 1 % 1 % 5 % 5 % 5 % 5 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 16 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 1 % 1 % 1 % 5 % 1 % 1 % 1 % 1 % 1 % 5 % 1 % 1 %T F T F 1 % 1 % 1 % 5 % 1 % 1 % 1 % 5 % 1 %… … … … 1 % 1 % 1 % 1 % 5 % 5 % 5 % 5 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 0 % 0 % 0 %0 % 0 % 0 % 0 % 0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 17 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 0 % 1 % 1 % 1 % 3 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 %T F T F 1 % 6 % 1 % 1 % 1 % 6 % 1 % 1 %… … … … 1 % 1 % 6 % 1 % 1 % 1 % 6 % 6 % 6 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 %0 %0 % 0 % 0 % 0 %0 % 0 % 0 % 0 % 0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 17 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 0 % 1 % 1 % 1 % 3 % 1 % 1 % 1 % 6 % 1 % 1 % 1 % 1 %T F T F 1 % 6 % 1 % 1 % 1 % 6 % 1 % 1 %… … … … 1 % 1 % 6 % 1 % 1 % 1 % 6 % 6 % 6 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 % 0 %0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 19 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 1 % 1 % 1 % 1 % 1 % 1 % 1 % 1 % 5 % 1 % 1 % 1 %T F T F 1 % 1 % 5 % 1 % 1 % 1 % 5 % 1 %… … … … 1 % 1 % 1 % 5 % 1 % 1 % 5 % 5 % 5 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 0 % 0 % 0 % 0 % 0 % 0 % 37 % 1 % 1 % 1 % 1 % 1 % 1 % 1 %T F T F 1 % 1 % 1 % 1 % 1 % 1 %… … … … 1 % 1 % 2 % 1 % 2 % 1 %1 % 2 % 1 %1 % 2 % 2 % 2 % 1 % 2 % 2 %
  • Directed Generation … T FT F T F… … … … 99.9 % DateTimeZone#getOffsetFromLocal:864
  • Statement ~ Failure0.01% org.joda.time.chrono.BasicGJChronology#<clinit>:5899.9% – org.joda.time.DateTimeZone#getOffsetFromLocal:8640.01% – org.joda.time.chrono.ISOChronology#assemble:1690.01% – org.joda.time.field.UnsupportedDurationField#getInstance:480.01% – org.joda.time.DateTimeZone#<clinit>:1230.01% – org.joda.time.DateTimeZone#<clinit>:1300.01% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:46// if the offsets differ, we must be0.01% – org.joda.time.tz.CachedDateTimeZone#<clinit>:450.01% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:94// near a DST boundary5.5% – org.joda.time.chrono.GregorianChronology#getInstance:117if (offsetLocal != offsetAdjusted) {0.001% – org.joda.time.DateTimeZone#getDefaultProvider:425 …0.001% – org.joda.time.DateTimeZone#getDefaultProvider:4370.001% – org.joda.time.DateTimeZone#getDefaultProvider:446}0.001% – org.joda.time.DateTimeZone#getDefaultNameProvider:5090.001% – org.joda.time.DateTimeZone#getDefaultNameProvider:5210.001% – org.joda.time.DateTimeZone#setNameProvider0:4910.001% – org.joda.time.DateTimeZone#setProvider0:392
  • Statement ~ Failure0.01% org.joda.time.chrono.BasicGJChronology#<clinit>:5899.9% – org.joda.time.DateTimeZone#getOffsetFromLocal:8640.01% – org.joda.time.chrono.ISOChronology#assemble:1690.01% – org.joda.time.field.UnsupportedDurationField#getInstance:480.01% – org.joda.time.DateTimeZone#<clinit>:1230.01% – org.joda.time.DateTimeZone#<clinit>:1300.01% – org.joda.time.field.ZeroIsMaxDateTimeField#<init>:46// if the offsets differ, we must be0.01% – org.joda.time.tz.CachedDateTimeZone#<clinit>:450.01% – org.joda.time.chrono.BasicGJChronology#getMonthOfYear:94// near a DST boundary – failure condition!5.5% – org.joda.time.chrono.GregorianChronology#getInstance:117if (offsetLocal != offsetAdjusted) {0.001% – org.joda.time.DateTimeZone#getDefaultProvider:425 …0.001% – org.joda.time.DateTimeZone#getDefaultProvider:4370.001% – org.joda.time.DateTimeZone#getDefaultProvider:446}0.001% – org.joda.time.DateTimeZone#getDefaultNameProvider:5090.001% – org.joda.time.DateTimeZone#getDefaultNameProvider:5210.001% – org.joda.time.DateTimeZone#setNameProvider0:4910.001% – org.joda.time.DateTimeZone#setProvider0:392
  • Two Issues and two solutions1. Many weak correlations – strengthen correlations2. Only location – provide explanations BugEx
  • Two Issues and two solutions1. Many weak correlations – strengthen correlations2. Only location – provide explanations BugEx
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 0 % 0 % 0 % 0 % 0 % 0 % 37 % 1 % 1 % 1 % 1 % 1 % 1 % 1 %T F T F 1 % 1 % 1 % 1 % 1 % 1 %… … … … 1 % 1 % 2 % 1 % 2 % 1 %1 % 2 % 1 %1 % 2 % 2 % 2 % 1 % 2 % 2 %
  • Directed Generation 0 %0 %0 % 0 %0 %0 % 0 % 0 %0 %0 % 0 % 0 % 0 % 0 % 0 %0 % … 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 %0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % T F 0 % 0 % 0 % 0 % 0 % 0 % 37 % 1 % 1 % 1 %Arbitrary runtime features 1 % 1 % 1 % 1 % T F T F 1 % 1 % 1 % 1 % 1 % 1 %… … … … 1 % 1 % 2 % 1 % 2 % 1 %1 % 2 % 1 %1 % 2 % 2 % 2 % 1 % 2 % 2 %
  • Directed Generation … T F • BranchesArbitrary runtime features T F T F… … … …
  • Directed Generation … T F • BranchesArbitrary runtime features • State predicates T F T F… … … …
  • Directed Generation … T F • BranchesArbitrary runtime features • State predicates T F T F • Thread schedules… … … …
  • Directed Generation … T F • BranchesArbitrary runtime features • State predicates T F T F • Thread schedules • Def-use pairs… … … …
  • Directed Generation … T F • BranchesArbitrary runtime features • State predicates T F T F • Thread schedules • Def-use pairs… … … … • and more…
  • Directed Generation … T F • BranchesArbitrary runtime features • State predicates T F T F • Thread schedules • Def-use pairs… … … … • and more…
  • State Predicates• Encode features of objects: attribute | inspector <|>|≤|≥|=|≠ attribute | inspector | constant
  • State Predicates• Encode features of objects: attribute | inspector <|>|≤|≥|=|≠ attribute | inspector | constant• e.g. “shape.area() ≥ 0” or “square.width = square.height”
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);}
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);}
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);}
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);} arrayOctect.length() == 3
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);} arrayOctect.length() == 3 octect <= 0
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);} octect <= 0
  • Base64 Lookuppublic void test() { byte[] byteArray = { -125, -10, 64 }; Base64.isArrayByteBase64(byteArray);} octect <= 0 base64Alphabet[octect]
  • Quantitative Results Branches Predicates
  • Quantitative Results Branches Predicates
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Quantitative Results Branches PredicatesBrazilian Date bug 2,380 s 1 13,55 s 25Base64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Qualitative Results Branches Predicates • CorrectbugBrazilian Date results after2,380seconds 13,55 s 25 20 s 1 in six out of seven casesBase64 Lookup bug 38 s 1 737 s 2Sparse Iterator bug 216 s 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Qualitative Results Branches Predicates • CorrectbugBrazilian Date results after2,380seconds 13,55 s 25 20 s 1 in six out of seven casesBase64 Lookup bug 38 s 1 737 s 2 •Sparse Iterator bug effort 216 s No additional 8 10,267 s 9Base64 Decoder bug 214 s 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Qualitative Results Branches Predicates • CorrectbugBrazilian Date results after2,380seconds 13,55 s 25 20 s 1 in six out of seven casesBase64 Lookup bug 38 s 1 737 s 2 •Sparse Iterator bug effort 216 s No additional 8 10,267 s 9 •Base64 Decoder bug of branchess reported Small number 214 1 1,339 s 23Western Hemisphere bug 8,422 s 7 30,937 s 9Vending Machine bug 19 s 1 56 s 1Parse French Date bug 1,577 s 15 n/a n/a
  • Qualitative Results Branches Predicates • CorrectbugBrazilian Date results after2,380seconds 13,55 s 25 20 s 1 in six out of seven casesBase64 Lookup bug 38 s 1 737 s 2 •Sparse Iterator bug effort 216 s No additional 8 10,267 s 9 •Base64 Decoder bug of branchess reported Small number 214 1 1,339 s 23 • Directly leads to failure19 s 1Western Hemisphere bug 8,422 s 7Vending Machine bug cause 30,937 s 56 s 9 1 in six out of seven casesParse French Date bug 1,577 s 15 n/a n/a
  • Future Work• More runtime features
  • Future Work• More runtime features• Improve test case generation
  • Future Work• More runtime features• Improve test case generation• Large-scale quantitative evaluation
  • Future Work• More runtime features• Improve test case generation• Large-scale quantitative evaluation• User study (like seen yesterday)
  • http://www.st.cs.uni-saarland.de/bugex/
  • Comparing Branches Statistical BugEx Debugging
  • Comparing Branches Statistical BugEx Debugging
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • Comparing Branches Statistical BugEx DebuggingBrazilian Date bug 2,380 s 1 25,223 s 24Base64 Lookup bug 38 s 1 512 s 17Sparse Iterator bug 216 s 8 1901 s 14Base64 Decoder bug 214 s 1 496 s 51Western Hemisphere bug 8,422 s 7 25,341 s 28Vending Machine bug 19 s 1 n/a n/aParse French Date bug 1,577 s 15 25,542 s 26
  • PerformanceBranches Minutes
  • Joda Time 5.2%Brazilian Date Test 94.8% 23.7%3,497 Tests 76.3%
  • Commons Codec 1.7%Base64 Lookup Test 98.3% 14.2%185 Tests 85.8%