Exploratory Testing: A Multiple Case StudyDocument Transcript
Exploratory Testing: A Multiple Case Study
Juha Itkonen and Kristian Rautiainen
Helsinki University of Technology, Software Business and Engineering Institute
P.O. BOX 9210, FIN-02015 Finland
Abstract suitable for certain situations which are not known due
to lack of research. Many benefits of ET have been
Exploratory testing (ET) – simultaneous learning, proposed, such as effectiveness, efficiency, the ability
test design, and test execution – is an applied practice to utilize testers’ creativity, and enabling rapid
in industry but lacks research. We present the current feedback [3, 4, 6, 7, 8, 11], but there is lack of
knowledge of ET based on existing literature and scientific evidence of these benefits. Furthermore, very
interviews with seven practitioners in three companies. few explicit shortcomings of ET have been mentioned
Our interview data shows that the main reasons for in the existing literature. This does not mean that there
using ET in the companies were the difficulties in would not be any. Considering the potential benefits of
designing test cases for complicated functionality and ET and its seemingly wide use, we think that more
the need for testing from the end user’s viewpoint. The research on the subject is merited.
perceived benefits of ET include the versatility of The purpose of this paper is to introduce
testing and the ability to quickly form an overall exploratory testing and study its use in three
picture of system quality. We found some support for companies in order to increase the understanding of its
the claimed high defect detection efficiency of ET. The applicability, benefits, and shortcomings. In this paper
biggest shortcoming of ET was managing test our intent is not to contrast ET and other testing
coverage. Further quantitative research on the approaches, but to describe the use of ET in industry.
efficiency and effectiveness of ET is needed. To help In addition, we bring forth several important research
focus ET efforts and help control test coverage, we issues that should be studied to better understand the
must study planning, controlling and tracking ET. applicability and restrictions of ET as well as to
develop new ways to better manage ET in order to
lessen its shortcomings.
1. Introduction The rest of the paper is structured in the following
way. In the next section we present the research
Software developers and testers have always been methods used. Then we describe related work about
doing exploratory testing. It is easy to understand this exploratory testing, followed by the results of the case
considering the short definition of exploratory testing: study. The paper is rounded up with discussion and
“Exploratory testing is simultaneous learning, test implications for further work.
design, and test execution.” . This means that ET is
testing without detailed pre-specified test cases, i.e., 2. Research methods
unscripted testing. ET is not a single testing technique
or strategy; it is rather an approach to testing where In this section we present the research methods used
test design is performed as part of test execution in this study. Our research questions were: 1) What is
instead of having a test design phase before execution. the current knowledge of exploratory testing in
The ET approach can be used for different types of literature? 2) How and why do companies use ET and
testing using various techniques or methods. Many with what results? For the first research question we
practitioners seem to promote ET as a valid approach performed a literature study. For the second research
to software testing and as a valuable part of an question we selected a descriptive case study approach
effective set of software quality assurance practices. .
ET is not a replacement for existing test-case based
approaches, but a complementary testing approach
0-7803-9508-5/05/$20.00 (c)2005 IEEE
Table 1. Summary of the case companies
Mercury Neptune Vulcan
Number of employees in 15 30 40
software product development
Type of product Professional Professional engineering Professional engineering design
application design application application
Type of users Professional control Professional engineers Professional engineers
Training requirements of the Requires some Requires comprehensive Requires comprehensive training
product training training
Number of customers < 10 > 100 > 1000
Number of users < 100 > 1000 > 1000
Applied ET in its current form Two months Six months Four years
Test approaches used Only ET for the ET and automated scripted ET for functional and smoke
studied product testing testing, test cases for system
testing, and automated model-based
Number of interviewees 1 4 2
Formal testing training of the None None None
We performed interviews and collected subjective studying and describing how and why the companies
evaluations of the benefits and shortcomings of ET and used ET.
quantitative defect and effort data in three companies.
In the following subsections we shortly describe the 2.2. Data collection and analysis
case companies and the data collection and analysis.
We performed seven semi-structured thematic
2.1. Case companies interviews lasting 40-70 minutes with persons that
were performing exploratory testing in the case
The three case companies – Mercury, Neptune, and companies. At Vulcan, where ET had been
Vulcan – were selected based on accessibility through purposefully used and improved for four years, we
our research project in which they participate. interviewed 2 persons. At Neptune, where a more
Therefore we knew that exploratory testing was systematic approach to ET had been introduced half a
applied in these companies. The real names of the year ago, we interviewed 3 persons from the company
companies have been replaced with pseudonyms. and 1 outsourced tester that was a professional user of
Table 1 summarizes the characteristics of the the system. The only interviewee at Mercury had used
companies. One of the companies is small with around ET only a couple of times before the interview.
10 employees working with software development. Each interviewee was interviewed separately using
The other two are bigger with around 30 and 40 the same set of themes and questions. The questions
employees in software development. were open ended and neutral, and the goal was to
Neptune and Vulcan develop software application record the honest opinions and experiences without
products for a large number of customers. Mercury has any leading of the interviewee. Two researchers were
a more restricted customer segment. The products of present at each interview. One researcher asked the
all companies are systems for professional users. Using questions and the other one made notes. The
two of the systems requires comprehensive training. interviews were also recorded. The notes made during
The products of Neptune and Vulcan have been on the the interviews were used as a basis for the analysis.
market for several years, while the product of Mercury The notes were complemented and clarified by
is new. listening to the recordings. We used MindManager1 to
Mercury used only ET for their product. This was
due to the immature development stage of the product.
Neptune and Vulcan also used other testing MindManager is a commercial software provided by the Mindjet
approaches, although ET had a significant role in their Corporation. It is used for visualizing and managing information in
the form of mind maps. For more information, see
testing efforts. In this paper we have focused on http://www.mindjet.com/
group the data and to create clusters that arose from the Jonathan Bach stresses focusing on finding defects
data. . ET is a testing approach that is optimized to
We also collected quantitative data of ET effort as finding defects and puts less emphasis on test
well as defect counts and types from the companies’ documentation. Finding defects is the primary purpose
defect and other tracking systems. We analyzed this of ET and documenting the results of the testing is
data by deriving descriptive metrics to get a picture of found more important than planning and scripting the
the results of using ET. We compared these to the test execution paths beforehand.
subjective evaluations of the interviewees for However, a certain degree of planning is needed for
triangulation of the results. ET. Copeland  suggests performing “chartered
exploratory testing” where the charter may define what
3. Related work to test, what documents are available, what tactics to
use, what defect types to look for, and what risks are
In this section we describe the existing knowledge involved. The charter defines a mission for testing and
regarding exploratory testing. First, we describe how works as a guideline.
ET is defined in different sources. Then we summarize The exploratory testing approach is recognized in
the claimed benefits and identified shortcomings of ET the Software Engineering Body of Knowledge
in literature. In the last subsection we describe the (SWEBOK): “Exploratory testing is defined as
Session-Based Test Management approach for simultaneous learning, test design, and test execution;
managing ET. Academic literature seems to lack that is, the tests are not defined in advance in an
research on exploratory testing. Therefore we must established test plan, but are dynamically designed,
rely largely on text books, practitioner reports and executed, and modified. The effectiveness of
electronic material available on the Internet. exploratory testing relies on the software engineer’s
knowledge, which can be derived from various
3.1. Definition of exploratory testing sources…” . SWEBOK does not describe ET in
more detail and does not provide any advice on how to
Exploratory testing is a loosely defined concept that apply ET or for which circumstances ET would be
was first introduced by Kaner et al. in the first edition suitable.
of their book “Testing Computer Software” in 1988. Craig and Jaskiel emphasize the fact that in
They describe ET as a means to keep testing software exploratory testing a tester can immediately expand
after executing the scripted tests and avoid investing testing into productive areas during testing . They
effort to carefully designing and documenting tests also state that in scripted testing a tester creates many
when the software is in an unstable state and could be tests that do not seem as useful during test execution as
redesigned soon. They also describe ET as a way of they seemed during test design, because they do not
learning the system while designing the systematic test reveal defects.
cases. However, they do not provide a precise From the above sources we can derive five
definition of what is included in ET, how it should be properties that describe when testing is exploratory
performed, and what it is not.  testing:
James Bach defines ET as “…simultaneous 1) Tests are not defined in advance as detailed test
learning, test design, and test execution” . Tinkham scripts or test cases. Instead, exploratory testing is
and Kaner give a slightly different definition: “Any exploration with a general mission without
testing to the extent that the tester actively controls the specific step-by-step instructions on how to
design of the tests as those tests are performed and accomplish the mission.
uses information gained while testing to design new 2) Exploratory testing is guided by the results of
and better tests” . According to Kaner, Bach, and previously performed tests and the gained
Pettichord exploring means “…purposeful wandering: knowledge from them. An exploratory tester uses
navigating through a space with a general mission, but any available information of the target of testing,
without a pre-scripted route. Exploration involves for example a requirements document, a user’s
continuous learning and experimenting.” . They manual, or even a marketing brochure.
also propose that a tester should continuously learn 3) The focus in exploratory testing is on finding
about the product, its market, its risks and its previous defects by exploration, instead of systematically
defects in order to continuously build new tests that are producing a comprehensive set of test cases for
more powerful than the older because they are based later use.
on the tester’s continuously increasing knowledge. 4) Exploratory testing is simultaneous learning of the
system under test, test design, and test execution.
5) The effectiveness of the testing relies on the A second benefit of exploratory testing is the
tester’s knowledge, skills, and experience. simultaneous learning. When testers are not following
pre-specified scripts, they are actively learning about
In most of the sources ET is seen as a useful and the system under test and gaining knowledge about the
effective approach to testing, but still, as a behavior and the failures in the system. This is claimed
complementary approach to structured and systematic to help testers come up with better and more powerful
test-case based techniques. tests as testing proceeds.
A third benefit is the ability to minimize preparation
3.2 Applicability of exploratory testing documentation before executing testing. This is an
advantage in a situation where the requirements and
James Bach describes contexts into which ET the design of the system change rapidly or at the early
would fit well. First, ET fits situations where rapid stage of product development when some parts of the
feedback or learning of the product is needed. Second, system are implemented, but the probability for major
ET fits situations where there is not enough time for changes is still high.
systematic testing approaches. Third, ET is a good A fourth benefit is the ability to perform
way to investigate the status of particular risks. exploratory testing without comprehensive
Fourth, ET can be used to provide more diversity to requirements or specification documentation, because
scripted tests. Fifth, regression testing based on defect exploratory testers can easily utilize all the experience
reports can be done by exploring. Sixth, ET fits well and knowledge of the product gained from various
into testing from an end-user viewpoint based on, e.g., other sources.
a user’s manual.  A fifth benefit is the rapid flow of feedback from
Copeland states that ET is valuable in situations testing to both developers and testers. This feedback
where choosing the next test case to run cannot be loop is especially fast, because exploratory testers can
determined in advance, but must be based on previous react quickly to changes to the product and provide test
tests and results. ET can be used to explore the size, results back to developers.
scope, and variations of a found defect to provide
better feedback to developers. ET is also useful when 3.4. Identified shortcomings of exploratory
test scripts become “tired”, i.e., they are not detecting testing
many defects anymore. 
Våga and Amland propose that ET should be We had a hard time finding explicit references to
planned as part of the testing approach in most of the shortcomings of ET in the existing literature. One
software development projects . They describe a identified shortcoming of exploratory testing is the
case where ET was successfully used to test a web- difficulty of tracking the progress of individual testers
based system in just two days, because that was all and the testing work as a whole. It is considered hard
available time for testing. to find out how the work proceeds, e.g., the feature
coverage of testing, because there is no planned low-
3.3. Claimed benefits of exploratory testing level structure that could be used for tracking the
progress. [3, 11]
Exploratory testing has been advocated as a useful Lee Copeland  points out that ET has no ability
testing approach based on several benefits that it can to prevent defects. Designing the test cases in scripted
provide for testing. In this section we summarize the testing can begin during the requirements gathering
benefits that are proposed in various sources [2, 7, 8]. and design phases and thus reveal defects early.
The most commonly claimed benefit of ET is its
ability to increase the effectiveness of testing in terms 3.5 Session-based test management
of number and importance of found defects. James
Bach also suggests that in some situations ET can be Exploratory testing should not be unplanned,
orders of magnitude more efficient than scripted unstructured, or careless testing without any strategy or
testing . He supports his claim with some anecdotes goals . Exploratory testing can be as disciplined as
from his personal experience. Exploratory testers can any other intellectual activity . It can be structured,
focus on the suspicious areas of the system based on managed and planned as long as the planning is not
the information of the actual behaviour of the system, extended to describe detailed tests on the level of test
instead of only relying on the specifications and design cases and execution steps.
documents when planning the tests.
Jonathan Bach has published an approach to ET that it was impossible to write test cases for all
called Session-Based Test Management (SBTM) . possible combinations. Therefore an exploratory
The session-based approach brings a clear structure to approach was regarded a natural choice. At Mercury,
loosely defined exploratory testing. It is an approach to the interviewee stated that it could have taken up to a
planning, managing, and controlling exploratory week to write a list of test cases and still some
testing in short (almost) fixed-length sessions. Each of important test cases might have been overlooked or
these sessions is planned in advance on a charter sheet. forgotten. Instead, the interviewee preferred using the
The charter is a high level plan for a test session. It same time to testing and giving feedback to
does not pre-specify the detailed test cases executed in development.
each session. A clearly defined report is produced and Another reason that all companies agreed upon was
metrics are collected during the session. The results are using ET as a way of testing the software from a user’s
debriefed afterwards between the tester and the test viewpoint. When performing ET, they tested larger
manager. combinations of functionality together from the
Experiences of using a similar approach have been viewpoint of performing the tasks of a real
presented by Lyndsay and van Eeden . The professional user of the system. Exploring real use
approach of Lyndsay and van Eeden also includes scenarios helped find peculiarities and usability
methods for controlling the scope of the testing, problems in the software. At Neptune, the user manual
assessing and tracking the coverage of the tests, and was used to structure exploratory testing from the
assessing risks and prioritizing the tests. The session- user’s viewpoint. This also enabled validating the
based approaches seem to provide tools for managing correctness and usefulness of the manual. Testing from
ET and alleviation to the planning as well as progress the user’s viewpoint was also reported challenging,
and coverage tracking problems of ET. since it requires very strong domain knowledge from
4. Results Mercury and Neptune mentioned ET as a natural
way of doing testing, since it emphasizes utilizing the
In this section we present the results from our case testers’ experience and creativity to find defects during
study. First we present the reported reasons for using test execution. At Vulcan, this was mentioned when
ET in the case companies. Then we present the talking about using ET to regression testing. When
different ways of performing ET in the companies. The new features have been developed or defects have
following subsections describe the perceived benefits been corrected, ET is useful for testing that nothing
and shortcomings of ET. In the last subsection we else has been broken. The interviewees at Vulcan
present the collected measurement data. commented that even if test cases are used as a basis
for testing, ET allows the tester to look at the tested
4.1. Reasons for using exploratory testing feature(s) as a whole. This makes it easier to spot
problems that might go unnoticed if the tester was only
The companies reported several reasons for using following a script. Also, the exploratory attitude helps
ET. In Table 2 we have summarized all reasons the tester to follow hunches and thus find defects, for
mentioned in more than one company. All companies example, in unexpected combinations of features. In
agreed that writing test cases for everything is difficult this way ET was by the interviewees regarded both as
and laborious. At Neptune and Vulcan, when using the an independent way of testing and as a complementary
system, a task could be performed in so many ways testing approach to using predefined test cases.
Table 2. Reported reasons for using ET in the case companies
Reasons for using ET Mercury Neptune Vulcan
The software can be used in so many ways or there are so many combinations between X X X
different features that writing detailed test cases for everything is difficult, laborious, and
It suits well to testing from a user’s viewpoint. X X X
It emphasizes utilizing the testers’ experience and creativity to find defects. X X X
It helps provide quick feedback on new features from testers to developers. X X X
It adapts well to situations, where the requirements and the tested features change often, X X
and the specifications are vague or incomplete.
It is a way of learning about the system, the results of which can be utilized in other X X
tasks, such as customer support and training.
All companies had found ET practical for giving system. However, none of the interviewees claimed
fast feedback to developers regarding newly developed they do this systematically.
features. When the developer(s) had completed a
feature, a tester would quickly explore the new feature 4.2.1. Session-based exploratory testing. At Neptune
and give feedback to the developer(s). The feedback and Mercury, a session-based exploratory testing
could range from reporting defects to pointing out approach was used. In this approach, testing was
usability problems or misconceptions regarding the organized in short (0,5-3 hours) test sessions during
customer requirements. This information could then be which the tester accomplished one planned testing task
used to steer the development, if necessary. (for example, “test the insert note functionality using
At Mercury and Neptune, the requirements and the different views of the system”) and tried to
developed features could change often during concentrate and focus on that task without any
development. Therefore it was considered a waste of interruptions or other disturbance. Mercury’s sessions
time to write detailed test cases during the early phases were shorter (59 minutes on average) than Neptune’s
of development. Also, the specifications were sessions (113 minutes on average). Most of the
deliberately left incomplete or even vague when interviewed persons found it beneficial to isolate the
development started. In these conditions, ET was testing time into focused sessions without other tasks
considered a logical approach to testing. or interruptions. The sessions were planned using short
At Neptune and Vulcan, ET was viewed as a way to descriptions that were called charters as in .
learn about the system under test. This was considered The charters described briefly the testing task, goals
valuable, because the information and experience of the test session, and the target of testing, i.e., the
could then be used to support the tester’s other tasks, product and the feature to be tested. At Mercury, the
since there were no full-time testers in any of the tester’s testing actions on a high level of abstraction
companies. For example, the information was used in were logged at the end of the test session charter
planning additions to the set of existing automated during the test session. At Mercury, the person who
tests. Also, some of the testers worked in customer performed the testing wrote the charter and at Neptune,
support and training. The information they got from the charters were written by the developer whose
doing ET helped them better understand the questions responsibility the tested functionality was. There was
received from customers and prepare the answers. The no systematic higher level planning or control of how
information and experience was also useful for the test sessions were allocated, and no tracking of
preparing training material. which parts of the system were covered by the test
sessions and how thoroughly.
4.2. Ways of utilizing exploratory testing The main results that were reported from the
sessions were found defects in the form of entries in
Based on our interviews we identified six different the defect tracking system. At Mercury, the
ways of utilizing exploratory testing in the three case interviewee recorded the found defects directly into the
companies: session-based ET, functional testing, defect tracking system during the session, but at
exploratory smoke testing, exploratory regression Neptune the interviewees wrote down the found
testing, subcontracted ET, and freestyle ET. These defects briefly by hand or into a text editor during the
were applied in different phases of the development session and reported the defects into the defect
life cycle and are explained in more detail in the tracking system only after the session. The
subsections below. Even if the companies had many interviewees felt that this way of working was better
different approaches to exploratory testing at the because it helped them keep their concentration on the
process level, the actual testing work of each work at hand.
individual tester did not seem to differ a lot. All
interviewees described their testing as an intuitive and 4.2.2. Functional testing of individual features.
ad-hoc process of trying to find defects or verify Vulcan used ET for testing individual features right
changes. The interviewees had different goals for the after the feature was implemented. This was performed
testing, but none of the interviewees could describe by individuals of the requirements management team
any intentional test strategies or techniques that they and focused on testing whether the implementation
used when exploring the software. When asked about corresponded to the requirements and the designer’s
trying out different combinations and finding actual ideas of the specified functionality or not. This
equivalence classes or boundary values, four of the enabled fast feedback to the developers in the early
seven interviewees told that they try to find out phase of the development life cycle.
combinations and boundaries to intentionally break the
4.2.3. Exploratory smoke testing. Vulcan used an beta release. For example, at customer services this
exploratory smoke-testing approach after the was a part of the everyday work.
implementation phase, when a new revision of the An interviewee at Vulcan also mentioned that they
software was released to testing, which could occur quite often use exploratory testing as part of systematic
once a week. Each of these releases was smoke tested system testing to explore functionality beyond the
by the service team. This exploratory testing took from documented test cases with the intent of finding more
half an hour to a day and was guided by a “heading- defects and defects that are not straightforward to find.
level” list of the areas to be tested. The tester went
through the list with the intent of identifying defects in 4.3. Perceived benefits of exploratory testing
or changes to the existing functionality and
formulating a quick overall picture of the general Some of the perceived benefits of ET were
quality of the release. In addition, the tester checked mentioned in Section 4.1 as part of the reasons for
every fix and enhancement that was implemented in using ET. In this section we present additional benefits
the release to ensure that the reported fixes actually the interviewees mentioned.
had been properly performed and worked as the The testing performed using ET is more versatile
service team member would expect from the end-user and goes deeper into the tested feature(s) according to
point of view. all interviewees. Five out of the seven interviewees
mentioned that they tend to test things that they might
4.2.4. Exploratory regression testing. Both Neptune not have included in a test plan or among test cases.
and Vulcan used exploratory testing to verify fixes and Examples of such tests include testing the
changes after implementing a single fix. This testing dependencies of new and existing features based on
was driven by announcements from developers that expertise and knowledge of the system. This would
some defect was fixed or enhancement implemented. normally not be included as test cases because they are
The tester held a short testing session to verify the fix, written based on the requirements for the new features.
typically without any planning or formal tracking or Another example of versatility is retesting a corrected
control. The result of this session was informally defect. Interviewees at both Neptune and Vulcan
communicated to the developer or, if it was a defect mentioned that they do not just retest in the same way
fix, the defect was just checked as closed into the as before, but explore for possible new defects at the
defect tracking system. same time. At Neptune, this was realized in some cases
This approach differs from the typical regression so that the person that found and reported the defect
testing described in textbooks. In these companies was not the one who tested the correction. In this way
regression testing was not performed exhaustively over the other tester, who explored the software from a
the whole system. Rather it concentrated on the different viewpoint, could use his experience to try to
changes and fixes made and, based on the tester’s find new and related defects.
experience, exploring possible new and related defects The effectiveness and efficiency of ET was
caused by the fixes. The main reasons mentioned for perceived high by the interviewees, although with
this kind of “limited” regression testing were lack of some reservations. At Vulcan, both interviewees
time or resources for complete regression testing of the concluded that ET helped them find important defects
system. in a short amount of time, but if a less experienced
person with less domain knowledge would do the
4.2.5. Subcontracted exploratory testing. As a testing, the results might not be so good. They also
special form of testing, Neptune used real users of the claimed that more defects were found in system testing
system as subcontracted testers. They hired two using ET than using test-case-based testing and they
experienced professional users of their system to test implied this could be because the test cases are
their upcoming release. This testing was organized by designed to verify that the system works and that the
features and the task of the testers was to explore each testers use ET with a more destructive attitude. At
feature of the software to accomplish real working Neptune, one of the interviewees thought that ET is
scenarios. efficient in the short term considering used hours and
the number of found defects. However, in the long run
4.2.6. Freestyle exploratory testing. At Vulcan, many it is difficult to determine the efficiency of ET, because
parts of the organization performed unmanaged it is hard to estimate the coverage of the tests and many
exploratory testing as part of their other duties. The things may go untested and thus unnoticed causing
common way of doing it was testing the latest alpha or problems in the future. Another interviewee at Neptune
thought ET was effective, but he also mentioned that
using ET to test features of a complex system is very Being able to utilize the testers experience and
time consuming. creativity was mentioned as a reason and benefit for
Getting an overall picture of the quality of the using ET in all the companies. However, this was also
system quickly is one aspect of efficiency that was mentioned as a shortcoming. In both Neptune and
mentioned by interviewees at Neptune and Vulcan. At Vulcan the interviewees agreed that relying on the
Neptune, this was considered important because the expertise of the testers made ET more prone to human
information and overall picture gained from ET was errors than systematic testing. At Neptune, one
used as a basis for prioritizing the work towards the interviewee commented that it was impossible to find
end of the project. At Vulcan, when the testers get a testers with enough experience to act as professional
new version of the system they can quickly determine users. Another comment from both Neptune and
the quality level based on their prior experience. They Vulcan was that all testers have different backgrounds
can also quickly verify that the visual look and feel of and experience and thus perform ET from different
the system is consistent with the historical look and viewpoints. Again, this was seen both as a strength and
feel, which is considered important since the system weakness, especially regarding the versatility of testing
has been in use for many years and existing customers this implied.
would probably not welcome inconsistent changes. The repeatability of defects was seen as a
shortcoming of ET at Neptune. This was attributed to
4.4. Perceived shortcomings of exploratory the complex system that permitted many ways of
testing performing tasks, and each task could require up to a
hundred or more steps. When a defect was spotted, it
Coverage in one form or another was the biggest could take hours to repeat it, so that it could be
shortcoming of ET mentioned by all the interviewees. properly reported in the defect database. However, the
At Mercury, the biggest challenge concerning macro recording capability of the system could be used
coverage was planning and selecting what to test with to alleviate the problems. Repeatability problems had
ET. The interviewee admitted that everything could also occurred at Vulcan, but they had mainly happened
not be tested with ET, because it is time consuming due to memory leaks and could probably not have been
and there are not enough testers. The factor of limited repeated even if a detailed script had been followed. At
time and testers was also mentioned at Vulcan, where Mercury, repeatability was not considered a problem,
the interviewees admitted that not everything could be because of the extensive session logs written during
tested. It was a question of prioritizing testing to the ET sessions.
potential weak spots in the system under test and trying
to allocate time of domain experts for testing. This, 4.5. Defect and effort data
combined with scarce documentation of the testing
itself, created another challenge, namely following up We collected quantitative defect and effort data
what had been tested and what had not. At Neptune, from the defect and other tracking systems of the case
coverage was considered a challenge from two companies. The collected data is summarized in Table
viewpoints: 1) when testing new features, some things 3. It helps in describing how the companies used ET.
most certainly were left untested, especially The available data has limitations which makes it hard
concerning resulting effects in the old features, and 2) to make conclusions based on it. At Vulcan, a session-
a tester can never guess all the ways in which the based testing approach was not used which means that
customer might use the system. there is no session-specific data available.
Table 3. Summary of defect and effort data
Neptune Mercury Vulcan
Total number of found defects 169 34 103 / release 31 / release
Total effort (hours) 36 4 160 / release NA
Number of sessions 17 4 NA NA
Average session length (minutes) 113 59 NA NA
Average defects / session 9.9 8.5 NA NA
Average defects / hour 4.8 8.7 0.6 NA
Serious defects NA 15 % NA NA
The session data of Neptune and Mercury is companies. There were only three case companies,
collected from the test sessions that they have which makes generalization of the results difficult.
performed so far. Vulcan’s data is collected from four Neptune and Vulcan are alike in many respects, while
successive product releases and presented as averages. Mercury is smaller and has a little bit different
Two different ways of doing exploratory testing at customer segment, but it still means that the
Vulcan could be taken into account in data collection: representativeness of the case companies is limited.
functional testing (see Section 4.2.2) and smoke testing Also, all case companies are in the product business,
(see Section 4.2.3). There was no effort data of smoke- so we cannot say if or how ET would work in, e.g.,
testing available. The number of serious defects could bespoke software projects. Another limitation is that
not be determined for Neptune and Vulcan. we focused on the use of ET in the companies and not
Within each case we cannot say whether the the whole testing approach and the role ET plays in it.
numbers in Table 3 show good or bad performance In view of the limitations of the study, our findings
since we have no comparative data of other testing should be considered more suggestive than conclusive.
approaches in the companies. Based on a cross-case A more comprehensive case study should be made,
comparison, however, defect detection efficiency with a bigger and more versatile sample of companies.
(Average defects / hour in Table 3) is much higher However, our findings add to the body of knowledge
where a session-based approach has been used. This concerning exploratory testing and provide some food
may partly be because the tester can stay focused by for thought.
avoiding interruptions during sessions. The The data did not fully support the claimed benefit of
interviewees at Vulcan admitted that they encountered increased productivity. Many interviewees felt that ET
many interruptions during testing, but they did not see is an effective way of finding defects, especially
it as a problem. The defect rate per hour metric of defects that were hard to find. Still, they also
Vulcan’s functional testing might also be biased. We considered it time consuming to explore complicated
cannot ensure that all detected defects have been functionality carefully. The defect and effort data in
logged into the defect tracking system or that effort has Table 3 seems to support claims about defect detection
been correctly and exactly allocated to ET. efficiency, at least concerning session-based ET. In
The difference between the defect rates per hour at studies we can use as comparison, Anderson et al. 
Neptune and Mercury may be explained by the found an average defect detection efficiency of less
maturity of the products. Mercury’s product is new and than 3 defects per hour for usage-based testing, where
it is logical that it might contain more defects to be the tester tests a piece of software based on test cases
detected. derived from use cases. Wood et al. found an average
defect detection efficiency of 2.47 defects per hour for
5. Discussion and further work functional testing . The defect detection efficiency
for Neptune and Mercury was 4.8 and 8.7 defects per
The purpose of this paper was to study what the hour respectively (see Table 3). Also, 15% of the
current knowledge of exploratory testing is in found defects at Mercury were serious, which gives
literature, how and why companies use ET, and with some support for the effectiveness of session-based
what results. We presented the current knowledge of ET. However, these issues require studies where a
ET, its claimed benefits, applicability, and reliable comparison of efficiency and effectiveness can
shortcomings based on the existing literature, which be made between ET and e.g. test-case based testing.
we found very scarce. We conducted a case study in Both the literature and the interviewees agreed that
three software product companies. We looked at the one of the main reasons for using ET is being able to
reasons for using ET, how ET is applied in the utilize the testers’ creativity and experience during test
companies, and what the perceived benefits and execution, instead of spending much time on designing
shortcomings of ET are. The resulting description of test cases prior to test execution. The interviewees
the use of ET in the companies is the main contribution stressed the difficulty of creating good test cases
of this paper. The results of our study support many of because the systems were so complicated and provided
the claimed benefits of existing literature and reveal so many options for the user. However, the
some new findings. Next we discuss the limitations of interviewees stated clearly their concern for the strong
our study, our findings, and propose ideas for further reliance on the expertise of the individual tester in ET.
research (ideas marked in italics). It is hard to find testers with enough domain expertise
We recognize that there are many limitations to our to act as professional users and different testers may
study, especially concerning the small number of case focus on different things when testing. This makes
evaluating the quality of testing challenging. These
shortcomings have not been explicitly addressed in the References
existing literature. An interesting question for further
research is what importance domain knowledge and  C. Andersson, T. Thelin, P. Runeson and N. Dzamashvili,
testing skills play in finding defects. None of the "An Experimental Evaluation of Inspection and Testing for
interviewees had received any formal testing training. Detection of Desing Faults", in Proceedings of International
Would the interviewees have found more defects if they Symposium on Empirical Software Engineering, 2003, pp.
had received testing training? Would a professional 174-184.
tester be able to find relevant defects without domain  J. Bach, "Exploratory Testing", in The Testing
knowledge? These questions remain open for further Practitioner, Second ed., E. van Veenendaal Ed., Den Bosch:
research. UTN Publishers, 2004, pp. 253-265.
Our study did not reveal any intentional test  J. Bach, "Session-Based Test Management", STQE, vol.
techniques or strategies used for exploring. This could 2, no. 6, 2000.
be due to lack of testing training. We feel that further
research on ET techniques and strategies is needed,  L. Copeland, A Practitioner's Guide to Software Test
Design, Boston: Artech House Publishers, 2004.
because they might increase the effectiveness of ET.
One new finding of our study that has not been  R.D. Craig and S.P. Jaskiel, Systematic Software Testing,
mentioned in existing literature was the use of ET for Boston: Artech House Publishers, 2002.
learning the system for other purposes than better  IEEE, "Guide to the Software Engineering Body of
testing and finding defects more effectively. In two of Knowledge", IEEE., Tech. Rep. IEEE - 2004 Version, 2004.
the three case companies one of the reasons for using
ET was learning the features and behavior of the  C. Kaner, J. Bach and B. Pettichord, Lessons Learned in
Software Testing, New York: John Wiley & Sons, Inc., 2002.
system, e.g. to help prepare training material and
answers in customer service.  C. Kaner, J. Falk and H.Q. Nguyen, Testing Computer
Based on our study the biggest shortcoming of ET Software, New York: John Wiley & Sons, Inc., 1999.
is coverage. Only selected parts of the system can be  J. Lyndsay and N. van Eeden, "Adventures in Session-
tested because of time pressure, but in the case Based Testing", 2003, Accessed 2005 04/25,
companies there were no established ways of planning http://www.workroom-
and prioritizing what to use ET for. This, combined productions.com/papers/AiSBTv1.2.pdf
with insufficient mechanisms for following up test  A. Tinkham and C. Kaner, "Exploring Exploratory
progress, resulted in unknown coverage, which was a Testing", 2003, Accessed 2005 04/25,
concern for the interviewees. It seems that we need http://kaner.com/pdfs/ExploringExploratoryTesting.pdf
more research to find reliable techniques for planning,
 J. Våga and S. Amland, "Managing High-Speed Web
controlling and tracking ET to help focus ET efforts
Testing", in Software Quality and Software Testing in
and help control coverage. The session-based Internet Times, D. Meyerhoff, B. Laibarra, van der Pouw
approach suggested in [3, 9] seems promising in this Kraan,Rob and A. Wallet Eds., Berlin: Springer-Verlag,
respect. 2002, pp. 23-30.
It seems that exploratory testing is an accepted
approach to testing in industry and in the future we  M. Wood, M. Roper, A. Brooks and J. Miller,
"Comparing and Combining Software Defect Detection
will continue our research efforts on ET. As ET seems Techniques: A Replicated Empirical Study", ACM SIGSOFT
to work well as a complementary testing method, more Software Engineering Notes, vol. 22, no. 6, 1997, pp. 262-
detailed case studies are needed focusing on the role 277.
of ET in a comprehensive testing process. We plan to
conduct more case studies to gain better understanding  R.K. Yin, Case Study Research: Design and Methods,
London: SAGE Publications, 1994.
of the role, and benefits of exploratory testing in
software development. We plan to focus our efforts on
session-based approaches and hope to find ways of
coping with the shortcomings and challenges of ET
while utilizing the full potential of the benefits it can
provide. Another area of interest is exploration
techniques. We plan to arrange student and industrial
experiments to get results of the effect of using
different techniques in exploratory testing.