PHANTA: Diversified Test Code Quality Measurement for Modern Software Development

Copyright 2019 FUJITSU LABORATORIES LTD.
PHANTA: Diversified Test Code
Quality Measurement for
Modern Software Development
Susumu Tokumoto, Kuniharu Takayama
FUJITSU LABORATORIES LTD., Japan
ASE 2019 Industry Showcase
0

Background
 How should we assure quality of test code?
 Quality of production code is assured by various ways such as
testing, review
 But few tools/standard practices for quality assurance of test code
int foo(int x){
int ret = 1;
while(x > 0){
ret *= x:
x--;
}
return ret;
}
TEST(foo, one){
int x = foo(1);
assert(x == 1);
}
TEST(foo, two){
int x = foo(2);
assert(x == 2);
}
Test Code Production Code
No.
1
2
3
4
5
6
7
8
検証者備考
ＣＵＤの編集内容通りになること 18 2003/4/21 ○○　○○
予想結果
テスト
項目数
検証日
機能テスト（業務仕様） a600100101.ot1a600100101.in1
テストケース入力データ出力データ
テストケース表
L57-005g20
プロセス名
受注
プログラム名プログラムＩＤ
受注履歴検索
サブシステム
コード
L57
サブシステム名
1 2003/4/22 ○○　○○
機能テスト（制御仕様１）
　回収先コードに有効な値が入ってい
る場合
a600100101.in1 a600100101.ot1
相手先名に回収先コードに対応した
値が設定されていること
1 2003/4/22 ○○　○○
機能テスト（制御仕様２）
　回収先コードにＺＥＲＯが入っている
場合
a600100101.in1 a600100101.ot1
相手先名に空白が設定されているこ
と
1
1
2003/4/22 ○○　○○
機能テスト（制御仕様４）
　入金借方勘定コードに有効な値が
入っている場合
a600100101.in1 a600100101.ot1
入金借方勘定コード内容に入金借方
勘定コードに対応した値が設定され
ていること
○○　○○
a600100101.in1
a600100102.txt
a600100101.ot1
入金番号をもとに入金実績テーブルを
検索し、入金実績振込テーブルの値
を電文に設定する
機能テスト（制御仕様６）
　入金番号に有効な値が入っており、
取消状態の場合
a600100101.in1
a600100102.txt
a600100101.ot1
エラーのメッセージＩＤを電文に設定す
る
2003/4/22 ○○　○○
－－
1 2003/4/22
限界値テスト
　入金実績の取得件数が名寄せの最
大件数を超過した場合
a600100101.in1
a600100102.txt
a600100101.ot1
次ページ有無Fに「有」が設定されて
いること
2003/4/22 ○○　○○1
L57-0004
L57-0005
L57-0006
L57-0007
受発注
L57-0008
テストケース名
L57-0001
L57-0002
L57-0003
しきい値テスト
　該当処理なし
－
機能テスト（制御仕様５）
　入金番号に有効な値が入っており、
取消状態で無い場合 Test Specification
☑Automated
Testing
☑Manual Testing
Function Line Branch
foo 100% 100%
bar 54% 20%
baz 75% 51%
Total 82% 73%
Coverages
Measure Coverage
Engineer
☑Review
☑Review
？Review
☑？
Need technology to
check quality of test code
Automated quality measurement for test code!
• Improve maintainability of test code. This in turn
improves quality of production code
• Reduce cost by automating manual review of test code
• Adds value to Fujitsu’s system integration business
Various ways to
assure the quality
How to assure
the quality?
1

What is Good Test Code?
Does 100% Code Coverage
mean Perfect Test Code?
Is Automation of Testing
Goal of Creating Test Code?
2

Position of Test Code in Testing Quadrants
Feel confident about code
Enable to refactor frequently
Release with confidence
Support programming
Managing technical debt
Test code as safety net
rather than producing
business value
Lisa Crispin and Janet Gregory. 2009. Agile Testing: A Practical Guide for Testers and Agile Teams (1 ed.). Addison-Wesley Professional.
3

Multiple Forces of Influence Related to Tests
Directly or Indirectly Affect Productivity
Lasse Koskela, “Effective Unit Testing: A Guide for Java Developers”, p8, Figure 1.3
4

Don’t Do Everything from the Beginning
You should NOT stick to
• doing everything from
the beginning
• test-driven development
• test-first development
• “unit” test
• code coverage
• test speed
You should stick to
• reproducible and
repeatable tests
• isolated tests
Takuto Wada, “Working Effectively with Legacy Code” https://speakerdeck.com/twada/working-effectively-with-legacy-code
5

Good Test Code
Plays the Role of a
Safety Net
Affects Positively
Productivity
with Diverse and
Prioritized
Quality Metrics
Achieved by
Monitoring the
Quality Metrics
Copyright 2019 FUJITSU LABORATORIES LTD.6

Maturity Levels of Test Code
No Test Code
Isolated and Repeatable Tests
Requirements-covered Tests
Code-covered Tests
High Speed
Tests
Maintain-
able Tests
Unsteady defense
Steady defense
and turn to attack
Spend enough
time to attack
We need a tool
to measure and
improve these
qualities
7

We developed PHANTA!
Pit enHANced Test code Analyzer
8

Quality of Test Code PHANTA Measures
Bug Detectability
• How many bugs the test code can detect
Maintainability
• Flexibility for fixing
• Readability
Speed
• Test duration
Technology 1:
Mutation Analysis
Technology 2:
Active Assertion
Analysis
Technology 3:
Test Code
Clone Detection
Technology 4:
Scoring Test
Duration
9

Technology 1: Mutation Analysis
1: int abs(int x){
2: if(x <= 0){
3: return –x;
4: }
5: return x;
6: }
int abs(int x){
if(x > 0){
return –x;
}
return x;
}
Test1
abs(2) ==2
Test2
abs(-2)==2
mutant1 fail fail killed
mutant2 pass pass unkilled
mutant3 pass fail killed
int abs(int x){
if(x <= 1){
return –x;
}
return x;
}
int abs(int x){
if(x <= 0){
return –-x;
}
return x;
}
mutant1
(line 2, <= → >)
mutant2
(line 2, 0 → 1)
mutant3
(line 3, - → --)
mutation score: 2/3=0.667
Inject various bugs and
check test can detect the
injected bugs automatically
Program Under Test
Mutant notation：
(Location of mutant, Mutant operator)
Ratio of killed
mutants, i.e.
bug
detectability
10

Technology 2: Active Assertion Analysis
 Assertions which don’t contribute bug detection may
make test code’s readability worse
→Assertions which don’t kill any mutants are regarded as
inactive, and use that to improve test code’s readability
The assertion which doesn’t kill mutant
doesn’t contribute bug detection
1: public Bar method(){
2: b = 1;
3: c = 2;
4: }
mutant1
(Line 2, 1 → 0)
mutant2
(Line 3, 2 → 0)
2: b = 0;
3: c = 2;
4: }
2: b = 1;
3: c = 0;
4: }
11

Technology 3: Test Code Clone Detection
 Code clone: Matching or similar chunks in source code
 Example for test code clone
 Code clones is considered harmful for the maintenance,
and they can be used for maintainability improvement
@Test
public void testFoo1() {
Foo foo = new Foo();
assertEquals(12, foo.methodA(0));
}
@Test
public void testFoo2() {
Foo foo = new Foo();
assertEquals(34, foo.methodA(1));
}
Match except method names
and literals
→Regards them as code
clone

Technology 4: Scoring Test Duration
 We aggregate test duration per test case from 225 open
source projects, and enable to calculate relative speed of
target tests as normalized score
Test duration per test case
Test duration per test case
Distribution of test duration
per test case in OSS Score model
Score
Numberofprojects
Transform
to normal
distribution
0.489 sec ->
50 (average)
13

Case Study :
Factory Automation Application

Subject Application
• RESTful API server in Factory
Automation
Feature
• Source Code : 37 KLOC
• Test Code : 21 KLOC
• Over 1,000 Test cases
Size of Code
• Java 8
• DBMS
• Apache Maven
Application requires
Application Server
DBMS
Servlet
Authen-
tication
ValidatorServiceService
Services
Client
Data Access
Architecture
15

Install and Run PHANTA
 Add internal maven repository to ~/.m2/settings.xml
 Add the plugin to build/plugins in your pom.xml
 Run the plugin from command line
<settings>
<mirrors>
<mirror>
<id>freiburg-nexus</id>
<mirrorOf>*</mirrorOf>
<url>http://freiburg.dyn.soft.flab.fujitsu.co.jp:8081/repository/maven-public/</url>
</mirror>
</mirrors>
</settings>
<plugin>
<groupId>com.fujitsu.labs.phanta</groupId>
<artifactId>phanta-maven</artifactId>
<version>0.1.0-SNAPSHOT</version>
</plugin>
$ mvn com.fujitsu.labs:phanta-maven:analyze
16

 Analysis duration: 3h 8min (sequential but could be parallelized)
Analysis Summary
Yellow (Caution) Red (Danger)
17

Mutation Analysis Report
Authentication
Data Access
Services
Moderate line coverage and
mutation coverage with acceptable
gap between two.
→caused by well written assertions
Servlet
18

Active Assertion Analysis Report
Moderate number of
active assertions with
low variance
19

Proposal for Improving the Test Code
Repeatable
• Some test cases look flaky
because expected values are
written as order-sensitive in
spite of no need for order-
sensitive
Coverage
• Frequently updated code with
low coverage, especially zero
coverage, should be added
test cases.
Speed, Maintainability
• Use in-memory DB or
stub/mock to relieve DB
access overhead
• Change test code clones into
parameterized tests

Feedback from Test Code Developers
Overall, the result of analysis is reasonable and acceptable
Average values are not quite meaningful
• The code except Services is out of the tests’ scope because they were derived from other past project and had
already be tested well
• This is the reason why the code except Services has much less coverage than Services
Equivalent mutants made the developers confused
• They were wondering why mutation coverages in Services are not 100%
• There were equivalent mutants (not killable) in Services’ code, but this was not evident from the report
Understanding the report is difficult
• They found difficulty in understanding what the report means and what action items various metrics imply
Maintainability is more important than we assumed
• They thought maintainability of test code is more important than some qualities like coverage

Lessons Learned
Make the report
understandable for
developers
• Remove equivalent mutants
• Lead developers to the next action
Make the tool flexible and
customizable
• Let developers select the target to be
measured
Automated test code
refactoring could be helpful
• The developers want more
maintainable test code

PHANTA: Diversified Test Code Quality Measurement for Modern Software Development

PHANTA: Diversified Test Code Quality Measurement for Modern Software Development

Recommended

Recommended

More Related Content

Similar to PHANTA: Diversified Test Code Quality Measurement for Modern Software Development

Similar to PHANTA: Diversified Test Code Quality Measurement for Modern Software Development (20)

Recently uploaded

Recently uploaded (20)

PHANTA: Diversified Test Code Quality Measurement for Modern Software Development