Static analysis: Around Java in 60 minutes
Maxim Stefanov
PVS-Studio, C++/Java developer, Tula
1
About the speaker
• Maxim Stefanov (stefanov@viva64.com)
• C++/Java developer in the PVS-Studio company
• Activities:
• Taking part in developing the C++ analyser core
• Taking part in developing the Java analyzer
2
We’re going to talk about...
• Theory
• Code quality (bugs, vulnerabilities)
• Methodologies of code protection against defects
• Code Review
• Static analysis and everything related to it
• Tools
• Existing tools of static analysis
• SonarQube
• PVS-Studio for Java what is it?
• Several detected examples of code with defects
• More about static analysis
• Conclusions
3
Why we need to concern about code quality
• Don’t let technical debt accrue, if a project is green
• Don’t lose users, if a project already has a history
4
Cost of fixing a defect
From the book by C. McConnell "Code Complete"
5
Methods to provide the code of high quality
6
Code Review
7
Pros Cons
Detection of defects at the earliest development stage Tiring
Intensified teamwork Time-consuming
Increased degree of code grasping Expensive
Learning effect
Fresh outside perspective
(no matter how cool you are as a programmer, you’ll
definitely forget something)
Detection of high level errors
8
Code Review
Static code analysis
Pros Cons
Detects defects before code reviewing You cannot detect high level
errors
The analyser doesn’t get tired and is ready to work
anytime
False positives
You can find some errors not knowing about such patterns
You can detect errors that are difficult to notice when
reviewing code
9
Technologies used in static analysis
•Pattern-based analysis
•Type inference
•Data-flow analysis
•Symbolic execution
•Method annotations
10
Pattern-based analysis
@Override
public boolean equals(Object obj) {
....
return index.equals(other.index)
&& type.equals(other.type)
&& version == other.version
&& found == other.found
&& tookInMillis == tookInMillis
&& Objects.equals(terms, other.terms);
}
11
Type inference
interface Human { ... }
class Parent implements Human{ ... }
class Child extends Parent { ... }
...
class Animal { ... }
...
boolean someMethod(List<Child> list, Animal animal)
{
if (list.remove(animal))
return false;
...
}
12
Method annotations
Class("java.lang.Math")
- Function("max", Type::Int32, Type::Int32)
.Pure()
.Set(FunctionClassification::NoDiscard)
.Requires(NotEquals(Arg1, Arg2))
.Returns(Arg1, Arg2, [](const Int &v1, const Int &v2)
{
return v1.Max(v2);
}
)
13
Method annotations
int test(int a, int b) {
Math.max(a, b); //1
if (a > 5 && b < 2) {
// a = [6..INT_MAX]
// b = [INT_MIN..1]
if (Math.max(a, b) > 0) //2
{...}
}
return Math.max(a, a); //3
}
14
Data-flow analysis
void func(int x) // x: [-2147483648..2147483647] //1
{
if (x > 3)
{
// x: [4..2147483647] //2
if (x < 10)
{
// x: [4..9] //3
}
}
else
{
// x: [-2147483648..3] //4
}
}
15
Symbolic execution
int someMethod(int A, int B)
{
if (A == B)
return 10 / (A - B);
return 1;
}
16
Existing tools
17
SonarQube: who, what and why
• Platform with open source code for continuous analysis and
estimating the code quality
• Contains a number of analyzers for various languages
• Allows to integrate third-party analyzers
• Clearly demonstrates quality of your project
18
SonarQube: data representation
19
SonarQube: data representation
20
SonarQube: data representation
21
SonarQube: data representation
22
Story of creating PVS-Studio for Java
• Java is a popular language
• Wide implementation area of the language
• We could use mechanisms from the C++ analyzer
(data-flow analysis, method annotations)
23
Analyzer internals
24
Spoon for getting a syntax tree and semantic
model
Spoon transforms the code in the metamodel:
class TestClass
{
void test(int a, int b)
{
int x = (a + b) * 4;
System.out.println(x);
}
}
25
Analyzer internals
Data-flow analysis, method annotations - usage of mechanisms from
the C++ analyzer using SWIG
26
Analyzer internals
Diagnostic rule is a visitor with overloaded methods.
Inside the methods the items that are of interest for us are traversed
along the tree.
27
Analyzer internals
Several examples of errors, found using
PVS-Studio
28
Integer division
private static boolean checkSentenceCapitalization(@NotNull String value) {
List<String> words = StringUtil.split(value, " ");
....
int capitalized = 1;
....
return capitalized / words.size() < 0.2; // allow reasonable amount of
// capitalized words
}
V6011 [CWE-682] The '0.2' literal of the 'double' type is compared to a value of the 'int' type.
TitleCapitalizationInspection.java 169
IntelliJ IDEA
29
Always false
PVS-Studio: V6007 [CWE-570] Expression '"0".equals(text)' is always false. ConvertIntegerToDecimalPredicate.java 46
IntelliJ IDEA
public boolean satisfiedBy(@NotNull PsiElement element) {
....
@NonNls final String text = expression.getText().replaceAll("_", "");
if (text == null || text.length() < 2) {
return false;
}
if ("0".equals(text) || "0L".equals(text) || "0l".equals(text)) {
return false;
}
return text.charAt(0) == '0';
}
30
Unexpected number of iterations
public static String getXMLType(@WillNotClose InputStream in) throws
IOException
{
....
String s;
int count = 0;
while (count < 4) {
s = r.readLine();
if (s == null) {
break;
}
Matcher m = tag.matcher(s);
if (m.find()) {
return m.group(1);
}
}
....
}
31
SpotBugs
V6007 [CWE-571] Expression 'count < 4' is always true. Util.java 394
We can’t go on without Copy-Paste
public class RuleDto {
....
private final RuleDefinitionDto definition;
private final RuleMetadataDto metadata;
....
private void setUpdatedAtFromDefinition(@Nullable Long updatedAt) {
if (updatedAt != null && updatedAt > definition.getUpdatedAt()) {
setUpdatedAt(updatedAt);
}
}
private void setUpdatedAtFromMetadata(@Nullable Long updatedAt) {
if (updatedAt != null && updatedAt > definition.getUpdatedAt()) {
setUpdatedAt(updatedAt);
}
}
....
}
32
SonarQube
V6032 It is odd that the body of method 'setUpdatedAtFromDefinition' is fully equivalent to the body of another method
'setUpdatedAtFromMetadata'. Check lines: 396, 405. RuleDto.java 396
Duplicates
V6033 [CWE-462] An item with the same key 'JavaPunctuator.PLUSEQU' has already been added. Check lines: 104, 100.
KindMaps.java 104
SonarJava
private final Map<JavaPunctuator, Tree.Kind> assignmentOperators =
Maps.newEnumMap(JavaPunctuator.class);
public KindMaps() {
....
assignmentOperators.put(JavaPunctuator.PLUSEQU, Tree.Kind.PLUS_ASSIGNMENT);
....
assignmentOperators.put(JavaPunctuator.PLUSEQU, Tree.Kind.PLUS_ASSIGNMENT);
....
}
33
How to integrate static analysis in the process
of software development
• Each developer has a static analysis tool on his machine
• Analysis of the entire code base during the night builds.
When suspicious code is found - all guilty ones get
mails.
34
How to start using static analysis tools on large
projects and not to lose heart
1. Check the project
2. Specify that all issued warnings are not interesting for us yet.
Place the warnings in a special suppression file
3. Upload the file with markup in the version control system
4. Run the analyser and get warnings only for the newly written or
modified code
5. PROFIT!
35
Conclusions
• Static analysis – additional methodology, not a «silver bullet»
• Static analysis has to be used regularly
• You can immediately start using the analysis and postpone fixing of
old errors
• Competition is a key to progress
36
Maxim Stefanov
stefanov@viva64.com
7 953 968 49 43
37

Static analysis: Around Java in 60 minutes

  • 1.
    Static analysis: AroundJava in 60 minutes Maxim Stefanov PVS-Studio, C++/Java developer, Tula 1
  • 2.
    About the speaker •Maxim Stefanov (stefanov@viva64.com) • C++/Java developer in the PVS-Studio company • Activities: • Taking part in developing the C++ analyser core • Taking part in developing the Java analyzer 2
  • 3.
    We’re going totalk about... • Theory • Code quality (bugs, vulnerabilities) • Methodologies of code protection against defects • Code Review • Static analysis and everything related to it • Tools • Existing tools of static analysis • SonarQube • PVS-Studio for Java what is it? • Several detected examples of code with defects • More about static analysis • Conclusions 3
  • 4.
    Why we needto concern about code quality • Don’t let technical debt accrue, if a project is green • Don’t lose users, if a project already has a history 4
  • 5.
    Cost of fixinga defect From the book by C. McConnell "Code Complete" 5
  • 6.
    Methods to providethe code of high quality 6
  • 7.
  • 8.
    Pros Cons Detection ofdefects at the earliest development stage Tiring Intensified teamwork Time-consuming Increased degree of code grasping Expensive Learning effect Fresh outside perspective (no matter how cool you are as a programmer, you’ll definitely forget something) Detection of high level errors 8 Code Review
  • 9.
    Static code analysis ProsCons Detects defects before code reviewing You cannot detect high level errors The analyser doesn’t get tired and is ready to work anytime False positives You can find some errors not knowing about such patterns You can detect errors that are difficult to notice when reviewing code 9
  • 10.
    Technologies used instatic analysis •Pattern-based analysis •Type inference •Data-flow analysis •Symbolic execution •Method annotations 10
  • 11.
    Pattern-based analysis @Override public booleanequals(Object obj) { .... return index.equals(other.index) && type.equals(other.type) && version == other.version && found == other.found && tookInMillis == tookInMillis && Objects.equals(terms, other.terms); } 11
  • 12.
    Type inference interface Human{ ... } class Parent implements Human{ ... } class Child extends Parent { ... } ... class Animal { ... } ... boolean someMethod(List<Child> list, Animal animal) { if (list.remove(animal)) return false; ... } 12
  • 13.
    Method annotations Class("java.lang.Math") - Function("max",Type::Int32, Type::Int32) .Pure() .Set(FunctionClassification::NoDiscard) .Requires(NotEquals(Arg1, Arg2)) .Returns(Arg1, Arg2, [](const Int &v1, const Int &v2) { return v1.Max(v2); } ) 13
  • 14.
    Method annotations int test(inta, int b) { Math.max(a, b); //1 if (a > 5 && b < 2) { // a = [6..INT_MAX] // b = [INT_MIN..1] if (Math.max(a, b) > 0) //2 {...} } return Math.max(a, a); //3 } 14
  • 15.
    Data-flow analysis void func(intx) // x: [-2147483648..2147483647] //1 { if (x > 3) { // x: [4..2147483647] //2 if (x < 10) { // x: [4..9] //3 } } else { // x: [-2147483648..3] //4 } } 15
  • 16.
    Symbolic execution int someMethod(intA, int B) { if (A == B) return 10 / (A - B); return 1; } 16
  • 17.
  • 18.
    SonarQube: who, whatand why • Platform with open source code for continuous analysis and estimating the code quality • Contains a number of analyzers for various languages • Allows to integrate third-party analyzers • Clearly demonstrates quality of your project 18
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    Story of creatingPVS-Studio for Java • Java is a popular language • Wide implementation area of the language • We could use mechanisms from the C++ analyzer (data-flow analysis, method annotations) 23
  • 24.
  • 25.
    Spoon for gettinga syntax tree and semantic model Spoon transforms the code in the metamodel: class TestClass { void test(int a, int b) { int x = (a + b) * 4; System.out.println(x); } } 25 Analyzer internals
  • 26.
    Data-flow analysis, methodannotations - usage of mechanisms from the C++ analyzer using SWIG 26 Analyzer internals
  • 27.
    Diagnostic rule isa visitor with overloaded methods. Inside the methods the items that are of interest for us are traversed along the tree. 27 Analyzer internals
  • 28.
    Several examples oferrors, found using PVS-Studio 28
  • 29.
    Integer division private staticboolean checkSentenceCapitalization(@NotNull String value) { List<String> words = StringUtil.split(value, " "); .... int capitalized = 1; .... return capitalized / words.size() < 0.2; // allow reasonable amount of // capitalized words } V6011 [CWE-682] The '0.2' literal of the 'double' type is compared to a value of the 'int' type. TitleCapitalizationInspection.java 169 IntelliJ IDEA 29
  • 30.
    Always false PVS-Studio: V6007[CWE-570] Expression '"0".equals(text)' is always false. ConvertIntegerToDecimalPredicate.java 46 IntelliJ IDEA public boolean satisfiedBy(@NotNull PsiElement element) { .... @NonNls final String text = expression.getText().replaceAll("_", ""); if (text == null || text.length() < 2) { return false; } if ("0".equals(text) || "0L".equals(text) || "0l".equals(text)) { return false; } return text.charAt(0) == '0'; } 30
  • 31.
    Unexpected number ofiterations public static String getXMLType(@WillNotClose InputStream in) throws IOException { .... String s; int count = 0; while (count < 4) { s = r.readLine(); if (s == null) { break; } Matcher m = tag.matcher(s); if (m.find()) { return m.group(1); } } .... } 31 SpotBugs V6007 [CWE-571] Expression 'count < 4' is always true. Util.java 394
  • 32.
    We can’t goon without Copy-Paste public class RuleDto { .... private final RuleDefinitionDto definition; private final RuleMetadataDto metadata; .... private void setUpdatedAtFromDefinition(@Nullable Long updatedAt) { if (updatedAt != null && updatedAt > definition.getUpdatedAt()) { setUpdatedAt(updatedAt); } } private void setUpdatedAtFromMetadata(@Nullable Long updatedAt) { if (updatedAt != null && updatedAt > definition.getUpdatedAt()) { setUpdatedAt(updatedAt); } } .... } 32 SonarQube V6032 It is odd that the body of method 'setUpdatedAtFromDefinition' is fully equivalent to the body of another method 'setUpdatedAtFromMetadata'. Check lines: 396, 405. RuleDto.java 396
  • 33.
    Duplicates V6033 [CWE-462] Anitem with the same key 'JavaPunctuator.PLUSEQU' has already been added. Check lines: 104, 100. KindMaps.java 104 SonarJava private final Map<JavaPunctuator, Tree.Kind> assignmentOperators = Maps.newEnumMap(JavaPunctuator.class); public KindMaps() { .... assignmentOperators.put(JavaPunctuator.PLUSEQU, Tree.Kind.PLUS_ASSIGNMENT); .... assignmentOperators.put(JavaPunctuator.PLUSEQU, Tree.Kind.PLUS_ASSIGNMENT); .... } 33
  • 34.
    How to integratestatic analysis in the process of software development • Each developer has a static analysis tool on his machine • Analysis of the entire code base during the night builds. When suspicious code is found - all guilty ones get mails. 34
  • 35.
    How to startusing static analysis tools on large projects and not to lose heart 1. Check the project 2. Specify that all issued warnings are not interesting for us yet. Place the warnings in a special suppression file 3. Upload the file with markup in the version control system 4. Run the analyser and get warnings only for the newly written or modified code 5. PROFIT! 35
  • 36.
    Conclusions • Static analysis– additional methodology, not a «silver bullet» • Static analysis has to be used regularly • You can immediately start using the analysis and postpone fixing of old errors • Competition is a key to progress 36
  • 37.