A graph-based approach to analyze JavaScript source codes, using Neo4j as the graph database backend and ShapeSecurity Shift as the parser.
Hungarian version (presented at a Neo4j meetup): http://www.slideshare.net/steindani/forrskdtrak-grfalap-statikus-analzise
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Graph-Based Source Code Analysis of JavaScript Repositories
1. Graph-Based Source Code Analysis
of JavaScript Repositories
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Fault Tolerant Systems Research Group
Dániel Stein
Gábor Szárnyas
3. Continuous Integration (CI)
– Developers working together
– Prevent integration problems
– Examples
– Jenkins
– Hudson
– Travis CI
3
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Development
Version
Control
System
Compilation
Unit and
Integration
Tests
7. Static Analysis
– No need for compilation or
execution of the application
– Formatting, structural and
semantic rule checking
– Can extend the workflow of
continuous integration and
improve it
– In this research we used code
analysis utilizing pattern
matching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Development
Version
Control
System
Compilation
Unit and
Integration
Tests
Static
Analysis
8. Static Analysis
– No need for compilation or
execution of the application
– Formatting, structural and
semantic rule checking
– Can extend the workflow of
continuous integration and
improve it
– In this research we used code
analysis utilizing pattern
matching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Development
Version
Control
System
Compilation
Unit and
Integration
Tests
Static
Analysis
9. Static Analysis
– No need for compilation or
execution of the application
– Formatting, structural and
semantic rule checking
– Can extend the workflow of
continuous integration and
improve it
– In this research we used code
analysis utilizing pattern
matching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
– Java
– FindBugs
– PMD
– CheckStyle
Development
Version
Control
System
Compilation
Unit and
Integration
Tests
Static
Analysis
10. Static Analysis
– No need for compilation or
execution of the application
– Formatting, structural and
semantic rule checking
– Can extend the workflow of
continuous integration and
improve it
– In this research we used code
analysis utilizing pattern
matching
5
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
– Java
– FindBugs
– PMD
– CheckStyle
– JavaScript
– ESLint
– Facebook Infer, Flow
– Tern
– TAJS
Development
Version
Control
System
Compilation
Unit and
Integration
Tests
Static
Analysis
11. – Thorough code analysis is time-consuming and resource-intensive
– For large projects it can be too slow
Problems to Solve
6
unit tests
static analysis
☼ ☆☾☆
12. – Thorough code analysis is time-consuming and resource-intensive
– For large projects it can be too slow
– Temporary solution: batching
Problems to Solve
6
unit tests
static analysis
☼ ☆☾☆
unit tests
static analyis
13. – Thorough code analysis is time-consuming and resource-intensive
– For large projects it can be too slow
– Temporary solution: batching
Present results
as soon and as fast
as possible.
Problems to Solve
6
unit tests
static analysis
☼ ☆☾☆
unit tests
static analyis
14. Problems to Solve
– Memory limits appear when...
– Global rules are checked
– Storing the structure in-memory
– For large code repositories
– Not being incremental
– Batched execution simply
does not cut it
– Small change induces
complete recheck
7
15. Our Approach
– Incremental methodology
– Instead of batched execution
– Update the prepared results with the
effects of the change
– Only store the required parts in the
memory
8
analyzer
Δ2.-1.1.
27. Code Processing Steps
27
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Token Token type
VAR (Keyword)
IDENTIFIER (Ident)
ASSIGN (Punctuator)
NUMBER (NumericLiteral)
DIV (Punctuator)
NUMBER (NumericLiteral)
token – the shortest character
sequence still having meaning.
28. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
12
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryExpression
operator = `Div`
LiteralNumericExpression
value = 1.0
LiteralNumericExpression
value = 0.0
declaration
declarators
items
binding init
left right
29. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
12
Abstract Syntax Tree (AST)
– Tree representation of
– the grammar structure of
– the sequence of tokens.
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryExpression
operator = `Div`
LiteralNumericExpression
value = 1.0
LiteralNumericExpression
value = 0.0
declaration
declarators
items
binding init
left right
30. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
12
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryExpression
operator = `Div`
LiteralNumericExpression
value = 1.0
LiteralNumericExpression
value = 0.0
declaration
declarators
items
binding init
left right
31. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationStatement
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryExpression
operator = `Div`
LiteralNumericExpression
value = 1.0
LiteralNumericExpression
value = 0.0
declaration
declarators
items
binding init
left right
32. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationState
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryE
operato
LiteralNumericExpression
value = 1.0
declaration
declarators
items
binding init
left right
33. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationState
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryE
operato
LiteralNumericExpression
value = 1.0
declaration
declarators
items
binding init
left right
GlobalScope
Scope
Variable
name = `foo`
Reference
accessibility = `Write`
variables
references
children
Declaration
kind = `Var`
declarations
node
astNode
34. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
Abstract Semantic Graph
(ASG)
– Graph, not necessarily tree.
– Semantic information besides
the syntactic structure.
– Contains
cross-edges →
13
Module
VariableDeclarationState
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryE
operato
LiteralNumericExpression
value = 1.0
declaration
declarators
items
binding init
left right
GlobalScope
Scope
Variable
name = `foo`
Reference
accessibility = `Write`
variables
references
children
Declaration
kind = `Var`
declarations
node
astNode
35. tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Code Processing Steps
13
Module
VariableDeclarationState
VariableDeclaration
VariableDeclarator
BindingIdentifier
name = `foo`
BinaryE
operato
LiteralNumericExpression
value = 1.0
declaration
declarators
items
binding init
left right
GlobalScope
Scope
Variable
name = `foo`
Reference
accessibility = `Write`
variables
references
children
Declaration
kind = `Var`
declarations
node
astNode
39. Overview of the Approach
15
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Verziókezelés
Fordítás
Fejlesztés
Egység- és
integrációs teszt
Kódanalízis
Development
Version
Control
System
Compilation
Unit and
Integration
Tests
Static
Analysis
41. Overview of the Approach
16
Version
Control
System
Integrated
Development
Environment
Git,VisualStudioCode
42. Overview of the Approach
16
Version
Control
System
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Git,VisualStudioCode ShapeSecurityShift
43. Overview of the Approach
16
Version
Control
System
transformation
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Git,VisualStudioCode ShapeSecurityShift Java,Cypher
44. Overview of the Approach
16
Version
Control
System
transformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
45. Overview of the Approach
16
Version
Control
System
transformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
result
processing
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
46. Overview of the Approach
16
Version
Control
System
transformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
result
processing
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
47. Overview of the Approach
16
Version
Control
System
transformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
result
processing
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
48. Overview of the Approach
16
Version
Control
System
transformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
result
processing
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
49. Overview of the Approach
16
Version
Control
System
transformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
result
processing
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
50. Overview of the Approach
16
Version
Control
System
transformationtransformation
graph
database
Integrated
Development
Environment
tokenizer
source code
tokens
AST
ASG
parser
scope analyzer
result
processing
result
processing
Git,VisualStudioCode ShapeSecurityShift Java,Cypher Neo4j
56. Use Cases static analysis
– Searching for local bad
smells (linter warnings)
– without a case
– value set more than once
– Not used variable
– Global rules
– Unreachable code parts
– Framework
– Freely extendable
– User-defined rules
– Easier to use than visitor
pattern solutions
18
57. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
58. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
59. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
60. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
if
61. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
if condition
62. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
statement statement
if condition
63. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
statement statement
if
statement
condition
64. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
statement statement
error
if
statement
condition
65. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
statement statement
statement
error
if
statement
condition
66. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
statement statement
statement
error
if
statement
condition
67. Use Cases transformation
Control Flow Graph (CFG)
– Graph representation of
– every possible
statement sequence
– during code execution.
19
statement
statement
statement statement
statement
error
if
return
statement
condition
68. error
Use Cases test generation
20
statement
statement
statement statement
statement
if
return
condition
statement
69. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
70. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
71. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
72. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
73. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
74. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
75. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
76. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
20
statement
statement
statement statement
statement
if
return
condition
statement
77. error
Use Cases test generation
– Inspecting control flows
– Is the given statement reachable
given the constraints on the
edges?
– Which one is the shortest route?
– Producing test input
for dynamic testing
20
statement
statement
statement statement
statement
if
return
condition
statement
78. Use Cases type inference
– Supporting dynamically typed languages
– Python
– JavaScript / ECMAScript
21
79. Use Cases type inference
– Supporting dynamically typed languages
– Python
– JavaScript / ECMAScript
21
http://marijnhaverbeke.nl/blog/tern.html
80. Use Cases impact analysis
– Adapting to the continuous integration workflow
– Handling multiple branches
– Following the modifications in a branch
– File-level incremental granularity
– Giving differential reports to the developers
22
81. Why Neo4j?
+++
– Quick prototyping
– Supporting transactions
– Great tooling
--
– Not scaling well
– Only disk-based
23
82. Remarks MERGE
– MATCH or CREATE
– Great for the lazy
– Can be expensive
– Possible solutions:
– Less MERGE
– Separating queries
– Create first if not present
– Use MATCH instead of MERGE
– Prevention
– Prepare the structure when
inserting the data
24
84. Remarks if-then-else
– Not a language element in
Cypher
– Can be solved with a trick
– Verrrrrry sloww
– Solution:
– Two smaller, disjunct cases
26
85. Remarks if-then-else
– Not a language element in
Cypher
– Can be solved with a trick
– Verrrrrry sloww
– Solution:
– Two smaller, disjunct cases
26
93. Remarks reachability
– Transitive closure without
length constraints is slow.
– Transitive closure over
repeating node/edge pattern
is only possible using tricks.
29
A B
*
94. Remarks reachability
– Transitive closure without
length constraints is slow.
– Transitive closure over
repeating node/edge pattern
is only possible using tricks.
29
A B
*
95. Remarks reachability
– Transitive closure without
length constraints is slow.
– Transitive closure over
repeating node/edge pattern
is only possible using tricks.
29
A B
*
96. Conclusions
– Source code analyzer
framework
– Searching for global error
patterns
– Close to real time feedback
– Type inference possible
– Test input generation possible
– Approach for both dynamically
and statically typed languages
– Using Neo4j for
– Storing
– Pattern matching
– Transforming
– Version control
– Storing metadata
30
97. – Our work was supported by:
– ÚNKP*
– Microsoft Azure for Research
– MTA-BME Lendület Program
Project Details
– The framework
prototype is open-
source.
https://github.com/
ftsrg/codemodel-rifle
31
* Supported by the ÚNKP-16-2-I. New National Excellence
Program of the Ministry of Human Capacities.
98. Project Details
– Supervisors
– Ádám Lippai
– Dávid Honfi
– Gábor Szárnyas
– Helped my research
– Tamás Soma Lucz
– Industrial case study
32