Plume
A Code Property Graph Extraction and
Analysis Library
1
S.D. Baker Effendi, A.B. van der Merwe, & W. Visser
Stellenbosch University
Using Code Property Graphs and Pushdown
Systems for Static Analysis
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
❏ Introduction to Plume
❏ Background
❏ Code Property Graph
❏ Data-Flow Analysis
❏ Pushdown Systems
❏ How Plume works
❏ The future of Plume
2
Overview
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
❏ Plume is an open-source, static
analysis library
❏ A code property graph is extracted
from JVM bytecode
❏ This code property graph is stored in a
graph database backend
❏ Data-flow analysis is run on the graph
database by using graph queries
❏ Written using Kotlin which is
interoperable with Java
3
Introduction
| GRAPHAIWORLD.COM | #GRAPHAIWORLD | 4
Background
| GRAPHAIWORLD.COM |
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
The Code Property Graph
F Yamaguchi, et al. introduced the code property graph (CPG) that merges the
❏ abstract syntax tree (AST),
❏ control flow graph (CFG), and
❏ program dependence graph (PDG)
into a joint data structure.
5
Illustration of a code property graph from the original paper “Modeling and Discovering Vulnerabilities with Code Property Graphs”
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
The Code Property Graph
❏ The CPG is independent of the
programming language
❏ Software vulnerabilities can be
identified from the CPG
❏ Graph patterns of known
vulnerabilities are then matched
❏ ShiftLeft have commercialized the
CPG for DevSecOps
6
Illustration of a CPG projection from ShiftLeft.io
Yamaguchi, Fabian, et al. "Modeling and discovering vulnerabilities with code property graphs." 2014 IEEE Symposium on Security and Privacy. IEEE, 2014.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data-Flow Analysis
❏ Data-flow analysis is a technique for
gathering information about the
possible set of values calculated at
various points in a program
❏ The control flow graph is used to
determine where a particular value
might propagate
7
Sagiv, Mooly, Thomas Reps, and Susan Horwitz. "Precise interprocedural dataflow analysis with applications to constant propagation." Theoretical Computer Science 167.1-2 (1996): 131-170.
The supergraph is annotated with the dataflow functions for the “possibly-
uninitialized variables” problem.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data-Flow Analysis
❏ A procedure is a small section of a program
that performs a specific task
❏ Intraprocedural analysis looks at analyzing a
single procedure
❏ Interprocedural analysis uses calling
relationships among multiple procedures
❏ Example analysis’ are:
❏ reaching definitions
❏ liveness analysis
❏ constant propagation
8
Reps, Thomas, Susan Horwitz, and Mooly Sagiv. "Precise interprocedural dataflow analysis via graph reachability." Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of
programming languages. 1995.
The exploded super-graph that corresponds to the instance of the
possibly-uninitialized variables problem shown in the last figure.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data-Flow Analysis
❏ Reps, Horwitz and Sagiv introduced
frameworks for a general way of
solving these problems in polynomial
time
❏ E Bodden created a generic IFDS/IDE
solver on top of Soot
❏ This was able to implement a wider
range of analysis such as typestate
and information-flow
9
Bodden, Eric. "Inter-procedural data-flow analysis with IFDS/IDE and Soot." Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis. 2012.
Exploded super-graph for an IFDS information-flow analysis.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Soot
❏ Soot is a Java optimization framework
originally developed by the Sable Research
Group of McGill University
❏ Soot provides a range of analysis such as:
❏ call-graph construction
❏ points-to analysis
❏ data-flow analysis with IFDS/IDE
❏ Soot transforms programs into an intermediate
representation (IR) which is then analyzed
10
Soot - A framework for analyzing and transforming Java and Android applications https://soot-oss.github.io/soot
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
IDE in Typestate Analysis
❏ Typestates define valid sequences of operations that can be performed upon
an instance of a given type
❏ Aliasing refers to the situation where the same memory location can be
accessed using different names
❏ Späth, et. al. presented an alias-aware extension on the IDE framework with
IDEal
which improved upon the efficiency and precision of typestate analysis
11
File a = new File();
File b = a;
b.open();
a.close();
Späth, J., Ali, K., & Bodden, E. (2017). IDEal: efficient and precise alias-aware dataflow analysis. Proc. ACM Program. Lang., 1(OOPSLA), 99-1.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data-Flow Analysis Limitations
Rice’s theorem
Any non-trivial, semantic property of a program is
undecidable.
A semantic property concerns a program’s behaviour
e.g. does a program terminate for all inputs?
To ensure an analysis terminates, we need to put a
boundary on the data-flow domain but ultimately leads
to imprecision. One technique is by limiting
field-sequence access paths to length k.
12
If we have an algorithm that decides a non-trivial property, we can
construct a Turing machine that decides the halting problem.
By Booyabazooka - Own work, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=5407483
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Pushdown Systems
❏ A pushdown automata (PDS) is a
finite-state automata with extra memory
called a stack
❏ Each state is called a control location
❏ This class of automata recognize Context
Free Languages (CFL)
❏ A CFL is generated by a context free
grammar (CFG)
13
A diagram of a pushdown automaton.
By Jochgem - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=4983792
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Pushdown Systems
❏ Context-, and field-sensitivity can be
expressed using CFL reachability problems
❏ Späth, et. al. introduced the notion of
synchronized pushdown systems (SPDS)
to efficiently solve any single
CFL-reachability problem
❏ An SPDS is a combination of two
flow-sensitive automata; a call-PDS and a
field-PDS
14
Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on
Programming Languages 3.POPL (2019): 1-29.
A points-to analysis can be formulated by the reachability problem under the
following Dyck Language:
Yuan, Hao, and Patrick Eugster. "An efficient algorithm for solving the
dyck-cfl reachability problem on trees." European Symposium on
Programming. Springer, Berlin, Heidelberg, 2009.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Pushdown System of Calls
15
Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on
Programming Languages 3.POPL (2019): 1-29.
Data-flow example for a simple recursive program.
Automaton computed with the post* algorithm.
The structure of a call-PDS:
❏ Control locations are program
variables
❏ The stack alphabet is the set of
program statements
❏ The rule set models the data-flow
effect of a variable at a statement
This automaton provides
context-sensitivity.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Pushdown System of Fields
The structure of a field-PDS:
❏ Control locations is a pair of a variable
and a statement
❏ The stack alphabet is the set of all fields
of a program
❏ The rule set models the data-flow
within the access paths
This automaton provides field-sensitivity.
16
Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on
Programming Languages 3.POPL (2019): 1-29.
Data-flow example for a simple if-else statement with field accesses.
Automaton computed with the post* algorithm.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Synchronized Pushdown Systems
17
Pushdown System of Calls Pushdown System of Fields SPDS
Flow sensitive ✔ ✔ ✔
Context-Sensitive ✔ ✘ ✔
Field-Sensitive ✘ ✔ ✔
Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on
Programming Languages 3.POPL (2019): 1-29.
Both pushdown systems can answer reachability queries and handle recursive
structures.
Each PDS has a precision advantage over the other so by combining them we
get the precision benefits of both.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
SPDS Advantages
18
Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on
Programming Languages 3.POPL (2019): 1-29.
❏ The PDA of fields is a concise and finite
representation of (potentially infinitely many)
access paths
❏ No need to resort to k-limiting - preserves
precision!
❏ In pointer-analysis, SPDS avoids exponential
growth of the abstract domain by using
PDS-based encoding
❏ Typestate information can be encoded as
weights to any of the PDAs
A PDA of fields and its finite representation of an infinite set of
access paths.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
SPDS Limitations
19
Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on
Programming Languages 3.POPL (2019): 1-29.
SPDS over-approximates in corner cases where a
context-insensitive data-flow path occurs at the
same time as a field-sensitive path or vice versa.
These are typically only during synthetic examples
and, based on Späth, et. al.’s empirical evaluation,
these situations do not arise in practice.
Thus, an improperly matched call site does not
induce a properly matched field access.
| GRAPHAIWORLD.COM | #GRAPHAIWORLD | 20
Back to Plume
| GRAPHAIWORLD.COM |
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Features of Plume
21
Code Property Graph
+
Synchronized Pushdown Systems
+
Graph Database
=
❏ Language independent analysis on the CPG
❏ Provides flow-, context-, field- sensitive and
alias-aware dataflow analysis
❏ Provides the ability to perform static analysis
incrementally and store results in the graph
database
❏ Partial updates to the CPG when
source-code is updated
❏ Scales for large programs by leveraging a
graph database backend
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
How does Plume work?
22
Plume is a Kotlin library divided into 3 parts
❏ Driver: connects to the database of choice
❏ Extractor: creates a CPG from bytecode
❏ Analyser: performs data-flow analysis on the CPG
The three parts represent the separation of concerns between the different
stages and requirements of the CPG driven analysis pipeline.
Connect to Graph Database Extract Code Property Graph
Graph Icons from graph theory tree by Ecem Afacan from the Noun Project
Analyze Code Property Graph
.java
.py
.js
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
How does Plume communicate?
23
Plume’s driver aims to be graph
database agnostic in order to
eventually benchmark all supported
graph databases in the application of
data-flow analysis against each other.
The driver provides a generic interface
with which the extractor and analyzer
are to interact with.
There are more graph databases to be
supported in the future.
<<interface>>
IDriver
+ exists(PlumeVertex): boolean
+ addVertex(PlumeVertex)
+ addEdge(PlumeVertex, PlumeVertex, EdgeType)
...
TinkerGraph JanusGraph TigerGraph Amazon Neptune
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Plume’s Extraction Process
24
❏ Soot is used to convert JVM bytecode to an IR called Jimple
❏ Jimple is based on three-address code and only uses 15 different
operations
❏ Jimple is then converted into Soot’s UnitGraph and CallGraph objects
❏ The extractor converts these two objects into a code property graph
❏ Plume supports compiling Python 2.7 and JavaScript 1.7 into JVM bytecode
using Jython and Mozilla Rhino respectively
Convert source code to class files
.java
.py
.js
.class .jimple
Extract Jimple and graphs using Soot
Graph Icons from graph theory tree by Ecem Afacan from the Noun Project
Store CPG in database
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Example
25
package intraprocedural.basic;
public class Basic1 {
public static void main(String[] args) {
int a = 3;
int b = 2;
int c = a + b;
}
}
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Example
26
package intraprocedural.conditional;
public class Conditional1 {
public static void main(String[] args) {
int a = 1;
int b = 2;
if (a > b) {
a -= b;
b -= b;
} else {
b += a;
}
}
}
| GRAPHAIWORLD.COM | #GRAPHAIWORLD |
What Plume can do
❏ Generate an intraprocedural
code property graph
❏ Connect to TinkerGraph,
JanusGraph, TigerGraph, and
Amazon Neptune
❏ Compile Java, Python 2.7 and
JavaScript 1.7 code
Plans for Plume
❏ Add interprocedural edges
❏ Include Neo4j
❏ Perform interprocedural
data-flow analysis algorithms
❏ Investigate soundness of
analysis for dynamic vs static
languages
❏ Investigate the use of GCNNs
for vulnerability detection
27
Plume Roadmap
| GRAPHAIWORLD.COM | #GRAPHAIWORLD | 28
Try it out!
Examples Repository
https://github.com/plume-oss/plume-examples
Documentation
https://plume-oss.github.io/plume-docs/
Plume GitHub
https://github.com/plume-oss

Plume - A Code Property Graph Extraction and Analysis Library

  • 1.
    Plume A Code PropertyGraph Extraction and Analysis Library 1 S.D. Baker Effendi, A.B. van der Merwe, & W. Visser Stellenbosch University Using Code Property Graphs and Pushdown Systems for Static Analysis
  • 2.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | ❏ Introduction to Plume ❏ Background ❏ Code Property Graph ❏ Data-Flow Analysis ❏ Pushdown Systems ❏ How Plume works ❏ The future of Plume 2 Overview
  • 3.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | ❏ Plume is an open-source, static analysis library ❏ A code property graph is extracted from JVM bytecode ❏ This code property graph is stored in a graph database backend ❏ Data-flow analysis is run on the graph database by using graph queries ❏ Written using Kotlin which is interoperable with Java 3 Introduction
  • 4.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | 4 Background | GRAPHAIWORLD.COM |
  • 5.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | The Code Property Graph F Yamaguchi, et al. introduced the code property graph (CPG) that merges the ❏ abstract syntax tree (AST), ❏ control flow graph (CFG), and ❏ program dependence graph (PDG) into a joint data structure. 5 Illustration of a code property graph from the original paper “Modeling and Discovering Vulnerabilities with Code Property Graphs”
  • 6.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | The Code Property Graph ❏ The CPG is independent of the programming language ❏ Software vulnerabilities can be identified from the CPG ❏ Graph patterns of known vulnerabilities are then matched ❏ ShiftLeft have commercialized the CPG for DevSecOps 6 Illustration of a CPG projection from ShiftLeft.io Yamaguchi, Fabian, et al. "Modeling and discovering vulnerabilities with code property graphs." 2014 IEEE Symposium on Security and Privacy. IEEE, 2014.
  • 7.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Data-Flow Analysis ❏ Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a program ❏ The control flow graph is used to determine where a particular value might propagate 7 Sagiv, Mooly, Thomas Reps, and Susan Horwitz. "Precise interprocedural dataflow analysis with applications to constant propagation." Theoretical Computer Science 167.1-2 (1996): 131-170. The supergraph is annotated with the dataflow functions for the “possibly- uninitialized variables” problem.
  • 8.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Data-Flow Analysis ❏ A procedure is a small section of a program that performs a specific task ❏ Intraprocedural analysis looks at analyzing a single procedure ❏ Interprocedural analysis uses calling relationships among multiple procedures ❏ Example analysis’ are: ❏ reaching definitions ❏ liveness analysis ❏ constant propagation 8 Reps, Thomas, Susan Horwitz, and Mooly Sagiv. "Precise interprocedural dataflow analysis via graph reachability." Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 1995. The exploded super-graph that corresponds to the instance of the possibly-uninitialized variables problem shown in the last figure.
  • 9.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Data-Flow Analysis ❏ Reps, Horwitz and Sagiv introduced frameworks for a general way of solving these problems in polynomial time ❏ E Bodden created a generic IFDS/IDE solver on top of Soot ❏ This was able to implement a wider range of analysis such as typestate and information-flow 9 Bodden, Eric. "Inter-procedural data-flow analysis with IFDS/IDE and Soot." Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis. 2012. Exploded super-graph for an IFDS information-flow analysis.
  • 10.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Soot ❏ Soot is a Java optimization framework originally developed by the Sable Research Group of McGill University ❏ Soot provides a range of analysis such as: ❏ call-graph construction ❏ points-to analysis ❏ data-flow analysis with IFDS/IDE ❏ Soot transforms programs into an intermediate representation (IR) which is then analyzed 10 Soot - A framework for analyzing and transforming Java and Android applications https://soot-oss.github.io/soot
  • 11.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | IDE in Typestate Analysis ❏ Typestates define valid sequences of operations that can be performed upon an instance of a given type ❏ Aliasing refers to the situation where the same memory location can be accessed using different names ❏ Späth, et. al. presented an alias-aware extension on the IDE framework with IDEal which improved upon the efficiency and precision of typestate analysis 11 File a = new File(); File b = a; b.open(); a.close(); Späth, J., Ali, K., & Bodden, E. (2017). IDEal: efficient and precise alias-aware dataflow analysis. Proc. ACM Program. Lang., 1(OOPSLA), 99-1.
  • 12.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Data-Flow Analysis Limitations Rice’s theorem Any non-trivial, semantic property of a program is undecidable. A semantic property concerns a program’s behaviour e.g. does a program terminate for all inputs? To ensure an analysis terminates, we need to put a boundary on the data-flow domain but ultimately leads to imprecision. One technique is by limiting field-sequence access paths to length k. 12 If we have an algorithm that decides a non-trivial property, we can construct a Turing machine that decides the halting problem. By Booyabazooka - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=5407483
  • 13.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Pushdown Systems ❏ A pushdown automata (PDS) is a finite-state automata with extra memory called a stack ❏ Each state is called a control location ❏ This class of automata recognize Context Free Languages (CFL) ❏ A CFL is generated by a context free grammar (CFG) 13 A diagram of a pushdown automaton. By Jochgem - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4983792
  • 14.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Pushdown Systems ❏ Context-, and field-sensitivity can be expressed using CFL reachability problems ❏ Späth, et. al. introduced the notion of synchronized pushdown systems (SPDS) to efficiently solve any single CFL-reachability problem ❏ An SPDS is a combination of two flow-sensitive automata; a call-PDS and a field-PDS 14 Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29. A points-to analysis can be formulated by the reachability problem under the following Dyck Language: Yuan, Hao, and Patrick Eugster. "An efficient algorithm for solving the dyck-cfl reachability problem on trees." European Symposium on Programming. Springer, Berlin, Heidelberg, 2009.
  • 15.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Pushdown System of Calls 15 Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29. Data-flow example for a simple recursive program. Automaton computed with the post* algorithm. The structure of a call-PDS: ❏ Control locations are program variables ❏ The stack alphabet is the set of program statements ❏ The rule set models the data-flow effect of a variable at a statement This automaton provides context-sensitivity.
  • 16.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Pushdown System of Fields The structure of a field-PDS: ❏ Control locations is a pair of a variable and a statement ❏ The stack alphabet is the set of all fields of a program ❏ The rule set models the data-flow within the access paths This automaton provides field-sensitivity. 16 Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29. Data-flow example for a simple if-else statement with field accesses. Automaton computed with the post* algorithm.
  • 17.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Synchronized Pushdown Systems 17 Pushdown System of Calls Pushdown System of Fields SPDS Flow sensitive ✔ ✔ ✔ Context-Sensitive ✔ ✘ ✔ Field-Sensitive ✘ ✔ ✔ Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29. Both pushdown systems can answer reachability queries and handle recursive structures. Each PDS has a precision advantage over the other so by combining them we get the precision benefits of both.
  • 18.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | SPDS Advantages 18 Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29. ❏ The PDA of fields is a concise and finite representation of (potentially infinitely many) access paths ❏ No need to resort to k-limiting - preserves precision! ❏ In pointer-analysis, SPDS avoids exponential growth of the abstract domain by using PDS-based encoding ❏ Typestate information can be encoded as weights to any of the PDAs A PDA of fields and its finite representation of an infinite set of access paths.
  • 19.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | SPDS Limitations 19 Späth, Johannes, Karim Ali, and Eric Bodden. "Context-, flow-, and field-sensitive data-flow analysis using synchronized pushdown systems." Proceedings of the ACM on Programming Languages 3.POPL (2019): 1-29. SPDS over-approximates in corner cases where a context-insensitive data-flow path occurs at the same time as a field-sensitive path or vice versa. These are typically only during synthetic examples and, based on Späth, et. al.’s empirical evaluation, these situations do not arise in practice. Thus, an improperly matched call site does not induce a properly matched field access.
  • 20.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | 20 Back to Plume | GRAPHAIWORLD.COM |
  • 21.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Features of Plume 21 Code Property Graph + Synchronized Pushdown Systems + Graph Database = ❏ Language independent analysis on the CPG ❏ Provides flow-, context-, field- sensitive and alias-aware dataflow analysis ❏ Provides the ability to perform static analysis incrementally and store results in the graph database ❏ Partial updates to the CPG when source-code is updated ❏ Scales for large programs by leveraging a graph database backend
  • 22.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | How does Plume work? 22 Plume is a Kotlin library divided into 3 parts ❏ Driver: connects to the database of choice ❏ Extractor: creates a CPG from bytecode ❏ Analyser: performs data-flow analysis on the CPG The three parts represent the separation of concerns between the different stages and requirements of the CPG driven analysis pipeline. Connect to Graph Database Extract Code Property Graph Graph Icons from graph theory tree by Ecem Afacan from the Noun Project Analyze Code Property Graph .java .py .js
  • 23.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | How does Plume communicate? 23 Plume’s driver aims to be graph database agnostic in order to eventually benchmark all supported graph databases in the application of data-flow analysis against each other. The driver provides a generic interface with which the extractor and analyzer are to interact with. There are more graph databases to be supported in the future. <<interface>> IDriver + exists(PlumeVertex): boolean + addVertex(PlumeVertex) + addEdge(PlumeVertex, PlumeVertex, EdgeType) ... TinkerGraph JanusGraph TigerGraph Amazon Neptune
  • 24.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Plume’s Extraction Process 24 ❏ Soot is used to convert JVM bytecode to an IR called Jimple ❏ Jimple is based on three-address code and only uses 15 different operations ❏ Jimple is then converted into Soot’s UnitGraph and CallGraph objects ❏ The extractor converts these two objects into a code property graph ❏ Plume supports compiling Python 2.7 and JavaScript 1.7 into JVM bytecode using Jython and Mozilla Rhino respectively Convert source code to class files .java .py .js .class .jimple Extract Jimple and graphs using Soot Graph Icons from graph theory tree by Ecem Afacan from the Noun Project Store CPG in database
  • 25.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Example 25 package intraprocedural.basic; public class Basic1 { public static void main(String[] args) { int a = 3; int b = 2; int c = a + b; } }
  • 26.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | Example 26 package intraprocedural.conditional; public class Conditional1 { public static void main(String[] args) { int a = 1; int b = 2; if (a > b) { a -= b; b -= b; } else { b += a; } } }
  • 27.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | What Plume can do ❏ Generate an intraprocedural code property graph ❏ Connect to TinkerGraph, JanusGraph, TigerGraph, and Amazon Neptune ❏ Compile Java, Python 2.7 and JavaScript 1.7 code Plans for Plume ❏ Add interprocedural edges ❏ Include Neo4j ❏ Perform interprocedural data-flow analysis algorithms ❏ Investigate soundness of analysis for dynamic vs static languages ❏ Investigate the use of GCNNs for vulnerability detection 27 Plume Roadmap
  • 28.
    | GRAPHAIWORLD.COM |#GRAPHAIWORLD | 28 Try it out! Examples Repository https://github.com/plume-oss/plume-examples Documentation https://plume-oss.github.io/plume-docs/ Plume GitHub https://github.com/plume-oss