SlideShare a Scribd company logo
1 of 29
Download to read offline
Elegant and Scalable Code Querying with
Code Property Graphs
Fabian Yamaguchi
October 4, 2019, Connected Data London
Fabian Yamaguchi
Chief Scientist, ShiftLeft Inc.
Github: fabsx00
Twitter: @fabsx00
Email: fabs@shiftleft.io
PhD, University of Goettingen
Loves code analysis, vulnerability
discovery, machine learning, and
graphs!
What is Ocular? What is Joern?
- Goal: provide query language to describe patterns in code
- to identify bugs and vulnerabilities
- to help in deeply understanding large programs
- Think of it as an extensible Code Analysis Machine
- Programmable in JVM-based languages (e.g., Java/Scala/Kotlin)
- You can write scripts, language extensions and libraries on top of it
- Joern is Ocular’s free-spirited open-source brother
This talk is about the technology behind these engines
Low level graph representations of programs
- Each graph provides a different perspective on the code
- Can we merge them?
Abstract Syntax Trees
Control
flow graphs
Program dependence graphs
Dominator
tree
Combining graphs with “Property Graphs”
- “A property graph is a directed edge-labeled, attributed multigraph”
- Attributes allow data to be stored in nodes/edges
- Edge labels allow different types of relations to be present in one graph
Code Property Graph (2014) - rudimentary concept
presented in
Specification - Key Design Ideas
- Specification that works over programming languages
- Delay as much of the graph creation to second-stage
- Provide generic representation for core programming language concepts
- Methods/Functions
- Types
- Namespaces
- Instructions
- Call sites
- Encode control flow structures only via a control flow graph
- Model only local program properties and leave global program
representations for later analysis stages
OSS Specification for first-stage code property graph
https://github.com/ShiftLeftSecurity/codepropertygraph
A “container” for code over arbitrary instruction sets
- Define only a common format for
representing code
- Allow arbitrary instruction set (given
by semantics) as a parameter
- Represent all code using only
- call sites and method stubs
- call edges, and control flow edges
- data-flow semantics via data flow edges
Second stage: “linking”
jvm2cpg
csharp2cpg
llvm2cpg
...
go2cpg
Local analysis
(language
dependent)
Analyzer
(non)-interactive Report
cpg.bin.zip
Shared operations and “linking” steps
cpg2scpg
scpg.bin.zip
First level
(unlinked) CPG
Second level
(enhanced and linked) CPG
Base layer of the code property graph
- Production quality version of 2014 code property graph
- Language-independent intermediate representation of control-flow and
data-flow semantics
- Interprocedural, flow-sensitive, context-sensitive, field-sensitive data-flow
tracker available that operates on this representation
- Heuristics and street smarts to terminate in < 10 minutes
This is already pretty powerful...
- Entry names of archives
must be checked for control
characters (e.g., “../”) to
avoid path traversal
- First presented in phrack in ‘91
- Reported again with shiny logo
as “Zip-Slip” by startup in 2018
- https://www.youtube.com/watc
h?v=Ry_yb5Oipq0 for details
Typical vulnerability type associated with Zip files
- Names of archive entries are used unchecked as path
names when opening a file
- Concrete example: return values of ZipEntry.getName are
used as path names in a java.io.File constructor
File f = new File(entry.getName())
Sufficient to model control flow, data flow, and method
invocations correctly.
How does this surface in Java code?
Query
ocular> val src = cpg.method.fullName(“java.util.zip.*getName.*”).methodReturn
val snk = cpg.method.fullName(“.*File.<init>”).parameter.index(1)
snk.reachableBy(src).flows.passesNot(“.*escape|sanitize|contains.*”)
- “XML External Entity Processing” vulnerability
- The attacker controls an XML file processed by the application and
it does NOT disable resolving of external entities
- If this is given, the attacker can provide an XML file that reads and sends
sensitive files to the attacker as part of the parsing process
- OWASP cheat sheet says:
Typical XML-related vulnerability: XXE
Query
ocular> val snk = cpg.method.fullName(“.*XMLStreamReader.*next.*”)
.parameter.index(0)
val src =
cpg.method.fullName(“.*XMLInputFactory.*create.*”).methodReturn
snk.reachableBy(src).flows.l.andNot{
// No calls to setProperty
cpg.method.name(“setProperty”).fullName(“.*XML.*”).callIn.l
}
… but not powerful enough.
Higher level abstractions - Example: Microservice
Program
Java Platform “File”
“Database”
Java Platform
GET /foo/bar?...
{“foo” : “bar”, ...}
“Route”
“Handler”“request”
“Json response”
“query”
<form action=”/foo/bar”>
<input type=”hidden”
name=”csrfprotect”>
...
<input name=”foo”
htmlEscaped=”true”>
</form>
“Input field”
“HTML
attribute”
“Form”
“Information flow”
Humans tend to describe vulnerabilities on much
higher levels of abstraction!
Higher-level abstractions
Handlers/Routes (Vert.x framework)
@Override
public void start(Future<Void> done) {
// Create a router object.
Router router = Router.router(vertx);
router.get("/health").handler(rc -> rc.response().end("OK"));
// This is how one can do RBAC, e.g.: only admin is allowed
router.get("/greeting").handler(ctx ->
ctx.user().isAuthorized("booster-admin", authz -> {
if (authz.succeeded() && authz.result()) {
ctx.next();
} else {
log.error("AuthZ failed!");
ctx.fail(403);
}
}));
…
}
- Lambda expression flows into
first argument of `handler`
- Route (as string) flows into first
argument of `get`.
- Return value of `get` flows into
instance parameter of
`handler` (or the other way
around)
Extraction of handlers/routes
val HANDLER_METHOD_NAME = "io.vertx.ext.web.Route.handler:io.vertx.ext.web.Route(io.vertx.core.Handler)"
val firstParamOfHandler = cpg.method.fullNameExact(HANDLER_METHOD_NAME).parameter.index(1)
val flowsOfMethodRefsIntoHandler = firstParamOfHandler.reachableBy(cpg.methodRef).flows.l
val handlerNameRoutePairs = flowsOfMethodRefsIntoHandler.map { flow =>
val handlerName = flow.source.node.asInstanceOf[nodes.MethodRef].methodInstFullName
val route = routeForFlowIntoHandler(flow)(handlerName, route)
(handlerName, route)
}
def routeForFlowIntoHandler(flow : NewFlow) = {
val getOrPostSource = cpg.method.fullName("io.vertx.*Router.*(get|post):.*").methodReturn
Flow.sink.callsite
.map(x => x.asInstanceOf[nodes.Call]).get
.start.argument(0).reachableBy(getOrPostSource)
.flows.source.l
.flatMap(x => x.callsite.get.asInstanceOf[nodes.Call].start.argument(1).code.l)
.headOption.getOrElse("")
}
Creating DOM Trees from JSP files in a CPG Pass
class JspPass(cpg: Cpg) extends CpgPass(cpg) {
override def run(): Iterator[DiffGraph] = {
Queries.configFiles(cpg, ".jsp").toList.sorted.map {
case (filename, html) =>
val diffGraph = new DiffGraph
val rootCpgNode =
createAndAddTreesRecursively(
Jsoup.parse(html).root(), None, diffGraph)
cpg.configfile.nameExact(filename).headOption.foreach(
diffGraph.addEdgeFromOriginal(_,
rootCpgNode, EdgeTypes.CONTAINS))
diffGraph
}.toIterator
}
def createAndAddTreesRecursively(node: Node,
maybeParent: Option[NewDomNode],
diffGraph: DiffGraph): NewDomNode = {
implicit val dstGraph: DiffGraph = diffGraph
val attributes = node.attributes().asScala.map { attr =>
NewDomAttribute(attr.getKey, attr.getValue) }.toList
// Add node + attribute nodes
val newDomNode = NewDomNode(node.nodeName,
attributes)
diffGraph.addNode(newDomNode)
newDomNode.start.store()
// Add AST edge from parent to node
maybeParent.foreach(
diffGraph.addEdge(_, newDomNode, EdgeTypes.AST)
)
node.childNodes.asScala.foreach(
createAndAddTreesRecursively(_,
Some(newDomNode),
diffGraph))
newDomNode
}
}
Query for XSS detection using DOM + Bytecode
ocular> val src = cpg.form.method(“post”).input
.filterNot(_.attribute.name(“htmlEscaped”))
.variable
val snk = … // some output routine in HTML
snk.reachableBy(src).flows.l
Transition from
HTML DOM into
Java Code!
- Literature deals a lot with FPs due to model limitations, e.g.,
overtainting of collections, conservative call graphs, ...
- In practice, most FPs result from context information, e.g.,
information about the business logic, that you cannot
deduce from the code alone:
- “This is an internal service that only our admin uses”
- “Without first convincing the authentication server, this code would
never be executed”
- “Due to $aliens, this integer is always 5 and thus cannot be negative”
- Ability to model the $aliens part is crucial to reduce false
positives
- We do this mostly via passes that tag the graph
Outside information, business logic, and FP reduction
Code Property Graphs Today
Base layer - low level local program
representations: syntax, control flow,
methods, types.
Vulnerabilities
Multiple domain-specific layers [...]
Call graph, type hierarchy, data
flows, configurations, dependencies
Scaling static analysis
- Summaries
- Scaling static analysis requires “summaries” of program behavior (in order to skip duplicate
calculation of facts, e.g., for library methods)
- Calculating summaries for data flow is common practice
- Upper layers of the CPG generalize the concept of a summary
- Parallelism
- Processors aren’t getting much faster, but you’re getting more and more cores.
- Literature has very little to say about multiple cores, let alone multiple cloud instances
- CPG passes are a design with parallelism in mind
Designed for distributed computing
- Passes can be run in a sequence like the passes of a compiler
- The design also allows to run independent passes in parallel though!
End of presentation
@ShiftLeftInc @fabsx00
https://shiftleft.io
https://joern.io

More Related Content

What's hot

AEM GEMs Session Oak Lucene Indexes
AEM GEMs Session Oak Lucene IndexesAEM GEMs Session Oak Lucene Indexes
AEM GEMs Session Oak Lucene IndexesAdobeMarketingCloud
 
Dapr: distributed application runtime
Dapr: distributed application runtimeDapr: distributed application runtime
Dapr: distributed application runtimeMoaid Hathot
 
Curso practico-de-javascript
Curso practico-de-javascriptCurso practico-de-javascript
Curso practico-de-javascriptManuel Zarate
 
Java agents for fun and (not so much) profit
Java agents for fun and (not so much) profitJava agents for fun and (not so much) profit
Java agents for fun and (not so much) profitRobert Munteanu
 
Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실Taewan Kim
 
CRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migrationCRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migrationTomasz Rękawek
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Ruxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and Entitlements
Ruxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and EntitlementsRuxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and Entitlements
Ruxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and EntitlementsStefan Esser
 
Network Security and Visibility through NetFlow
Network Security and Visibility through NetFlowNetwork Security and Visibility through NetFlow
Network Security and Visibility through NetFlowLancope, Inc.
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014Amazon Web Services
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaAleksander Pohl
 

What's hot (20)

Tools kali
Tools kaliTools kali
Tools kali
 
AEM GEMs Session Oak Lucene Indexes
AEM GEMs Session Oak Lucene IndexesAEM GEMs Session Oak Lucene Indexes
AEM GEMs Session Oak Lucene Indexes
 
Dapr: distributed application runtime
Dapr: distributed application runtimeDapr: distributed application runtime
Dapr: distributed application runtime
 
Curso practico-de-javascript
Curso practico-de-javascriptCurso practico-de-javascript
Curso practico-de-javascript
 
Java agents for fun and (not so much) profit
Java agents for fun and (not so much) profitJava agents for fun and (not so much) profit
Java agents for fun and (not so much) profit
 
Logging with log4j v1.2
Logging with log4j v1.2Logging with log4j v1.2
Logging with log4j v1.2
 
NMap
NMapNMap
NMap
 
Node js for beginners
Node js for beginnersNode js for beginners
Node js for beginners
 
Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실Cloud Native Java GraalVM 이상과 현실
Cloud Native Java GraalVM 이상과 현실
 
Nmap tutorial
Nmap tutorialNmap tutorial
Nmap tutorial
 
CRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migrationCRX2Oak - all the secrets of repository migration
CRX2Oak - all the secrets of repository migration
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Ruxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and Entitlements
Ruxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and EntitlementsRuxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and Entitlements
Ruxcon 2014 - Stefan Esser - iOS8 Containers, Sandboxes and Entitlements
 
Network Security and Visibility through NetFlow
Network Security and Visibility through NetFlowNetwork Security and Visibility through NetFlow
Network Security and Visibility through NetFlow
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
(PFC306) Performance Tuning Amazon EC2 Instances | AWS re:Invent 2014
 
Nmap and metasploitable
Nmap and metasploitableNmap and metasploitable
Nmap and metasploitable
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for Java
 
Local storage
Local storageLocal storage
Local storage
 
Json Tutorial
Json TutorialJson Tutorial
Json Tutorial
 

Similar to Elegant and Scalable Code Querying with Code Property Graphs

Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...Codemotion
 
Porting VisualWorks code to Pharo
Porting VisualWorks code to PharoPorting VisualWorks code to Pharo
Porting VisualWorks code to PharoESUG
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsLessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsItai Yaffe
 
JSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph PicklJSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph PicklChristoph Pickl
 
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...Codemotion
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiSebastian Ruder
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini
 
Construction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific LanguagesConstruction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific LanguagesThoughtWorks
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19Joseph Kuo
 
PHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutionsPHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutionsOleg Zinchenko
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientAngelo Dell'Aera
 
Server Side Javascript
Server Side JavascriptServer Side Javascript
Server Side Javascriptrajivmordani
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceTim Ellison
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...Karthik Murugesan
 

Similar to Elegant and Scalable Code Querying with Code Property Graphs (20)

Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
Alberto Maria Angelo Paro - Isomorphic programming in Scala and WebDevelopmen...
 
Porting VisualWorks code to Pharo
Porting VisualWorks code to PharoPorting VisualWorks code to Pharo
Porting VisualWorks code to Pharo
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsLessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark Applications
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
JSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph PicklJSUG - Filthy Flex by Christoph Pickl
JSUG - Filthy Flex by Christoph Pickl
 
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
ClojureScript - Making Front-End development Fun again - John Stevenson - Cod...
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
 
Construction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific LanguagesConstruction Techniques For Domain Specific Languages
Construction Techniques For Domain Specific Languages
 
JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19JCConf 2022 - New Features in Java 18 & 19
JCConf 2022 - New Features in Java 18 & 19
 
PHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutionsPHP. Trends, implementations, frameworks and solutions
PHP. Trends, implementations, frameworks and solutions
 
Kamailio Updates - VUC 588
Kamailio Updates - VUC 588Kamailio Updates - VUC 588
Kamailio Updates - VUC 588
 
Test2
Test2Test2
Test2
 
Test2
Test2Test2
Test2
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 
Server Side Javascript
Server Side JavascriptServer Side Javascript
Server Side Javascript
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...The magic behind your Lyft ride prices: A case study on machine learning and ...
The magic behind your Lyft ride prices: A case study on machine learning and ...
 

More from Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Connected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleConnected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?Connected Data World
 

More from Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Semantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scaleSemantic similarity for faster Knowledge Graph delivery at scale
Semantic similarity for faster Knowledge Graph delivery at scale
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Elegant and Scalable Code Querying with Code Property Graphs

  • 1. Elegant and Scalable Code Querying with Code Property Graphs Fabian Yamaguchi October 4, 2019, Connected Data London
  • 2. Fabian Yamaguchi Chief Scientist, ShiftLeft Inc. Github: fabsx00 Twitter: @fabsx00 Email: fabs@shiftleft.io PhD, University of Goettingen Loves code analysis, vulnerability discovery, machine learning, and graphs!
  • 3. What is Ocular? What is Joern? - Goal: provide query language to describe patterns in code - to identify bugs and vulnerabilities - to help in deeply understanding large programs - Think of it as an extensible Code Analysis Machine - Programmable in JVM-based languages (e.g., Java/Scala/Kotlin) - You can write scripts, language extensions and libraries on top of it - Joern is Ocular’s free-spirited open-source brother This talk is about the technology behind these engines
  • 4. Low level graph representations of programs - Each graph provides a different perspective on the code - Can we merge them? Abstract Syntax Trees Control flow graphs Program dependence graphs Dominator tree
  • 5. Combining graphs with “Property Graphs” - “A property graph is a directed edge-labeled, attributed multigraph” - Attributes allow data to be stored in nodes/edges - Edge labels allow different types of relations to be present in one graph
  • 6. Code Property Graph (2014) - rudimentary concept presented in
  • 7. Specification - Key Design Ideas - Specification that works over programming languages - Delay as much of the graph creation to second-stage - Provide generic representation for core programming language concepts - Methods/Functions - Types - Namespaces - Instructions - Call sites - Encode control flow structures only via a control flow graph - Model only local program properties and leave global program representations for later analysis stages
  • 8. OSS Specification for first-stage code property graph https://github.com/ShiftLeftSecurity/codepropertygraph
  • 9. A “container” for code over arbitrary instruction sets - Define only a common format for representing code - Allow arbitrary instruction set (given by semantics) as a parameter - Represent all code using only - call sites and method stubs - call edges, and control flow edges - data-flow semantics via data flow edges
  • 10. Second stage: “linking” jvm2cpg csharp2cpg llvm2cpg ... go2cpg Local analysis (language dependent) Analyzer (non)-interactive Report cpg.bin.zip Shared operations and “linking” steps cpg2scpg scpg.bin.zip First level (unlinked) CPG Second level (enhanced and linked) CPG
  • 11. Base layer of the code property graph - Production quality version of 2014 code property graph - Language-independent intermediate representation of control-flow and data-flow semantics - Interprocedural, flow-sensitive, context-sensitive, field-sensitive data-flow tracker available that operates on this representation - Heuristics and street smarts to terminate in < 10 minutes
  • 12. This is already pretty powerful...
  • 13. - Entry names of archives must be checked for control characters (e.g., “../”) to avoid path traversal - First presented in phrack in ‘91 - Reported again with shiny logo as “Zip-Slip” by startup in 2018 - https://www.youtube.com/watc h?v=Ry_yb5Oipq0 for details Typical vulnerability type associated with Zip files
  • 14. - Names of archive entries are used unchecked as path names when opening a file - Concrete example: return values of ZipEntry.getName are used as path names in a java.io.File constructor File f = new File(entry.getName()) Sufficient to model control flow, data flow, and method invocations correctly. How does this surface in Java code?
  • 15. Query ocular> val src = cpg.method.fullName(“java.util.zip.*getName.*”).methodReturn val snk = cpg.method.fullName(“.*File.<init>”).parameter.index(1) snk.reachableBy(src).flows.passesNot(“.*escape|sanitize|contains.*”)
  • 16. - “XML External Entity Processing” vulnerability - The attacker controls an XML file processed by the application and it does NOT disable resolving of external entities - If this is given, the attacker can provide an XML file that reads and sends sensitive files to the attacker as part of the parsing process - OWASP cheat sheet says: Typical XML-related vulnerability: XXE
  • 17. Query ocular> val snk = cpg.method.fullName(“.*XMLStreamReader.*next.*”) .parameter.index(0) val src = cpg.method.fullName(“.*XMLInputFactory.*create.*”).methodReturn snk.reachableBy(src).flows.l.andNot{ // No calls to setProperty cpg.method.name(“setProperty”).fullName(“.*XML.*”).callIn.l }
  • 18. … but not powerful enough.
  • 19. Higher level abstractions - Example: Microservice Program Java Platform “File” “Database” Java Platform GET /foo/bar?... {“foo” : “bar”, ...} “Route” “Handler”“request” “Json response” “query” <form action=”/foo/bar”> <input type=”hidden” name=”csrfprotect”> ... <input name=”foo” htmlEscaped=”true”> </form> “Input field” “HTML attribute” “Form” “Information flow” Humans tend to describe vulnerabilities on much higher levels of abstraction!
  • 21. Handlers/Routes (Vert.x framework) @Override public void start(Future<Void> done) { // Create a router object. Router router = Router.router(vertx); router.get("/health").handler(rc -> rc.response().end("OK")); // This is how one can do RBAC, e.g.: only admin is allowed router.get("/greeting").handler(ctx -> ctx.user().isAuthorized("booster-admin", authz -> { if (authz.succeeded() && authz.result()) { ctx.next(); } else { log.error("AuthZ failed!"); ctx.fail(403); } })); … } - Lambda expression flows into first argument of `handler` - Route (as string) flows into first argument of `get`. - Return value of `get` flows into instance parameter of `handler` (or the other way around)
  • 22. Extraction of handlers/routes val HANDLER_METHOD_NAME = "io.vertx.ext.web.Route.handler:io.vertx.ext.web.Route(io.vertx.core.Handler)" val firstParamOfHandler = cpg.method.fullNameExact(HANDLER_METHOD_NAME).parameter.index(1) val flowsOfMethodRefsIntoHandler = firstParamOfHandler.reachableBy(cpg.methodRef).flows.l val handlerNameRoutePairs = flowsOfMethodRefsIntoHandler.map { flow => val handlerName = flow.source.node.asInstanceOf[nodes.MethodRef].methodInstFullName val route = routeForFlowIntoHandler(flow)(handlerName, route) (handlerName, route) } def routeForFlowIntoHandler(flow : NewFlow) = { val getOrPostSource = cpg.method.fullName("io.vertx.*Router.*(get|post):.*").methodReturn Flow.sink.callsite .map(x => x.asInstanceOf[nodes.Call]).get .start.argument(0).reachableBy(getOrPostSource) .flows.source.l .flatMap(x => x.callsite.get.asInstanceOf[nodes.Call].start.argument(1).code.l) .headOption.getOrElse("") }
  • 23. Creating DOM Trees from JSP files in a CPG Pass class JspPass(cpg: Cpg) extends CpgPass(cpg) { override def run(): Iterator[DiffGraph] = { Queries.configFiles(cpg, ".jsp").toList.sorted.map { case (filename, html) => val diffGraph = new DiffGraph val rootCpgNode = createAndAddTreesRecursively( Jsoup.parse(html).root(), None, diffGraph) cpg.configfile.nameExact(filename).headOption.foreach( diffGraph.addEdgeFromOriginal(_, rootCpgNode, EdgeTypes.CONTAINS)) diffGraph }.toIterator } def createAndAddTreesRecursively(node: Node, maybeParent: Option[NewDomNode], diffGraph: DiffGraph): NewDomNode = { implicit val dstGraph: DiffGraph = diffGraph val attributes = node.attributes().asScala.map { attr => NewDomAttribute(attr.getKey, attr.getValue) }.toList // Add node + attribute nodes val newDomNode = NewDomNode(node.nodeName, attributes) diffGraph.addNode(newDomNode) newDomNode.start.store() // Add AST edge from parent to node maybeParent.foreach( diffGraph.addEdge(_, newDomNode, EdgeTypes.AST) ) node.childNodes.asScala.foreach( createAndAddTreesRecursively(_, Some(newDomNode), diffGraph)) newDomNode } }
  • 24. Query for XSS detection using DOM + Bytecode ocular> val src = cpg.form.method(“post”).input .filterNot(_.attribute.name(“htmlEscaped”)) .variable val snk = … // some output routine in HTML snk.reachableBy(src).flows.l Transition from HTML DOM into Java Code!
  • 25. - Literature deals a lot with FPs due to model limitations, e.g., overtainting of collections, conservative call graphs, ... - In practice, most FPs result from context information, e.g., information about the business logic, that you cannot deduce from the code alone: - “This is an internal service that only our admin uses” - “Without first convincing the authentication server, this code would never be executed” - “Due to $aliens, this integer is always 5 and thus cannot be negative” - Ability to model the $aliens part is crucial to reduce false positives - We do this mostly via passes that tag the graph Outside information, business logic, and FP reduction
  • 26. Code Property Graphs Today Base layer - low level local program representations: syntax, control flow, methods, types. Vulnerabilities Multiple domain-specific layers [...] Call graph, type hierarchy, data flows, configurations, dependencies
  • 27. Scaling static analysis - Summaries - Scaling static analysis requires “summaries” of program behavior (in order to skip duplicate calculation of facts, e.g., for library methods) - Calculating summaries for data flow is common practice - Upper layers of the CPG generalize the concept of a summary - Parallelism - Processors aren’t getting much faster, but you’re getting more and more cores. - Literature has very little to say about multiple cores, let alone multiple cloud instances - CPG passes are a design with parallelism in mind
  • 28. Designed for distributed computing - Passes can be run in a sequence like the passes of a compiler - The design also allows to run independent passes in parallel though!
  • 29. End of presentation @ShiftLeftInc @fabsx00 https://shiftleft.io https://joern.io