SlideShare a Scribd company logo
#BHUSA @BlackHatEvents
CodeQL: Also a Powerful Binary Analysis Engine
Haiquan Zhang
rhettxie
@tencent security yunding lab
#BHUSA @BlackHatEvents
Outline
Ø The story begins with static analysis
Ø CodeQL Introduction(Architecture and Core Engine)
Ø Exploring the implementation details of CodeQL through the lifecycle of QL statements
Ø How to extend CodeQL to support binary analysis
Ø A newly developed dedicated debugger
Ø Show time
#BHUSA @BlackHatEvents
The Story Begins With Static Program Analysis
Ø Static program analysis is the art of reasoning about the behavior of computer
programs without actually running them
Ø A static program analyzer is a program that reasons about the behavior of other
programs
Ø It’s a hard work last for several decades. Resolved many issues, but still many
problems remaining
Ø Precision and efficiency are the two big problems
#BHUSA @BlackHatEvents
Many amazing technologies have been developed
Ø Type Inference and Abstract Interpretation
Ø Data Flow & Control Flow & Dependency Analysis
Ø Pointer Analysis
Ø Path Sensitive and Context Sensitive Analysis
Ø Constraint Solving and Symbolic Execution
#BHUSA @BlackHatEvents
Also Some Amazing Tools
Name Pay or not Open source Compiler depend Tech stack
Coverity commercial source close need compile classic analysis tech
Fortify commercial source close need compile classic analysis tech
Symgrep free use source open code scan pattern matching
SVF free use source open need compile
depends on LLVM
classic analysis tech
academic use
Clang Static
Analyzer
free use source open need compile
clang only
AST travel
symbolic execution
#BHUSA @BlackHatEvents
What is CodeQL
Ø Founded in 2006, a research project from Oxford University, acquired by Github in 2019
Ø Partly open source , but the core engine is source closed
Ø Basically scan code, make database and run query logic, find patterns
Ø Make code analysis to code property query
#BHUSA @BlackHatEvents
Architecture of CodeQL
Ø The extractor can be regarded as language frontend,so it’s
language depends
Ø Extractor scan code, make extra analysis and store code property
to database
Ø Database store code information, can be shared and reused
Ø CodeQL introduced a query language, the ql is related with query
logic, but not related with code that being analyzed,so it’s language
agnostic
Ø CodeQL has developed a mature and comprehensive library that
can perform various data flow analysis, such as the classic taint
analysis
Ø The core engine can be regarded as a database evaluate engine
#BHUSA @BlackHatEvents
Engine of CodeQL(extractor)
./codeql database create -l cpp -j 8 -s /data/codeql/qemu-6.1.0 /data/codeql/qemu.db
Ø For compiled languages, the extractor and
compiler work in parallel
Ø Monitor compiler works, if failed, then the
process will abort
Ø Intercept compiler command parameters, and
add more
Ø Extractor has the necessary information to
compile a code file
Ø Extractor scan code, analysis and get code
property
#BHUSA @BlackHatEvents
Engine of CodeQL(extractor)
How does extractor interrupt a compiler?
protected void executeSubcommand()
1. Launch preload_tracer process while codeql starts to work
2. Disassembling shows that preload_tracer will inject something into LD_PRELOAD
3. GDB shows that codeql/tools/linux64/lib64trace.so injected to LD_PRELOAD
#BHUSA @BlackHatEvents
Engine of CodeQL(extractor)
How does extractor compile code
Ø lib64trace.so injected to every process
Ø detect whether the host process is a compiler process
Ø If so, add parameters from compiler-tracing.spec and
launch an extractor process
#BHUSA @BlackHatEvents
Engine of CodeQL(database)
dbscheme is a guidance document for the content extraction, extractor extracts
information according to that file
Trap file located at
/data/blackhat/codedb/trap/cpp/tarballs/data/blackhat/co
de/test_c/source/data/blackhat/code/test_c
In fact , codeql generate a trap file firstly
then convert the trap to the final database
#BHUSA @BlackHatEvents
Engine of CodeQL(database)
Ø Extractor scan code, extracts information described in
dbscheme
Ø All the code information extracted out will be placed at
trap files
Ø Trap is a text file,it’s not convenient for further process
It will be converted to structed db file
Let’s map these files together, and see an example
#BHUSA @BlackHatEvents
Engine of CodeQL(database)
Take functions as an example
Ø functions is the collection of all functions
Ø It’s unique lable is @function, can be referenced by this name
Ø It has two field, name & kind, name is string and kind is an int
Ø kind 1 means normal function, 2 means constructor
locations_stmt(
/** The location of a
statement. */
unique int id:
@location_stmt,
int container: @container
ref,
int startLine: int ref,
int startColumn: int ref,
int endLine: int ref,
int endColumn: int ref
);
Stmts is complicated
Is has recursion type
Let’s explore the dbscheme file
#BHUSA @BlackHatEvents
Engine of CodeQL(database)
It’s a statement
Kind 6 means a stmt_return
Location is at startLine 16, startColumn 3
Endline is 16 and endColumn is 11
Let’s explore the trap file
simple example complicated example
In trap file
xopen is a function, named “xopen”, and it kind is
1(normal)
#BHUSA @BlackHatEvents
Engine of CodeQL(database)
trap to database
xopen function in trap file
0x59cb=22987 0x14d8=5336
the rel file is db file, tuples data are stored simply and directly in database file
xopen string encoded to 5336 a function declaration converted to an int tuple and write to db file
Ø The int values in tuple can be regarded as index to the actual
resources
Ø It’s a complex and tedious process, and we will not go into
further details here
#BHUSA @BlackHatEvents
The Query Language
Ø Declarative Logic Programming
Ø With class and OOP support
Ø Target language independent
Ø Built-in with many libraries, such as the most
common data flow analysis, taint analysis
#BHUSA @BlackHatEvents
QL to DIL
Ø It’s a lowing process
Ø High level user friendly language to machine friendly language
Ø From where select, high level logic to child query operation
Ø Table query and bool calculus
The first lowing phase
#BHUSA @BlackHatEvents
DIL to RA
Ø Continue lowing
Ø Data logic to relation algebra operations
Ø RA operations are classic JOIN UNION and BOOL calculus
Ø R1 R2, they are not registers
The second lowing phase
#BHUSA @BlackHatEvents
Evaluate Engine
Ø RA expr is a query tree
Ø A query tree is a set of logic related together, and can not be separated
Ø The engine will consume a tree at a time
Ø Evaluate from bottom to up
Ø Support recursion evaluation
RA code and query tree
#BHUSA @BlackHatEvents
Summary
Ø CodeQL has several language frontend, as extractor
Ø Extractor extracts code information, do analysis, and store information to
database
Ø CodeQL implemented a datalog query language to query code property
Ø CodeQL stores code property as database, and regard code analysis as
database query
#BHUSA @BlackHatEvents
The Advance of CodeQL
Ø The architecture design is very elegant, highly modular, and separates the front-end
and back-end
Ø CodeQL is a faithful implementer of database technology, convert code analysis to
data query
Ø Well designed query language, focus on code pattern or semantic matching. One
query language to handle many other code language
Ø Thanks to a unified query language, the same logic for different analysis tasks can
be reused on a large scale
#BHUSA @BlackHatEvents
Extending CodeQL to binary
Ø new dbscheme for binary
Ø new extractor for binary
Ø new QL library for binary
Architecture
#BHUSA @BlackHatEvents
Design of binary dbshceme
Ø Store the file, architecture, string, and function information of the binary.
Ø Store the information of instructions, registers, and basic blocks inside the function.
Ø Store the use and def information of registers inside the function.
Ø Store the information of global variables and pointers.
Ø Store memory layout information.(address ,section etc)
#BHUSA @BlackHatEvents
New Dbscheme For Binary
Control flow information
Instruction and register information
Basic information
#BHUSA @BlackHatEvents
New Extractor For Binary
The extractor needs to extract
information according to the
tables defined by the
dbscheme. The main
challenge here is how to
efficiently save the
relationships between the
tables.
Extracting imported function information from binary
Extracting all strings from binary
Extracting usage information
of instructions and registers
from binary.
#BHUSA @BlackHatEvents
Configure a new workflow
Ø The above has completed the creation of the two most important key components for creating a database.Now,
they need to be integrated into the CodeQL data creation workflow. Below is how we supplemented the configuration
files needed for the workflow through dynamic debugging.
Register a new language option Configure the compiler Generate asm.dbscheme.stats file
#BHUSA @BlackHatEvents
Autobuild.sh For Binary
Different from the data workflow for creating
source code, we separate the extractor
process, and autobuild.sh is just for
packaging the trap files and copying them
to a specific directory in the database.
database create
#BHUSA @BlackHatEvents
QL library For Binary
Ø The QL library can help us use the QL language, and with this flexible language, we can accomplish even more
powerful tasks.
l Export functions
l Functions
l Registers, instructions
l Import functions
l Strings
Support basic table
queries
Supports SSA IR, dataflow,
and taint analysis
Supporting more
architectures (arm, risc-v)
Done In Development Future
#BHUSA @BlackHatEvents
Simple example
#BHUSA @BlackHatEvents
Complex example
#BHUSA @BlackHatEvents
QL language Debugger At RA-level
Ø The current version of the debugger supports breakpoints, execution, single-stepping, viewing relations,
and tracing.
Architecture
#BHUSA @BlackHatEvents
com.semmle.inmemory.eval.Evaluate.evaluate
Ø It is the starting point for RA's interpreted execution, where we can use
CountingPrinter.print(expr) to print out the decompiled text of RA
Ø Decompiled text
#BHUSA @BlackHatEvents
RA To Pipeline
13 types of operation pipelines
RA operation Pipeline operation
ApplyTupleOperation TupleOperationPipeline
StreamDedup StreamDedupPipeline
SelectionByTest SelectionByTestPipeline
Literal AbstractRelationPipeline
EmptySet EmptySetPipeline
Union UnionPipeline
InvokeHigherOrderRelation InvokeHigherOrderRelationPipeline
Join JoinPipeline
… …
#BHUSA @BlackHatEvents
Pipeline Run
Ø Each type of pipeline will eventually call its own runinternal method to perform operations
such as reading and writing tables. Tables in the database are saved and used in memory in
the form of page relations.
Running pipeline
Get data corresponding to the page relation
#BHUSA @BlackHatEvents
com.semmle.inmemory.caching.RelationManager.addRelation
Ø CodeQL will save the page relations mentioned earlier here. Each table also has a different
storage type, and we have implemented reading for each type in the debugger.
The storage format of a relation
Page relation data
Entry point
Page information
#BHUSA @BlackHatEvents
codeql-debug Agent
Ø We discovered a hidden performance tuning parameter 'semmle.profiler.verbosity'
during dynamic debugging, which can save the engine's execution state to a file. This
way, we don't have to implement a trace function through jdb, we just need to add this
parameter to the startup parameters.
Added launch parameters
Undisclosed system tuning parameters
#BHUSA @BlackHatEvents
Demo
#BHUSA @BlackHatEvents
#BHUSA @BlackHatEvents
OPEN SOURCE
https://github.com/YunDingLab/codeql-binary
#BHUSA @BlackHatEvents
Q&A
huntazhang@tencent.com
jitxie@tencent.com
#BHUSA @BlackHatEvents
Thanks

More Related Content

What's hot

remote-method-guesser - BHUSA2021 Arsenal
remote-method-guesser - BHUSA2021 Arsenal remote-method-guesser - BHUSA2021 Arsenal
remote-method-guesser - BHUSA2021 Arsenal
Tobias Neitzel
 
Whitebox testing of Spring Boot applications
Whitebox testing of Spring Boot applicationsWhitebox testing of Spring Boot applications
Whitebox testing of Spring Boot applications
Yura Nosenko
 
Date and Time Module in Python | Edureka
Date and Time Module in Python | EdurekaDate and Time Module in Python | Edureka
Date and Time Module in Python | Edureka
Edureka!
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
Flavio W. Brasil
 
Threads in Java
Threads in JavaThreads in Java
Threads in Java
Gaurav Aggarwal
 
Unit Testing with Python
Unit Testing with PythonUnit Testing with Python
Unit Testing with Python
MicroPyramid .
 
Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytest
Hector Canto
 
Advantages of Python Learning | Why Python
Advantages of Python Learning | Why PythonAdvantages of Python Learning | Why Python
Advantages of Python Learning | Why Python
EvoletTechnologiesCo
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
CODE WHITE GmbH
 
Java Serialization Deep Dive
Java Serialization Deep DiveJava Serialization Deep Dive
Java Serialization Deep Dive
Martijn Dashorst
 
Python Basics
Python BasicsPython Basics
Python Basics
tusharpanda88
 
A python web service
A python web serviceA python web service
A python web service
Temian Vlad
 
Python GUI Programming
Python GUI ProgrammingPython GUI Programming
Python GUI Programming
RTS Tech
 
Java 8-streams-collectors-patterns
Java 8-streams-collectors-patternsJava 8-streams-collectors-patterns
Java 8-streams-collectors-patterns
José Paumard
 
Python: Modules and Packages
Python: Modules and PackagesPython: Modules and Packages
Python: Modules and Packages
Damian T. Gordon
 
JDK versions and OpenJDK
JDK versions and OpenJDKJDK versions and OpenJDK
JDK versions and OpenJDK
Wolfgang Weigend
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Edureka!
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring Boot
Purbarun Chakrabarti
 
Introduction to kotlin coroutines
Introduction to kotlin coroutinesIntroduction to kotlin coroutines
Introduction to kotlin coroutines
NAVER Engineering
 
Core java concepts
Core java  conceptsCore java  concepts
Core java concepts
Ram132
 

What's hot (20)

remote-method-guesser - BHUSA2021 Arsenal
remote-method-guesser - BHUSA2021 Arsenal remote-method-guesser - BHUSA2021 Arsenal
remote-method-guesser - BHUSA2021 Arsenal
 
Whitebox testing of Spring Boot applications
Whitebox testing of Spring Boot applicationsWhitebox testing of Spring Boot applications
Whitebox testing of Spring Boot applications
 
Date and Time Module in Python | Edureka
Date and Time Module in Python | EdurekaDate and Time Module in Python | Edureka
Date and Time Module in Python | Edureka
 
Kyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdf
 
Threads in Java
Threads in JavaThreads in Java
Threads in Java
 
Unit Testing with Python
Unit Testing with PythonUnit Testing with Python
Unit Testing with Python
 
Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytest
 
Advantages of Python Learning | Why Python
Advantages of Python Learning | Why PythonAdvantages of Python Learning | Why Python
Advantages of Python Learning | Why Python
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (DeepSec Edition)
 
Java Serialization Deep Dive
Java Serialization Deep DiveJava Serialization Deep Dive
Java Serialization Deep Dive
 
Python Basics
Python BasicsPython Basics
Python Basics
 
A python web service
A python web serviceA python web service
A python web service
 
Python GUI Programming
Python GUI ProgrammingPython GUI Programming
Python GUI Programming
 
Java 8-streams-collectors-patterns
Java 8-streams-collectors-patternsJava 8-streams-collectors-patterns
Java 8-streams-collectors-patterns
 
Python: Modules and Packages
Python: Modules and PackagesPython: Modules and Packages
Python: Modules and Packages
 
JDK versions and OpenJDK
JDK versions and OpenJDKJDK versions and OpenJDK
JDK versions and OpenJDK
 
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
Python Sequence | Python Lists | Python Sets & Dictionary | Python Strings | ...
 
Introduction to Spring Boot
Introduction to Spring BootIntroduction to Spring Boot
Introduction to Spring Boot
 
Introduction to kotlin coroutines
Introduction to kotlin coroutinesIntroduction to kotlin coroutines
Introduction to kotlin coroutines
 
Core java concepts
Core java  conceptsCore java  concepts
Core java concepts
 

Similar to CodeQL a Powerful Binary Analysis Engine

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdf
AdiseshaK
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
Nico Ludwig
 
05 Lecture - PARALLEL Programming in C ++.pdf
05 Lecture - PARALLEL Programming in C ++.pdf05 Lecture - PARALLEL Programming in C ++.pdf
05 Lecture - PARALLEL Programming in C ++.pdf
alivaisi1
 
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
confluent
 
A case for teaching SQL to scientists
A case for teaching SQL to scientistsA case for teaching SQL to scientists
A case for teaching SQL to scientists
dhalperi
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
Nico Ludwig
 
490450755-Chapter-2.ppt
490450755-Chapter-2.ppt490450755-Chapter-2.ppt
490450755-Chapter-2.ppt
ManiMala75
 
490450755-Chapter-2.ppt
490450755-Chapter-2.ppt490450755-Chapter-2.ppt
490450755-Chapter-2.ppt
ManiMala75
 
Adv programming languages
Adv programming languagesAdv programming languages
Adv programming languages
Nishant Mevawala
 
object oriented programming part inheritance.pptx
object oriented programming part inheritance.pptxobject oriented programming part inheritance.pptx
object oriented programming part inheritance.pptx
urvashipundir04
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
Sangharsh agarwal
 
Presentation c++
Presentation c++Presentation c++
Presentation c++
JosephAlex21
 
Computer Programming In C.pptx
Computer Programming In C.pptxComputer Programming In C.pptx
Computer Programming In C.pptx
chouguleamruta24
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
VMware Tanzu
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Dr. Fabio Baruffa
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Ary Borenszweig
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Ary Borenszweig
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
Crystal Language
 

Similar to CodeQL a Powerful Binary Analysis Engine (20)

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
COMPILER DESIGN.pdf
COMPILER DESIGN.pdfCOMPILER DESIGN.pdf
COMPILER DESIGN.pdf
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
05 Lecture - PARALLEL Programming in C ++.pdf
05 Lecture - PARALLEL Programming in C ++.pdf05 Lecture - PARALLEL Programming in C ++.pdf
05 Lecture - PARALLEL Programming in C ++.pdf
 
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
 
A case for teaching SQL to scientists
A case for teaching SQL to scientistsA case for teaching SQL to scientists
A case for teaching SQL to scientists
 
python and perl
python and perlpython and perl
python and perl
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
 
490450755-Chapter-2.ppt
490450755-Chapter-2.ppt490450755-Chapter-2.ppt
490450755-Chapter-2.ppt
 
490450755-Chapter-2.ppt
490450755-Chapter-2.ppt490450755-Chapter-2.ppt
490450755-Chapter-2.ppt
 
Adv programming languages
Adv programming languagesAdv programming languages
Adv programming languages
 
object oriented programming part inheritance.pptx
object oriented programming part inheritance.pptxobject oriented programming part inheritance.pptx
object oriented programming part inheritance.pptx
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
 
Presentation c++
Presentation c++Presentation c++
Presentation c++
 
Computer Programming In C.pptx
Computer Programming In C.pptxComputer Programming In C.pptx
Computer Programming In C.pptx
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 
Crystal internals (part 1)
Crystal internals (part 1)Crystal internals (part 1)
Crystal internals (part 1)
 

Recently uploaded

CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 

Recently uploaded (20)

CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 

CodeQL a Powerful Binary Analysis Engine

  • 1. #BHUSA @BlackHatEvents CodeQL: Also a Powerful Binary Analysis Engine Haiquan Zhang rhettxie @tencent security yunding lab
  • 2. #BHUSA @BlackHatEvents Outline Ø The story begins with static analysis Ø CodeQL Introduction(Architecture and Core Engine) Ø Exploring the implementation details of CodeQL through the lifecycle of QL statements Ø How to extend CodeQL to support binary analysis Ø A newly developed dedicated debugger Ø Show time
  • 3. #BHUSA @BlackHatEvents The Story Begins With Static Program Analysis Ø Static program analysis is the art of reasoning about the behavior of computer programs without actually running them Ø A static program analyzer is a program that reasons about the behavior of other programs Ø It’s a hard work last for several decades. Resolved many issues, but still many problems remaining Ø Precision and efficiency are the two big problems
  • 4. #BHUSA @BlackHatEvents Many amazing technologies have been developed Ø Type Inference and Abstract Interpretation Ø Data Flow & Control Flow & Dependency Analysis Ø Pointer Analysis Ø Path Sensitive and Context Sensitive Analysis Ø Constraint Solving and Symbolic Execution
  • 5. #BHUSA @BlackHatEvents Also Some Amazing Tools Name Pay or not Open source Compiler depend Tech stack Coverity commercial source close need compile classic analysis tech Fortify commercial source close need compile classic analysis tech Symgrep free use source open code scan pattern matching SVF free use source open need compile depends on LLVM classic analysis tech academic use Clang Static Analyzer free use source open need compile clang only AST travel symbolic execution
  • 6. #BHUSA @BlackHatEvents What is CodeQL Ø Founded in 2006, a research project from Oxford University, acquired by Github in 2019 Ø Partly open source , but the core engine is source closed Ø Basically scan code, make database and run query logic, find patterns Ø Make code analysis to code property query
  • 7. #BHUSA @BlackHatEvents Architecture of CodeQL Ø The extractor can be regarded as language frontend,so it’s language depends Ø Extractor scan code, make extra analysis and store code property to database Ø Database store code information, can be shared and reused Ø CodeQL introduced a query language, the ql is related with query logic, but not related with code that being analyzed,so it’s language agnostic Ø CodeQL has developed a mature and comprehensive library that can perform various data flow analysis, such as the classic taint analysis Ø The core engine can be regarded as a database evaluate engine
  • 8. #BHUSA @BlackHatEvents Engine of CodeQL(extractor) ./codeql database create -l cpp -j 8 -s /data/codeql/qemu-6.1.0 /data/codeql/qemu.db Ø For compiled languages, the extractor and compiler work in parallel Ø Monitor compiler works, if failed, then the process will abort Ø Intercept compiler command parameters, and add more Ø Extractor has the necessary information to compile a code file Ø Extractor scan code, analysis and get code property
  • 9. #BHUSA @BlackHatEvents Engine of CodeQL(extractor) How does extractor interrupt a compiler? protected void executeSubcommand() 1. Launch preload_tracer process while codeql starts to work 2. Disassembling shows that preload_tracer will inject something into LD_PRELOAD 3. GDB shows that codeql/tools/linux64/lib64trace.so injected to LD_PRELOAD
  • 10. #BHUSA @BlackHatEvents Engine of CodeQL(extractor) How does extractor compile code Ø lib64trace.so injected to every process Ø detect whether the host process is a compiler process Ø If so, add parameters from compiler-tracing.spec and launch an extractor process
  • 11. #BHUSA @BlackHatEvents Engine of CodeQL(database) dbscheme is a guidance document for the content extraction, extractor extracts information according to that file Trap file located at /data/blackhat/codedb/trap/cpp/tarballs/data/blackhat/co de/test_c/source/data/blackhat/code/test_c In fact , codeql generate a trap file firstly then convert the trap to the final database
  • 12. #BHUSA @BlackHatEvents Engine of CodeQL(database) Ø Extractor scan code, extracts information described in dbscheme Ø All the code information extracted out will be placed at trap files Ø Trap is a text file,it’s not convenient for further process It will be converted to structed db file Let’s map these files together, and see an example
  • 13. #BHUSA @BlackHatEvents Engine of CodeQL(database) Take functions as an example Ø functions is the collection of all functions Ø It’s unique lable is @function, can be referenced by this name Ø It has two field, name & kind, name is string and kind is an int Ø kind 1 means normal function, 2 means constructor locations_stmt( /** The location of a statement. */ unique int id: @location_stmt, int container: @container ref, int startLine: int ref, int startColumn: int ref, int endLine: int ref, int endColumn: int ref ); Stmts is complicated Is has recursion type Let’s explore the dbscheme file
  • 14. #BHUSA @BlackHatEvents Engine of CodeQL(database) It’s a statement Kind 6 means a stmt_return Location is at startLine 16, startColumn 3 Endline is 16 and endColumn is 11 Let’s explore the trap file simple example complicated example In trap file xopen is a function, named “xopen”, and it kind is 1(normal)
  • 15. #BHUSA @BlackHatEvents Engine of CodeQL(database) trap to database xopen function in trap file 0x59cb=22987 0x14d8=5336 the rel file is db file, tuples data are stored simply and directly in database file xopen string encoded to 5336 a function declaration converted to an int tuple and write to db file Ø The int values in tuple can be regarded as index to the actual resources Ø It’s a complex and tedious process, and we will not go into further details here
  • 16. #BHUSA @BlackHatEvents The Query Language Ø Declarative Logic Programming Ø With class and OOP support Ø Target language independent Ø Built-in with many libraries, such as the most common data flow analysis, taint analysis
  • 17. #BHUSA @BlackHatEvents QL to DIL Ø It’s a lowing process Ø High level user friendly language to machine friendly language Ø From where select, high level logic to child query operation Ø Table query and bool calculus The first lowing phase
  • 18. #BHUSA @BlackHatEvents DIL to RA Ø Continue lowing Ø Data logic to relation algebra operations Ø RA operations are classic JOIN UNION and BOOL calculus Ø R1 R2, they are not registers The second lowing phase
  • 19. #BHUSA @BlackHatEvents Evaluate Engine Ø RA expr is a query tree Ø A query tree is a set of logic related together, and can not be separated Ø The engine will consume a tree at a time Ø Evaluate from bottom to up Ø Support recursion evaluation RA code and query tree
  • 20. #BHUSA @BlackHatEvents Summary Ø CodeQL has several language frontend, as extractor Ø Extractor extracts code information, do analysis, and store information to database Ø CodeQL implemented a datalog query language to query code property Ø CodeQL stores code property as database, and regard code analysis as database query
  • 21. #BHUSA @BlackHatEvents The Advance of CodeQL Ø The architecture design is very elegant, highly modular, and separates the front-end and back-end Ø CodeQL is a faithful implementer of database technology, convert code analysis to data query Ø Well designed query language, focus on code pattern or semantic matching. One query language to handle many other code language Ø Thanks to a unified query language, the same logic for different analysis tasks can be reused on a large scale
  • 22. #BHUSA @BlackHatEvents Extending CodeQL to binary Ø new dbscheme for binary Ø new extractor for binary Ø new QL library for binary Architecture
  • 23. #BHUSA @BlackHatEvents Design of binary dbshceme Ø Store the file, architecture, string, and function information of the binary. Ø Store the information of instructions, registers, and basic blocks inside the function. Ø Store the use and def information of registers inside the function. Ø Store the information of global variables and pointers. Ø Store memory layout information.(address ,section etc)
  • 24. #BHUSA @BlackHatEvents New Dbscheme For Binary Control flow information Instruction and register information Basic information
  • 25. #BHUSA @BlackHatEvents New Extractor For Binary The extractor needs to extract information according to the tables defined by the dbscheme. The main challenge here is how to efficiently save the relationships between the tables. Extracting imported function information from binary Extracting all strings from binary Extracting usage information of instructions and registers from binary.
  • 26. #BHUSA @BlackHatEvents Configure a new workflow Ø The above has completed the creation of the two most important key components for creating a database.Now, they need to be integrated into the CodeQL data creation workflow. Below is how we supplemented the configuration files needed for the workflow through dynamic debugging. Register a new language option Configure the compiler Generate asm.dbscheme.stats file
  • 27. #BHUSA @BlackHatEvents Autobuild.sh For Binary Different from the data workflow for creating source code, we separate the extractor process, and autobuild.sh is just for packaging the trap files and copying them to a specific directory in the database. database create
  • 28. #BHUSA @BlackHatEvents QL library For Binary Ø The QL library can help us use the QL language, and with this flexible language, we can accomplish even more powerful tasks. l Export functions l Functions l Registers, instructions l Import functions l Strings Support basic table queries Supports SSA IR, dataflow, and taint analysis Supporting more architectures (arm, risc-v) Done In Development Future
  • 31. #BHUSA @BlackHatEvents QL language Debugger At RA-level Ø The current version of the debugger supports breakpoints, execution, single-stepping, viewing relations, and tracing. Architecture
  • 32. #BHUSA @BlackHatEvents com.semmle.inmemory.eval.Evaluate.evaluate Ø It is the starting point for RA's interpreted execution, where we can use CountingPrinter.print(expr) to print out the decompiled text of RA Ø Decompiled text
  • 33. #BHUSA @BlackHatEvents RA To Pipeline 13 types of operation pipelines RA operation Pipeline operation ApplyTupleOperation TupleOperationPipeline StreamDedup StreamDedupPipeline SelectionByTest SelectionByTestPipeline Literal AbstractRelationPipeline EmptySet EmptySetPipeline Union UnionPipeline InvokeHigherOrderRelation InvokeHigherOrderRelationPipeline Join JoinPipeline … …
  • 34. #BHUSA @BlackHatEvents Pipeline Run Ø Each type of pipeline will eventually call its own runinternal method to perform operations such as reading and writing tables. Tables in the database are saved and used in memory in the form of page relations. Running pipeline Get data corresponding to the page relation
  • 35. #BHUSA @BlackHatEvents com.semmle.inmemory.caching.RelationManager.addRelation Ø CodeQL will save the page relations mentioned earlier here. Each table also has a different storage type, and we have implemented reading for each type in the debugger. The storage format of a relation Page relation data Entry point Page information
  • 36. #BHUSA @BlackHatEvents codeql-debug Agent Ø We discovered a hidden performance tuning parameter 'semmle.profiler.verbosity' during dynamic debugging, which can save the engine's execution state to a file. This way, we don't have to implement a trace function through jdb, we just need to add this parameter to the startup parameters. Added launch parameters Undisclosed system tuning parameters