The document discusses extending CodeQL, a static analysis tool, to support binary analysis. It describes adding a new database schema and extractor for binaries, as well as a query language library. A debugger was also developed at the relational algebra level to step through and inspect queries. The approach separates the front-end and back-end of CodeQL to make it modular and adaptable to new languages and analysis tasks.
Defending against Java Deserialization VulnerabilitiesLuca Carettoni
Java deserialization vulnerabilities have recently gained popularity due to a renewed interest from the security community. Despite being publicly discussed for several years, a significant number of Java based products are still affected. Whenever untrusted data is used within deserialization methods, an attacker can abuse this simple design anti-pattern to compromise your application. After a quick introduction of the problem, this talk will focus on discovering and defending against deserialization vulnerabilities. I will present a collection of techniques for mitigating attacks when turning off object serialization is not an option, and we will discuss practical recommendations that developers can use to help prevent these attacks.
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Edureka!
This Edureka Python Pandas tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) will help you learn the basics of Pandas. It also includes a use-case, where we will analyse the data containing the percentage of unemployed youth for every country between 2010-2014. Below are the topics covered in this tutorial:
1. What is Data Analysis?
2. What is Pandas?
3. Pandas Operations
4. Use-case
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...Christopher Frohoff
Object deserialization is an established but poorly understood attack vector in applications that is disturbingly prevalent across many languages, platforms, formats, and libraries.
In January 2015 at AppSec California, Chris Frohoff and Gabe Lawrence gave a talk on this topic, covering deserialization vulnerabilities across platforms, the many forms they take, and places they can be found. It covered, among other things, somewhat novel techniques using classes in commonly used libraries for attacking Java serialization that were subsequently released in the form of the ysoserial tool. Few people noticed until late 2015, when other researchers used these techniques/tools to exploit well known products such as Bamboo, WebLogic, WebSphere, ApacheMQ, and Jenkins, and then services such as PayPal. Since then, the topic has gotten some long-overdue attention and great work is being done by many to improve our understanding and developer awareness on the subject.
This talk will review the details of Java deserialization exploit techniques and mitigations, as well as report on some of the recent (and future) activity in this area.
http://www.meetup.com/Open-Web-Application-Security-Project-San-Diego-OWASP-SD/events/226242635/
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
Defending against Java Deserialization VulnerabilitiesLuca Carettoni
Java deserialization vulnerabilities have recently gained popularity due to a renewed interest from the security community. Despite being publicly discussed for several years, a significant number of Java based products are still affected. Whenever untrusted data is used within deserialization methods, an attacker can abuse this simple design anti-pattern to compromise your application. After a quick introduction of the problem, this talk will focus on discovering and defending against deserialization vulnerabilities. I will present a collection of techniques for mitigating attacks when turning off object serialization is not an option, and we will discuss practical recommendations that developers can use to help prevent these attacks.
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Edureka!
This Edureka Python Pandas tutorial (Python Tutorial Blog: https://goo.gl/wd28Zr) will help you learn the basics of Pandas. It also includes a use-case, where we will analyse the data containing the percentage of unemployed youth for every country between 2010-2014. Below are the topics covered in this tutorial:
1. What is Data Analysis?
2. What is Pandas?
3. Pandas Operations
4. Use-case
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...Christopher Frohoff
Object deserialization is an established but poorly understood attack vector in applications that is disturbingly prevalent across many languages, platforms, formats, and libraries.
In January 2015 at AppSec California, Chris Frohoff and Gabe Lawrence gave a talk on this topic, covering deserialization vulnerabilities across platforms, the many forms they take, and places they can be found. It covered, among other things, somewhat novel techniques using classes in commonly used libraries for attacking Java serialization that were subsequently released in the form of the ysoserial tool. Few people noticed until late 2015, when other researchers used these techniques/tools to exploit well known products such as Bamboo, WebLogic, WebSphere, ApacheMQ, and Jenkins, and then services such as PayPal. Since then, the topic has gotten some long-overdue attention and great work is being done by many to improve our understanding and developer awareness on the subject.
This talk will review the details of Java deserialization exploit techniques and mitigations, as well as report on some of the recent (and future) activity in this area.
http://www.meetup.com/Open-Web-Application-Security-Project-San-Diego-OWASP-SD/events/226242635/
Python is dominating the fast-growing data-science landscape. This talk provides a foundational overview of the practice of data science and some of the most popular Python libraries for doing data science. It also provides an overview of how Anaconda brings it all together.
remote-method-guesser - BHUSA2021 Arsenal Tobias Neitzel
Slides from the Black Hat USA 2021 Arsenal presentation of remote-method-guesser.
Recording: https://youtu.be/t_aw1mDNhzI
remote-method-guesser (rmg) is a Java RMI vulnerability scanner that checks for common misconfigurations on Java RMI endpoints.
It combines well known techniques for RMI enumeration with detection capabilities for lesser known attack vectors that are often missed.
Apart from detecting RMI vulnerabilities, remote-method-guesser can perform attack operations for each supported vulnerability type.
The following list shows some of it's currently supported operations:
* List available bound names and their interface class names
* List codebase locations (if exposed by the remote server)
* Check for known vulnerabilities (enabled class loader, missing JEP290, JEP290 bypasses, localhost bypass (CVE-2019-2684))
* Identify existing remote methods by using a bruteforce (wordlist) approach
* Call remote methods with user specified arguments (no manual coding required)
* Call remote methods with ysoserial gadgets within the arguments
* Call remote methods with a client specified codebase (remote class loading attack)
* Perform DGC, registry and activator calls with ysoserial gadgets or a client specified codebase
* Perform bind, rebind and unbind operations against an RMI registry
* Bypass registry deserialization filters by using An Trinhs registry bypass
* Enumerate the unmarshalling behavior of java.lang.String
* Create Java code dynamically to invoke remote methods manually
YouTube Link: https://youtu.be/mHezNgNBnuA
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Date and Time in Python' will train you to use the datetime and time modules to fetch, set and modify date and time in python.
Below are the topics covered in this PPT:
The time module
Built-in functions
Examples
The datetime module
Built-in functions
Examples
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Kyo is a next-generation effect system that introduces an approach based on algebraic effects to deliver straightforward APIs in the pure Functional Programming paradigm. Unlike similar solutions, Kyo achieves this without inundating developers with esoteric concepts from Category Theory or using cryptic symbolic operators. This results in a development experience that is both intuitive and robust.
Kyo generalizes the innovative effect rotation mechanism introduced by ZIO. While ZIO restricts effects to two channels, namely dependency injection and short-circuiting, Kyo allows for an arbitrary number of effectful channels. This enhancement offers developers greater flexibility in effect management and simplifies Kyo's internal codebase through more principled design patterns.
In addition to this novel approach to effect handling, Kyo provides seamless direct syntax inspired by Monadless and a comprehensive set of built-in effects like Aborts for short-circuiting, Envs for dependency injection, and Fibers for green threads with fine-grained uncooperative preemption.
After over two years in development, the first public release of the project will be made during Functional Scala 2023. Attendees will also be treated to benchmark results that showcase Kyo's unparalleled performance.
Java Serialization is often considered a dark art of Java programmers. This session will lift the veil and show what serialization is and isn't, how you can use it for profit and evil. After this session no NotSerializableException will be unconquerable.
This Edureka Python tutorial will help you in learning various sequences in Python - Lists, Tuples, Strings, Sets, Dictionaries. It will also explain various operations possible on them. Below are the topics covered in this tutorial:
1. Python Sequences
2. Python Lists
3. Python Tuples
4. Python Sets
5. Python Dictionaries
6. Python Strings
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
This was a talk that I gave at CERN at the Inter-experimental Machine Learning (IML) Working Group Meeting in April 2017 about language-agnostic (or polyglot) analysis workflows. I show how it is possible to work in multiple languages and switch between them without leaving the workflow you started. Additionally, I demonstrate how an entire workflow can be encapsulated in a markdown file that is rendered to a publishable paper with cross-references and a bibliography (and with raw LaTeX file produced as a by-product) in a simple process, making the whole analysis workflow reproducible. For experimental particle physics, ROOT is the ubiquitous data analysis tool, and has been for the last 20 years old, so I also talk about how to exchange data to and from ROOT.
remote-method-guesser - BHUSA2021 Arsenal Tobias Neitzel
Slides from the Black Hat USA 2021 Arsenal presentation of remote-method-guesser.
Recording: https://youtu.be/t_aw1mDNhzI
remote-method-guesser (rmg) is a Java RMI vulnerability scanner that checks for common misconfigurations on Java RMI endpoints.
It combines well known techniques for RMI enumeration with detection capabilities for lesser known attack vectors that are often missed.
Apart from detecting RMI vulnerabilities, remote-method-guesser can perform attack operations for each supported vulnerability type.
The following list shows some of it's currently supported operations:
* List available bound names and their interface class names
* List codebase locations (if exposed by the remote server)
* Check for known vulnerabilities (enabled class loader, missing JEP290, JEP290 bypasses, localhost bypass (CVE-2019-2684))
* Identify existing remote methods by using a bruteforce (wordlist) approach
* Call remote methods with user specified arguments (no manual coding required)
* Call remote methods with ysoserial gadgets within the arguments
* Call remote methods with a client specified codebase (remote class loading attack)
* Perform DGC, registry and activator calls with ysoserial gadgets or a client specified codebase
* Perform bind, rebind and unbind operations against an RMI registry
* Bypass registry deserialization filters by using An Trinhs registry bypass
* Enumerate the unmarshalling behavior of java.lang.String
* Create Java code dynamically to invoke remote methods manually
YouTube Link: https://youtu.be/mHezNgNBnuA
** Python Certification Training: https://www.edureka.co/python **
This Edureka PPT on 'Date and Time in Python' will train you to use the datetime and time modules to fetch, set and modify date and time in python.
Below are the topics covered in this PPT:
The time module
Built-in functions
Examples
The datetime module
Built-in functions
Examples
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Kyo is a next-generation effect system that introduces an approach based on algebraic effects to deliver straightforward APIs in the pure Functional Programming paradigm. Unlike similar solutions, Kyo achieves this without inundating developers with esoteric concepts from Category Theory or using cryptic symbolic operators. This results in a development experience that is both intuitive and robust.
Kyo generalizes the innovative effect rotation mechanism introduced by ZIO. While ZIO restricts effects to two channels, namely dependency injection and short-circuiting, Kyo allows for an arbitrary number of effectful channels. This enhancement offers developers greater flexibility in effect management and simplifies Kyo's internal codebase through more principled design patterns.
In addition to this novel approach to effect handling, Kyo provides seamless direct syntax inspired by Monadless and a comprehensive set of built-in effects like Aborts for short-circuiting, Envs for dependency injection, and Fibers for green threads with fine-grained uncooperative preemption.
After over two years in development, the first public release of the project will be made during Functional Scala 2023. Attendees will also be treated to benchmark results that showcase Kyo's unparalleled performance.
Java Serialization is often considered a dark art of Java programmers. This session will lift the veil and show what serialization is and isn't, how you can use it for profit and evil. After this session no NotSerializableException will be unconquerable.
This Edureka Python tutorial will help you in learning various sequences in Python - Lists, Tuples, Strings, Sets, Dictionaries. It will also explain various operations possible on them. Below are the topics covered in this tutorial:
1. Python Sequences
2. Python Lists
3. Python Tuples
4. Python Sets
5. Python Dictionaries
6. Python Strings
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
This was a talk that I gave at CERN at the Inter-experimental Machine Learning (IML) Working Group Meeting in April 2017 about language-agnostic (or polyglot) analysis workflows. I show how it is possible to work in multiple languages and switch between them without leaving the workflow you started. Additionally, I demonstrate how an entire workflow can be encapsulated in a markdown file that is rendered to a publishable paper with cross-references and a bibliography (and with raw LaTeX file produced as a by-product) in a simple process, making the whole analysis workflow reproducible. For experimental particle physics, ROOT is the ubiquitous data analysis tool, and has been for the last 20 years old, so I also talk about how to exchange data to and from ROOT.
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...confluent
Invitae is one of the fastest growing genetic information companies, whose mission is to bring comprehensive genetic information into mainstream medical practice to improve the quality of healthcare for billions of people. We have recently partnered with another lab, requiring an integration layer that was developed as part of a dizzying leap from a traditional Python service architecture to Scala Streaming applications on Kafka and Kubernetes. This presentation is our story, where we discuss challenges and solutions, error handling and resilience techniques, technology stack choices and compromises, tools and approaches we have developed, and general insights. Beyond engineering itself, our team's goal is enabling others to join in. Building an application entirely of Streams is a significant and in many ways liberating paradigm shift. In addition to learning to architect and understand how the application will behave and evolve, success depends on great tooling. We will show, for example, how we extended KStreams API to seamlessly include Avro Schema as part of our build and code infrastructure, completely automating SerDe derivation, introducing typed topics, and still supporting polyglot teams. Other highlights: - Self-healing streams with aggregation, and deciding when to crash - Connectors vs Streams for side effects - Scheduling with Streams - Deriving topology diagrams - Monitoring and metrics as Streams - Combining Avro, Swagger and code generation, plus avro4s vs avrohugger comparison - Typelevel Cats and its role in our success - http4s and hybrid testing
A quick, off-the-cuff talk about why I think SQL is good for scientists. Please send me notes correcting my Python, arguing, or asking for more information! And see the tutorial at: http://uwescience.github.io/sqlshare
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
2. #BHUSA @BlackHatEvents
Outline
Ø The story begins with static analysis
Ø CodeQL Introduction(Architecture and Core Engine)
Ø Exploring the implementation details of CodeQL through the lifecycle of QL statements
Ø How to extend CodeQL to support binary analysis
Ø A newly developed dedicated debugger
Ø Show time
3. #BHUSA @BlackHatEvents
The Story Begins With Static Program Analysis
Ø Static program analysis is the art of reasoning about the behavior of computer
programs without actually running them
Ø A static program analyzer is a program that reasons about the behavior of other
programs
Ø It’s a hard work last for several decades. Resolved many issues, but still many
problems remaining
Ø Precision and efficiency are the two big problems
4. #BHUSA @BlackHatEvents
Many amazing technologies have been developed
Ø Type Inference and Abstract Interpretation
Ø Data Flow & Control Flow & Dependency Analysis
Ø Pointer Analysis
Ø Path Sensitive and Context Sensitive Analysis
Ø Constraint Solving and Symbolic Execution
5. #BHUSA @BlackHatEvents
Also Some Amazing Tools
Name Pay or not Open source Compiler depend Tech stack
Coverity commercial source close need compile classic analysis tech
Fortify commercial source close need compile classic analysis tech
Symgrep free use source open code scan pattern matching
SVF free use source open need compile
depends on LLVM
classic analysis tech
academic use
Clang Static
Analyzer
free use source open need compile
clang only
AST travel
symbolic execution
6. #BHUSA @BlackHatEvents
What is CodeQL
Ø Founded in 2006, a research project from Oxford University, acquired by Github in 2019
Ø Partly open source , but the core engine is source closed
Ø Basically scan code, make database and run query logic, find patterns
Ø Make code analysis to code property query
7. #BHUSA @BlackHatEvents
Architecture of CodeQL
Ø The extractor can be regarded as language frontend,so it’s
language depends
Ø Extractor scan code, make extra analysis and store code property
to database
Ø Database store code information, can be shared and reused
Ø CodeQL introduced a query language, the ql is related with query
logic, but not related with code that being analyzed,so it’s language
agnostic
Ø CodeQL has developed a mature and comprehensive library that
can perform various data flow analysis, such as the classic taint
analysis
Ø The core engine can be regarded as a database evaluate engine
8. #BHUSA @BlackHatEvents
Engine of CodeQL(extractor)
./codeql database create -l cpp -j 8 -s /data/codeql/qemu-6.1.0 /data/codeql/qemu.db
Ø For compiled languages, the extractor and
compiler work in parallel
Ø Monitor compiler works, if failed, then the
process will abort
Ø Intercept compiler command parameters, and
add more
Ø Extractor has the necessary information to
compile a code file
Ø Extractor scan code, analysis and get code
property
9. #BHUSA @BlackHatEvents
Engine of CodeQL(extractor)
How does extractor interrupt a compiler?
protected void executeSubcommand()
1. Launch preload_tracer process while codeql starts to work
2. Disassembling shows that preload_tracer will inject something into LD_PRELOAD
3. GDB shows that codeql/tools/linux64/lib64trace.so injected to LD_PRELOAD
10. #BHUSA @BlackHatEvents
Engine of CodeQL(extractor)
How does extractor compile code
Ø lib64trace.so injected to every process
Ø detect whether the host process is a compiler process
Ø If so, add parameters from compiler-tracing.spec and
launch an extractor process
11. #BHUSA @BlackHatEvents
Engine of CodeQL(database)
dbscheme is a guidance document for the content extraction, extractor extracts
information according to that file
Trap file located at
/data/blackhat/codedb/trap/cpp/tarballs/data/blackhat/co
de/test_c/source/data/blackhat/code/test_c
In fact , codeql generate a trap file firstly
then convert the trap to the final database
12. #BHUSA @BlackHatEvents
Engine of CodeQL(database)
Ø Extractor scan code, extracts information described in
dbscheme
Ø All the code information extracted out will be placed at
trap files
Ø Trap is a text file,it’s not convenient for further process
It will be converted to structed db file
Let’s map these files together, and see an example
13. #BHUSA @BlackHatEvents
Engine of CodeQL(database)
Take functions as an example
Ø functions is the collection of all functions
Ø It’s unique lable is @function, can be referenced by this name
Ø It has two field, name & kind, name is string and kind is an int
Ø kind 1 means normal function, 2 means constructor
locations_stmt(
/** The location of a
statement. */
unique int id:
@location_stmt,
int container: @container
ref,
int startLine: int ref,
int startColumn: int ref,
int endLine: int ref,
int endColumn: int ref
);
Stmts is complicated
Is has recursion type
Let’s explore the dbscheme file
14. #BHUSA @BlackHatEvents
Engine of CodeQL(database)
It’s a statement
Kind 6 means a stmt_return
Location is at startLine 16, startColumn 3
Endline is 16 and endColumn is 11
Let’s explore the trap file
simple example complicated example
In trap file
xopen is a function, named “xopen”, and it kind is
1(normal)
15. #BHUSA @BlackHatEvents
Engine of CodeQL(database)
trap to database
xopen function in trap file
0x59cb=22987 0x14d8=5336
the rel file is db file, tuples data are stored simply and directly in database file
xopen string encoded to 5336 a function declaration converted to an int tuple and write to db file
Ø The int values in tuple can be regarded as index to the actual
resources
Ø It’s a complex and tedious process, and we will not go into
further details here
16. #BHUSA @BlackHatEvents
The Query Language
Ø Declarative Logic Programming
Ø With class and OOP support
Ø Target language independent
Ø Built-in with many libraries, such as the most
common data flow analysis, taint analysis
17. #BHUSA @BlackHatEvents
QL to DIL
Ø It’s a lowing process
Ø High level user friendly language to machine friendly language
Ø From where select, high level logic to child query operation
Ø Table query and bool calculus
The first lowing phase
18. #BHUSA @BlackHatEvents
DIL to RA
Ø Continue lowing
Ø Data logic to relation algebra operations
Ø RA operations are classic JOIN UNION and BOOL calculus
Ø R1 R2, they are not registers
The second lowing phase
19. #BHUSA @BlackHatEvents
Evaluate Engine
Ø RA expr is a query tree
Ø A query tree is a set of logic related together, and can not be separated
Ø The engine will consume a tree at a time
Ø Evaluate from bottom to up
Ø Support recursion evaluation
RA code and query tree
20. #BHUSA @BlackHatEvents
Summary
Ø CodeQL has several language frontend, as extractor
Ø Extractor extracts code information, do analysis, and store information to
database
Ø CodeQL implemented a datalog query language to query code property
Ø CodeQL stores code property as database, and regard code analysis as
database query
21. #BHUSA @BlackHatEvents
The Advance of CodeQL
Ø The architecture design is very elegant, highly modular, and separates the front-end
and back-end
Ø CodeQL is a faithful implementer of database technology, convert code analysis to
data query
Ø Well designed query language, focus on code pattern or semantic matching. One
query language to handle many other code language
Ø Thanks to a unified query language, the same logic for different analysis tasks can
be reused on a large scale
23. #BHUSA @BlackHatEvents
Design of binary dbshceme
Ø Store the file, architecture, string, and function information of the binary.
Ø Store the information of instructions, registers, and basic blocks inside the function.
Ø Store the use and def information of registers inside the function.
Ø Store the information of global variables and pointers.
Ø Store memory layout information.(address ,section etc)
25. #BHUSA @BlackHatEvents
New Extractor For Binary
The extractor needs to extract
information according to the
tables defined by the
dbscheme. The main
challenge here is how to
efficiently save the
relationships between the
tables.
Extracting imported function information from binary
Extracting all strings from binary
Extracting usage information
of instructions and registers
from binary.
26. #BHUSA @BlackHatEvents
Configure a new workflow
Ø The above has completed the creation of the two most important key components for creating a database.Now,
they need to be integrated into the CodeQL data creation workflow. Below is how we supplemented the configuration
files needed for the workflow through dynamic debugging.
Register a new language option Configure the compiler Generate asm.dbscheme.stats file
27. #BHUSA @BlackHatEvents
Autobuild.sh For Binary
Different from the data workflow for creating
source code, we separate the extractor
process, and autobuild.sh is just for
packaging the trap files and copying them
to a specific directory in the database.
database create
28. #BHUSA @BlackHatEvents
QL library For Binary
Ø The QL library can help us use the QL language, and with this flexible language, we can accomplish even more
powerful tasks.
l Export functions
l Functions
l Registers, instructions
l Import functions
l Strings
Support basic table
queries
Supports SSA IR, dataflow,
and taint analysis
Supporting more
architectures (arm, risc-v)
Done In Development Future
31. #BHUSA @BlackHatEvents
QL language Debugger At RA-level
Ø The current version of the debugger supports breakpoints, execution, single-stepping, viewing relations,
and tracing.
Architecture
33. #BHUSA @BlackHatEvents
RA To Pipeline
13 types of operation pipelines
RA operation Pipeline operation
ApplyTupleOperation TupleOperationPipeline
StreamDedup StreamDedupPipeline
SelectionByTest SelectionByTestPipeline
Literal AbstractRelationPipeline
EmptySet EmptySetPipeline
Union UnionPipeline
InvokeHigherOrderRelation InvokeHigherOrderRelationPipeline
Join JoinPipeline
… …
34. #BHUSA @BlackHatEvents
Pipeline Run
Ø Each type of pipeline will eventually call its own runinternal method to perform operations
such as reading and writing tables. Tables in the database are saved and used in memory in
the form of page relations.
Running pipeline
Get data corresponding to the page relation
36. #BHUSA @BlackHatEvents
codeql-debug Agent
Ø We discovered a hidden performance tuning parameter 'semmle.profiler.verbosity'
during dynamic debugging, which can save the engine's execution state to a file. This
way, we don't have to implement a trace function through jdb, we just need to add this
parameter to the startup parameters.
Added launch parameters
Undisclosed system tuning parameters