SlideShare a Scribd company logo
Messing with
binary formats
London, EnglandAnge Albertini
2013/09/13
ΠΟΛΎΓΛΩΣΣΟΣ
Welcome!
● this is the non-live version of my slides
○ more text
○ standard PDF file ;)
About me:
● Reverse engineer
● my website: http://corkami.com
○ reverse engineering &
visual documentations
to extract the live deck. 61 slides:
pdftk 44con-albertini.pdf cat 1 3 5 7 9 10 12 14 16 18 23 25 29 31 33 35 37 39 41 43 45 47 49 51 53 55-57 59 63 65 67 69 71 73 75 77 79 81 83 85 87 89 90 94 96 98 101-102 104 106-107 109-112 114-117 119 output 44con-albertini(live).pdf
119 2 4 6 8 11 13 15 17 19+4 24 26+3 32 34 36 38 40 42 44 46 48 50 52 54 58 60+3 64 66 68 70 72 74 76 78 80 82 84 86 88 91+3 95 97 99+2 103 105 108 113 118
low-level ones,
that is
I just like to play
with lego blocks
generate files byte per byte
Goals
● explore the format
● make sure that's how things
work
● full control over the structure
result:
● a complete executable
● all bytes defined by hand
our problem
● is related to virus (malwares)
● they use many file formats
● it's critical to identify them reliably
○ and to tell whether corrupted or well-formed
standard infection chain
the most common chain:
1. a web page, in HTML format
a. launching an applet
2. an evil applet, in CLASS format
a. exploiting a Java vulnerability
b. dropping an executable
3. a malicious executable, in Portable
Executable format
(a vast majority of malwares
rely on an executable)
another classic chain
● open a PDF document
○ with an exploit inside
■ dropping or downloading a PE executable
● get a malicious executable on your machine
the challenge
it might look obvious:
● tell whether it's a PDF, a PE, a JAVA, an
HTML...
● typical formats are clearly defined
○ Magic signature enforced at offset 0
reality
some formats have no header at all
● Command File (DOS 16 bits)
● Master Boot record
some formats don't need to start at offset 0
● Archives (Zips, Rars...)
● HTML
○ but text-only?
some formats accept a large controllable block
early in their header
● Portable Executable
● PICT image
How did this start?
a real-life problem:
1. a (malicious) HTML page
2. started with 'MZ' (the signature of PE)
3. just scanned as a PE!
a. wow, this PE is highly corrupted :)
b. it must be clean :p
?
MZ
polyglots in the wild
GIFAR = GIF + JAR
● an uploaded image
○ an avatar in a forum
● with a malicious JAVA appended as JAR
hosted on the server!
● bypass same domain policy
● now useable via its JAVA=EVIL payload
+ =
let's get started
PE, the executable format of windows
● it's central to windows malware
● it enforces a magic signature at offset 0
○ game over for other formats?
● starts with a compulsory header
● made of sub-headers
overview
a historical sandwich
1. a deprecated but required header
2. a modern header
old header content
● almost completely ignored
● only required:
○ 2 byte signature
○ pointer to new header
the new header can be
anywhere
ex: at the end of the file!
such as Corkami Standard Test
let's look at HTML format
it enforces NOTHING!
anything before the <html> tag!
even 28 Mb of binary!
and it's been the same
since Mozilla 1.0 in 2002thanks to Nicolas Grégoire!
now, the PDF format
signature position?
● officially at offset 0
● officially tolerated until offset 1024
● wtf?
○ it get actually worse later
PDF trick 1
put a small executable within 1024 bytes
(just concatenate)
trick 2
1. start a fake PDF + object in a PE header
2. finish fake object at the end the PE
3. end fake object
4. put PDF real structure
works with real-life example!
(PE data might contain PDF keywords)
JAR = ZIP + Class
just enforced at the very end of the file
but CRCs are just ignored
it was too easy :p
Summary
Structure
1. start
○ PE Signature
■ %PDF + fake obj start
■ HTML comment start
2. next
○ PE (next)
○ HTML
○ PDF (next)
3. bottom
○ ZIP
it’s time for a real example!
an inception demo!
wait, what?
we’re already in the demo!
the live version file is simultaneously:
● the PDF slides themselves
● a PDF viewer executable
○ ie, the file is loading itself
● the PoCs in a ZIP
● an HTML readme
○ with JavaScript mario
so, it works
but it lacks something
● not artistic enough
● not advanced enough
let's build a 'well representative' (=nasty) PoC
the PE specs
● Official MS specs = big joke
○ 'the gentle guide for beginners'
○ barely describes standard PEs
stripped down PE
many elements removed
● including no sections
imports
(imports = communication between executables and libraries)
imports are made of 3 lists
evil imports
● let's make these lists into each other
● with more extra tricks to fail parser!
ultimate import fail
● failing all tools
○ including IDA & Hiew
● now fixed :)
let's put some code
● some undocumented opcodes!
● big blank spaces in Intel official docs
let's check AMD's
● miracle!
result in WinDbg
● '???' == clueless (tool/user)
don't rely (only) on official docs
messing with PDF
there is a so-called standard
and the reality of existing parsers
looking at: Adobe, MuPDF, Chrome
● 3 different files
○ working each on a specific viewer
○ failing on the other 2
let's look inside
● MuPDF
○ no %PDF sig required
■ a PDF without a PDF sig ? WTF ?!?!
○ no trailer keyword required either
● Chrome
○ integer overflows: -4294967275 = 21
○ trailer in a comment
■ it can actually be almost ANYWHERE
■ even inside another object
● Adobe
○ looks almost sane compare to the other 2
Chrome insanity++
(thx to Jonas Magazinius)
● a single object
● no 'trailer'
● inline stream
● brackets are not even closed
● * are required - it just checks for minimum
space
%PDF*****
1 0 obj
<<
/Size 2
/W[[]1/]
/Root 1 0 R
/Pages<<
/Kids[<<
/Contents<<>>
stream
BT{99
Tf{Td(Inlined PDF)'
endstream
>>]
>>
>>
stream
*
endstream
startxref%*******
PDF.JS
● very strict
○ 'too' strict / naive ?
○ I don't want to be their QA ;)
● requires a lot of information usually ignored
○ xref
○ /Length %PDF-1.1
1 0 obj
<<
% /Type /Catalog
...
>>
endobj
2 0 obj
<<
/Type /Pages
...
>>
endobj
3 0 obj
<<
/Type /Page
/Resources <<
/Font <<
/F1 <<
/Type /Font
/Subtype /Type1
...
>>
>>
>>
>>
endobj
4 0 obj
<< /Length 47>>
stream
...
xref
0 1
0000000000 65535 f
0000000010 00000 n
...
let's play further
combine 3 documents in a single file
● it's actually 3 set of 'independant' objects
● objects are parsed
○ but not used
alternate reality demo
the live slide-deck contains 2 PDF
● bogus one under Chrome
● real one under MuPDF (Sumatra, Linux...)
● rejected under Acrobat
○ because of the PE signature (see later)
DEMO
final PoC
● combine most previously mentioned tricks
● many fails on many tools
● total control of the structure
○ the PDF 'ends' in the Java class
Adobe rejects 'weird
magics' after 10.1.5
not in their own specs :p
10.1.4 10.1.5
also in ELF/Linux flavor
● starring a signature-less PDF
○ which won't run on other viewers
and Apple too
PS: I don't have a Mac, this was built blindly
Thanks to Nicolas Seriot for testing
why should we care?
like washing powders
security tools are selected:
● speed
● {files} → {[clean/detected]}
file types not taken into consideration
type confusion
make the tool believe it's another type, which
will fool the engine
engine with checksum caching will be fooled:
1. scanned as HTML, clean
2. reused as PE but malicious
engine exhaustion
rankings in magazines are based on scanning
time
→ scanning per file must stop arbitrarily
→ waste scanning cycle by adding extra
formats
Weaknesses
● evasion
○ filters → exfiltration
○ same origin policy
○ detection
■ ex: clean PE but malicious PDF/HTML/...
■ exhaust checks
■ pretend to be corrupt
● DoS
Conclusion
Conclusion
● type confusion is bad
○ succinct docs too
○ lazy softwares as well
● go beyond the specs
○ Adobe: good
● suggestions
○ more extensions checks
○ isolate downloaded files
○ enforce magic signature at offset 0
Questions ?
thank YOU !
http://
reverseengineering.stackexchange.com
@angealbertini
✉ ange@corkami.com
Bonus
Valid image as JavaScript
Highlighted by Saumil Shah
● abusing header and parsers laxisms
● turn a field into /*
● close comment after the picture data

More Related Content

What's hot

Python for IoT, A return of experience
Python for IoT, A return of experiencePython for IoT, A return of experience
Python for IoT, A return of experience
Alexandre Abadie
 
Phpconf taiwan-2012
Phpconf taiwan-2012Phpconf taiwan-2012
Phpconf taiwan-2012Hash Lin
 
How to Dockerize, Automate the Build and Deployment Process for Flutter?
How to Dockerize, Automate the Build and Deployment Process for Flutter?How to Dockerize, Automate the Build and Deployment Process for Flutter?
How to Dockerize, Automate the Build and Deployment Process for Flutter?
9 series
 
Egress-Assess and Owning Data Exfiltration
Egress-Assess and Owning Data ExfiltrationEgress-Assess and Owning Data Exfiltration
Egress-Assess and Owning Data Exfiltration
CTruncer
 
Stashaway 1
Stashaway 1Stashaway 1
Stashaway 1priestc
 
Boosting python web apps with protocol buffers &amp; grpc
Boosting python web apps with protocol buffers &amp; grpcBoosting python web apps with protocol buffers &amp; grpc
Boosting python web apps with protocol buffers &amp; grpc
Naren Arya
 
Pentester++
Pentester++Pentester++
Pentester++
CTruncer
 
Hacking - Breaking Into It
Hacking - Breaking Into ItHacking - Breaking Into It
Hacking - Breaking Into It
CTruncer
 
Making%20R%20Packages%20Under%20Windows
Making%20R%20Packages%20Under%20WindowsMaking%20R%20Packages%20Under%20Windows
Making%20R%20Packages%20Under%20Windowstutorialsruby
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Holden Karau
 
Understanding how concurrency work in os
Understanding how concurrency work in osUnderstanding how concurrency work in os
Understanding how concurrency work in os
GenchiLu1
 
First month with golang - Building Telegram chat bot
First month with golang - Building Telegram chat botFirst month with golang - Building Telegram chat bot
First month with golang - Building Telegram chat bot
Dan Tran
 
Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016
maiktoepfer
 
Mono - Alternative .NET CLR Implementation
Mono - Alternative .NET CLR ImplementationMono - Alternative .NET CLR Implementation
Mono - Alternative .NET CLR ImplementationYulian Slobodyan
 

What's hot (16)

Python for IoT, A return of experience
Python for IoT, A return of experiencePython for IoT, A return of experience
Python for IoT, A return of experience
 
Phpconf taiwan-2012
Phpconf taiwan-2012Phpconf taiwan-2012
Phpconf taiwan-2012
 
PostgreSQL Development Today: 9.0
PostgreSQL Development Today: 9.0PostgreSQL Development Today: 9.0
PostgreSQL Development Today: 9.0
 
How to Dockerize, Automate the Build and Deployment Process for Flutter?
How to Dockerize, Automate the Build and Deployment Process for Flutter?How to Dockerize, Automate the Build and Deployment Process for Flutter?
How to Dockerize, Automate the Build and Deployment Process for Flutter?
 
Egress-Assess and Owning Data Exfiltration
Egress-Assess and Owning Data ExfiltrationEgress-Assess and Owning Data Exfiltration
Egress-Assess and Owning Data Exfiltration
 
Stashaway 1
Stashaway 1Stashaway 1
Stashaway 1
 
Boosting python web apps with protocol buffers &amp; grpc
Boosting python web apps with protocol buffers &amp; grpcBoosting python web apps with protocol buffers &amp; grpc
Boosting python web apps with protocol buffers &amp; grpc
 
Pentester++
Pentester++Pentester++
Pentester++
 
Hacking - Breaking Into It
Hacking - Breaking Into ItHacking - Breaking Into It
Hacking - Breaking Into It
 
Making%20R%20Packages%20Under%20Windows
Making%20R%20Packages%20Under%20WindowsMaking%20R%20Packages%20Under%20Windows
Making%20R%20Packages%20Under%20Windows
 
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...Powering tensorflow with big data (apache spark, flink, and beam)   dataworks...
Powering tensorflow with big data (apache spark, flink, and beam) dataworks...
 
Understanding how concurrency work in os
Understanding how concurrency work in osUnderstanding how concurrency work in os
Understanding how concurrency work in os
 
First month with golang - Building Telegram chat bot
First month with golang - Building Telegram chat botFirst month with golang - Building Telegram chat bot
First month with golang - Building Telegram chat bot
 
MyReplayInZen
MyReplayInZenMyReplayInZen
MyReplayInZen
 
Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016
 
Mono - Alternative .NET CLR Implementation
Mono - Alternative .NET CLR ImplementationMono - Alternative .NET CLR Implementation
Mono - Alternative .NET CLR Implementation
 

Viewers also liked

Funky file formats - 31c3
Funky file formats - 31c3Funky file formats - 31c3
Funky file formats - 31c3
Ange Albertini
 
A binary chimera - 3 headers & 1 data body in a single file
A binary chimera - 3 headers & 1 data body in a single fileA binary chimera - 3 headers & 1 data body in a single file
A binary chimera - 3 headers & 1 data body in a single file
Ange Albertini
 
I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules
 I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules
I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules
IndiaMART InterMESH Limited
 
Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3
Nick Sullivan
 
Personal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark NetPersonal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark Net
Silvia Puglisi
 

Viewers also liked (6)

Funky file formats - 31c3
Funky file formats - 31c3Funky file formats - 31c3
Funky file formats - 31c3
 
A binary chimera - 3 headers & 1 data body in a single file
A binary chimera - 3 headers & 1 data body in a single fileA binary chimera - 3 headers & 1 data body in a single file
A binary chimera - 3 headers & 1 data body in a single file
 
31c3
31c331c3
31c3
 
I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules
 I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules
I2R Labs, Bengaluru, Telecommunication Equipment GPS Modules
 
Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3Heartache and Heartbleed - 31c3
Heartache and Heartbleed - 31c3
 
Personal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark NetPersonal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark Net
 

Similar to Messing with binary formats

PDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documentsPDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documents
Ange Albertini
 
PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01
dumbfuckery
 
PDF: myths vs facts
PDF: myths vs factsPDF: myths vs facts
PDF: myths vs facts
Ange Albertini
 
A bit more of PE
A bit more of PEA bit more of PE
A bit more of PE
Ange Albertini
 
Linux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology aside
Leszek Godlewski
 
Caring for file formats
Caring for file formatsCaring for file formats
Caring for file formats
Ange Albertini
 
Drupalhagen 2014 kiss omg ftw
Drupalhagen 2014   kiss omg ftwDrupalhagen 2014   kiss omg ftw
Drupalhagen 2014 kiss omg ftw
Arne Jørgensen
 
Docker and Go: why did we decide to write Docker in Go?
Docker and Go: why did we decide to write Docker in Go?Docker and Go: why did we decide to write Docker in Go?
Docker and Go: why did we decide to write Docker in Go?
Jérôme Petazzoni
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...
Holden Karau
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
CTruncer
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Holden Karau
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopTamas K Lengyel
 
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io
 
Rusty Python
Rusty PythonRusty Python
Rusty Python
RangHo Lee
 
Drupal Day 2011 - Features: una vita felice
Drupal Day 2011 - Features: una vita feliceDrupal Day 2011 - Features: una vita felice
Drupal Day 2011 - Features: una vita felice
DrupalDay
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
Dharmit Shah
 
Introduction to google chromebooks and chromeboxes presentation tech-talk
Introduction to google chromebooks and chromeboxes presentation tech-talkIntroduction to google chromebooks and chromeboxes presentation tech-talk
Introduction to google chromebooks and chromeboxes presentation tech-talk
Roel Palmaers
 
The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formats
Ange Albertini
 

Similar to Messing with binary formats (20)

PDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documentsPDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documents
 
PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01
 
PDF: myths vs facts
PDF: myths vs factsPDF: myths vs facts
PDF: myths vs facts
 
A bit more of PE
A bit more of PEA bit more of PE
A bit more of PE
 
Linux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology aside
 
Cpb2010
Cpb2010Cpb2010
Cpb2010
 
Caring for file formats
Caring for file formatsCaring for file formats
Caring for file formats
 
Drupalhagen 2014 kiss omg ftw
Drupalhagen 2014   kiss omg ftwDrupalhagen 2014   kiss omg ftw
Drupalhagen 2014 kiss omg ftw
 
Docker and Go: why did we decide to write Docker in Go?
Docker and Go: why did we decide to write Docker in Go?Docker and Go: why did we decide to write Docker in Go?
Docker and Go: why did we decide to write Docker in Go?
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and Golang
 
Rusty Python
Rusty PythonRusty Python
Rusty Python
 
Drupal Day 2011 - Features: una vita felice
Drupal Day 2011 - Features: una vita feliceDrupal Day 2011 - Features: una vita felice
Drupal Day 2011 - Features: una vita felice
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
Introduction to google chromebooks and chromeboxes presentation tech-talk
Introduction to google chromebooks and chromeboxes presentation tech-talkIntroduction to google chromebooks and chromeboxes presentation tech-talk
Introduction to google chromebooks and chromeboxes presentation tech-talk
 
Ruxmon.2013-08.-.CodeBro!
Ruxmon.2013-08.-.CodeBro!Ruxmon.2013-08.-.CodeBro!
Ruxmon.2013-08.-.CodeBro!
 
The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formats
 

More from Ange Albertini

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formats
Ange Albertini
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formats
Ange Albertini
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formats
Ange Albertini
 
TimeCryption
TimeCryptionTimeCryption
TimeCryption
Ange Albertini
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiot
Ange Albertini
 
Improving file formats
Improving file formatsImproving file formats
Improving file formats
Ange Albertini
 
KILL MD5
KILL MD5KILL MD5
KILL MD5
Ange Albertini
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
Ange Albertini
 
Beyond your studies
Beyond your studiesBeyond your studies
Beyond your studies
Ange Albertini
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscape
Ange Albertini
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisions
Ange Albertini
 
Infosec & failures
Infosec & failuresInfosec & failures
Infosec & failures
Ange Albertini
 
Connecting communities
Connecting communitiesConnecting communities
Connecting communities
Ange Albertini
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionist
Ange Albertini
 
Hacks in video games
Hacks in video gamesHacks in video games
Hacks in video games
Ange Albertini
 
Trusting files (and their formats)
Trusting files (and their formats)Trusting files (and their formats)
Trusting files (and their formats)
Ange Albertini
 
Let's write a PDF file
Let's write a PDF fileLet's write a PDF file
Let's write a PDF file
Ange Albertini
 
An overview of potential leaks via PDF
An overview of potential leaks via PDFAn overview of potential leaks via PDF
An overview of potential leaks via PDF
Ange Albertini
 
Advanced Pdf Tricks
Advanced Pdf TricksAdvanced Pdf Tricks
Advanced Pdf Tricks
Ange Albertini
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3
Ange Albertini
 

More from Ange Albertini (20)

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formats
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formats
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formats
 
TimeCryption
TimeCryptionTimeCryption
TimeCryption
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiot
 
Improving file formats
Improving file formatsImproving file formats
Improving file formats
 
KILL MD5
KILL MD5KILL MD5
KILL MD5
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Beyond your studies
Beyond your studiesBeyond your studies
Beyond your studies
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscape
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisions
 
Infosec & failures
Infosec & failuresInfosec & failures
Infosec & failures
 
Connecting communities
Connecting communitiesConnecting communities
Connecting communities
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionist
 
Hacks in video games
Hacks in video gamesHacks in video games
Hacks in video games
 
Trusting files (and their formats)
Trusting files (and their formats)Trusting files (and their formats)
Trusting files (and their formats)
 
Let's write a PDF file
Let's write a PDF fileLet's write a PDF file
Let's write a PDF file
 
An overview of potential leaks via PDF
An overview of potential leaks via PDFAn overview of potential leaks via PDF
An overview of potential leaks via PDF
 
Advanced Pdf Tricks
Advanced Pdf TricksAdvanced Pdf Tricks
Advanced Pdf Tricks
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 

Messing with binary formats

  • 1. Messing with binary formats London, EnglandAnge Albertini 2013/09/13 ΠΟΛΎΓΛΩΣΣΟΣ
  • 2. Welcome! ● this is the non-live version of my slides ○ more text ○ standard PDF file ;) About me: ● Reverse engineer ● my website: http://corkami.com ○ reverse engineering & visual documentations to extract the live deck. 61 slides: pdftk 44con-albertini.pdf cat 1 3 5 7 9 10 12 14 16 18 23 25 29 31 33 35 37 39 41 43 45 47 49 51 53 55-57 59 63 65 67 69 71 73 75 77 79 81 83 85 87 89 90 94 96 98 101-102 104 106-107 109-112 114-117 119 output 44con-albertini(live).pdf 119 2 4 6 8 11 13 15 17 19+4 24 26+3 32 34 36 38 40 42 44 46 48 50 52 54 58 60+3 64 66 68 70 72 74 76 78 80 82 84 86 88 91+3 95 97 99+2 103 105 108 113 118
  • 3. low-level ones, that is I just like to play with lego blocks
  • 4. generate files byte per byte Goals ● explore the format ● make sure that's how things work ● full control over the structure
  • 5. result: ● a complete executable ● all bytes defined by hand
  • 6. our problem ● is related to virus (malwares) ● they use many file formats ● it's critical to identify them reliably ○ and to tell whether corrupted or well-formed
  • 7. standard infection chain the most common chain: 1. a web page, in HTML format a. launching an applet 2. an evil applet, in CLASS format a. exploiting a Java vulnerability b. dropping an executable 3. a malicious executable, in Portable Executable format (a vast majority of malwares rely on an executable)
  • 8. another classic chain ● open a PDF document ○ with an exploit inside ■ dropping or downloading a PE executable ● get a malicious executable on your machine
  • 9. the challenge it might look obvious: ● tell whether it's a PDF, a PE, a JAVA, an HTML... ● typical formats are clearly defined ○ Magic signature enforced at offset 0
  • 10. reality some formats have no header at all ● Command File (DOS 16 bits) ● Master Boot record some formats don't need to start at offset 0 ● Archives (Zips, Rars...) ● HTML ○ but text-only? some formats accept a large controllable block early in their header ● Portable Executable ● PICT image
  • 11. How did this start? a real-life problem: 1. a (malicious) HTML page 2. started with 'MZ' (the signature of PE) 3. just scanned as a PE! a. wow, this PE is highly corrupted :) b. it must be clean :p ? MZ
  • 12. polyglots in the wild GIFAR = GIF + JAR ● an uploaded image ○ an avatar in a forum ● with a malicious JAVA appended as JAR hosted on the server! ● bypass same domain policy ● now useable via its JAVA=EVIL payload + =
  • 13. let's get started PE, the executable format of windows ● it's central to windows malware ● it enforces a magic signature at offset 0 ○ game over for other formats?
  • 14. ● starts with a compulsory header ● made of sub-headers overview
  • 15. a historical sandwich 1. a deprecated but required header 2. a modern header
  • 16. old header content ● almost completely ignored ● only required: ○ 2 byte signature ○ pointer to new header
  • 17. the new header can be anywhere ex: at the end of the file! such as Corkami Standard Test
  • 18. let's look at HTML format
  • 19. it enforces NOTHING! anything before the <html> tag! even 28 Mb of binary!
  • 20. and it's been the same since Mozilla 1.0 in 2002thanks to Nicolas Grégoire!
  • 21. now, the PDF format
  • 22. signature position? ● officially at offset 0 ● officially tolerated until offset 1024 ● wtf? ○ it get actually worse later
  • 23. PDF trick 1 put a small executable within 1024 bytes (just concatenate)
  • 24. trick 2 1. start a fake PDF + object in a PE header 2. finish fake object at the end the PE 3. end fake object 4. put PDF real structure works with real-life example! (PE data might contain PDF keywords)
  • 25. JAR = ZIP + Class just enforced at the very end of the file
  • 26. but CRCs are just ignored it was too easy :p
  • 28. Structure 1. start ○ PE Signature ■ %PDF + fake obj start ■ HTML comment start 2. next ○ PE (next) ○ HTML ○ PDF (next) 3. bottom ○ ZIP
  • 29. it’s time for a real example! an inception demo! wait, what?
  • 30. we’re already in the demo! the live version file is simultaneously: ● the PDF slides themselves ● a PDF viewer executable ○ ie, the file is loading itself ● the PoCs in a ZIP ● an HTML readme ○ with JavaScript mario
  • 31. so, it works but it lacks something ● not artistic enough ● not advanced enough let's build a 'well representative' (=nasty) PoC
  • 32. the PE specs ● Official MS specs = big joke ○ 'the gentle guide for beginners' ○ barely describes standard PEs
  • 33. stripped down PE many elements removed ● including no sections
  • 34. imports (imports = communication between executables and libraries) imports are made of 3 lists
  • 35. evil imports ● let's make these lists into each other ● with more extra tricks to fail parser!
  • 36. ultimate import fail ● failing all tools ○ including IDA & Hiew ● now fixed :)
  • 37. let's put some code ● some undocumented opcodes! ● big blank spaces in Intel official docs
  • 39. result in WinDbg ● '???' == clueless (tool/user) don't rely (only) on official docs
  • 41. there is a so-called standard and the reality of existing parsers looking at: Adobe, MuPDF, Chrome ● 3 different files ○ working each on a specific viewer ○ failing on the other 2
  • 42.
  • 43. let's look inside ● MuPDF ○ no %PDF sig required ■ a PDF without a PDF sig ? WTF ?!?! ○ no trailer keyword required either ● Chrome ○ integer overflows: -4294967275 = 21 ○ trailer in a comment ■ it can actually be almost ANYWHERE ■ even inside another object ● Adobe ○ looks almost sane compare to the other 2
  • 44.
  • 45. Chrome insanity++ (thx to Jonas Magazinius) ● a single object ● no 'trailer' ● inline stream ● brackets are not even closed ● * are required - it just checks for minimum space
  • 46. %PDF***** 1 0 obj << /Size 2 /W[[]1/] /Root 1 0 R /Pages<< /Kids[<< /Contents<<>> stream BT{99 Tf{Td(Inlined PDF)' endstream >>] >> >> stream * endstream startxref%*******
  • 47. PDF.JS ● very strict ○ 'too' strict / naive ? ○ I don't want to be their QA ;) ● requires a lot of information usually ignored ○ xref ○ /Length %PDF-1.1 1 0 obj << % /Type /Catalog ... >> endobj 2 0 obj << /Type /Pages ... >> endobj 3 0 obj << /Type /Page /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 ... >> >> >> >> endobj 4 0 obj << /Length 47>> stream ... xref 0 1 0000000000 65535 f 0000000010 00000 n ...
  • 48. let's play further combine 3 documents in a single file ● it's actually 3 set of 'independant' objects ● objects are parsed ○ but not used
  • 49. alternate reality demo the live slide-deck contains 2 PDF ● bogus one under Chrome ● real one under MuPDF (Sumatra, Linux...) ● rejected under Acrobat ○ because of the PE signature (see later) DEMO
  • 50. final PoC ● combine most previously mentioned tricks ● many fails on many tools ● total control of the structure ○ the PDF 'ends' in the Java class
  • 51. Adobe rejects 'weird magics' after 10.1.5 not in their own specs :p 10.1.4 10.1.5
  • 52. also in ELF/Linux flavor ● starring a signature-less PDF ○ which won't run on other viewers
  • 53.
  • 54. and Apple too PS: I don't have a Mac, this was built blindly Thanks to Nicolas Seriot for testing
  • 55. why should we care?
  • 56. like washing powders security tools are selected: ● speed ● {files} → {[clean/detected]} file types not taken into consideration
  • 57. type confusion make the tool believe it's another type, which will fool the engine engine with checksum caching will be fooled: 1. scanned as HTML, clean 2. reused as PE but malicious
  • 58.
  • 59. engine exhaustion rankings in magazines are based on scanning time → scanning per file must stop arbitrarily → waste scanning cycle by adding extra formats
  • 60. Weaknesses ● evasion ○ filters → exfiltration ○ same origin policy ○ detection ■ ex: clean PE but malicious PDF/HTML/... ■ exhaust checks ■ pretend to be corrupt ● DoS
  • 62. Conclusion ● type confusion is bad ○ succinct docs too ○ lazy softwares as well ● go beyond the specs ○ Adobe: good ● suggestions ○ more extensions checks ○ isolate downloaded files ○ enforce magic signature at offset 0
  • 65. Bonus
  • 66. Valid image as JavaScript Highlighted by Saumil Shah ● abusing header and parsers laxisms ● turn a field into /* ● close comment after the picture data