SlideShare a Scribd company logo
1 of 69
Download to read offline
Caring for
file formats
Caring for
file formats
Ange Albertini
Troopers 2016
Ange Albertini
Troopers 2016
TL;DR
● Attack surface with file formats is too big.
● Specs are useless (just a nice ‘guide’), not representing reality.
● We can’t deprecate formats because we can’t preserve and we can’t define how
they really work
● We need open good libraries to simplify landscape, and create a corpus to
express the reality of file format, which gives us real “documentation”.
● Then we can preserve and deprecate older format, which reduces attack surface.
● From then on, we can focus on making the present more secure.
● We don’t need “new” formats: we need ‘alive’ specs and files corpus.
Otherwise specs will always diverge from reality.
Ange Albertini
reverse engineering &
visual documentation
@angealbertini
ange@corkami.com
http://www.corkami.comWelcome to my talk!
I make polyglots (multi-type files),
schizophrenics (multi-behavior)...
I tried to explain file formats with cows…
But that didn’t really tell why people should care.
1
3DES
I really like to play with file formats...
AESK
AESK
JPG
JAR
(ZIP + CLASS)
PDF
FLV
PNG
2
I’m a part of PoC||GTFO,
for which I’m a file format
user and abuser.
PoC||GTFO: many file formats
● Articles
PDFLaTeX PDFBook Inkscape GhostScript Scribus Blender Gimp Fontforge
PDFFont Mutool
● Proof of Concept
Qpdf Xpdf Ruby Python Bash Truecrypt Wavpack Audacity Baudline Sox Tar
Zip MkIsoFS LSnes PngOpt JpegSnoop AdvPNG Nasm Qemu BPGEnc
And many custom scripts handling file formats in unconventional ways…
I'm interested about hardware preservation
and digital preservation.
My interests
● Using file formats
○ graphics, 3d, music…
● Abusing file formats
○ polyglot, schizophrenia, hash collisions…
● Preserving file formats
○ Retro-gaming, digital archeology...
A miserable little pile of secrets
Not just a sequence of binary
What is a file format?
If you [/your program] generate
a picture of any kind,
you might want to export
the result to something
that you can re-use later.
(same for any form of information)
A computer dialect
to communicate
between communities.
What is a file format?
File formats are
community connectors.
Don’t think so?
Try exporting everything as XML ;)
Most people don’t care
about <actor>
They only care about <roles>
We mostly care about the input/output.
Example:
We don’t care about GIF
We mostly care about its characteristics
and how easy it is to use.
No need to be emotional,
and stay in our comfort zone.
We don’t really care
about file formats.
We care about their caracteristics.
Not groundbreaking,
but supported “everywhere”.
Why should infosec care?
Fuzz formats. Blame “bad” devs.
Collect CVEs. Boast your ego.
10 PRINT “SOLVED ANYTHING YET?”
20 GOTO 10
Attack surface
● 1 OS = N supported formats
● For each format:
○ How many parsers?
○ For each parser:
■ Which version, compiler...
The PGM or PPM
formats are the easiest
way to convert any data
in valid grayscale or
RGB pictures.
But most people don’t
know it’s supported out
of the box by many
softwares.
We should reduce the attack surface.
How many unsuspected supported
[sub-]formats and parsers?
https://lcamtuf.blogspot.com/2014/10/psa-dont-run-strings-on-untrusted-files.html
How many file formats supported
by your browser ?
By your OS?
How many do you really need ?
Think “embedded”.
Capacity is still too cheap:
we keep stacking formats/features,
which doesn’t solve anything.
It’s a problem everywhere.
We keep losing ground.
<!--
PoC||GTFO 10
“Pokemon plays Twitch”
1. Exploit a GameBoy game via input
2. Take over the Super GameBoy
3. Take over the Super Nintendo
The file itself can perform the exploit
(on the hardware or an emulator).
The payload displays the article.
-->
PoC||GTFO 10 is a PoC-ception:
- a PDF article describing the exploit
- a file performing the exploit
(to display the article)
“young celebs”
What they were supposed to be
doesn’t really matter.
What file formats were supposed to be
doesn't matter anymore,
what they are now is all we care.
Security cares about current reality,
not obsolete theory.
We can blame bad parsers.
What about the file formats?
If the map is unclear enough, you’ll get lost anyway.
A blurry file format will never lead to a clean parser.
use a ready-made translator:
an import/export library
Write your own:
read the specs.
2 ways to communicate
Landscapes
To exploit hash collisions, I abused JPEG.
To abuse JPEG “everywhere”, just abuse LibJPEG.
JPEG format’s landscape
in practice, JPEG is LibJPEG turbo v6
● de facto standard
○ later versions not used (different API)
Even if you create your own JPEG library,
you want to have full LibJPEG compatibility.
JPEG format is defined by LibJPEG.
I made extremely custom PDFs for each reader.
These "extreme" PDFs fail on any other reader.
PDF’s current landscape
PDF: 6 interpretations of the specs
● specs are even more useless
One good open library:
a unified attack surface
Fuzz it, pwn everyone ?
True, but also fixed for everyone!
Is diversity really good?
We’re all supposed to use the same file format.
Diversity is good?
Attack surface is worse.
Unofficial substandards.
In any cases...
Specs are merely an introduction guide.
A free set of examples w/ corner cases.
A grammar ?
PDF’s future
PDF/E (engineer): 3d crap
PDF/A (archiving): already 8 flavours
Specs:
● specs are now commercial
● the main implementation is not open
● no set of free files.
And all countries preserve their culture with that format?!?!
We’re waiting for a new disaster...
many file formats are
abandoned
One specs. then nothing.
It’s like knowing about someone
only from a baby’s picture.
<!--
PoC||GTFO 11
PoC||GTFO 11 is a webserver serving itself, with its own HTML page
extracting its own attachments from its ZIP.
$ruby pocorgtfo11.pdf
Listening for connections on port 8080.
To listen on a different port,
re-run with the desired port as a command-line argument.
A neighbor at 127.0.0.1 is requesting /
A neighbor at 127.0.0.1 is requesting /ajax/feelies.json
A neighbor at 127.0.0.1 is requesting /favicon.png
$unzip -l pocorgtfo11.pdf
Archive: pocorgtfo11.pdf
Length Date Time Name
-------- ---- ---- ----
0 03-16-16 13:37 4am/
25955 03-11-16 15:06 4am/Stickybear Math 2 (4am crack).txt
[...]
3241 03-16-16 13:37 wafflehouse.txt
-------- -------
8177332 23 files
-->
PoC||GTFO 11 is self-aware:
a PDF that serves itself (HTTP quine),
parses its own ZIP to serve its archived feelies.
Important question
Do you still sleep
with a teddy bear?
Kids really deprecate stuff
Our computers still handle always more
and more file formats.
⇒ The attack surface just keeps growing.
Obsolete formats are
still omnipresent
Formats, sub-formats, features...
Because it’s unclear
if we can go back.
We’d be too afraid to deprecate them.
Yet we deprecate
for security.
Example for PDF:
JPEG-compressed text
is not supported anymore
(it could bypass security).
Windows PE format
becomes stricter
(deprecates packers)
For example,
EPUB 3.1 suddenly killed
backward compatibility.
http://blog.kbresearch.nl/2016/03/10/the-future-of-epub-a-first-look-at-the-epub-3-1-editors-draft/
Sometimes,
it’s not even for
security reasons
We don’t need
new file formats.
It’s the same problem again if
eventually their specs stop reflecting reality.
Even dictionaries have
regular updates,
to reflect reality.
Story time
Digipres = PDF worshippers. 150 years of availability?
● Non free specs + closed source software?
Here comes the grim reaper:
● Fix your stuff or it will be killed (like Flash)
We store our knowledge. What about files born digital?
Not infosec, but worrying.
veraPDF and its test files:
a great initiative.
PE.corkami.com: my own collection of hand-made executables and "documentation" (completely free).
Some of these failed a lot of software...
Consequence of my PE page+corpus
● 'corkami-proof' software
● raises the bar for everyone
● become a hub of knowledge
○ "I can't share the sample", but from the knowledge,
my own file will be shared
⇒ even useful for the original contact
Conclusion
Attack surface
Too many (sub)formats
Too many parsers (= no good open lib)
Specs
Specs shouldn’t be a religious text
● Worshipped, but outdated and worthless
Specs should reflect reality (a law)
● updated, enforced, realistic, freely available
A good open lib
Deprecation
Deprecation is a natural cycle, and yet...
We are afraid to deprecate because
no file format is fully preserved:
● open, up to date specs
● free test coverage
But it won’t happen...
...until a great disaster ?
It ends up on CNN, with a logo & a website :)
Ack
Phil Fabrice Travis Sergey
Micah Kurt QKumba Hanno...
Thank you!
Caring for
file formats
corkami.com
@angealbertini
Hail to the king, baby!

More Related Content

What's hot

Multilingual sites in plone
Multilingual sites in ploneMultilingual sites in plone
Multilingual sites in plone
Ramon Navarro
 

What's hot (10)

Let's write a PDF file
Let's write a PDF fileLet's write a PDF file
Let's write a PDF file
 
Multilingual sites in plone
Multilingual sites in ploneMultilingual sites in plone
Multilingual sites in plone
 
Code quality. Patch quality
Code quality. Patch qualityCode quality. Patch quality
Code quality. Patch quality
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Why I Love Python
Why I Love PythonWhy I Love Python
Why I Love Python
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
Neo4j
Neo4jNeo4j
Neo4j
 
What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)What is Python? (Silicon Valley CodeCamp 2014)
What is Python? (Silicon Valley CodeCamp 2014)
 
Writing Fast Code (JP) - PyCon JP 2015
Writing Fast Code (JP) - PyCon JP 2015Writing Fast Code (JP) - PyCon JP 2015
Writing Fast Code (JP) - PyCon JP 2015
 

Viewers also liked

Viewers also liked (7)

Connecting communities
Connecting communitiesConnecting communities
Connecting communities
 
Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)
 
nikhil resume
nikhil resumenikhil resume
nikhil resume
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionist
 
Hacks in video games
Hacks in video gamesHacks in video games
Hacks in video games
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3
 
Exploring the Portable Executable format
Exploring the Portable Executable formatExploring the Portable Executable format
Exploring the Portable Executable format
 

Similar to Caring for file formats

NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
National Information Standards Organization (NISO)
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
ice799
 
Python @ PiTech - March 2009
Python @ PiTech - March 2009Python @ PiTech - March 2009
Python @ PiTech - March 2009
tudorprodan
 

Similar to Caring for file formats (20)

The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formats
 
Schizophrenic files v2
Schizophrenic files v2Schizophrenic files v2
Schizophrenic files v2
 
What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...
 
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
 
Introduction to Programming in Go
Introduction to Programming in GoIntroduction to Programming in Go
Introduction to Programming in Go
 
Go language presentation
Go language presentationGo language presentation
Go language presentation
 
Codebits Handivi
Codebits HandiviCodebits Handivi
Codebits Handivi
 
U2 l05 lossy vs.lossless compression
U2 l05 lossy vs.lossless compressionU2 l05 lossy vs.lossless compression
U2 l05 lossy vs.lossless compression
 
C unix ipc
C unix ipcC unix ipc
C unix ipc
 
Python enterprise vento di liberta
Python enterprise vento di libertaPython enterprise vento di liberta
Python enterprise vento di liberta
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industry
 
PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01PDF - Secrets - 140519092839-phpapp01
PDF - Secrets - 140519092839-phpapp01
 
Python @ PiTech - March 2009
Python @ PiTech - March 2009Python @ PiTech - March 2009
Python @ PiTech - March 2009
 
Messing with binary formats
Messing with binary formatsMessing with binary formats
Messing with binary formats
 
Pigaios: A Tool for Diffing Source Codes against Binaries (Hacktivity 2018)
Pigaios: A Tool for Diffing Source Codes against Binaries (Hacktivity 2018)Pigaios: A Tool for Diffing Source Codes against Binaries (Hacktivity 2018)
Pigaios: A Tool for Diffing Source Codes against Binaries (Hacktivity 2018)
 
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
 
Schizophrenic files
Schizophrenic filesSchizophrenic files
Schizophrenic files
 
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and ApproachesBUD17-104: Scripting Languages in IoT: Challenges and Approaches
BUD17-104: Scripting Languages in IoT: Challenges and Approaches
 
Infrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleInfrastructure as code might be literally impossible
Infrastructure as code might be literally impossible
 

More from Ange Albertini

More from Ange Albertini (16)

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formats
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formats
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formats
 
TimeCryption
TimeCryptionTimeCryption
TimeCryption
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiot
 
KILL MD5
KILL MD5KILL MD5
KILL MD5
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Beyond your studies
Beyond your studiesBeyond your studies
Beyond your studies
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscape
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisions
 
Infosec & failures
Infosec & failuresInfosec & failures
Infosec & failures
 
Preserving arcade games
Preserving arcade gamesPreserving arcade games
Preserving arcade games
 
Let's talk about...
Let's talk about...Let's talk about...
Let's talk about...
 
Hide Android applications in images
Hide Android applications in imagesHide Android applications in images
Hide Android applications in images
 
Let's play with crypto! v2
Let's play with crypto! v2Let's play with crypto! v2
Let's play with crypto! v2
 
Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1
 

Recently uploaded

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

Caring for file formats

  • 1. Caring for file formats Caring for file formats Ange Albertini Troopers 2016 Ange Albertini Troopers 2016
  • 2. TL;DR ● Attack surface with file formats is too big. ● Specs are useless (just a nice ‘guide’), not representing reality. ● We can’t deprecate formats because we can’t preserve and we can’t define how they really work ● We need open good libraries to simplify landscape, and create a corpus to express the reality of file format, which gives us real “documentation”. ● Then we can preserve and deprecate older format, which reduces attack surface. ● From then on, we can focus on making the present more secure. ● We don’t need “new” formats: we need ‘alive’ specs and files corpus. Otherwise specs will always diverge from reality.
  • 3. Ange Albertini reverse engineering & visual documentation @angealbertini ange@corkami.com http://www.corkami.comWelcome to my talk!
  • 4. I make polyglots (multi-type files), schizophrenics (multi-behavior)...
  • 5. I tried to explain file formats with cows… But that didn’t really tell why people should care.
  • 6. 1 3DES I really like to play with file formats... AESK AESK JPG JAR (ZIP + CLASS) PDF FLV PNG 2
  • 7. I’m a part of PoC||GTFO, for which I’m a file format user and abuser.
  • 8. PoC||GTFO: many file formats ● Articles PDFLaTeX PDFBook Inkscape GhostScript Scribus Blender Gimp Fontforge PDFFont Mutool ● Proof of Concept Qpdf Xpdf Ruby Python Bash Truecrypt Wavpack Audacity Baudline Sox Tar Zip MkIsoFS LSnes PngOpt JpegSnoop AdvPNG Nasm Qemu BPGEnc And many custom scripts handling file formats in unconventional ways…
  • 9. I'm interested about hardware preservation and digital preservation.
  • 10. My interests ● Using file formats ○ graphics, 3d, music… ● Abusing file formats ○ polyglot, schizophrenia, hash collisions… ● Preserving file formats ○ Retro-gaming, digital archeology...
  • 11. A miserable little pile of secrets Not just a sequence of binary What is a file format?
  • 12. If you [/your program] generate a picture of any kind, you might want to export the result to something that you can re-use later. (same for any form of information)
  • 13. A computer dialect to communicate between communities. What is a file format?
  • 14. File formats are community connectors. Don’t think so? Try exporting everything as XML ;)
  • 15. Most people don’t care about <actor> They only care about <roles> We mostly care about the input/output.
  • 16. Example: We don’t care about GIF We mostly care about its characteristics and how easy it is to use. No need to be emotional, and stay in our comfort zone.
  • 17. We don’t really care about file formats. We care about their caracteristics. Not groundbreaking, but supported “everywhere”.
  • 18. Why should infosec care? Fuzz formats. Blame “bad” devs. Collect CVEs. Boast your ego. 10 PRINT “SOLVED ANYTHING YET?” 20 GOTO 10
  • 19. Attack surface ● 1 OS = N supported formats ● For each format: ○ How many parsers? ○ For each parser: ■ Which version, compiler...
  • 20. The PGM or PPM formats are the easiest way to convert any data in valid grayscale or RGB pictures. But most people don’t know it’s supported out of the box by many softwares.
  • 21. We should reduce the attack surface. How many unsuspected supported [sub-]formats and parsers? https://lcamtuf.blogspot.com/2014/10/psa-dont-run-strings-on-untrusted-files.html
  • 22. How many file formats supported by your browser ? By your OS? How many do you really need ? Think “embedded”.
  • 23. Capacity is still too cheap: we keep stacking formats/features, which doesn’t solve anything. It’s a problem everywhere. We keep losing ground.
  • 25. “Pokemon plays Twitch” 1. Exploit a GameBoy game via input 2. Take over the Super GameBoy 3. Take over the Super Nintendo
  • 26. The file itself can perform the exploit (on the hardware or an emulator). The payload displays the article.
  • 27. --> PoC||GTFO 10 is a PoC-ception: - a PDF article describing the exploit - a file performing the exploit (to display the article)
  • 28.
  • 29. “young celebs” What they were supposed to be doesn’t really matter.
  • 30. What file formats were supposed to be doesn't matter anymore, what they are now is all we care. Security cares about current reality, not obsolete theory.
  • 31. We can blame bad parsers. What about the file formats? If the map is unclear enough, you’ll get lost anyway. A blurry file format will never lead to a clean parser.
  • 32. use a ready-made translator: an import/export library Write your own: read the specs. 2 ways to communicate
  • 34. To exploit hash collisions, I abused JPEG. To abuse JPEG “everywhere”, just abuse LibJPEG.
  • 35. JPEG format’s landscape in practice, JPEG is LibJPEG turbo v6 ● de facto standard ○ later versions not used (different API) Even if you create your own JPEG library, you want to have full LibJPEG compatibility. JPEG format is defined by LibJPEG.
  • 36. I made extremely custom PDFs for each reader.
  • 37. These "extreme" PDFs fail on any other reader.
  • 38. PDF’s current landscape PDF: 6 interpretations of the specs ● specs are even more useless
  • 39. One good open library: a unified attack surface Fuzz it, pwn everyone ? True, but also fixed for everyone! Is diversity really good? We’re all supposed to use the same file format.
  • 40. Diversity is good? Attack surface is worse. Unofficial substandards.
  • 41. In any cases... Specs are merely an introduction guide. A free set of examples w/ corner cases. A grammar ?
  • 42. PDF’s future PDF/E (engineer): 3d crap PDF/A (archiving): already 8 flavours Specs: ● specs are now commercial ● the main implementation is not open ● no set of free files. And all countries preserve their culture with that format?!?! We’re waiting for a new disaster...
  • 43. many file formats are abandoned One specs. then nothing. It’s like knowing about someone only from a baby’s picture.
  • 45. PoC||GTFO 11 is a webserver serving itself, with its own HTML page extracting its own attachments from its ZIP. $ruby pocorgtfo11.pdf Listening for connections on port 8080. To listen on a different port, re-run with the desired port as a command-line argument. A neighbor at 127.0.0.1 is requesting / A neighbor at 127.0.0.1 is requesting /ajax/feelies.json A neighbor at 127.0.0.1 is requesting /favicon.png $unzip -l pocorgtfo11.pdf Archive: pocorgtfo11.pdf Length Date Time Name -------- ---- ---- ---- 0 03-16-16 13:37 4am/ 25955 03-11-16 15:06 4am/Stickybear Math 2 (4am crack).txt [...] 3241 03-16-16 13:37 wafflehouse.txt -------- ------- 8177332 23 files
  • 46. --> PoC||GTFO 11 is self-aware: a PDF that serves itself (HTTP quine), parses its own ZIP to serve its archived feelies.
  • 48. Do you still sleep with a teddy bear?
  • 49. Kids really deprecate stuff Our computers still handle always more and more file formats. ⇒ The attack surface just keeps growing.
  • 50. Obsolete formats are still omnipresent Formats, sub-formats, features...
  • 51. Because it’s unclear if we can go back. We’d be too afraid to deprecate them.
  • 52. Yet we deprecate for security. Example for PDF: JPEG-compressed text is not supported anymore (it could bypass security).
  • 53. Windows PE format becomes stricter (deprecates packers)
  • 54. For example, EPUB 3.1 suddenly killed backward compatibility. http://blog.kbresearch.nl/2016/03/10/the-future-of-epub-a-first-look-at-the-epub-3-1-editors-draft/ Sometimes, it’s not even for security reasons
  • 55. We don’t need new file formats. It’s the same problem again if eventually their specs stop reflecting reality.
  • 56. Even dictionaries have regular updates, to reflect reality.
  • 57. Story time Digipres = PDF worshippers. 150 years of availability? ● Non free specs + closed source software? Here comes the grim reaper: ● Fix your stuff or it will be killed (like Flash) We store our knowledge. What about files born digital? Not infosec, but worrying.
  • 58. veraPDF and its test files: a great initiative.
  • 59. PE.corkami.com: my own collection of hand-made executables and "documentation" (completely free).
  • 60. Some of these failed a lot of software...
  • 61. Consequence of my PE page+corpus ● 'corkami-proof' software ● raises the bar for everyone ● become a hub of knowledge ○ "I can't share the sample", but from the knowledge, my own file will be shared ⇒ even useful for the original contact
  • 63. Attack surface Too many (sub)formats Too many parsers (= no good open lib)
  • 64. Specs Specs shouldn’t be a religious text ● Worshipped, but outdated and worthless Specs should reflect reality (a law) ● updated, enforced, realistic, freely available A good open lib
  • 65. Deprecation Deprecation is a natural cycle, and yet... We are afraid to deprecate because no file format is fully preserved: ● open, up to date specs ● free test coverage
  • 66. But it won’t happen... ...until a great disaster ? It ends up on CNN, with a logo & a website :)
  • 67. Ack Phil Fabrice Travis Sergey Micah Kurt QKumba Hanno...