SlideShare a Scribd company logo
Ange Albertini - Hack.Lu 2015
Trusting files
(and their formats)
Ange Albertini
reverse engineering &
visual documentation
@angealbertini
ange@corkami.com
http://www.corkami.comWelcome to my talk!
My resume is a PDF. What could go wrong ? ;)
;)
For some reason, many people are not motivated to open any files coming from me,
so I made this to reward them ;)
"standard file" ;)
Yes, I write files by hand...
[...and I open them in hex editors]
%PDF-1.
1 0 obj
<< /Kids [<<
/Parent 1 0 R
/Resources <<>>
/Contents 2 0 R
>>]
>>
2 0 obj
<<>>
stream
BT
/F1 110 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
trailer <<
/Root << /Pages 1 0 R >>
>>
...like this one
truncated signature
missing parent /Type
/Kids should be indirect
missing /Font
missing kid /Type
missing /Count
missing endobj
missing /Length
missing xref
/Root should be indirect, missing /Size, missing root /Type
missing startxref, %%EOF
%PDF-1.
1 0 obj
<< /Kids [<<
/Parent 1 0 R
/Resources <<>>
/Contents 2 0 R
>>]
>>
2 0 obj
<<>>
stream
BT
/F1 110 Tf
10 400 Td
(Hello World!) Tj
ET
endstream
endobj
trailer <<
/Root << /Pages 1 0 R >>
>>
It’s not
standard...
INVALID?
...but it works
exactly as planned!
(without any reported error)
ACCEPTED!
File formats are
my playground
(and I'm beyond recovery already)
Files or file formats?
It's not a real question.
Fichier sans format n'est que ruine de l'âme ;)
To share information, you need to use a common standard.
Forms & file formats
moving to a different country == making a PDF/SNES polyglot
same problems: all similar, but all different
only difference: forms are rarely required to evolve
Trusting files comes with
trusting their format.
Knowing that the specs will
be useful and reliable.
Retrospective
1
3DES
AESK
AESK
JPG
JAR
(ZIP + CLASS)
PDF
FLV
PNG
2
I gave an entertaining presentation with many funky binary creations.
Check it if you want more binary magic tricks ;)
I wrote a technical paper classifying all my file format abuses:
So far, I was (only) playing with file formats.
Why?
Why are all these abuses possible?
What could we (try to) do about it?
The big question...
Now, I'm (also) in contact with people analyzing or designing file formats.
I presented at a DigiPres con about Infosec: today, it's the other way around.
There are different kinds of expectations for files.
The end-user just wants to view external files, store his own information and re-use it.
The developer relies on the specifications to add support in his library.
The archivist wants to make sure that his data will be re-usable much later.
The digital investigator looks for clues in a suspect's system.
An attacker tries to craft dangerous files,
while a defender wants to prevent it from happening.
Common points
We're blind believers:
● believing that we'll be able to reuse our
information
● believing that in any case, we can just rely
on the specs to help us, like a religious book.
"the cult of the (useless) specs" ;)
It's not just an Infosec problem.
Bad specs make it harder for
devs, DFIR, digipres, defenders...
Theory: check official specs
Reality:
check unofficial specs & blog posts
analyse/reverse libraries
gather ITW (clean & malware) samples
Does it ring a bell ?
Bad specs are why attackers
and DFIR devs can make so
much money ;)
It's not specs reading anymore, it's reversing.
Not all abuse of file formats
turn into exploits.
But why should we only fix
what's pwning you?
"Short term fix" anyone?
We just care about code,
and "cyber attacks".
Files tricks go under the radar.
Usually… a few exceptions...
Tavis Ormandy's ZIP/DLL polyglot exploit for Kaspersky
Tavis Ormandy's "HTML in certificate" exploit for Avast
J00ru's font vulnerability (Recon 2015)
If we don't understand
how it really works,
we can't: parse it, preserve it,
tell if corrupted or malicious.
Crafting a file format
File format is not just "data structure"
Protobuf / XML doesn't solve everything.
They're just the high-level layer.
Data structure need to be logical and make
sense from a dev perspective.
So at least, use a magic number/signature,
and enforce version numbers, sizes... ;)
Failure is still possible
Office file format is a … filesystem!
You can defragment it!
And it has different kinds of FAT ;)
https://www.reddit.com/r/IAmA/comments/3ilzey/were_a_bunch_of_developers_from_ibm_ask_us/?sort=top
A file format is not just an "algorithm"
Your algorithm is great, but the file format will
be the interface between your algorithm and all
its users and other applications.
finish your specs! double-check them!
provide test cases!
A file format is a map
Every street should follow the same rules,
Otherwise you must expect many violations.
Wherever there is a 'surprise', bad things
happen.
Consistency ^ (Compatibility || Schizophrenia)
"PSD makes inconsistency an art form"
https://code.google.com/p/xee/source/browse/XeePhotoshopLoader.m?r=f16763d221dfca6253983824b470adf553a19e06#108
// At this point, I'd like to take a moment to speak to you about the Adobe PSD format.
// insult to other bad formats, such as PCX or JPEG. No, PSD is an abysmal format. Having
// PSD is not a good format. PSD is not even a bad format. Calling it such would be an
// worked on this code for several weeks now, my hate for PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.
// If there are two different ways of doing something, PSD will do both, in different
// places. It will then make up three more ways no sane human would think of, and do those
// too. PSD makes inconsistency an art form. Why, for instance, did it suddenly decide
// that *these* particular chunks should be aligned to four bytes, and that this alignment
// should *not* be included in the size? Other chunks in other places are either unaligned,
// or aligned with the alignment included in the size. Here, though, it is not included.
// Either one of these three behaviours would be fine. A sane format would pick one. PSD,
// of course, uses all three, and more.
// Trying to get data out of a PSD file is like trying to find something in the attic of
// your eccentric old uncle who died in a freak freshwater shark attack on his 58th
// birthday. That last detail may not be important for the purposes of the simile, but
// at this point I am spending a lot of time imagining amusing fates for the people
// responsible for this Rube Goldberg of a file format.
// Earlier, I tried to get a hold of the latest specs for the PSD file format. To do this,
// I had to apply to them for permission to apply to them to have them consider sending
// me this sacred tome. This would have involved faxing them a copy of some document or
// other, probably signed in blood. I can only imagine that they make this process so
// difficult because they are intensely ashamed of having created this abomination. I
// was naturally not gullible enough to go through with this procedure, but if I had done
// so, I would have printed out every single page of the spec, and set them all on fire.
// Were it within my power, I would gather every single copy of those specs, and launch
// them on a spaceship directly into the sun.
//
// PSD is not my favourite file format.
Not just specs
A default open implementation?
with test cases for the code, and free-licenced
examples cases provided.
Too many 'features from the specs' are never
seen in the wild.
Life of a file format
1. define a format (if possible)
2. implement it in your software
3. end :(
if you're lucky:
your software becomes standard
along with its file format. That's all.
Becoming a de-facto standard
doesn't require anything:
it's your niche market.
No official requirements.
Just business directions.
no "long term plan"
You end up with a standard
that was never properly
designed or documented in
the first place.
Have fun preserving it or making it secure!
I wrote a simple "Hello World" PDF,
that works on every reader.
Yet, it's not 100% standard (only 99%)
That's a bad start :(
Thinking about bundling?
Hint: don't.
int bundle(trust){return trust--;}
Evolution of a format
(divergence)
Evolution
1. Tool X creates bogus file
2. StandardTool adapts silently to support them
3. Now StandardTool goes beyond the specs
Specs are now even more useless.
Ex: ColorTrac scanners, PDF readers
Implementations slowly
diverge from the specs
⇒ the specs become
theoretical and
useless in the wild.
Yet nothing exists to replace them.
Once it's a standard,
it's too late to fix it.
Before it's a standard,
no one really cares.
And too few people care anyway ;)
JPEG 1/2
JPEG (1992) is not a file format!
Open source library: LibJPEG → that's great!
LibJPEG goes beyond the specs:
- recovers standard types of App0 chunks
- including the one specific to Adobe
- unnecessary functions (headless JPEG (!))
- "let's add this in case" ⇔ design by committee ?
A JPG without a 'required' APPx segment
JPEG 2/2
JPEG is ‘de facto’ libJPEG-turbo v6b.
Explore corner-cases, and then you fail Adobe
or Safari:
⇒ their test cases are not big enough
Major problems (so many!)
specs really come last: absent, or TBD
incomplete specs: BPG, ZIP, PDF
incoherent specs: PDF
non-free specs.
Recovering broken files
AKA "hidden mode"
Take a fully working PDF.
Change one byte at the wrong place (in the XREF) ⇒ OMG it's corrupted!
But if you remove its XREF entirely, it now miraculously works,
with just a (misleading) dialog on closing, that actually means:
"we found some bugs, do you want to save as a valid but bloated file?"
Standard programs
typically embed a
(silent) recovery mode.
Nightmare for devs/defenders
These modes try their best
to recover "broken" files.
Far beyond the specs.
To improve security
and format reliability:
turn auto-recovery into
dialog box warnings?
or reject these files and log the error?
That would make vendors act.
"This file is not correct,
please contact your vendor"...
"helping" the end-user
by triggering no warning?
(even temporarily) OK
What about identifying bad
practices to make them stop
eventually?
Forcibly deprecate?
Like crypto? Sounds good, but...
Not going to happen:
Broken crypto leads to fast and mass pwnage.
Broken file formats mostly just lead to
headache - no incentive to avoid that.
Not enough "Android master key" bugs yet.
"one" standard ?
I made extremely custom PDFs for each reader.
These "extreme" PDFs fail on any other reader.
Consequence
We have 6 PDF reader 'standards' in practice:
these may be extreme examples, but
OTOH "Hello World" is not so complex
"Nothing to fix"
"Specs are subject to interpretation"
PDF Schizophrenia?
- Sumatra / Chrome-1 / Others
- Chrome-2 / Others
- Safari / Others
- Poppler / Others
It's not even funny anymore…
⇒ any unclear area may lead to schizophrenia
PDF = portable?
Most readers are okay to read 'standard' docs.
any advanced functions? Adobe Reader
(printing, forms, JavaScript, 3D).
Also, no more Linux version.
PDF, a clean standard?
Non-free specs.
Only the "standard" 1.7 doc is free.
No free examples.
Incomplete + missing specs
no shareable samples
Non-free specs?
No free sample-set?
And you wish to stay
a "standard" in 2015?
PDF for archiving?
PDF/A already has 8 sub-standards
Adobe Preflight is not very updated
⇒ Preservation is not a business model,
nor a legal requirement of any kind.
How long before "support is discontinued"?
PDF 2.0
No new security stuff, specs are now 170 CHF.
New printing features, new insecure features:
embedding files anyone?
http://www.pdfa.org/2015/10/whats-unique-about-pdf/
I'm not so sure about it - after all, we're killing Flash for security reasons.
A (tiny) ray of hope
VeraPDF.org:
open source PDF/A validator.
Preservation
portable compiler + toolchain
portable source
no OS dependency at all ?
preservation via closed-source software?
⇒ "emulation as a service" has a great future :(
ZIP archives already made for multiple floppy support.
1
2
3
Because it's awkward
and suboptimal for modern standards,
there are now 3 ways ITW to parse ZIP
(can be abused
like in the Android Master Key bug)
ZIP (1989) is still updated.
ZIP added AES, LZMA, 64 bits, Unicode.
But still this awkward obsolete structure?
Why not just reorder structures, enforcing
values, and slowly preventing abuses ?
Not re-inventing the format, just forking it.
Do we still need floppy support?
Seriously
Do we still really need Tape Archives?
Floppy-oriented, backward-parsed ZIP?
Any generated PDF that doesn't have its magic
at offset 0?
FTR:
OpenSSL still supports WinCE, BeOS….
Windows bitmap fonts are stored as 16 bits NE executables (copyright 1989).
Pure digital preservation
New documents are born digital:
the problem is shifted:
the 'master' copy already depends on:
source+compiler+toolchain+(OS+CPU).
A PDF with a JPEG-compressed script
JPEG, but not an image?
It's not against the specs,
but it was removed without any warning nor
tracking.
⇒ breaks backward compatibility
If your document was using it, now it's broken.
If this document is born digital, you lost your
source document.
Backward compatibility
Everywhere.
In case, you never know.
The customer is always right.
Perhaps except for security things ;)
Our kids will probably ask us one day
why we kept all these things for so long...
Windows compatibility
Windows is becoming progressively (but
silently) more strict for the PE format, slowly
killing several packers.
Have you heard anyone complaining?
(the official PE doc still totally sucks though)
breaking backward compatibility
It's ok if it's for valid reasons,
but keep track of changes, enforce version
numbers, and update the specs accordingly at
the same time!
Nowadays,
a file format is an evolving entity for security reasons,
not something sacred written in stone
Multiple formats is not the problem:
we have different needs.
But documentation never reflect
reality in any case.
There are many benefits to
know definitively what a valid
file is or isn't.
Cleaning up
Terse Executable is a cleaned-up version of the
Portable Executable
(but for UEFI, not to replace it).
Only example of forking that makes sense?
We just stack features...
There's no standard for
file format specifications
different style of writing, may be incomplete
unclear, non free...
Something I tried
PE.corkami.com: my own collection of hand-made executables and "documentation" (completely free).
Some of these failed a lot of software...
Consequence?
● 'corkami-proof' software
● raises the bar for everyone
● become a hub of knowledge
○ "I can't share the sample", but from the knowledge,
my own file will be shared
⇒ even useful for the original contact
Conclusion
We're -ed
We probably have to witness
the burning of
a digital "Library of Alexandria"
before we change anything.
(because money)
No matter the kind of format,
we can't trust files:
"specifications" ?
more like gentle introductions!
Or maybe something like
religious texts (with philosophical suggestions)
not accurate descriptions of reality.
Many more file abuses will come!
It doesn't get you any bug bounty,
but plenty of new classes
of abuse to discover:
compression, network, cryptography,
file systems...
Rules of thumb
● abuse your own format
○ double-check your specs -- with a twisted mind!
● open-source, unit-tested library
● consistency, technical common sense
● stop stacking features!
How you can help?
test-case binaries
● share your testing suite
● fuzzing results (seen from code coverage)
⇒ raises the bar for all industries
A format evolves
● deprecate!
● enforce version numbers
● make it public
we can set open ultimatum for crypto,
we should do the same for bad files.
Ack
Phil Paul Arindam Jacob Alex
Christophe Travis Tavis
Sergey Kurt Gabor Miki Gyn
Mat Bart Max
…
Thank you!
Corkami: 10 years!
time to evolve !
More PoCs, posters, book(s)...
+ some side projects
⇒ no more [personal] presentations for now
FAQ: "do you have any
recommended PDF reader"
Only Adobe Reader handles complex
documents and functionalities.
Other are more or less equivalent.
Not a very satisfying answer, I know ;)
PDFs:
myths vs facts
corkami.com
@angealbertini
Hail to the king, baby!

More Related Content

What's hot

PDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documentsPDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documentsAnge Albertini
 
Multilingual sites in plone
Multilingual sites in ploneMultilingual sites in plone
Multilingual sites in ploneRamon Navarro
 
Code quality. Patch quality
Code quality. Patch qualityCode quality. Patch quality
Code quality. Patch qualitymalcolmt
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 

What's hot (8)

PDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documentsPDF secrets - hiding & revealing secrets in PDF documents
PDF secrets - hiding & revealing secrets in PDF documents
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
NLP Project Full Circle
NLP Project Full CircleNLP Project Full Circle
NLP Project Full Circle
 
Fossetcon15
Fossetcon15Fossetcon15
Fossetcon15
 
Multilingual sites in plone
Multilingual sites in ploneMultilingual sites in plone
Multilingual sites in plone
 
Code quality. Patch quality
Code quality. Patch qualityCode quality. Patch quality
Code quality. Patch quality
 
Neo4j
Neo4jNeo4j
Neo4j
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 

Viewers also liked

Connecting communities
Connecting communitiesConnecting communities
Connecting communitiesAnge Albertini
 
Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Ange Albertini
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionistAnge Albertini
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3Ange Albertini
 
Exploring the Portable Executable format
Exploring the Portable Executable formatExploring the Portable Executable format
Exploring the Portable Executable formatAnge Albertini
 

Viewers also liked (7)

Connecting communities
Connecting communitiesConnecting communities
Connecting communities
 
Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)Binary art - Byte-ing the PE that fails you (extended offline version)
Binary art - Byte-ing the PE that fails you (extended offline version)
 
nikhil resume
nikhil resumenikhil resume
nikhil resume
 
TASBot - the perfectionist
TASBot - the perfectionistTASBot - the perfectionist
TASBot - the perfectionist
 
Hacks in video games
Hacks in video gamesHacks in video games
Hacks in video games
 
Preserving arcade games - 31c3
Preserving arcade games -  31c3Preserving arcade games -  31c3
Preserving arcade games - 31c3
 
Exploring the Portable Executable format
Exploring the Portable Executable formatExploring the Portable Executable format
Exploring the Portable Executable format
 

Similar to Trusting files (and their formats)

The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formatsAnge Albertini
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2ice799
 
Improving file formats
Improving file formatsImproving file formats
Improving file formatsAnge Albertini
 
Being a Software Engineer at Facebook
Being a Software Engineer at FacebookBeing a Software Engineer at Facebook
Being a Software Engineer at FacebookTyrone Nicholas
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industryReinout van Rees
 
Infrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleInfrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleice799
 
In our heated learning of the scope of genetic programming, before ...
In our heated learning of the scope of genetic programming, before ...In our heated learning of the scope of genetic programming, before ...
In our heated learning of the scope of genetic programming, before ...butest
 
What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...Sławomir Zborowski
 
Open event (show&tell april 2016)
Open event (show&tell april 2016)Open event (show&tell april 2016)
Open event (show&tell april 2016)Jorge López-Lago
 
Infrastructure as code might be literally impossible / Joe Domato (packageclo...
Infrastructure as code might be literally impossible / Joe Domato (packageclo...Infrastructure as code might be literally impossible / Joe Domato (packageclo...
Infrastructure as code might be literally impossible / Joe Domato (packageclo...Ontico
 
Rasmus, Think Again! Agile Framework == Happy Php Developer
Rasmus, Think Again! Agile Framework == Happy Php DeveloperRasmus, Think Again! Agile Framework == Happy Php Developer
Rasmus, Think Again! Agile Framework == Happy Php DeveloperArno Schneider
 
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docxevonnehoggarth79783
 
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Area41
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingDan Kaminsky
 
Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...
Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...
Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...RootedCON
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 

Similar to Trusting files (and their formats) (20)

The challenges of file formats
The challenges of file formatsThe challenges of file formats
The challenges of file formats
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
 
Improving file formats
Improving file formatsImproving file formats
Improving file formats
 
Being a Software Engineer at Facebook
Being a Software Engineer at FacebookBeing a Software Engineer at Facebook
Being a Software Engineer at Facebook
 
Semantic web, python, construction industry
Semantic web, python, construction industrySemantic web, python, construction industry
Semantic web, python, construction industry
 
Infrastructure as code might be literally impossible
Infrastructure as code might be literally impossibleInfrastructure as code might be literally impossible
Infrastructure as code might be literally impossible
 
In our heated learning of the scope of genetic programming, before ...
In our heated learning of the scope of genetic programming, before ...In our heated learning of the scope of genetic programming, before ...
In our heated learning of the scope of genetic programming, before ...
 
What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...What every C++ programmer should know about modern compilers (w/ comments, AC...
What every C++ programmer should know about modern compilers (w/ comments, AC...
 
Open event (show&tell april 2016)
Open event (show&tell april 2016)Open event (show&tell april 2016)
Open event (show&tell april 2016)
 
Bgnet a4 2
Bgnet a4 2Bgnet a4 2
Bgnet a4 2
 
Infrastructure as code might be literally impossible / Joe Domato (packageclo...
Infrastructure as code might be literally impossible / Joe Domato (packageclo...Infrastructure as code might be literally impossible / Joe Domato (packageclo...
Infrastructure as code might be literally impossible / Joe Domato (packageclo...
 
Rasmus, Think Again! Agile Framework == Happy Php Developer
Rasmus, Think Again! Agile Framework == Happy Php DeveloperRasmus, Think Again! Agile Framework == Happy Php Developer
Rasmus, Think Again! Agile Framework == Happy Php Developer
 
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
 
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
841- Advanced Computer ForensicsUnix Forensics LabDue Date.docx
 
Bgipc a4 2
Bgipc a4 2Bgipc a4 2
Bgipc a4 2
 
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
Ange Albertini and Gynvael Coldwind: Schizophrenic Files – A file that thinks...
 
Schizophrenic files
Schizophrenic filesSchizophrenic files
Schizophrenic files
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
 
Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...
Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...
Joxean Koret - Interactive Static Analysis Tools for Vulnerability Discovery ...
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 

More from Ange Albertini

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formatsAnge Albertini
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formatsAnge Albertini
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formatsAnge Albertini
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiotAnge Albertini
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscapeAnge Albertini
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisionsAnge Albertini
 
Preserving arcade games
Preserving arcade gamesPreserving arcade games
Preserving arcade gamesAnge Albertini
 
Hide Android applications in images
Hide Android applications in imagesHide Android applications in images
Hide Android applications in imagesAnge Albertini
 
Let's play with crypto! v2
Let's play with crypto! v2Let's play with crypto! v2
Let's play with crypto! v2Ange Albertini
 
Schizophrenic files v2
Schizophrenic files v2Schizophrenic files v2
Schizophrenic files v2Ange Albertini
 
Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Ange Albertini
 

More from Ange Albertini (17)

Technical challenges with file formats
Technical challenges with file formatsTechnical challenges with file formats
Technical challenges with file formats
 
Relations between archive formats
Relations between archive formatsRelations between archive formats
Relations between archive formats
 
Abusing archive file formats
Abusing archive file formatsAbusing archive file formats
Abusing archive file formats
 
TimeCryption
TimeCryptionTimeCryption
TimeCryption
 
You are *not* an idiot
You are *not* an idiotYou are *not* an idiot
You are *not* an idiot
 
KILL MD5
KILL MD5KILL MD5
KILL MD5
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Beyond your studies
Beyond your studiesBeyond your studies
Beyond your studies
 
An introduction to inkscape
An introduction to inkscapeAn introduction to inkscape
An introduction to inkscape
 
Exploiting hash collisions
Exploiting hash collisionsExploiting hash collisions
Exploiting hash collisions
 
Infosec & failures
Infosec & failuresInfosec & failures
Infosec & failures
 
Preserving arcade games
Preserving arcade gamesPreserving arcade games
Preserving arcade games
 
Let's talk about...
Let's talk about...Let's talk about...
Let's talk about...
 
Hide Android applications in images
Hide Android applications in imagesHide Android applications in images
Hide Android applications in images
 
Let's play with crypto! v2
Let's play with crypto! v2Let's play with crypto! v2
Let's play with crypto! v2
 
Schizophrenic files v2
Schizophrenic files v2Schizophrenic files v2
Schizophrenic files v2
 
Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfalexjohnson7307
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Trusting files (and their formats)

  • 1. Ange Albertini - Hack.Lu 2015 Trusting files (and their formats)
  • 2. Ange Albertini reverse engineering & visual documentation @angealbertini ange@corkami.com http://www.corkami.comWelcome to my talk!
  • 3. My resume is a PDF. What could go wrong ? ;)
  • 4. ;)
  • 5. For some reason, many people are not motivated to open any files coming from me, so I made this to reward them ;)
  • 7. Yes, I write files by hand... [...and I open them in hex editors]
  • 8. %PDF-1. 1 0 obj << /Kids [<< /Parent 1 0 R /Resources <<>> /Contents 2 0 R >>] >> 2 0 obj <<>> stream BT /F1 110 Tf 10 400 Td (Hello World!) Tj ET endstream endobj trailer << /Root << /Pages 1 0 R >> >> ...like this one
  • 9. truncated signature missing parent /Type /Kids should be indirect missing /Font missing kid /Type missing /Count missing endobj missing /Length missing xref /Root should be indirect, missing /Size, missing root /Type missing startxref, %%EOF %PDF-1. 1 0 obj << /Kids [<< /Parent 1 0 R /Resources <<>> /Contents 2 0 R >>] >> 2 0 obj <<>> stream BT /F1 110 Tf 10 400 Td (Hello World!) Tj ET endstream endobj trailer << /Root << /Pages 1 0 R >> >> It’s not standard... INVALID?
  • 10. ...but it works exactly as planned! (without any reported error) ACCEPTED!
  • 11. File formats are my playground (and I'm beyond recovery already)
  • 12. Files or file formats? It's not a real question. Fichier sans format n'est que ruine de l'âme ;)
  • 13. To share information, you need to use a common standard.
  • 14. Forms & file formats moving to a different country == making a PDF/SNES polyglot same problems: all similar, but all different only difference: forms are rarely required to evolve
  • 15. Trusting files comes with trusting their format. Knowing that the specs will be useful and reliable.
  • 17. 1 3DES AESK AESK JPG JAR (ZIP + CLASS) PDF FLV PNG 2 I gave an entertaining presentation with many funky binary creations. Check it if you want more binary magic tricks ;)
  • 18. I wrote a technical paper classifying all my file format abuses: So far, I was (only) playing with file formats.
  • 19. Why? Why are all these abuses possible? What could we (try to) do about it? The big question...
  • 20. Now, I'm (also) in contact with people analyzing or designing file formats. I presented at a DigiPres con about Infosec: today, it's the other way around.
  • 21. There are different kinds of expectations for files.
  • 22. The end-user just wants to view external files, store his own information and re-use it.
  • 23. The developer relies on the specifications to add support in his library.
  • 24. The archivist wants to make sure that his data will be re-usable much later.
  • 25. The digital investigator looks for clues in a suspect's system.
  • 26. An attacker tries to craft dangerous files,
  • 27. while a defender wants to prevent it from happening.
  • 28. Common points We're blind believers: ● believing that we'll be able to reuse our information ● believing that in any case, we can just rely on the specs to help us, like a religious book. "the cult of the (useless) specs" ;)
  • 29. It's not just an Infosec problem. Bad specs make it harder for devs, DFIR, digipres, defenders...
  • 30. Theory: check official specs Reality: check unofficial specs & blog posts analyse/reverse libraries gather ITW (clean & malware) samples Does it ring a bell ?
  • 31. Bad specs are why attackers and DFIR devs can make so much money ;) It's not specs reading anymore, it's reversing.
  • 32. Not all abuse of file formats turn into exploits. But why should we only fix what's pwning you? "Short term fix" anyone?
  • 33. We just care about code, and "cyber attacks". Files tricks go under the radar. Usually… a few exceptions...
  • 34. Tavis Ormandy's ZIP/DLL polyglot exploit for Kaspersky
  • 35. Tavis Ormandy's "HTML in certificate" exploit for Avast
  • 37. If we don't understand how it really works, we can't: parse it, preserve it, tell if corrupted or malicious.
  • 38. Crafting a file format
  • 39. File format is not just "data structure" Protobuf / XML doesn't solve everything. They're just the high-level layer. Data structure need to be logical and make sense from a dev perspective. So at least, use a magic number/signature, and enforce version numbers, sizes... ;)
  • 40. Failure is still possible Office file format is a … filesystem! You can defragment it! And it has different kinds of FAT ;)
  • 42. A file format is not just an "algorithm" Your algorithm is great, but the file format will be the interface between your algorithm and all its users and other applications. finish your specs! double-check them! provide test cases!
  • 43. A file format is a map Every street should follow the same rules, Otherwise you must expect many violations. Wherever there is a 'surprise', bad things happen. Consistency ^ (Compatibility || Schizophrenia)
  • 44. "PSD makes inconsistency an art form" https://code.google.com/p/xee/source/browse/XeePhotoshopLoader.m?r=f16763d221dfca6253983824b470adf553a19e06#108 // At this point, I'd like to take a moment to speak to you about the Adobe PSD format. // insult to other bad formats, such as PCX or JPEG. No, PSD is an abysmal format. Having // PSD is not a good format. PSD is not even a bad format. Calling it such would be an // worked on this code for several weeks now, my hate for PSD has grown to a raging fire // that burns with the fierce passion of a million suns. // If there are two different ways of doing something, PSD will do both, in different // places. It will then make up three more ways no sane human would think of, and do those // too. PSD makes inconsistency an art form. Why, for instance, did it suddenly decide // that *these* particular chunks should be aligned to four bytes, and that this alignment // should *not* be included in the size? Other chunks in other places are either unaligned, // or aligned with the alignment included in the size. Here, though, it is not included. // Either one of these three behaviours would be fine. A sane format would pick one. PSD, // of course, uses all three, and more. // Trying to get data out of a PSD file is like trying to find something in the attic of // your eccentric old uncle who died in a freak freshwater shark attack on his 58th // birthday. That last detail may not be important for the purposes of the simile, but // at this point I am spending a lot of time imagining amusing fates for the people // responsible for this Rube Goldberg of a file format. // Earlier, I tried to get a hold of the latest specs for the PSD file format. To do this, // I had to apply to them for permission to apply to them to have them consider sending // me this sacred tome. This would have involved faxing them a copy of some document or // other, probably signed in blood. I can only imagine that they make this process so // difficult because they are intensely ashamed of having created this abomination. I // was naturally not gullible enough to go through with this procedure, but if I had done // so, I would have printed out every single page of the spec, and set them all on fire. // Were it within my power, I would gather every single copy of those specs, and launch // them on a spaceship directly into the sun. // // PSD is not my favourite file format.
  • 45. Not just specs A default open implementation? with test cases for the code, and free-licenced examples cases provided. Too many 'features from the specs' are never seen in the wild.
  • 46. Life of a file format 1. define a format (if possible) 2. implement it in your software 3. end :( if you're lucky: your software becomes standard along with its file format. That's all.
  • 47. Becoming a de-facto standard doesn't require anything: it's your niche market. No official requirements. Just business directions. no "long term plan"
  • 48. You end up with a standard that was never properly designed or documented in the first place. Have fun preserving it or making it secure!
  • 49. I wrote a simple "Hello World" PDF, that works on every reader. Yet, it's not 100% standard (only 99%) That's a bad start :(
  • 50. Thinking about bundling? Hint: don't. int bundle(trust){return trust--;}
  • 51. Evolution of a format (divergence)
  • 52. Evolution 1. Tool X creates bogus file 2. StandardTool adapts silently to support them 3. Now StandardTool goes beyond the specs Specs are now even more useless. Ex: ColorTrac scanners, PDF readers
  • 53. Implementations slowly diverge from the specs ⇒ the specs become theoretical and useless in the wild. Yet nothing exists to replace them.
  • 54. Once it's a standard, it's too late to fix it. Before it's a standard, no one really cares. And too few people care anyway ;)
  • 55. JPEG 1/2 JPEG (1992) is not a file format! Open source library: LibJPEG → that's great! LibJPEG goes beyond the specs: - recovers standard types of App0 chunks - including the one specific to Adobe - unnecessary functions (headless JPEG (!)) - "let's add this in case" ⇔ design by committee ?
  • 56. A JPG without a 'required' APPx segment
  • 57. JPEG 2/2 JPEG is ‘de facto’ libJPEG-turbo v6b. Explore corner-cases, and then you fail Adobe or Safari: ⇒ their test cases are not big enough
  • 58. Major problems (so many!) specs really come last: absent, or TBD incomplete specs: BPG, ZIP, PDF incoherent specs: PDF non-free specs.
  • 60. Take a fully working PDF.
  • 61. Change one byte at the wrong place (in the XREF) ⇒ OMG it's corrupted!
  • 62. But if you remove its XREF entirely, it now miraculously works, with just a (misleading) dialog on closing, that actually means: "we found some bugs, do you want to save as a valid but bloated file?"
  • 63. Standard programs typically embed a (silent) recovery mode. Nightmare for devs/defenders
  • 64. These modes try their best to recover "broken" files. Far beyond the specs.
  • 65. To improve security and format reliability: turn auto-recovery into dialog box warnings? or reject these files and log the error? That would make vendors act. "This file is not correct, please contact your vendor"...
  • 66. "helping" the end-user by triggering no warning? (even temporarily) OK What about identifying bad practices to make them stop eventually?
  • 67. Forcibly deprecate? Like crypto? Sounds good, but... Not going to happen: Broken crypto leads to fast and mass pwnage. Broken file formats mostly just lead to headache - no incentive to avoid that. Not enough "Android master key" bugs yet.
  • 69. I made extremely custom PDFs for each reader.
  • 70. These "extreme" PDFs fail on any other reader.
  • 71. Consequence We have 6 PDF reader 'standards' in practice: these may be extreme examples, but OTOH "Hello World" is not so complex "Nothing to fix" "Specs are subject to interpretation"
  • 72. PDF Schizophrenia? - Sumatra / Chrome-1 / Others - Chrome-2 / Others - Safari / Others - Poppler / Others It's not even funny anymore… ⇒ any unclear area may lead to schizophrenia
  • 73. PDF = portable? Most readers are okay to read 'standard' docs. any advanced functions? Adobe Reader (printing, forms, JavaScript, 3D). Also, no more Linux version.
  • 74. PDF, a clean standard? Non-free specs. Only the "standard" 1.7 doc is free. No free examples. Incomplete + missing specs no shareable samples
  • 75. Non-free specs? No free sample-set? And you wish to stay a "standard" in 2015?
  • 76. PDF for archiving? PDF/A already has 8 sub-standards Adobe Preflight is not very updated ⇒ Preservation is not a business model, nor a legal requirement of any kind. How long before "support is discontinued"?
  • 77. PDF 2.0 No new security stuff, specs are now 170 CHF. New printing features, new insecure features: embedding files anyone?
  • 78. http://www.pdfa.org/2015/10/whats-unique-about-pdf/ I'm not so sure about it - after all, we're killing Flash for security reasons.
  • 79. A (tiny) ray of hope VeraPDF.org: open source PDF/A validator.
  • 80. Preservation portable compiler + toolchain portable source no OS dependency at all ? preservation via closed-source software? ⇒ "emulation as a service" has a great future :(
  • 81. ZIP archives already made for multiple floppy support.
  • 82. 1 2 3 Because it's awkward and suboptimal for modern standards, there are now 3 ways ITW to parse ZIP (can be abused like in the Android Master Key bug)
  • 83. ZIP (1989) is still updated. ZIP added AES, LZMA, 64 bits, Unicode. But still this awkward obsolete structure? Why not just reorder structures, enforcing values, and slowly preventing abuses ? Not re-inventing the format, just forking it. Do we still need floppy support?
  • 84. Seriously Do we still really need Tape Archives? Floppy-oriented, backward-parsed ZIP? Any generated PDF that doesn't have its magic at offset 0? FTR: OpenSSL still supports WinCE, BeOS…. Windows bitmap fonts are stored as 16 bits NE executables (copyright 1989).
  • 85. Pure digital preservation New documents are born digital: the problem is shifted: the 'master' copy already depends on: source+compiler+toolchain+(OS+CPU).
  • 86. A PDF with a JPEG-compressed script
  • 87. JPEG, but not an image? It's not against the specs, but it was removed without any warning nor tracking. ⇒ breaks backward compatibility If your document was using it, now it's broken. If this document is born digital, you lost your source document.
  • 88. Backward compatibility Everywhere. In case, you never know. The customer is always right. Perhaps except for security things ;) Our kids will probably ask us one day why we kept all these things for so long...
  • 89. Windows compatibility Windows is becoming progressively (but silently) more strict for the PE format, slowly killing several packers. Have you heard anyone complaining? (the official PE doc still totally sucks though)
  • 90. breaking backward compatibility It's ok if it's for valid reasons, but keep track of changes, enforce version numbers, and update the specs accordingly at the same time! Nowadays, a file format is an evolving entity for security reasons, not something sacred written in stone
  • 91.
  • 92. Multiple formats is not the problem: we have different needs. But documentation never reflect reality in any case.
  • 93. There are many benefits to know definitively what a valid file is or isn't.
  • 94. Cleaning up Terse Executable is a cleaned-up version of the Portable Executable (but for UEFI, not to replace it). Only example of forking that makes sense? We just stack features...
  • 95. There's no standard for file format specifications different style of writing, may be incomplete unclear, non free...
  • 97. PE.corkami.com: my own collection of hand-made executables and "documentation" (completely free).
  • 98. Some of these failed a lot of software...
  • 99. Consequence? ● 'corkami-proof' software ● raises the bar for everyone ● become a hub of knowledge ○ "I can't share the sample", but from the knowledge, my own file will be shared ⇒ even useful for the original contact
  • 102. We probably have to witness the burning of a digital "Library of Alexandria" before we change anything. (because money)
  • 103. No matter the kind of format, we can't trust files: "specifications" ? more like gentle introductions! Or maybe something like religious texts (with philosophical suggestions) not accurate descriptions of reality.
  • 104. Many more file abuses will come! It doesn't get you any bug bounty, but plenty of new classes of abuse to discover: compression, network, cryptography, file systems...
  • 105. Rules of thumb ● abuse your own format ○ double-check your specs -- with a twisted mind! ● open-source, unit-tested library ● consistency, technical common sense ● stop stacking features!
  • 106. How you can help? test-case binaries ● share your testing suite ● fuzzing results (seen from code coverage) ⇒ raises the bar for all industries
  • 107. A format evolves ● deprecate! ● enforce version numbers ● make it public we can set open ultimatum for crypto, we should do the same for bad files.
  • 108. Ack Phil Paul Arindam Jacob Alex Christophe Travis Tavis Sergey Kurt Gabor Miki Gyn Mat Bart Max …
  • 110. Corkami: 10 years! time to evolve ! More PoCs, posters, book(s)... + some side projects ⇒ no more [personal] presentations for now
  • 111. FAQ: "do you have any recommended PDF reader" Only Adobe Reader handles complex documents and functionalities. Other are more or less equivalent. Not a very satisfying answer, I know ;)