1. INTERPOLIQUE
(OR, THE ONLY GOOD DEFENSE IS
THROUGH A BRUTAL OFFENSE)
Dan Kaminsky, Chief Scientist
Recursion Ventures
dan@recursion.com
2. ANNOUNCEMENT
This is my new company. Woot.
Recursion productizes significant research
It’s time to do things a little differently
This talk isn’t a sales pitch for Recursion, but it’s an idea
regarding its philosophy
3. A STORY
Design flaw in SSL
The server thought it was resuming, the client thought it
was connecting
Project Mogul spawned to fix it
Several months in deep secrecy
Thousands of hours spent on IETF fix
The fix broke <1% of servers
No big deal, right?
4. REALITY
“Note that to benefit from the fix for CVE-2009-3555
added in nss-3.12.6, Firefox 3.6 users will need to
set their security.ssl.require_safe_negotiation
preference to true. In Mandriva the default setting is
false due to problems with some common sites.” –
Mandriva Patch Notes
They thought knocking out a few sites was acceptable
for a remediation
They were wrong
5. THE BAD NEWS
We give bad advice
Pen testers are very good at breaking things
Our “remediation” advice tends towards myopia
We consider only our own engineering requirements
We assume tools are static, and bash the craftsman
6. THE GOOD NEWS
We are the keys to there actually being good advice
We are the one community that actually knows how
things break
We hold the knowledge to end the bugs we keep seeing
8. A SIMPLE QUESTION
When I log into two SSH servers, do I need to worry
about one accessing the other?
No
When I log into two web sites, do I need to worry
about one accessing the other?
Yes
Why?
Because SSH does not have totally broken session
management
9. SIMPLE THINGS, SIMPLY BROKEN
The web was never designed to have authenticated
resources
Auth was bolted on (because Basic/Digest never got
fixed)
Normal Mechanism For Managing Credentials
Password causes Set-Cookie
Cookie sent with each query to target domain
Cookie is sent even with requests caused by third
party domains
User’s credentials are mixed with attacker’s URL
This is why most XSS/XSRF attacks are dangerous
Cross Site Scripting and Cross Site Request Forgery
wouldn’t be nearly the big deal they are if they didn’t work
cross site
10. THE PEN TESTER REACTION:
DEV, DO MORE WORK
XSRF Tokens
Manually add a token to every authenticated URL
Requires touching everything in a web app that
generates a URL
How’s that working out for us?
This seems to be a lot of work
If/when we come back six months later, it’s not usually
done, is it?
11. A MODEST PROPOSAL
Couldn’t the tools be better?
The big debate: Should SVGs animate?
Unsaid: Shouldn’t it be possible to easily log into a
web site without other sites being able to use your
creds?
12. AN ATTEMPT
A fix that requires no change to the browser is
better
So I tried to find one
Server Side Referrer Checking
Client Side Referrer Checking
Window.Name Checking
Window.SessionStorage Checking
It says SessionStorage! Surely it’s perfect for Session
Management!
They all failed
Thank you Cstone, Kuza55, Amit Klein, David Ross,
SirDarckcat
13. WHEN FAILURE IS SUCCESS:
OUR PROBLEM WITH LATENCY
My suggested defenses were defeated early in
development
We, as a community, have a latency problem
We don’t break during development
We don’t break at release
We don’t break when early adopters are deploying
We break only when it gets really popular
By then, it’s in customer hands, and the best we can do is
give the customers really expensive advice on how to fix it
We need to close the feedback loop
14. AT MINIMUM
Whatever’s going on with other defenses, I want
mine to be thoroughly, even brutally audited as
soon as possible
Life is too short to back broken code!
Session Management will require modifications to
the browser
Something else might not…
15. ON LANGUAGES
"The bottom-line is that there just isn't a large
measurable difference in the security postures from
language to language or framework to framework --
specifically Microsoft ASP Classic, Microsoft .NET,
Java, Cold Fusion, PHP, and Perl. Sure in theory
one might be significantly more secure than the
others, but when deployed on the Web it's just not
the case.”
--Jeremiah Grossman, CTO, White Hat
Security (a guy who has audited a lot of web
applications)
Question: Why aren’t the type safe languages
safer against web attack than the type unsafe
languages?
16. WE AREN’T ACTUALLY USING THEM
Reality of web development
HTML and JavaScript and CSS and XML and SQL and
PHP and C# and…
“On the web, every time you sneeze, you’re writing in a
new language”
How do we communicate across all these
languages?
Strings
And how type safe are strings?
Not at all
17. ALL INJECTIONS ARE TYPE BUGS
select count(*) from foo where x=‘x' or '1'='1';
The C#/PHP/Java/Ruby sender thinks there’s a string
there.
The SQL receiver thinks there’s a string, a concatenator,
another string, and comparator, and another string
there.
The challenge: Maintaining type safety across
language boundaries
18. ISN’T THIS A SOLVED PROBLEM?
Escaping?
Parameterized Queries?
19. NO ESCAPE
$conn->query(“select * from foo where x=“$foo”;”);
Is this secure or not?
Who knows, depends on whether $foo has been
escaped between when it first came in on the wire, and
when it’s being passed into the DB
This simple line of code is expensive to debug!
If somebody removes the escape(), the code still
works
“Fails open”
20. ACCIDENTAL ESCAPE
What does it mean to escape?
“Block Evil Characters”
Was very easy to determine evil characters when we just had
ASCII
Only 256 possible bytes
Unicode changes that
Millions of characters
All of which could mutate (“best fit match”) into one another
All of which have multiple possible encodings, and
representations within encodings
Escaping works by accident, without a solid contract
Keeps getting updated
escape(), escapeURI(), escapeURIComponent()
21. WHAT ABOUT PARAMETERIZED QUERIES?
Which would you rather write?
$r = $m->query(“SELECT * from foo where
fname=‘$fname’ and lname=‘$lname’ and
address=‘$address’ and city=‘$city’”);
$p->prepare(“SELECT * from foo where
fname=‘$fname’ and lname=‘$lname’ and
address=‘$address’ and city=‘$city’”);
$p->set(1, $fname);
$p->set(2, $lname);
$p->set(3, $address);
$p->set(4, $city);
$r = $m->queryPrepared($p);
22. REALITY OF PARAMETERIZED QUERIES
No developer has ever written a parameterized
query without a gun to his head
We should know
We hold the gun
25. HOW INJECTIONS HAPPEN /
HOW DEVS LIKE TO WRITE CODE
String Interpolation:
select count(*) from foo where x=‘$_GET[“foo”]';
String Concatenation:
“select count(*) from foo where x=”“ +
$_GET[“foo”] + “”;”;
Why they write code this way
Devs are thinking inline
They want to be writing inline
See: Fitts’ Law
26. IS IT POSSIBLE…
…to let devs write inline code, without exposing the
resultant strings to injections?
Yes – by making String Interpolation smarter
RETAIN: The language still sees the boundary between the
environment(“select * from…”) and the variable ($_GET…).
TRANSLATE: Given that metadata, the language can do
smarter things than just slap unprocessed strings together
(This overlaps with, and extends, Mike Samuel’s
excellent “Secure String Interpolation” work, seen at
http://tinyurl.com/2lbrdy.)
Working with Mike
32. WHAT’S GOING ON
Language interpolators are blind – they just push
strings into strings
So we write custom interpolators – the dev puts in what
he wants, the compiler sees what it needs
33. WHAT TO INTERPOLATE INTO
Parameterized Queries are an obvious target
Programmer writes:
select * from table where fname=^^fname and
country=^^country and x=^^x;
Interpolique expands:
$statement = $conn->prepare("select * from table where
fname=? and country=? and x=? ");
$statement->bind_param("s", $fname);
$statement->bind_param("s", $country);
$statement->bind_param("s", $x);
35. BASE64: ESCAPING DONE RIGHT
Programmer writes:
select * from table where fname=^^fname and
country=^^country and x=^^x;
Interpolique expands:
select * from table where
fname=b64d("VEhJUyBJUyBUSEUgU1RPUlkgQUx
MIEFCT1VUIEhPVyBNWSBMSUZFIEdPVCBUVVJ
ORUQgVVBTSURFIERPV04=") and
country=b64d("d2Fzc3Nzc3Nzc3Nzc3Nzc3NzdXA=
") and x=b64d("eXl5eXk=") ;
36. WHY THIS WORKS
Type safe going into b64d() function
That’s never getting interpreted as anything but a string
Type safe coming out of b64d() function
B64d() cast to return a string
Not a subquery, not a conditional, not anything other
than a string
B64d() a MySQL UDF that’s already written, has no
apparent time penalty, will be released with Interpolique
Most other databases already have B64 support
In a pinch, could use MySQL hex/unhex
37. TWO MODES OF BASE64
Late binding
Interpolation inserts the Base64 handler
Text is plain until right before it crosses the frontend/backend
layer
SQL looks like this:
select * from foo where x=^^foo;
Early Binding
Base64 the variable as soon as it comes in off the HTTP
request
SQL looks like this:
select * from foo where x=b64d($foo);
Pen testers: If somebody fails to escape $foo,
everything still works. If somebody fails to Base64
Encode $foo, everything breaks immediately
40. BASE64 IN THE OTHER DIRECTION
<span id=3520750
b64text="Zm9v">___</span><script>do_decode(35
20750)</script>
Create a SPAN with a random ID and a dynamic
attribute that contains its base64’d content
Call do_decode with that ID, which can now look up the
element in O(1) time
Use this construction to retain streamability
Thank/Blame CP for this
41. DOM INTERACTION: SIMPLE
Push to textContent
ob = document.getElementById(id); ob.textContent =
Base64.decode(ob.getAttribute("b64text"));
We never go through the browser HTML parser
42. DOM INTERACTION: COMPLEX
Push to appropriate createElements
ob = document.getElementById(id);
raw = Base64.decode(ob.getAttribute("b64text")); safeParse(raw,
ob);
HTMLParser(src, {
start: function( tag, attrs, unary ) {
…
if(tag == "i" || tag == "b" || tag == "img" || tag == "a"){
el = document.createElement(tag);
…
Basic idea is to have a simple HTML parser that extracts what it can,
creates elements according to whitelisted rules, and importantly,
never goes through the browser HTML parser
See also: “Blueprint”, a system that moves all DOM
generation to JS
http://www.cs.uic.edu/~venkat/research/papers/blueprint-
oakland09.pdf
43. IMPORTANT NOTE
Security Is Quantized
There’s a set of elements that can be safely exposed
There’s a set that can’t
The game is to expose only those tags and attributes
that don’t expand to arbitrary JS
Either you have prevented wishing for more wishes, or
you have not
(We see this from the webmail attack surface)
44. HOW THIS WORKS
Primary Mechanism: Eval
Yes, there’s risk here, and yes we’re going to talk about that
risk – we need this for scoping reasons
Programmer written query: select * from table where
fname=^^fname and country=^^country and x=^^x;.
To Eval: return ("select * from table where fname=b64d("" .
base64_encode($fname) . "") and country=b64d("" .
base64_encode($country) . "") and x=b64d("" .
base64_encode($x) . "") ;");
Eval Out: select * from table where
fname=b64d("VEhJUyBJUyBUSEUgU1RPUlkgQUxMIEFCT1
VUIEhPVyBNWSBMSUZFIEdPVCBUVVJORUQgVVBTSURFI
ERPV04=") and
country=b64d("d2Fzc3Nzc3Nzc3Nzc3Nzc3NzdXA=") and
x=b64d("eXl5eXk=")
45. CAN WE OPERATE WITHOUT EVAL?
No Eval in Java or C#
One approach: Combine variable argument functions
with string subclass tagging
public bwrap w = new bwrap();
w.s(w.c("select * from foo where x="), argument1, w.c("and
y="), argument2);
If you forget to mark the safe code, it breaks
Another approach:
w.code(“select * from foo where
x=“).data(argument1).code(“and
y=“).data(argument2).toString()
Similar to LINQ etc. but actually works for arbitary grammars
If you mismark code as data, or vice versa, it breaks
Both actually implemented! (Tiny HOPE Announce)
46. THE STATUS QUO
We see this doesn’t work:
String s = “select * from foo where x = ”“ + escape(s) +
“”;”;
By doesn’t work: It is too similar to this:
String s = “select * from foo where x = ”“ + s + “”;”;
Devs mess this up, but the code works anyway
As a matter of principle, devs will do enough
work to make the code function
If it works, it should work securely
If it isn’t working securely, it shouldn’t be working at all
The trick is to not make it easier to get around the security,
than it is to do things right
47. WHY CUSTOM INTERPOLATORS ARE HARD:
THE ANCIENT SCOPE WAR
Lexical Scope: Scope Known At Compile Time
Variables are “pushed” into child scopes
Dynamic Scope: Scope Determined At Run Time
Variables are “pulled” by child scopes
Lexical scope has won, and has systematically
removed methods that allow any code to access
variables not explicitly pushed in
This makes it rather difficult to write a function that sees
^^variable and thus deferences that variable
There are silly “superclass” or “parent” modifiers in some
languages, but they’re all special case
In Java and C#, they went so far as to leave local variables
unnamed on the stack, so you couldn’t just hop into previous
stack frames and dereference from there!
48. TO BE CLEAR
Yes, there is risk to eval, and we’ll be talking about it
Yes, there are very nice and very good reasons for lexical
scope to be the default state
The fact that the vast majority of programming
languages, type safe or not, are repeatedly found to
expose injection flaws is a direct sign that something
is wrong
Put simply, language design needs to be informed by the
bloody findings of pen testers
It is informed by performance engineers
It is informed by usability engineers
Memory safety didn’t come from security engineers, it came from
reliability engineers
I think we need a way to write functions that execute in present scope
49. YES, THIS MEANS
(LISP) (WAS) (RIGHT)
(((NOT ABOUT EVERYTHING)))
(((THEY ( HAD A POINT ( HERE ))))
Crazy theory
JavaScript has been successful because it’s been able
to mutate to absorb almost any language construct
“More dialects of JavaScript than Chinese”
50. RISKS
There are three things that can go wrong with any
defensive technology
It doesn’t work
None of this mealy mouthed, “well, it depends on what your threat
model is”
Either it does what it says, or it doesn’t!
It doesn’t work in the field
Security: It is too easy to screw up
It has side effects
Fails other first class engineering requirements (too slow,
unstable, hard to deploy, etc)
I am looking for destructive analysis on these
techniques, and will accept criticism on any of the
above fronts
Here is what I know so far
51. THE HANDLERS APPEAR RELATIVELY SOLID
No known SQL Injection bypasses for Base64 into a
b64d() function
Using a fast base64 decode – could be flaws here
Could be databases that don’t type-lock return values
No known flaws when putting arbitrary text into a
span.textContent field
Well, except it doesn’t work in IE Will port to its wonky
DOM
Most testing is in Firefox -- Could be problems in
Chrome/Safari, Opera, etc.
No known flaws when creating arbitrary DOM elements
and populating them, rather than pushing HTML
IE6 is apparently slow at this
Need to enumerate the full set of tags which are safe to put
into HTML
52. EVAL ADDS SOME RISK
Don’t buy that a PHP server is safer if it isn’t
running eval
Month of PHP Bugs = PHP not safe against any
arbitrary PHP, eval or not
Eval in this context can make programmer errors
more severe
Correct: eval(b(“select * from foo where x=‘^^x’”));
Incorrect: eval(b(“select * from foo where x = ‘$x’;”));
Before we had SQLi. Now we potentially have front end
code execution!
This is why it’s now ^^foo instead of $!foo
53. MANAGING RISK OF EVAL
b() can be smarter
It can be aware of strings that break out of string-
returner
It can be aware of SQL grammar, to the point that in
order to write a right hand variable, it must be ^^’d
Select * from foo where x=^^x and y=safe(1);
It can even be self-auditing – in PHP, it can use
debug_backtrace() to find the line that called it, and
validate that that line doesn’t have an unsafe language
deref
54. WHAT ONLY SORT OF WORKS
“Requiring” Single Quotes
In some languages, ‘$foo’ doesn’t interpolate, while
“$foo” does
So, the thinking is, require eval(b(‘$foo’))
This is a policy that cannot be enforced by present
compilers or languages (both ‘$foo’ and “$foo” turn into
a string in the parse tree)
Could be enforced by a preprocessor
At large shops, significant improvements in security are won
by blocking otherwise legal expressions as a coding policy
Not convinced that smaller shops can/should absorb
55. PERFORMANCE
Eval is slower than compiled code
Translating strings could be a major pain point in some
languages
Easy to cache the translation (because we retain the
boundary, accessing the normalized query form is trivial)
Could potentially parameterize/accelerate more,
because it’s suddenly easy for the framework to
autorecognize repeated queries
Base64 is fast
Slight bandwidth increase, but nothing compared to
URLEncoding
56. ANYTHING ELSE?
I don’t know.
Hope: There’s about two months till Black Hat.
Lets find out!
This isn’t a recommendation yet
Clearly what we are doing right now is not working
Lets find out the best things we can do with the present
languages
Lets find out what we’d want from future languages
It’s time we got involved in the discussion of what
software looks like