SlideShare a Scribd company logo
1 of 88
Download to read offline
Black Ops
  Dan Kaminsky
  Chief Scientist
      DKH
dan@doxpara.com
     (2012)
Another Year, Another Talk
 Good News and Bad News
  Good News: We‟re going to fix this thing.
    We have no choice. The global economy is based on
     Information Technology being trustworthy.
    An economy where you have to be big enough to field a
     “cyber army” in order to participate, is a broken economy
     indeed
       It‟s not like the big guys are doing a great job defensively
  Bad News: We‟re not going to fix it according to
   dogma.
    How‟s the status quo working out for us?
    There are many alternatives to dogma that are even worse
    How do we find those that are better?
A Riddle
 What is the fundamental difference between
 attack and defense?
Answer
 When an attack doesn’t work, you can tell.
 Offense has an inherent quality filter
   “Put up or shut up”
   Doesn‟t mean there aren‟t bugs in offensive
   disclosure
     The “Oracle Critical” is just an unencrypted transport – if it‟s
      a bug, then Wireshark is dropping hundreds of 0day
     Press Will Report Anything
   But it‟s not the same
The Reality Of Defense
 Too much dogma
 Not enough science
 “You have to defend against every bug” -> “That‟s
  impossible” -> “You don‟t have to show you‟ve defended
  against anything”
   Critiques of defenses aren‟t much better – nobody is
    measuring or critiquing effectiveness
 So this is a talk about skepticism and the processes of
  finding effective defenses to the real and legitimate
  threats we cannot ignore
   You shouldn’t agree with everything I‟m going to present
   My goal is to show you some new ideas, and give you a
    framework to consider them as worthwhile or not
   This is the only way we‟re going to get defense to work
     Lest you think there‟s nothing concrete here…
The Fundamental Test
 Take 2000 systems with a defense.
 Take 2000 systems without.
 Come back in six months, and manually audit all 4000 systems.
 Is there or is there not a statistically significant difference in the
  infection rate?
 Even if we don‟t do the above, let us at least respect a gold
  standard when we see one!
 The time may come when we spend as much money on
  security research as we do on medical research.
     Medicine took hundreds of years to become scientific, and they
      had dead bodies to motivate them
     We don‟t have dead bodies, or hundreds of years.
     We still need to fix these problems.
     Some vendors out there care along these lines. Reward them!
The Three Heads Of The Security
Hydra
   1) The inability to authenticate
   2) The inability to write secure code
   3) The inability to bust the bad guys
   What we‟re not talking about today
     Authentication – DNSSEC – no time, ask me in private (or wait
      a few months)
     Busting the bad guys
       Remarkable lack of consensus regarding which bad guys are most
        important
       I tend to worry about the Aurora attack, which involved espionage
        against (lets face it) the entire Fortune 500, and against those raiding
        SMB payrolls, because that calls into question the very viability of
        SMB
          Others have different priorities
 What we are talking about
     The inability to write secure code
An immediate clarification
 It‟s not that it‟s impossible to write secure code
   It‟s not impossible to deploy X.509 PKI
   It‟s not impossible to bust the bad guys
 It’s just plainly and utterly improbable
   At least in most organizations
 “Possible is not enough. Probable or bust.”
What are we looking at today?
 How do we address timing attacks?
 How do we generate random numbers?
 How do we suppress SQL Injection?
 How do we detect network manipulation?
 How do we scan the Internet?
 These are all things that are possible today.
 How do we make them more deployable, less
 expensive…more probable?
Timing Attacks
 Many systems are modeled in terms of just what data
  they send
   Not in terms of when they send it
   Sometimes data leaks – security sensitive data
 Possible to distinguish 15-100 microseconds of
  latency over Internet, and 100 nanoseconds of
  latency over LAN (1000 samples)
   Opportunities and limits of remote timing attacks (Scott A
    Crosby , Rudolf H. Riedi , Dan S. Wallach)
 Possible to exploit string comparison functions in
  widespread scripting languages, thus breaking HMAC
  compare (OpenID/OAuth)
   Exploiting timing attacks in widespread systems
    Nate Lawson and Taylor Nelson @ Black Hat 2010
The Proposed Fix
 Any time values need to compared in a security
 critical context, compare them in constant time (so
 that there‟s no correlation between what‟s
 compared, and how long it takes)
   public static boolean isEqual(byte[] a, byte[] b) {
    if (a.length != b.length) { return false; }
    int result = 0;
    for (int i = 0; i < a.length; i++) {
      result |= a[i] ^ b[i]
     }
     return result == 0;
    }
 Looks good, right?
The Problem
 You have to remember to do this everywhere
  there’s a security critical comparison
   You don‟t get to do it all the time, because the
    performance impact is too high
   You thus must actually identify all the security
    critical comparisons
 It‟s possible. But it‟s not probable.
A Solution?
 I seem to note that distinguishing against Internet
  noise yields less accuracy (15,000-100,000ns) than
  LAN noise (100ns)
   That‟s three to four orders of magnitude!
   And Internet noise is not actually random
 What if we actually did have a random delay?
   tc qdisc change dev eth0 root netem delay
    3ms 1ms
     “For all packets emitted from the first Ethernet interface, add a
      random amount of lag between 1,000,000ns and 3,000,000ns”
     “Boltzmann Filter”
   At minimum, the LAN should be as secure as the
    Internet. Maybe Internet attackers also are impacted.
   This is a lot easier to deploy. That really matters.
     But does it work?
What Could Go Wrong?
 “All timing noise can be averaged out eventually, so a global random
  delay can‟t work”
    Pretty much all password comparisons are done with non-constant time
     compares, so I guess all passwords are vulnerable?
    Here‟s some SSH 0day
         sys_auth_passwd(Authctxt *authctxt, const char
     *password){…
      /* Encrypt the candidate password using the proper
     salt. */
     encrypted_password = xcrypt(password,
     (pw_password[0] && pw_password[1]) ? pw_password :
     "xx");
     return (strcmp(encrypted_password, pw_password) == 0);
    Strcmp is not constant time. So, you just offline brute force for
     passwords that have certain characters and see how far you get.
 It is highly unlikely that the above “attack” actually works
    Nanosecond differentials are too small to recover
    Maybe not locally…hmmm…
What We Really Need To Know
 How much timing noise, of what nature, will
  permanently obscure how much timing signal beyond
  the point of infeasible return?
   Somewhere between “1 nanosecond” and “1 day” there
    is an amount of noise that will indefinitely obscure an n
    nanosecond differential
     There‟s likely to be an equation here
     “CSI Enhance” has its limits
   There is a limit to how much lag we can ask for, from the
    performance guys
     It is higher for some requests than for others
     We might require more lag than perf is willing to give (at least
      in general)
     Need to discover these numbers
What could actually go wrong
 The distribution of lag from the interface may be easy
  to filter
   Quantized into 1ms chunks?
   Gaussian when it should be uniform, or uniform when it
    should be Gaussian
   Could be filterable thanks to TCP timestamps (which
    have ~10ms accuracy, but also have sharp edges)
 All of the above can be fixed, the question is if they
  need to be
   The perfect (constant time comparisons) is the enemy of
    the good (interface-wide jitter)
      Jitter does not need to apply to all packets; could be a TCP
       setsockopt or whatnot
      Could also be applied at the end of a php script
Another Day, Another Time
 “RSA is broken!”
   No, not the thing with the smartcards that would
    (maybe, depending on vendor) leak their private key
   No, not the thing with the SecureID seeds that were stolen
   The thing with certificates with easily breakable RSA keys
     Something like 1 in 200 RSA keys on the Internet failed!
     Hughes and Lenstra had first announce, Nadia Heninger had parallel
      research
 At the time, the break was blamed on RSA itself
   Two primes in RSA (p and q)
   If either is repeated (p and q1, p and q2), then all are easy to
    derive
     Euclid‟s Greatest Common Denominator
   “RSA is bad!”
Reality
 Bad random number generators create trapdoor
 functions in all cryptosystems
   Rather than breaking the crypto, you guess the key
   Basic concept of 2011‟s Phidelius (expanded a
   password into a pseudorandom stream, which was
   then used to feed a key generator for
   RSA/DSA/ECC).
     “Bad RNG isn‟t a bug, it‟s a feature!”
 They thought they‟d shown RSA was bad
 They actually showed that RNG‟s are still broken
   Debian‟s bug wasn‟t just Debian‟s
   Weren‟t operating systems supposed to fix this?
Theory
 “Collecting and providing entropy is hard; let the
  operating system do it for you”
   /dev/random for good bits, /dev/urandom for best effort
    bits
   If /dev/random runs out of bits, block until more are
    found
 Sources for entropy
     Hardware RNG
     Keyboard
     Mouse
     Disk Rotation (as impacted by air)
 Problem: Lots of environments don‟t have any of that
Actual Environments
 Desktops
   Humans w/ keyboards and mice
   Often disks
 Servers
   Sometimes have disks
 VMs
 Embedded devices
The Reality of Hardware RNG
 It‟s just not there.
 Yes, I know Ivy Bridge is coming out with a
  Hardware RNG. In 2012.
   That‟s top of the line gear now.
 Yes, I know some TPM‟s are reported to have
  Hardware RNGs.
   For some reason, people treat TPM hardware as
    unstable radioactive gunk
   It‟s also rarely in embedded kit
What‟s Happening: An Analogy
 Proteins causes cancer
   http://ukpmc.ac.uk/abstract/MED/3007842/reload=0;jsessionid
    =3X3Cs6G7VbyRT1xEPcUX.4
 Carbohydrates cause cancer
   http://www.smh.com.au/lifestyle/diet-and-fitness/high-
    carbohydrate-diet-tied-to-cancer-20110616-1g4o9.html
 Fats cause cancer
   http://www.telegraph.co.uk/health/healthnews/5650141/High-
    fat-diet-can-increase-risk-of-deadly-cancer.html
 Alcohol causes cancer
   http://pubs.niaaa.nih.gov/publications/arh25-4/263-270.htm
 So you don‟t consume proteins, carbohydrates, fats, or
  booze.
   You starve to death.
What Actually Happens
 How do I know? I actually asked some devs.
 1) They have some code that depends on
  /dev/random
 2) On initialization of their embedded device, the code
  tries to generate a key.
 3) There‟s no human at the keyboard, no hand at the
  mouse, no disk to spin, and no hardware RNG.
  /dev/random blocks. The device is a brick.
   Quite literally, starving for entropy
 4) At best, they switch to /dev/urandom. At worst they
  switch to rand() and then they ship.
   /dev/urandom is underseeded, though, and is still broken
A comparison
 What perfectionists think will happen:
   It‟s broken! Sure they‟ll demand hardware RNG!
 What developers actually do:
   Security failed us again. Lets ship something that
    works.
 Perfectionism caused (at least) 1 out of 200
 RSA keys on the Net to be easily broken
   It‟s almost certainly worse than that
      Those are just the keys we can easily detect
   We can do better.
TrueRand: An Old Hack [0]
 Why do we like measuring keyboard and mice?
  Humans and computers are not synchronized
  Humans do not operate on nanosecond clocks like
   computers do
    Human is slow clock, CPU is fast clock
 Any system with two clocks, has a Hardware
 Random Number Generator
  Even if the error is one part per million, that‟s a bit
   per second per megahertz
  The error is generally much larger than a part per
   million, just from thermal noise
    (Not just thermal noise)
TrueRand: An Old Hack [1]
 What TrueRand (from Matt Blaze and D.P. Mitchell, in
  1996) does
   Run the CPU in a tight loop (count++);
   Every 16ms, fire an interrupt
     On interrupt, shuffle the count variable, and integrate it into a
      buffer
     The entropy comes in here – timer is slow clock, CPU is
      fast clock
   After 11 shuffles, return the buffer as an integer
   Hash two buffers together using sha1, return only the
    first byte
 It ain‟t bad. But it‟s disowned.
   That‟s too bad, because it would have prevented (at
    least) 1/200 keys from being broken.
Why is it disowned?
 (Literally – Matt Blaze was vaguely horrified that
  I‟m revisiting this code)
 Perfectionism
   “We can‟t model its behavior. We don‟t know how
    good or bad it is, so we shouldn‟t do it at all.”
 This attitude has actually led to a reduction in
  available entropy in the Linux kernel
   Used to look at interrupt counts from various
    devices
   Now they aren‟t used, because they “might be
    polluted”
DakaRand 1.0 [0]
 An update to the old model
 Multiple generators
   Sleepers: Measure usleep with
     CLOCK_MONOTONIC
     CLOCK_REALTIME
     RDTSC (on X86 platforms)
       CPU counter – there are equivalents for ARM, MIPS
   Incrementer: See how many times we can
   increment an integer within a certain time period
   (100% CPU)
DakaRand 1.0 [1]
  RTC: Measure interrupts from the realtime clock
  using CLOCK_MONOTONIC (dedicated IRQ!)
   128hz
   8192hz
  Threads: Measure the status of an integer
  modulated by a runaway thread (100% CPU)
   Anyone who thinks computers are completely deterministic
      creations has never written threaded code ;)
     Two Threads, One Int (one adds, one subtracts, main polls)
     Two Threads, Two ints (both add, main compares)
     One Thread, One Int (one adds, main polls)
     Possible addition: Noisier functions than add
DakaRand Flow
 Short version
   Push all bits into a SHA-256 Hash
     Don‟t undercount entropy
   Only count them as entropy when they pass Von
   Neumann‟s debiasing check
       Count 1‟s to decide whether 0 or 1
       Throw away 00 and 11, count only 01 and 10
       Actually insert a 0 or a 1 when you count a bit
       Don‟t overcount entropy
   Scrypt (time/memory hard function) the resulting SHA-
   256 value
     Make it miserable to guess entropy
   Use the output of Scrypt as the input to AES-256-CTR,
   emit the resulting stream
Attacking DakaRand
 The game: Find a platform (Desktop/Server/VM/Embed) or
  an OS under which DakaRand provides poor entropy in one
  of its modes
 Userspace/Hypervisor Scheduling
    We‟re only called some number of times per second
    These times per second may be at predictable intervals
    If sufficiently predictable, they‟ll bias the output
       Will they simultaneously and identically bias both clocked entities?
 Autoclocking
   If you time something against itself, you‟re going to have a bad
     time
       Clocks are highly correlated to themselves
    RTC and CLOCK_MONOTONIC could be the same underlying
     timer in a VM
    VMs, more than anything else, should be exposing a random
     device (even if the random device itself uses clock differentials)
    Still, this code seems to still work on VMs
The VM Cloning Issue
 /dev/random keeps bits around for a long time
 When you clone an image, you end up with those bits
  being static for a long time
   Meaning you keep generating the same entropy for a long
    time 
 DakaRand attempted guarantee: Each read is atomic
   The results of the read may be used across multiple images
   But two separate calls at two separate times MUST yield two
    uncorrelated streams
 Can‟t do anything after the read is fully completed
 During the read (which does last a second, due to scrypt)
  is already after
 I actually don‟t think you can do better than this, though I
  was considering XORing the keystream with /dev/urandom
  anyway
Is The Underlying Use Of Crypto
Safe?
 Modified Von Neumann
   We absorb a tremendous amount of data into our hash structure
     that has obvious patterns
    If you have 100GB of 0‟s and 128 bits of actual randomness,
     output of hash has 128 bits of randomness
    We do explicitly include the 0 and 1
 Stream Function vs. Raw Output
   Lots of raw output from a function tends to leak external state
   So lets not leak external state.
 Cryptographic Stream Function
   RNG‟s tend to have their own family of functions that are distinctly
     not cryptographically validated
      Mersenne Twister, not AES-256 in Counter Mode
    Is it in fact the case that strong (not RC4) cryptographic functions
     encompass all properties of RNGs?
      Well, what does dieharder say?
DieHarder CipherSuite Test
 About 16,000 CPU hours of DieHarder Entropy Tests
  was run across 21 ciphers, with inputs of either 16MB
  of zero or (the same) 16MB of /dev/urandom output
   About 24,000 different tests per cipher/content class
   Thanks, Jamie Schwettman, who did all the work to
    make this sweep happen
 No obvious statistical leanings to the data
 Machine learning people are taking a look
   Thanks, Prior Knowledge, Aleks Jakulin!
   No conclusive findings yet
   Releasing this data too
Neat tool – want it?
csql: run SQL against CSV files
 $ cat pass2.csv | head -n 20000 | ./csql - "SELECT cipher,
    content, test, subtest, count(pv), avg(pv) from c group by cipher,
    content, test, subtest;" | head -n 10
   aes-128-cbc,urandom,dab_bytedistrib,0,10,0.0
   aes-128-cbc,urandom,dab_dct,256,10,0.47393035
   aes-128-cbc,urandom,diehard_2dsphere,2,10,0.627572674
   aes-128-cbc,urandom,diehard_3dsphere,3,10,0.664239991
   aes-128-cbc,urandom,diehard_birthdays,0,10,0.50850473
   aes-128-cbc,urandom,diehard_bitstream,0,10,0.017056331
   aes-128-cbc,urandom,diehard_count_1s_byt,0,10,0.441374983
   aes-128-cbc,urandom,diehard_count_1s_str,0,10,0.538731369
   aes-128-cbc,urandom,diehard_craps,0,20,0.0394997795
   aes-128-cbc,urandom,diehard_dna,0,10,0.396250338
Kernel Recommendations
 /dev/random MUST not block.
    Make an IOCTL if you must
    Return data slowly if you like
    CryptGenRandom on Windows does not appear to block
      1 out of 200 RDP keys are not likely to be corrupt
 Don‟t be so shy about interrupt sources
   Care less about interrupt counts than interrupt timings
   ftrace exposes microsecond timings, which might not be fine
    grained enough
   Use nanosecond arrival times, as much as possible, from devices
    on foreign busses. The slower the foreign device is, the better.
      You want to be measuring slow clocks against fast clocks
      By definition, the kernel is interrupted at finer grain than userspace.
   Obviously you don‟t have to include every last interrupt – it takes
    time to check the time.
 Maybe consider this Modified Von Neumann construction
From The Bottom To The Top
 Our biggest problems in security do not revolve
  around Random Number Generation
 They revolve around languages
 Language Theoretic Security: The hypothesis that
  security vulnerabilities are the consequence of the
  languages code is written in
   Coined by Len Sassaman and Meredith Patterson
   “Sapir-Whorf is true for code”
   Corollary: If language got us into this mess, language
    can get us out
   More important corollary: Languages are spoken or
    written by humans. Ignore their needs at your peril.
The Shift
 One way to look at language theoretic security is
 through the lens of computability theory
   Different classes of code have different amounts of
    “power”, and communication should be limited to
    the least amount of power necessary
   Attacks expands power from Declarative to through
    Regular Expression through “Turing Complete”
   This is indeed a valid lens
 Another lens
Diagramming Sentences:
IT WAS ACTUALLY USEFUL
Injection Vulnerabilities:
When Trees Disagree
 Parsers, almost by definition, turn streams of bytes into
  trees
   Injection Vulnerabilities exist when a sending language and a
    receiving language (which may or may not be the same)
    disagree on the nature of the tree sent
   An extreme case of this is when bytes flow out into
    surrounding memory
   But SQL Injection, LDAP Injection, XSS, etc are all just
    situations where (generally) the sender thought it sent the
    user‟s data, but the receiver thought it received a peer‟s code
     A purely declarative language can still (easily) be injected into, and
      complexity can remain declarative and still yield damage. The attack
      is not in the increase of complexity, but in the transition of content from
      one identity/context to another through parse tree differentials.
 So what?
We have to stop injection
vulnerabilities
 They‟re killing us
 They‟re not l33t
 They‟re totally effective
 They‟re the vast majority of vulnerabilities ever
  written and discovered
 We haven‟t actually fixed them
   If we did fix them, they wouldn‟t still be costing
    billions of dollars
 [Yes, we‟re going to revisit Interpolique…it‟s OK,
  we‟re going to bash it too]
What is the importance of another
theoretical model?
 It declares the rules of the game.
   1) We want to synchronize parse trees.
   2) We want developers to actually use our method.
     A language unspoken has a term: A dead language
 It explains what is surprisingly not understood
   Why did XML become popular?
     Instead of spending months figuring out just how to say
      hello, they have their code, you have your code, and it‟s self
      describing strings in each direction. No fiddly “the eighth bit
      on the fourth byte changes everything”
   Why did JSON become popular?
     XML invented its own modes of being fiddly
The Hard Truth
 Developers are in charge.
   Not architects (they love ASN.1 and XML and WS-
    ZOMG)
   Not academics (they love Haskell)
   Not management (they love money)
     Money is made by
      performance, reliability, maintainability, features, rapid
      development
     Money is later lost by security, maybe
   So, not us.
 What is the #1 thing developers like?
   Code working
Thus, the biggest explanation
 Why is PHP so popular?
   If you don‟t think it is, see here:
 What is PHP incredibly good at?
   Copy and paste code…and it works
     We understand that CPAN makes PERL
     We don‟t understand that PHP sample code
      makes PHP
   Java Alternative: Look how much
   code my IDE can write for me!
     Copy and paste with a suit on
The Language Success Metric
 What are the odds, if I try this, that it will work?
   Not, when it fails, it fails fast!
 Surprisingly, nobody tracks this metric
   (Except maybe Processing, which is incredible)
   That‟s why all the successful languages tend to be
    the brainstorms of one guy
   Art is science before we know what we‟re doing 
   PHP beats your favorite language
   If we want to fix security, here is a good place to
    work
What‟s Wrong With ORMs?
 Object Relational Models
   Problems with SQL Injection? Don‟t use SQL!
    Instead, the database just looks like your favorite
    language‟s native objects.
   Great, right up until the moment you need to make
    a query.
Look at this. It matters.
 +[,+[-[>+>+<<-]>[<+>-]+>>++++++++[<-------->-]<-[< [-
  ]>>>+[<+<+>>-] <[>+<-]<[<++>>>+[<+<->>-]<[>+<-
  ]]>[<]<]>>[-]<<<[[- ]<[>>+>+<<<-]>> [<<+>>-
  ]>>++++++++[<-------->-]<->>++++[<++++ ++++>-]<-
  <[>>>+<<[ >+>[-]<<-]>[<+>-
  ]>[<<<<<+>>>>++++[<++++++++>-]>-]< <-<-]>[<<<<[-
  ]>>>>[<<<<->>>>-]]<<++++[<<++++++++>>-]<<-
  [>>+>+<<<-]>>[<<+ >>-]+>>+++++[<----->-]<-[<[-
  ]>>>+[<+<->>-]<[>+ <-]<[<++>>>+[<+<+ >>-]<[>+<-
  ]]>[<]<]>>[-]<<<[[-]<<[>>+>+<<<-]> >[<<+>>-]+>------------
  [<[-]>>>+[<+<->>-]<[>+<-]<[<++>>>+[<+<+>>-]< [>+<-
  ]]>[<]<]>>[-]< <<<<------------->>[[-]+++++[<<+++++>>-
  ]<<+>> ]<[>++++[<<+++++++ +>>-]<-]>]<[-
  ]++++++++[<++++++++>-]<+>]<.[ -]+>>+<]>[[-]<]<]
BrainF*ck‟s Rejoinder
 There are more things in this world broken by
  punctuation than just BrainF*ck.
 Compare.
     $result = from('$name')->in($names) ->where('$name =>
      strlen($name) < 5') ->select('$name');
        32 characters of punctuation, deeply interspersed
     $result = query(“SELECT $name FROM $names WHERE
      length($name)<5”);
        12 characters of punctuation (with large gaps)
     Which would you rather write?
 There‟s a reason SQL persists after all these years.
  It‟s really expressive and surprisingly without noise.
   Put another way: It‟s a language that‟s shockingly good
    for structured queries.
 Turns out this matters.
The Classics
 Escape?
   mysql_real_escape_string – really? 25 characters?
   Bigger problems: Fails open – code still works if it‟s just missing
      “Greppability” is huge – you can‟t grep for a missing escape!
    Escapes are a blacklist. When‟s the last time you saw a blacklist
     work properly?
 Parameterization
   First you declare a template for a query
   Then you link individual variables to the template, on a positional
     basis
      “This is the first argument”
      “This is the second argument”
      MAYBE, if you‟re lucky, your language supports argument aliases.
           “The argument marked with :name should get the value of the variable
            „name‟”
    One line of code becomes many
    Resources need to be synced
Reality
 Nobody has ever written a parameterized query without a gun to
  their head. We know, we hold the gun.
   Even secure code, when audited, tends to be “safe things written
    quickly” and “we realized this was unsafe so we parameterized it”
   That you have to threaten people with getting fired, is itself a
    data point.
 For some strange reason, databases don‟t seem to provide
  mechanisms to disable unparameterized queries entirely
   More interestingly, it‟s a crapshoot whether you get to
    parameterize at all
      Just try to parameterize “SELECT”.
   SQL, for all its elegance, builds a remarkably complex parse tree
    out of a mostly unpunctuated string
      Some nodes in the parse tree can be filled by functions, some can be
       parameterized, etc.
      It‟s a decent RNG to know what you can get away with 
Interpolique [0]
 Released in 2010 at HOPE
 Concept for eliminating injection attacks while
  retaining “dangerous” (but developer preferred)
  coding styles
   Both SQLi and XSS
 Basic idea
   “SELECT * FROM foo where x=$x and y=$y”
   Humans can pretty easily see the separation between
    code and data. Data begins with $. Code does not.
   The language throws that data away and just smashes
    strings together.
   Does it have to?
Interpolique [1]
 The original approach for Interpolique
   First, use an alternate syntax to identify the desired
    variables
     “SELECT * FROM foo where x=^^x and y=^^y”
   Then, create a function that returns the code we‟d have
    liked the developer to write.
     $stmt = $conn->prepare(“SELECT * FROM foo where x=? and
      y=?”);
      $stmt->bind_param(“ss”, $x, $y);
      $stmt->execute();
   Finally, evaluate the generated code
     eval(b(“SELECT * FROM foo where x=^^x and y=^^y”);
     Eval is, surprisingly, the only way to retrieve the values of $x
      and $y from inside the function b().
What‟s Wrong With Interpolique?
 What if the dev writes:
   eval(b(“SELECT * FROM foo where x=$x and
    y=$y”);
   If $x and $y are attacker controlled, he‟s not far
    from an eval that will run code in PHP‟s context!
   The b() function is in a position to defend the code
    that ultimately enters eval, but now you‟re entirely
    dependent on b() knowing what PHP will do given
    arbitrary bytes.
     GOOD LUCK WITH THAT
   Highly greppable error case, but it‟s pretty scary
Building A Safe Interpolique
 Eval only exists so that variables from the calling
  scope can be dereferenced
 One approach is to implement
  create_selfscoped_function()
   Returns a function that always runs in the scope of its
    parent
   Could implement “proxies” so it can only read variables,
    and can‟t rewrite
 $rows=$mysql_safequery(“select * from foo where
  x=^^x and y=^^y”);
 Requires a patch to PHP -- Daniel Zulla is working on
  this!
Code Rewriting?
 If we know what we would have liked developers to
 have written, why don‟t we just transform code once?
   Never really been a fan of this
     Have you ever audited autogenerated code? 
   What do you do when the code looks like:
    $z = “SELECT * from foo where x=$x and y=$y;”;
    $rows = mysql_query($z);
   Static analysis can of course find such situations (thus
    knowing $x came in from a HTTP variable) but most
    devs don‟t have access to such static analysis tools
     Should they?
Tainting
 What if we actually marked every character that came in from an
  HTTP query as “tainted”?
   Metadata, on a character by character basis
   Would survive passing from function to function
   Might even survive reasonable mangling by built in filters
 Then, you could write something like:
  mysql_query_safe(“select * from foo where x=$x and y=$y;”);
   Even though $x and $y would expand, the wrapper function
    would see that those particular characters were once tainted with
    the “mark of the web”, and could rewrite the unsafe query around
    it
   This still works with mysql_query_safe($x) when $x was
    assembled elsewhere, even concatenated;
 Could have problems with silent failure with filtering functions
 Requires a patch to PHP – Daniel Zulla also working on this
SuperEncoding as Explicit Tainting
 Based on discussions with Zane Lackey and Nick Galbreath at
  Etsy, based on an approach they’re already running in
  production
 What if all variables from the web, were encoded in a whitelisted
  format?
   Simple hex encoding -- &%41 – which, coincidentally, renders as
    an A in any HTML parser
 All non-DB access would have to go through accessors
   r($x) to read, w($x) to write
   Surprisingly easy to grep for access that isn’t wrapped
 Could do two things
   mysql_query_safe($x) could simply treat all superencoded
    characters as “data” and parameterize accordingly
   mysql itself could have its lexer modified to handle HTML
    encoding, exposing such characters to less of the SQL parser
    (“this is just a string”) – very LangSec
A Last Minute Alternative
 Perhaps we‟ve got this backwards
 Rather than tainting data as data, we mark code as code.
   SQL tends not to be passed around from function to function, let
    alone parsed in the frontend
   $sql = c(“select * from foo where x=“);
    $sql += $x;
    $sql += c(“and y =“);
    $sql += $y;
   Then either mysql_query_safe or mysql itself (cowardly) refuses
    to execute anything with unmarked code
      Or, if this is baked into MySQL, it just doesn‟t see bytes as code if they‟re
       not deeply marked as code
 Moderately greppable – you‟re basically finding all SQL in your
  code and wrapping it with some sort of taint
   Either implicit as per Zulla, or explicit as per Etsy
   Most likely failure mode is an attacker controlled variable
     somehow getting inside of c(“”);
This is what LangSec means
 “What are people trying to say?”
 “How can we make it easier to say that?”
 “How hard will it be for people to migrate?”
 “What errors will they make when trying to use
  this?”
 “Can we limit how much code might contain a
  bug?”
 CARE ABOUT YOUR DEVS OR THEY WILL
  NOT CARE ABOUT YOU
What‟s Going On With The Web?
 It doesn‟t matter what code you write, if there are
  parties in the middle changing or blocking what
  you send
 Content alteration and blocking is becoming a
  real thing
   Verizon is claiming the first amendment right to
    rewrite Internet connections
   Entire countries are silently blocking web pages
     Indonesia‟s blocking a million porn sites in the run up to
      Ramadan
What Went Wrong With N00ter
 N00ter was a really fun (and really powerful) mechanism
  for detecting network manipulation
   Allowed a remote server and a cooperating client to “pretend”
    to have a conversation with anyone on the Internet, using any
    protocol
   To any MITM, it would look like a real, unmodified
    conversation
     So any alterations that might normally hit the real server, would hit this
       too
 Unfortunately, N00ter does a lot of very low level
  packetcrafting, meaning (realistically) it requires custom
  hardware in front of user machines
   This is not fun to deploy 
     Especially if you need to get between NAT and actual network
      connection
   Not impossible. Definitely improbable.
What Else Can We Use?
 Executable code on the client
   OONI-Probe
 Web Pages with Iframes
   Herdict (“Herd Verdict”)
   Needs either user cooperation, or a Chrome
   extension, to know if content is up or down
 Is it possible to determine whether content is up
 or not, from just a web page?
   Can we crowdsource censorship data?
     Maximize data per user
     Minimize installation load per user
Imaging
 Browser‟s Same Origin Policy usually prevents web
  pages from doing much with one another
   You wouldn‟t want Yahoo able to read from your Gmail
    account
 But there is one exception
   Any domain is allowed to load any other domain‟s
    images
   Beyond that, it‟s allowed to know that the load was
    successful
     Not merely that there was a file at that location, but that it was
      actually an image
     You even get image dimensions (which you‟d have to, because
      it resizes the page)
 If a domain is being censored, the image will not load
   What one image is on most domains?
Favicon.ico
 (It‟s the picture to the left of Google in the tab)
So this is CensorSweeper
(Also by Joseph Van Geffen and Michael Tiffany)
Written for Wall Street Journal Data Transparency Hackathon
What‟s going on…
 img = new Image();
  img.onload = function(event) { }// render favicon
  img.onerror = function(event) { validate(); }
  img.src = “http://somesite.com/favicon.ico”
 The above is done in parallel, reading from a list
  of sites that have confirmed presence of
  favicon.ico
 Six failures are required before a “bomb” is
  dropped on the map
Error Handling
 Six failures isn‟t actually enough!
 Web browsers provide remarkably little feedback to a
  developer to know what‟s failing, and why
   Put simply, “flow control” hasn‟t really been implemented
    for the web
   Everything‟s been designed around infinite bandwidth
 For reliability, going to need to shut down all other
  traffic, and then do two simultaneous lookups
   One for a known-up site, the other for the supposedly-
    down site
 That being said, CensorSweeper works pretty well
   Can we do better?
Sockets
 Once upon a time, web browsers could act like
 proxies, giving you connections anywhere
   There were bugs in Flash and Java; we fixed them
   They can now only create connections to IP
   addresses that invite them
 But ~20% of the time there are transparent
 proxies between web servers and their users
   See “Staring into the Abyss” by me, or “Socket
   Capable Browser Plugins Result In Transparent
   Proxy Abuse” by Bob Augur
 This has been known…but not explored for
 mapping censorship!
HTTP Censorship Detection
 1) Using Flash (or HaXe) Create a HTTP
 connection back to your own IP on port 80
   Host a socket policy file, so Flash allows this
 2) Request anything, from any domain
   If the request comes to you, there is no transparent
    proxy
   Otherwise, the request will be hijacked by the proxy,
    serviced, and sent back to your Flash app
   You now see what that user would see, if they
    browsed to that site! You can then submit it back to
    yourself.
HTTPS Certificate Extraction
 Just as HTTP traffic on 80/tcp is hijacked, so may HTTPS
  traffic on 443/tcp
   MITM may have an alternate certificate for you
   But (if you‟re careful) it can‟t tell the difference between the
    browser starting SSL, and Flash/HaXe starting SSL
   It has to know which domain to pretend to have a certificate for
     The proxy can parse the Server Hello, with its certificate
        (It‟s your server saying hello)
     The proxy can parse the Client Hello, with its Server Name Indication
        (It‟s your Flash app saying hello)
     You can actually host the real Facebook certificate, or even proxy the
      real Facebook SSL endpoint
        Hard to keep track of all of Facebook‟s IPs
        It has to forge the certificate, before you have to prove you actually
         have Facebook‟s private key (assuming you aren‟t proxying)
Slight Annoyance
 No normal way, via Browser DOM, to determine the
  certificate that provided content
   This at least allows a page to query for its exposed
    certificates – kinda cool!
 Limitations
   You can test anyone‟s certificate, as long as the attacker
    isn‟t interposing themselves via DNS hijacking
     The Flash app sees what‟s at the named IP; if hijacking is at
      the DNS layer, then Flash won‟t get hijacked
   You are able to test your own certificate, but then the
    attacker has already MITM‟d you and can alter your
    security validation layer
Full Proxying
 One of the goals of N00ter was seeing if everyday content was
  being altered or slowed down
 One of the headaches with these custom probes is writing these
  custom probes
   How do you look just like a real web browser trying to access
    YouTube?
   Answer: Be a real web browser trying to access YouTube
 The last time we played with Flash and Sockets, we created a
  full VPN
 But now sockets are limited to a single destination…
   It turns out that it may still be possible/useful to proxy an entire
    browser (at the server) down to the Flash app (in the client),
    which will then make open connections back to the server who
    will proxy them to the rest of the Internet
   This will allow, at minimum, a protocol correct sequence of
    messages for HTTP and HTTPS that are only incorrect by
    destination IP
      So basically, if the intercepting server doesn‟t care about IP correctness,
       you get to interrogate its ruleset with no installed code on the client 
Last but not least:
Scanning Networks Quickly
 Actionable Intelligence: What can an attacker do
  today, that he couldn‟t do yesterday, for what class
  attacker, to what class victim?
   Rather related to this: How many potential victims are
    out there?
 I‟ve run two major scans this year (that I‟ve talked
  about)
   Telnet
     Determining presence of Telnet Encryption support
     Answer: Very rare
   RDP
     Determining presence of open RDP access
     Answer: VERY common
My Process
 Once upon a time, simply flooding TCP SYNs
  was enough to find out what was out there
 Nowadays, many, many IP addresses will three
  way handshake, but there won‟t actually be
  anything there
 Solution: Split process
   1) Identify candidate IP addresses, that are listening
    on a given port
   2) Given a candidate, actually connect to the IP
More Detail
 Candidate collection
   For each IP, incrementing the first byte
    first, (1.1.1.1, 2.1.1.1, 3.1.1.1…), send a TCP SYN
    on the required port (23 for telnet, 3389 for RDP)
   In a separate window, log TCP SYN|ACKs with
    tcpdump
     tcpdump –w log 'tcp[tcpflags] = (tcp-syn|tcp-ack)'
     Scanrand was being buggy, this maximized logging
 Candidate Inspection
   Telnet Encryption – nmap team whipped up a quick
    check, so I just fed the IP list to it
   Very few found
RDP Sweep: Black Mamba
 Probably the most pleasant environment for reasonable scale TCP
  probing ever devised
   http://rootfoo.org/blackmamba
 from blackmamba import *
  def get(host, port=80):
      msg = "GET / HTTP/1.1rnHost: %srnrn" % host
      yield connect(host, port)
      yield write(msg)
      response = yield read()
      yield close()
      print response
  def generate(host, count=100):
      for i in range(count):
          yield get('example.com')
  run(generate('example.com'))
 You end up getting ~3000 IPs a second
   May need to increase ulimit –n
   May need to alter hardcoded limits in blackmamba.py
Can We Get Faster?
 Always wanted to write a userspace TCP stack
   HD Moore kinda kicked me into working on one for critical.io,
    his mysterious new scanning project
     I am not at all beyond being motivated by other people‟s awesome
      and mysterious projects
     Especially when they give me CPU and Network Bandwidth
   So. Scanrand3! A new scanner that doesn‟t just flood SYNs,
    but actually connects to every node and extracts data
 Original plan: TCP stack with SQLite as the backend
   “SELECT * FROM sockets WHERE data_sent!=data_acked
    and data_sent_time-now()>3” (to find sockets where a
    retransmit is needed) is just funny!
   SQLite, in memory-only mode, is really really fast
     160K inserts/sec fast
   Unfortunately, that speed disappears when you add indexes
     20K inserts/sec with two indexes 
New Plan: Let The Servers Keep
TCP State

Details! Details!
 Scanrand didn‟t get its speed by keeping track of who it did
  or didn‟t send traffic to
   Why should Scanrand3?
 1) Send SYN
   Maximum Segment Size==1460
   Window Size==1460 (for all packets)
 2) Upon receiving a SYN|ACK, reply with an ACK
   Include “GET / HTTP/1.0” payload
   Yes, you can put a payload in the initial ACK!
 3) Upon receiving an ACK, if there is a payload, ACK it
   Save the payload
 4) Upon receiving a FIN|ACK, RST
   Save the payload, if any
No Local State
 If the first SYN is dropped – OK, nobody‟s around to
  retransmit it
   May want to log RST|ACK to avoid future retransmits
 If the SYN|ACK is dropped to the client, server
  retransmits SYN|ACK
 If the ACK w/ initial payload is dropped to the server,
  server retransmits SYN|ACK, causing new ACK w/
  payload
 If any ACK w/ response payload is dropped to the
  client, server will retransmit ACK w/ response payload
   Same with FIN|ACK
   Window size of 1460 means we always know which
    particular packet to acknowledge – only one in flight
    (usually)
Performance
 Relatively unoptimized code on a well hosted but
  underpowered server (cheap Dual Opteron)
 50-80K servers/sec w/ full payloads
 3.25M IPs takes 60-80 seconds, retrieves about
  800MB of content
 Task is embarrassingly parallelizable across
  threads, databases, etc.
   Should be able to use multiple bpf filters to route packets
    to their appropriate thread with kernel filtering
   Writing to a SQLite DB, and then backing up to disk, is
    really fast (substantially faster than fwrite, though
    haven‟t tested a large mmap yet)
   You basically reassemble payloads in SQLite as a
    postprocess
Security
 Scanrand pioneered inverse SYN cookies – you protect
  against spoofed responses by validating fields in the
  response against hashes of data plus a secret only you
  know
 16 bits in source port + 32 bits in sequence number are
  possible
   May be able to get another 32 bits out of TCP
    Timestamps, which are usually supported
   Haven‟t implemented yet, so very easy to poison me 
   Sequence space becomes less secure, the more data you
    actually send
     You do know the exact size of each payload, so you can say “I only
      accept responses with no payload seq, payload 1 seq, payload 2
      seq, etc”
     Technically the other said can ACK at any byte offset, but that doesn‟t
      mean they actually will
Some Notes
 Kernels have actually gotten kind of fast
 Non-blocking connect() plus epoll should be able
 to get pretty fast
   Certainly easier to code for that model!
   Didn‟t work for me (not sure why)
 This approach ultimately becomes fastest
   Probably need a “writev” call to spew many packets
   w/o a write for each
More Notes
 Can also try more efficient stores than sqlite
   Giant allocation of RAM with fixed offsets per IP
   MemSQL
     Neat project by ex-facebookers – compiles SQL to C++
     They think even with the indexes they can do +100K

 Can have merged approaches too
   Only start keeping state if I like the response from
   the server
 Note that stateless client + stateless server = no
 retransmits 
What should the coding model be?
 Flat file / command line?
 C?
 JavaScript?
 Lua?
   Could implement support for nmap scripts
Most Important Feature
 Blacklist support
   Most networks don‟t mind getting swept
     They certainly are, already
   Some do
     Part of being a whitehat is you let people know who you are,
      and listen to their requests
   So you end up with a pile of IP ranges not to sweep
     It can actually take a substantial amount of CPU if you
      check the list naively
     Need to compile it into a quickly queriable structure
     I don‟t think firewall rules apply to spoofed traffic
Simple Architectural Note
 Don‟t try to interact with the Linux firewall
   Just pick another IP on the LAN and send from their
   Respond to ARP traffic for it
   (Yes, it is an advantage of the socket model that
    you don‟t need to requisition another IP)
Whew!
 Lots of stuff!
 Hope you enjoyed!
 This may not be how you try to fix stuff…but it‟s
  what I try to do 
   Thanks to everyone cited in the slides
   Thanks also to
    Nick, Johnny, Blackstock, Alex, Allessandra, Allessa
    ndra, and Andrew of The Sub for putting up with me
    in DEFCON mode ;)

More Related Content

What's hot

A Technical Dive into Defensive Trickery
A Technical Dive into Defensive TrickeryA Technical Dive into Defensive Trickery
A Technical Dive into Defensive TrickeryDan Kaminsky
 
Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017Dan Kaminsky
 
Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)Dan Kaminsky
 
Design Reviewing The Web
Design Reviewing The WebDesign Reviewing The Web
Design Reviewing The Webamiable_indian
 
Black ops of tcp2005 japan
Black ops of tcp2005 japanBlack ops of tcp2005 japan
Black ops of tcp2005 japanDan Kaminsky
 
I Want These * Bugs Off My * Internet
I Want These * Bugs Off My * InternetI Want These * Bugs Off My * Internet
I Want These * Bugs Off My * InternetDan Kaminsky
 
Phreebird Suite 1.0: Introducing the Domain Key Infrastructure
Phreebird Suite 1.0:  Introducing the Domain Key InfrastructurePhreebird Suite 1.0:  Introducing the Domain Key Infrastructure
Phreebird Suite 1.0: Introducing the Domain Key InfrastructureDan Kaminsky
 
Move Fast and Fix Things
Move Fast and Fix ThingsMove Fast and Fix Things
Move Fast and Fix ThingsDan Kaminsky
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingDan Kaminsky
 
Bh fed-03-kaminsky
Bh fed-03-kaminskyBh fed-03-kaminsky
Bh fed-03-kaminskyDan Kaminsky
 
232 md5-considered-harmful-slides
232 md5-considered-harmful-slides232 md5-considered-harmful-slides
232 md5-considered-harmful-slidesDan Kaminsky
 
Dmk sb2010 web_defense
Dmk sb2010 web_defenseDmk sb2010 web_defense
Dmk sb2010 web_defenseDan Kaminsky
 
Keynote - Closing the TLS Authentication Gap
Keynote - Closing the TLS Authentication GapKeynote - Closing the TLS Authentication Gap
Keynote - Closing the TLS Authentication GapSecurityTube.Net
 
Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?Rob Fuller
 
SSL: Past, Present and Future
SSL: Past, Present and FutureSSL: Past, Present and Future
SSL: Past, Present and FutureLuis Grangeia
 
DDoS mitigation in the real world
DDoS mitigation in the real worldDDoS mitigation in the real world
DDoS mitigation in the real worldMichael Renner
 

What's hot (20)

A Technical Dive into Defensive Trickery
A Technical Dive into Defensive TrickeryA Technical Dive into Defensive Trickery
A Technical Dive into Defensive Trickery
 
Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017Wo defensive trickery_13mar2017
Wo defensive trickery_13mar2017
 
Dmk bo2 k8
Dmk bo2 k8Dmk bo2 k8
Dmk bo2 k8
 
Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)Black Ops of TCP/IP 2011 (Black Hat USA 2011)
Black Ops of TCP/IP 2011 (Black Hat USA 2011)
 
Dmk bo2 k8_bh_fed
Dmk bo2 k8_bh_fedDmk bo2 k8_bh_fed
Dmk bo2 k8_bh_fed
 
Design Reviewing The Web
Design Reviewing The WebDesign Reviewing The Web
Design Reviewing The Web
 
Black ops of tcp2005 japan
Black ops of tcp2005 japanBlack ops of tcp2005 japan
Black ops of tcp2005 japan
 
I Want These * Bugs Off My * Internet
I Want These * Bugs Off My * InternetI Want These * Bugs Off My * Internet
I Want These * Bugs Off My * Internet
 
Phreebird Suite 1.0: Introducing the Domain Key Infrastructure
Phreebird Suite 1.0:  Introducing the Domain Key InfrastructurePhreebird Suite 1.0:  Introducing the Domain Key Infrastructure
Phreebird Suite 1.0: Introducing the Domain Key Infrastructure
 
Move Fast and Fix Things
Move Fast and Fix ThingsMove Fast and Fix Things
Move Fast and Fix Things
 
Bh eu 05-kaminsky
Bh eu 05-kaminskyBh eu 05-kaminsky
Bh eu 05-kaminsky
 
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of TryingShowing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
Showing How Security Has (And Hasn't) Improved, After Ten Years Of Trying
 
Bh fed-03-kaminsky
Bh fed-03-kaminskyBh fed-03-kaminsky
Bh fed-03-kaminsky
 
232 md5-considered-harmful-slides
232 md5-considered-harmful-slides232 md5-considered-harmful-slides
232 md5-considered-harmful-slides
 
Dmk sb2010 web_defense
Dmk sb2010 web_defenseDmk sb2010 web_defense
Dmk sb2010 web_defense
 
Keynote - Closing the TLS Authentication Gap
Keynote - Closing the TLS Authentication GapKeynote - Closing the TLS Authentication Gap
Keynote - Closing the TLS Authentication Gap
 
Dmk bo2 k8_ccc
Dmk bo2 k8_cccDmk bo2 k8_ccc
Dmk bo2 k8_ccc
 
Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?
 
SSL: Past, Present and Future
SSL: Past, Present and FutureSSL: Past, Present and Future
SSL: Past, Present and Future
 
DDoS mitigation in the real world
DDoS mitigation in the real worldDDoS mitigation in the real world
DDoS mitigation in the real world
 

Similar to Black ops 2012

Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:Recursion Ventures
 
The ultimate privacy guide
The ultimate privacy guideThe ultimate privacy guide
The ultimate privacy guideJD Liners
 
Data Privacy for Activists
Data Privacy for ActivistsData Privacy for Activists
Data Privacy for ActivistsGreg Stromire
 
Mongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons LearnedMongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons LearnedStanford University
 
ITNW 2164 ResearchPaper
ITNW 2164 ResearchPaperITNW 2164 ResearchPaper
ITNW 2164 ResearchPaperManuel Garza
 
Recognizing security threats
Recognizing security threatsRecognizing security threats
Recognizing security threatsKishore Kumar
 
New text document
New text documentNew text document
New text documentsleucwnq
 
New text document
New text documentNew text document
New text documentsleucwnq
 
Opsec for security researchers
Opsec for security researchersOpsec for security researchers
Opsec for security researchersvicenteDiaz_KL
 
Keynote fx try harder 2 be yourself
Keynote fx   try harder 2 be yourselfKeynote fx   try harder 2 be yourself
Keynote fx try harder 2 be yourselfDefconRussia
 
Anton Chuvakin on What is NOT Working in Security 2004
Anton Chuvakin on What is NOT Working in Security 2004Anton Chuvakin on What is NOT Working in Security 2004
Anton Chuvakin on What is NOT Working in Security 2004Anton Chuvakin
 
Deja vu security Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...
Deja vu security   Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...Deja vu security   Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...
Deja vu security Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...adamdeja
 
CERT Data Science in Cybersecurity Symposium
CERT Data Science in Cybersecurity SymposiumCERT Data Science in Cybersecurity Symposium
CERT Data Science in Cybersecurity SymposiumBob Rudis
 
Nick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs securityNick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs securityDevSecCon
 
Stop Wasting Your Time: Focus on Security Practices that Actually Matter
Stop Wasting Your Time: Focus on Security Practices that Actually MatterStop Wasting Your Time: Focus on Security Practices that Actually Matter
Stop Wasting Your Time: Focus on Security Practices that Actually MatterAmazon Web Services
 
Mac security - a pragmatic guide
Mac security - a pragmatic guideMac security - a pragmatic guide
Mac security - a pragmatic guideJason Norwood-Young
 

Similar to Black ops 2012 (20)

One Time Pad Journal
One Time Pad JournalOne Time Pad Journal
One Time Pad Journal
 
Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:
 
The ultimate privacy guide
The ultimate privacy guideThe ultimate privacy guide
The ultimate privacy guide
 
128 BIT WHAT?
128 BIT WHAT?128 BIT WHAT?
128 BIT WHAT?
 
Data Privacy for Activists
Data Privacy for ActivistsData Privacy for Activists
Data Privacy for Activists
 
Mongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons LearnedMongoose H4D 2021 Lessons Learned
Mongoose H4D 2021 Lessons Learned
 
ITNW 2164 ResearchPaper
ITNW 2164 ResearchPaperITNW 2164 ResearchPaper
ITNW 2164 ResearchPaper
 
Security
SecuritySecurity
Security
 
Recognizing security threats
Recognizing security threatsRecognizing security threats
Recognizing security threats
 
New text document
New text documentNew text document
New text document
 
New text document
New text documentNew text document
New text document
 
Opsec for security researchers
Opsec for security researchersOpsec for security researchers
Opsec for security researchers
 
Keynote fx try harder 2 be yourself
Keynote fx   try harder 2 be yourselfKeynote fx   try harder 2 be yourself
Keynote fx try harder 2 be yourself
 
Dr. Alan Shark
Dr. Alan SharkDr. Alan Shark
Dr. Alan Shark
 
Anton Chuvakin on What is NOT Working in Security 2004
Anton Chuvakin on What is NOT Working in Security 2004Anton Chuvakin on What is NOT Working in Security 2004
Anton Chuvakin on What is NOT Working in Security 2004
 
Deja vu security Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...
Deja vu security   Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...Deja vu security   Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...
Deja vu security Adam Cecchetti - Security is a Snapshot in Time BSidesPDX ...
 
CERT Data Science in Cybersecurity Symposium
CERT Data Science in Cybersecurity SymposiumCERT Data Science in Cybersecurity Symposium
CERT Data Science in Cybersecurity Symposium
 
Nick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs securityNick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs security
 
Stop Wasting Your Time: Focus on Security Practices that Actually Matter
Stop Wasting Your Time: Focus on Security Practices that Actually MatterStop Wasting Your Time: Focus on Security Practices that Actually Matter
Stop Wasting Your Time: Focus on Security Practices that Actually Matter
 
Mac security - a pragmatic guide
Mac security - a pragmatic guideMac security - a pragmatic guide
Mac security - a pragmatic guide
 

More from Dan Kaminsky

More from Dan Kaminsky (11)

Chicken
ChickenChicken
Chicken
 
Chicken Chicken Chicken Chicken
Chicken Chicken Chicken ChickenChicken Chicken Chicken Chicken
Chicken Chicken Chicken Chicken
 
Some Thoughts On Bitcoin
Some Thoughts On BitcoinSome Thoughts On Bitcoin
Some Thoughts On Bitcoin
 
Interpolique
InterpoliqueInterpolique
Interpolique
 
Bh us-02-kaminsky-blackops
Bh us-02-kaminsky-blackopsBh us-02-kaminsky-blackops
Bh us-02-kaminsky-blackops
 
Bh eu 05-kaminsky
Bh eu 05-kaminskyBh eu 05-kaminsky
Bh eu 05-kaminsky
 
Dmk neut toor
Dmk neut toorDmk neut toor
Dmk neut toor
 
Dmk audioviz
Dmk audiovizDmk audioviz
Dmk audioviz
 
Bo2004
Bo2004Bo2004
Bo2004
 
Gwc3
Gwc3Gwc3
Gwc3
 
Advanced open ssh
Advanced open sshAdvanced open ssh
Advanced open ssh
 

Recently uploaded

My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...DianaGray10
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 

Recently uploaded (20)

My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...Explore the UiPath Community and ways you can benefit on your journey to auto...
Explore the UiPath Community and ways you can benefit on your journey to auto...
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 

Black ops 2012

  • 1. Black Ops Dan Kaminsky Chief Scientist DKH dan@doxpara.com (2012)
  • 2. Another Year, Another Talk  Good News and Bad News  Good News: We‟re going to fix this thing.  We have no choice. The global economy is based on Information Technology being trustworthy.  An economy where you have to be big enough to field a “cyber army” in order to participate, is a broken economy indeed  It‟s not like the big guys are doing a great job defensively  Bad News: We‟re not going to fix it according to dogma.  How‟s the status quo working out for us?  There are many alternatives to dogma that are even worse  How do we find those that are better?
  • 3. A Riddle  What is the fundamental difference between attack and defense?
  • 4. Answer  When an attack doesn’t work, you can tell.  Offense has an inherent quality filter  “Put up or shut up”  Doesn‟t mean there aren‟t bugs in offensive disclosure  The “Oracle Critical” is just an unencrypted transport – if it‟s a bug, then Wireshark is dropping hundreds of 0day  Press Will Report Anything  But it‟s not the same
  • 5. The Reality Of Defense  Too much dogma  Not enough science  “You have to defend against every bug” -> “That‟s impossible” -> “You don‟t have to show you‟ve defended against anything”  Critiques of defenses aren‟t much better – nobody is measuring or critiquing effectiveness  So this is a talk about skepticism and the processes of finding effective defenses to the real and legitimate threats we cannot ignore  You shouldn’t agree with everything I‟m going to present  My goal is to show you some new ideas, and give you a framework to consider them as worthwhile or not  This is the only way we‟re going to get defense to work  Lest you think there‟s nothing concrete here…
  • 6. The Fundamental Test  Take 2000 systems with a defense.  Take 2000 systems without.  Come back in six months, and manually audit all 4000 systems.  Is there or is there not a statistically significant difference in the infection rate?  Even if we don‟t do the above, let us at least respect a gold standard when we see one!  The time may come when we spend as much money on security research as we do on medical research.  Medicine took hundreds of years to become scientific, and they had dead bodies to motivate them  We don‟t have dead bodies, or hundreds of years.  We still need to fix these problems.  Some vendors out there care along these lines. Reward them!
  • 7. The Three Heads Of The Security Hydra  1) The inability to authenticate  2) The inability to write secure code  3) The inability to bust the bad guys  What we‟re not talking about today  Authentication – DNSSEC – no time, ask me in private (or wait a few months)  Busting the bad guys  Remarkable lack of consensus regarding which bad guys are most important  I tend to worry about the Aurora attack, which involved espionage against (lets face it) the entire Fortune 500, and against those raiding SMB payrolls, because that calls into question the very viability of SMB  Others have different priorities  What we are talking about  The inability to write secure code
  • 8. An immediate clarification  It‟s not that it‟s impossible to write secure code  It‟s not impossible to deploy X.509 PKI  It‟s not impossible to bust the bad guys  It’s just plainly and utterly improbable  At least in most organizations  “Possible is not enough. Probable or bust.”
  • 9. What are we looking at today?  How do we address timing attacks?  How do we generate random numbers?  How do we suppress SQL Injection?  How do we detect network manipulation?  How do we scan the Internet?  These are all things that are possible today. How do we make them more deployable, less expensive…more probable?
  • 10. Timing Attacks  Many systems are modeled in terms of just what data they send  Not in terms of when they send it  Sometimes data leaks – security sensitive data  Possible to distinguish 15-100 microseconds of latency over Internet, and 100 nanoseconds of latency over LAN (1000 samples)  Opportunities and limits of remote timing attacks (Scott A Crosby , Rudolf H. Riedi , Dan S. Wallach)  Possible to exploit string comparison functions in widespread scripting languages, thus breaking HMAC compare (OpenID/OAuth)  Exploiting timing attacks in widespread systems Nate Lawson and Taylor Nelson @ Black Hat 2010
  • 11. The Proposed Fix  Any time values need to compared in a security critical context, compare them in constant time (so that there‟s no correlation between what‟s compared, and how long it takes)  public static boolean isEqual(byte[] a, byte[] b) { if (a.length != b.length) { return false; } int result = 0; for (int i = 0; i < a.length; i++) { result |= a[i] ^ b[i] } return result == 0; }  Looks good, right?
  • 12. The Problem  You have to remember to do this everywhere there’s a security critical comparison  You don‟t get to do it all the time, because the performance impact is too high  You thus must actually identify all the security critical comparisons  It‟s possible. But it‟s not probable.
  • 13. A Solution?  I seem to note that distinguishing against Internet noise yields less accuracy (15,000-100,000ns) than LAN noise (100ns)  That‟s three to four orders of magnitude!  And Internet noise is not actually random  What if we actually did have a random delay?  tc qdisc change dev eth0 root netem delay 3ms 1ms  “For all packets emitted from the first Ethernet interface, add a random amount of lag between 1,000,000ns and 3,000,000ns”  “Boltzmann Filter”  At minimum, the LAN should be as secure as the Internet. Maybe Internet attackers also are impacted.  This is a lot easier to deploy. That really matters.  But does it work?
  • 14. What Could Go Wrong?  “All timing noise can be averaged out eventually, so a global random delay can‟t work”  Pretty much all password comparisons are done with non-constant time compares, so I guess all passwords are vulnerable?  Here‟s some SSH 0day sys_auth_passwd(Authctxt *authctxt, const char *password){… /* Encrypt the candidate password using the proper salt. */ encrypted_password = xcrypt(password, (pw_password[0] && pw_password[1]) ? pw_password : "xx"); return (strcmp(encrypted_password, pw_password) == 0);  Strcmp is not constant time. So, you just offline brute force for passwords that have certain characters and see how far you get.  It is highly unlikely that the above “attack” actually works  Nanosecond differentials are too small to recover  Maybe not locally…hmmm…
  • 15. What We Really Need To Know  How much timing noise, of what nature, will permanently obscure how much timing signal beyond the point of infeasible return?  Somewhere between “1 nanosecond” and “1 day” there is an amount of noise that will indefinitely obscure an n nanosecond differential  There‟s likely to be an equation here  “CSI Enhance” has its limits  There is a limit to how much lag we can ask for, from the performance guys  It is higher for some requests than for others  We might require more lag than perf is willing to give (at least in general)  Need to discover these numbers
  • 16. What could actually go wrong  The distribution of lag from the interface may be easy to filter  Quantized into 1ms chunks?  Gaussian when it should be uniform, or uniform when it should be Gaussian  Could be filterable thanks to TCP timestamps (which have ~10ms accuracy, but also have sharp edges)  All of the above can be fixed, the question is if they need to be  The perfect (constant time comparisons) is the enemy of the good (interface-wide jitter)  Jitter does not need to apply to all packets; could be a TCP setsockopt or whatnot  Could also be applied at the end of a php script
  • 17. Another Day, Another Time  “RSA is broken!”  No, not the thing with the smartcards that would (maybe, depending on vendor) leak their private key  No, not the thing with the SecureID seeds that were stolen  The thing with certificates with easily breakable RSA keys  Something like 1 in 200 RSA keys on the Internet failed!  Hughes and Lenstra had first announce, Nadia Heninger had parallel research  At the time, the break was blamed on RSA itself  Two primes in RSA (p and q)  If either is repeated (p and q1, p and q2), then all are easy to derive  Euclid‟s Greatest Common Denominator  “RSA is bad!”
  • 18. Reality  Bad random number generators create trapdoor functions in all cryptosystems  Rather than breaking the crypto, you guess the key  Basic concept of 2011‟s Phidelius (expanded a password into a pseudorandom stream, which was then used to feed a key generator for RSA/DSA/ECC).  “Bad RNG isn‟t a bug, it‟s a feature!”  They thought they‟d shown RSA was bad  They actually showed that RNG‟s are still broken  Debian‟s bug wasn‟t just Debian‟s  Weren‟t operating systems supposed to fix this?
  • 19. Theory  “Collecting and providing entropy is hard; let the operating system do it for you”  /dev/random for good bits, /dev/urandom for best effort bits  If /dev/random runs out of bits, block until more are found  Sources for entropy  Hardware RNG  Keyboard  Mouse  Disk Rotation (as impacted by air)  Problem: Lots of environments don‟t have any of that
  • 20. Actual Environments  Desktops  Humans w/ keyboards and mice  Often disks  Servers  Sometimes have disks  VMs  Embedded devices
  • 21. The Reality of Hardware RNG  It‟s just not there.  Yes, I know Ivy Bridge is coming out with a Hardware RNG. In 2012.  That‟s top of the line gear now.  Yes, I know some TPM‟s are reported to have Hardware RNGs.  For some reason, people treat TPM hardware as unstable radioactive gunk  It‟s also rarely in embedded kit
  • 22. What‟s Happening: An Analogy  Proteins causes cancer  http://ukpmc.ac.uk/abstract/MED/3007842/reload=0;jsessionid =3X3Cs6G7VbyRT1xEPcUX.4  Carbohydrates cause cancer  http://www.smh.com.au/lifestyle/diet-and-fitness/high- carbohydrate-diet-tied-to-cancer-20110616-1g4o9.html  Fats cause cancer  http://www.telegraph.co.uk/health/healthnews/5650141/High- fat-diet-can-increase-risk-of-deadly-cancer.html  Alcohol causes cancer  http://pubs.niaaa.nih.gov/publications/arh25-4/263-270.htm  So you don‟t consume proteins, carbohydrates, fats, or booze.  You starve to death.
  • 23. What Actually Happens  How do I know? I actually asked some devs.  1) They have some code that depends on /dev/random  2) On initialization of their embedded device, the code tries to generate a key.  3) There‟s no human at the keyboard, no hand at the mouse, no disk to spin, and no hardware RNG. /dev/random blocks. The device is a brick.  Quite literally, starving for entropy  4) At best, they switch to /dev/urandom. At worst they switch to rand() and then they ship.  /dev/urandom is underseeded, though, and is still broken
  • 24. A comparison  What perfectionists think will happen:  It‟s broken! Sure they‟ll demand hardware RNG!  What developers actually do:  Security failed us again. Lets ship something that works.  Perfectionism caused (at least) 1 out of 200 RSA keys on the Net to be easily broken  It‟s almost certainly worse than that  Those are just the keys we can easily detect  We can do better.
  • 25. TrueRand: An Old Hack [0]  Why do we like measuring keyboard and mice?  Humans and computers are not synchronized  Humans do not operate on nanosecond clocks like computers do  Human is slow clock, CPU is fast clock  Any system with two clocks, has a Hardware Random Number Generator  Even if the error is one part per million, that‟s a bit per second per megahertz  The error is generally much larger than a part per million, just from thermal noise  (Not just thermal noise)
  • 26. TrueRand: An Old Hack [1]  What TrueRand (from Matt Blaze and D.P. Mitchell, in 1996) does  Run the CPU in a tight loop (count++);  Every 16ms, fire an interrupt  On interrupt, shuffle the count variable, and integrate it into a buffer  The entropy comes in here – timer is slow clock, CPU is fast clock  After 11 shuffles, return the buffer as an integer  Hash two buffers together using sha1, return only the first byte  It ain‟t bad. But it‟s disowned.  That‟s too bad, because it would have prevented (at least) 1/200 keys from being broken.
  • 27. Why is it disowned?  (Literally – Matt Blaze was vaguely horrified that I‟m revisiting this code)  Perfectionism  “We can‟t model its behavior. We don‟t know how good or bad it is, so we shouldn‟t do it at all.”  This attitude has actually led to a reduction in available entropy in the Linux kernel  Used to look at interrupt counts from various devices  Now they aren‟t used, because they “might be polluted”
  • 28. DakaRand 1.0 [0]  An update to the old model  Multiple generators  Sleepers: Measure usleep with  CLOCK_MONOTONIC  CLOCK_REALTIME  RDTSC (on X86 platforms)  CPU counter – there are equivalents for ARM, MIPS  Incrementer: See how many times we can increment an integer within a certain time period (100% CPU)
  • 29. DakaRand 1.0 [1]  RTC: Measure interrupts from the realtime clock using CLOCK_MONOTONIC (dedicated IRQ!)  128hz  8192hz  Threads: Measure the status of an integer modulated by a runaway thread (100% CPU)  Anyone who thinks computers are completely deterministic creations has never written threaded code ;)  Two Threads, One Int (one adds, one subtracts, main polls)  Two Threads, Two ints (both add, main compares)  One Thread, One Int (one adds, main polls)  Possible addition: Noisier functions than add
  • 30. DakaRand Flow  Short version  Push all bits into a SHA-256 Hash  Don‟t undercount entropy  Only count them as entropy when they pass Von Neumann‟s debiasing check  Count 1‟s to decide whether 0 or 1  Throw away 00 and 11, count only 01 and 10  Actually insert a 0 or a 1 when you count a bit  Don‟t overcount entropy  Scrypt (time/memory hard function) the resulting SHA- 256 value  Make it miserable to guess entropy  Use the output of Scrypt as the input to AES-256-CTR, emit the resulting stream
  • 31. Attacking DakaRand  The game: Find a platform (Desktop/Server/VM/Embed) or an OS under which DakaRand provides poor entropy in one of its modes  Userspace/Hypervisor Scheduling  We‟re only called some number of times per second  These times per second may be at predictable intervals  If sufficiently predictable, they‟ll bias the output  Will they simultaneously and identically bias both clocked entities?  Autoclocking  If you time something against itself, you‟re going to have a bad time  Clocks are highly correlated to themselves  RTC and CLOCK_MONOTONIC could be the same underlying timer in a VM  VMs, more than anything else, should be exposing a random device (even if the random device itself uses clock differentials)  Still, this code seems to still work on VMs
  • 32. The VM Cloning Issue  /dev/random keeps bits around for a long time  When you clone an image, you end up with those bits being static for a long time  Meaning you keep generating the same entropy for a long time   DakaRand attempted guarantee: Each read is atomic  The results of the read may be used across multiple images  But two separate calls at two separate times MUST yield two uncorrelated streams  Can‟t do anything after the read is fully completed  During the read (which does last a second, due to scrypt) is already after  I actually don‟t think you can do better than this, though I was considering XORing the keystream with /dev/urandom anyway
  • 33. Is The Underlying Use Of Crypto Safe?  Modified Von Neumann  We absorb a tremendous amount of data into our hash structure that has obvious patterns  If you have 100GB of 0‟s and 128 bits of actual randomness, output of hash has 128 bits of randomness  We do explicitly include the 0 and 1  Stream Function vs. Raw Output  Lots of raw output from a function tends to leak external state  So lets not leak external state.  Cryptographic Stream Function  RNG‟s tend to have their own family of functions that are distinctly not cryptographically validated  Mersenne Twister, not AES-256 in Counter Mode  Is it in fact the case that strong (not RC4) cryptographic functions encompass all properties of RNGs?  Well, what does dieharder say?
  • 34. DieHarder CipherSuite Test  About 16,000 CPU hours of DieHarder Entropy Tests was run across 21 ciphers, with inputs of either 16MB of zero or (the same) 16MB of /dev/urandom output  About 24,000 different tests per cipher/content class  Thanks, Jamie Schwettman, who did all the work to make this sweep happen  No obvious statistical leanings to the data  Machine learning people are taking a look  Thanks, Prior Knowledge, Aleks Jakulin!  No conclusive findings yet  Releasing this data too
  • 35. Neat tool – want it? csql: run SQL against CSV files  $ cat pass2.csv | head -n 20000 | ./csql - "SELECT cipher, content, test, subtest, count(pv), avg(pv) from c group by cipher, content, test, subtest;" | head -n 10  aes-128-cbc,urandom,dab_bytedistrib,0,10,0.0  aes-128-cbc,urandom,dab_dct,256,10,0.47393035  aes-128-cbc,urandom,diehard_2dsphere,2,10,0.627572674  aes-128-cbc,urandom,diehard_3dsphere,3,10,0.664239991  aes-128-cbc,urandom,diehard_birthdays,0,10,0.50850473  aes-128-cbc,urandom,diehard_bitstream,0,10,0.017056331  aes-128-cbc,urandom,diehard_count_1s_byt,0,10,0.441374983  aes-128-cbc,urandom,diehard_count_1s_str,0,10,0.538731369  aes-128-cbc,urandom,diehard_craps,0,20,0.0394997795  aes-128-cbc,urandom,diehard_dna,0,10,0.396250338
  • 36. Kernel Recommendations  /dev/random MUST not block.  Make an IOCTL if you must  Return data slowly if you like  CryptGenRandom on Windows does not appear to block  1 out of 200 RDP keys are not likely to be corrupt  Don‟t be so shy about interrupt sources  Care less about interrupt counts than interrupt timings  ftrace exposes microsecond timings, which might not be fine grained enough  Use nanosecond arrival times, as much as possible, from devices on foreign busses. The slower the foreign device is, the better.  You want to be measuring slow clocks against fast clocks  By definition, the kernel is interrupted at finer grain than userspace.  Obviously you don‟t have to include every last interrupt – it takes time to check the time.  Maybe consider this Modified Von Neumann construction
  • 37. From The Bottom To The Top  Our biggest problems in security do not revolve around Random Number Generation  They revolve around languages  Language Theoretic Security: The hypothesis that security vulnerabilities are the consequence of the languages code is written in  Coined by Len Sassaman and Meredith Patterson  “Sapir-Whorf is true for code”  Corollary: If language got us into this mess, language can get us out  More important corollary: Languages are spoken or written by humans. Ignore their needs at your peril.
  • 38. The Shift  One way to look at language theoretic security is through the lens of computability theory  Different classes of code have different amounts of “power”, and communication should be limited to the least amount of power necessary  Attacks expands power from Declarative to through Regular Expression through “Turing Complete”  This is indeed a valid lens  Another lens
  • 39. Diagramming Sentences: IT WAS ACTUALLY USEFUL
  • 40. Injection Vulnerabilities: When Trees Disagree  Parsers, almost by definition, turn streams of bytes into trees  Injection Vulnerabilities exist when a sending language and a receiving language (which may or may not be the same) disagree on the nature of the tree sent  An extreme case of this is when bytes flow out into surrounding memory  But SQL Injection, LDAP Injection, XSS, etc are all just situations where (generally) the sender thought it sent the user‟s data, but the receiver thought it received a peer‟s code  A purely declarative language can still (easily) be injected into, and complexity can remain declarative and still yield damage. The attack is not in the increase of complexity, but in the transition of content from one identity/context to another through parse tree differentials.  So what?
  • 41. We have to stop injection vulnerabilities  They‟re killing us  They‟re not l33t  They‟re totally effective  They‟re the vast majority of vulnerabilities ever written and discovered  We haven‟t actually fixed them  If we did fix them, they wouldn‟t still be costing billions of dollars  [Yes, we‟re going to revisit Interpolique…it‟s OK, we‟re going to bash it too]
  • 42. What is the importance of another theoretical model?  It declares the rules of the game.  1) We want to synchronize parse trees.  2) We want developers to actually use our method.  A language unspoken has a term: A dead language  It explains what is surprisingly not understood  Why did XML become popular?  Instead of spending months figuring out just how to say hello, they have their code, you have your code, and it‟s self describing strings in each direction. No fiddly “the eighth bit on the fourth byte changes everything”  Why did JSON become popular?  XML invented its own modes of being fiddly
  • 43. The Hard Truth  Developers are in charge.  Not architects (they love ASN.1 and XML and WS- ZOMG)  Not academics (they love Haskell)  Not management (they love money)  Money is made by performance, reliability, maintainability, features, rapid development  Money is later lost by security, maybe  So, not us.  What is the #1 thing developers like?  Code working
  • 44. Thus, the biggest explanation  Why is PHP so popular?  If you don‟t think it is, see here:  What is PHP incredibly good at?  Copy and paste code…and it works  We understand that CPAN makes PERL  We don‟t understand that PHP sample code makes PHP  Java Alternative: Look how much code my IDE can write for me!  Copy and paste with a suit on
  • 45. The Language Success Metric  What are the odds, if I try this, that it will work?  Not, when it fails, it fails fast!  Surprisingly, nobody tracks this metric  (Except maybe Processing, which is incredible)  That‟s why all the successful languages tend to be the brainstorms of one guy  Art is science before we know what we‟re doing   PHP beats your favorite language  If we want to fix security, here is a good place to work
  • 46. What‟s Wrong With ORMs?  Object Relational Models  Problems with SQL Injection? Don‟t use SQL! Instead, the database just looks like your favorite language‟s native objects.  Great, right up until the moment you need to make a query.
  • 47. Look at this. It matters.  +[,+[-[>+>+<<-]>[<+>-]+>>++++++++[<-------->-]<-[< [- ]>>>+[<+<+>>-] <[>+<-]<[<++>>>+[<+<->>-]<[>+<- ]]>[<]<]>>[-]<<<[[- ]<[>>+>+<<<-]>> [<<+>>- ]>>++++++++[<-------->-]<->>++++[<++++ ++++>-]<- <[>>>+<<[ >+>[-]<<-]>[<+>- ]>[<<<<<+>>>>++++[<++++++++>-]>-]< <-<-]>[<<<<[- ]>>>>[<<<<->>>>-]]<<++++[<<++++++++>>-]<<- [>>+>+<<<-]>>[<<+ >>-]+>>+++++[<----->-]<-[<[- ]>>>+[<+<->>-]<[>+ <-]<[<++>>>+[<+<+ >>-]<[>+<- ]]>[<]<]>>[-]<<<[[-]<<[>>+>+<<<-]> >[<<+>>-]+>------------ [<[-]>>>+[<+<->>-]<[>+<-]<[<++>>>+[<+<+>>-]< [>+<- ]]>[<]<]>>[-]< <<<<------------->>[[-]+++++[<<+++++>>- ]<<+>> ]<[>++++[<<+++++++ +>>-]<-]>]<[- ]++++++++[<++++++++>-]<+>]<.[ -]+>>+<]>[[-]<]<]
  • 48. BrainF*ck‟s Rejoinder  There are more things in this world broken by punctuation than just BrainF*ck.  Compare.  $result = from('$name')->in($names) ->where('$name => strlen($name) < 5') ->select('$name');  32 characters of punctuation, deeply interspersed  $result = query(“SELECT $name FROM $names WHERE length($name)<5”);  12 characters of punctuation (with large gaps)  Which would you rather write?  There‟s a reason SQL persists after all these years. It‟s really expressive and surprisingly without noise.  Put another way: It‟s a language that‟s shockingly good for structured queries.  Turns out this matters.
  • 49. The Classics  Escape?  mysql_real_escape_string – really? 25 characters?  Bigger problems: Fails open – code still works if it‟s just missing  “Greppability” is huge – you can‟t grep for a missing escape!  Escapes are a blacklist. When‟s the last time you saw a blacklist work properly?  Parameterization  First you declare a template for a query  Then you link individual variables to the template, on a positional basis  “This is the first argument”  “This is the second argument”  MAYBE, if you‟re lucky, your language supports argument aliases.  “The argument marked with :name should get the value of the variable „name‟”  One line of code becomes many  Resources need to be synced
  • 50. Reality  Nobody has ever written a parameterized query without a gun to their head. We know, we hold the gun.  Even secure code, when audited, tends to be “safe things written quickly” and “we realized this was unsafe so we parameterized it”  That you have to threaten people with getting fired, is itself a data point.  For some strange reason, databases don‟t seem to provide mechanisms to disable unparameterized queries entirely  More interestingly, it‟s a crapshoot whether you get to parameterize at all  Just try to parameterize “SELECT”.  SQL, for all its elegance, builds a remarkably complex parse tree out of a mostly unpunctuated string  Some nodes in the parse tree can be filled by functions, some can be parameterized, etc.  It‟s a decent RNG to know what you can get away with 
  • 51. Interpolique [0]  Released in 2010 at HOPE  Concept for eliminating injection attacks while retaining “dangerous” (but developer preferred) coding styles  Both SQLi and XSS  Basic idea  “SELECT * FROM foo where x=$x and y=$y”  Humans can pretty easily see the separation between code and data. Data begins with $. Code does not.  The language throws that data away and just smashes strings together.  Does it have to?
  • 52. Interpolique [1]  The original approach for Interpolique  First, use an alternate syntax to identify the desired variables  “SELECT * FROM foo where x=^^x and y=^^y”  Then, create a function that returns the code we‟d have liked the developer to write.  $stmt = $conn->prepare(“SELECT * FROM foo where x=? and y=?”); $stmt->bind_param(“ss”, $x, $y); $stmt->execute();  Finally, evaluate the generated code  eval(b(“SELECT * FROM foo where x=^^x and y=^^y”);  Eval is, surprisingly, the only way to retrieve the values of $x and $y from inside the function b().
  • 53. What‟s Wrong With Interpolique?  What if the dev writes:  eval(b(“SELECT * FROM foo where x=$x and y=$y”);  If $x and $y are attacker controlled, he‟s not far from an eval that will run code in PHP‟s context!  The b() function is in a position to defend the code that ultimately enters eval, but now you‟re entirely dependent on b() knowing what PHP will do given arbitrary bytes.  GOOD LUCK WITH THAT  Highly greppable error case, but it‟s pretty scary
  • 54. Building A Safe Interpolique  Eval only exists so that variables from the calling scope can be dereferenced  One approach is to implement create_selfscoped_function()  Returns a function that always runs in the scope of its parent  Could implement “proxies” so it can only read variables, and can‟t rewrite  $rows=$mysql_safequery(“select * from foo where x=^^x and y=^^y”);  Requires a patch to PHP -- Daniel Zulla is working on this!
  • 55. Code Rewriting?  If we know what we would have liked developers to have written, why don‟t we just transform code once?  Never really been a fan of this  Have you ever audited autogenerated code?   What do you do when the code looks like: $z = “SELECT * from foo where x=$x and y=$y;”; $rows = mysql_query($z);  Static analysis can of course find such situations (thus knowing $x came in from a HTTP variable) but most devs don‟t have access to such static analysis tools  Should they?
  • 56. Tainting  What if we actually marked every character that came in from an HTTP query as “tainted”?  Metadata, on a character by character basis  Would survive passing from function to function  Might even survive reasonable mangling by built in filters  Then, you could write something like: mysql_query_safe(“select * from foo where x=$x and y=$y;”);  Even though $x and $y would expand, the wrapper function would see that those particular characters were once tainted with the “mark of the web”, and could rewrite the unsafe query around it  This still works with mysql_query_safe($x) when $x was assembled elsewhere, even concatenated;  Could have problems with silent failure with filtering functions  Requires a patch to PHP – Daniel Zulla also working on this
  • 57. SuperEncoding as Explicit Tainting  Based on discussions with Zane Lackey and Nick Galbreath at Etsy, based on an approach they’re already running in production  What if all variables from the web, were encoded in a whitelisted format?  Simple hex encoding -- &%41 – which, coincidentally, renders as an A in any HTML parser  All non-DB access would have to go through accessors  r($x) to read, w($x) to write  Surprisingly easy to grep for access that isn’t wrapped  Could do two things  mysql_query_safe($x) could simply treat all superencoded characters as “data” and parameterize accordingly  mysql itself could have its lexer modified to handle HTML encoding, exposing such characters to less of the SQL parser (“this is just a string”) – very LangSec
  • 58. A Last Minute Alternative  Perhaps we‟ve got this backwards  Rather than tainting data as data, we mark code as code.  SQL tends not to be passed around from function to function, let alone parsed in the frontend  $sql = c(“select * from foo where x=“); $sql += $x; $sql += c(“and y =“); $sql += $y;  Then either mysql_query_safe or mysql itself (cowardly) refuses to execute anything with unmarked code  Or, if this is baked into MySQL, it just doesn‟t see bytes as code if they‟re not deeply marked as code  Moderately greppable – you‟re basically finding all SQL in your code and wrapping it with some sort of taint  Either implicit as per Zulla, or explicit as per Etsy  Most likely failure mode is an attacker controlled variable somehow getting inside of c(“”);
  • 59. This is what LangSec means  “What are people trying to say?”  “How can we make it easier to say that?”  “How hard will it be for people to migrate?”  “What errors will they make when trying to use this?”  “Can we limit how much code might contain a bug?”  CARE ABOUT YOUR DEVS OR THEY WILL NOT CARE ABOUT YOU
  • 60. What‟s Going On With The Web?  It doesn‟t matter what code you write, if there are parties in the middle changing or blocking what you send  Content alteration and blocking is becoming a real thing  Verizon is claiming the first amendment right to rewrite Internet connections  Entire countries are silently blocking web pages  Indonesia‟s blocking a million porn sites in the run up to Ramadan
  • 61. What Went Wrong With N00ter  N00ter was a really fun (and really powerful) mechanism for detecting network manipulation  Allowed a remote server and a cooperating client to “pretend” to have a conversation with anyone on the Internet, using any protocol  To any MITM, it would look like a real, unmodified conversation  So any alterations that might normally hit the real server, would hit this too  Unfortunately, N00ter does a lot of very low level packetcrafting, meaning (realistically) it requires custom hardware in front of user machines  This is not fun to deploy   Especially if you need to get between NAT and actual network connection  Not impossible. Definitely improbable.
  • 62. What Else Can We Use?  Executable code on the client  OONI-Probe  Web Pages with Iframes  Herdict (“Herd Verdict”)  Needs either user cooperation, or a Chrome extension, to know if content is up or down  Is it possible to determine whether content is up or not, from just a web page?  Can we crowdsource censorship data?  Maximize data per user  Minimize installation load per user
  • 63. Imaging  Browser‟s Same Origin Policy usually prevents web pages from doing much with one another  You wouldn‟t want Yahoo able to read from your Gmail account  But there is one exception  Any domain is allowed to load any other domain‟s images  Beyond that, it‟s allowed to know that the load was successful  Not merely that there was a file at that location, but that it was actually an image  You even get image dimensions (which you‟d have to, because it resizes the page)  If a domain is being censored, the image will not load  What one image is on most domains?
  • 64. Favicon.ico  (It‟s the picture to the left of Google in the tab)
  • 65. So this is CensorSweeper (Also by Joseph Van Geffen and Michael Tiffany) Written for Wall Street Journal Data Transparency Hackathon
  • 66. What‟s going on…  img = new Image(); img.onload = function(event) { }// render favicon img.onerror = function(event) { validate(); } img.src = “http://somesite.com/favicon.ico”  The above is done in parallel, reading from a list of sites that have confirmed presence of favicon.ico  Six failures are required before a “bomb” is dropped on the map
  • 67. Error Handling  Six failures isn‟t actually enough!  Web browsers provide remarkably little feedback to a developer to know what‟s failing, and why  Put simply, “flow control” hasn‟t really been implemented for the web  Everything‟s been designed around infinite bandwidth  For reliability, going to need to shut down all other traffic, and then do two simultaneous lookups  One for a known-up site, the other for the supposedly- down site  That being said, CensorSweeper works pretty well  Can we do better?
  • 68. Sockets  Once upon a time, web browsers could act like proxies, giving you connections anywhere  There were bugs in Flash and Java; we fixed them  They can now only create connections to IP addresses that invite them  But ~20% of the time there are transparent proxies between web servers and their users  See “Staring into the Abyss” by me, or “Socket Capable Browser Plugins Result In Transparent Proxy Abuse” by Bob Augur  This has been known…but not explored for mapping censorship!
  • 69. HTTP Censorship Detection  1) Using Flash (or HaXe) Create a HTTP connection back to your own IP on port 80  Host a socket policy file, so Flash allows this  2) Request anything, from any domain  If the request comes to you, there is no transparent proxy  Otherwise, the request will be hijacked by the proxy, serviced, and sent back to your Flash app  You now see what that user would see, if they browsed to that site! You can then submit it back to yourself.
  • 70. HTTPS Certificate Extraction  Just as HTTP traffic on 80/tcp is hijacked, so may HTTPS traffic on 443/tcp  MITM may have an alternate certificate for you  But (if you‟re careful) it can‟t tell the difference between the browser starting SSL, and Flash/HaXe starting SSL  It has to know which domain to pretend to have a certificate for  The proxy can parse the Server Hello, with its certificate  (It‟s your server saying hello)  The proxy can parse the Client Hello, with its Server Name Indication  (It‟s your Flash app saying hello)  You can actually host the real Facebook certificate, or even proxy the real Facebook SSL endpoint  Hard to keep track of all of Facebook‟s IPs  It has to forge the certificate, before you have to prove you actually have Facebook‟s private key (assuming you aren‟t proxying)
  • 71. Slight Annoyance  No normal way, via Browser DOM, to determine the certificate that provided content  This at least allows a page to query for its exposed certificates – kinda cool!  Limitations  You can test anyone‟s certificate, as long as the attacker isn‟t interposing themselves via DNS hijacking  The Flash app sees what‟s at the named IP; if hijacking is at the DNS layer, then Flash won‟t get hijacked  You are able to test your own certificate, but then the attacker has already MITM‟d you and can alter your security validation layer
  • 72. Full Proxying  One of the goals of N00ter was seeing if everyday content was being altered or slowed down  One of the headaches with these custom probes is writing these custom probes  How do you look just like a real web browser trying to access YouTube?  Answer: Be a real web browser trying to access YouTube  The last time we played with Flash and Sockets, we created a full VPN  But now sockets are limited to a single destination…  It turns out that it may still be possible/useful to proxy an entire browser (at the server) down to the Flash app (in the client), which will then make open connections back to the server who will proxy them to the rest of the Internet  This will allow, at minimum, a protocol correct sequence of messages for HTTP and HTTPS that are only incorrect by destination IP  So basically, if the intercepting server doesn‟t care about IP correctness, you get to interrogate its ruleset with no installed code on the client 
  • 73. Last but not least: Scanning Networks Quickly  Actionable Intelligence: What can an attacker do today, that he couldn‟t do yesterday, for what class attacker, to what class victim?  Rather related to this: How many potential victims are out there?  I‟ve run two major scans this year (that I‟ve talked about)  Telnet  Determining presence of Telnet Encryption support  Answer: Very rare  RDP  Determining presence of open RDP access  Answer: VERY common
  • 74. My Process  Once upon a time, simply flooding TCP SYNs was enough to find out what was out there  Nowadays, many, many IP addresses will three way handshake, but there won‟t actually be anything there  Solution: Split process  1) Identify candidate IP addresses, that are listening on a given port  2) Given a candidate, actually connect to the IP
  • 75. More Detail  Candidate collection  For each IP, incrementing the first byte first, (1.1.1.1, 2.1.1.1, 3.1.1.1…), send a TCP SYN on the required port (23 for telnet, 3389 for RDP)  In a separate window, log TCP SYN|ACKs with tcpdump  tcpdump –w log 'tcp[tcpflags] = (tcp-syn|tcp-ack)'  Scanrand was being buggy, this maximized logging  Candidate Inspection  Telnet Encryption – nmap team whipped up a quick check, so I just fed the IP list to it  Very few found
  • 76. RDP Sweep: Black Mamba  Probably the most pleasant environment for reasonable scale TCP probing ever devised  http://rootfoo.org/blackmamba  from blackmamba import * def get(host, port=80): msg = "GET / HTTP/1.1rnHost: %srnrn" % host yield connect(host, port) yield write(msg) response = yield read() yield close() print response def generate(host, count=100): for i in range(count): yield get('example.com') run(generate('example.com'))  You end up getting ~3000 IPs a second  May need to increase ulimit –n  May need to alter hardcoded limits in blackmamba.py
  • 77. Can We Get Faster?  Always wanted to write a userspace TCP stack  HD Moore kinda kicked me into working on one for critical.io, his mysterious new scanning project  I am not at all beyond being motivated by other people‟s awesome and mysterious projects  Especially when they give me CPU and Network Bandwidth  So. Scanrand3! A new scanner that doesn‟t just flood SYNs, but actually connects to every node and extracts data  Original plan: TCP stack with SQLite as the backend  “SELECT * FROM sockets WHERE data_sent!=data_acked and data_sent_time-now()>3” (to find sockets where a retransmit is needed) is just funny!  SQLite, in memory-only mode, is really really fast  160K inserts/sec fast  Unfortunately, that speed disappears when you add indexes  20K inserts/sec with two indexes 
  • 78. New Plan: Let The Servers Keep TCP State 
  • 79. Details! Details!  Scanrand didn‟t get its speed by keeping track of who it did or didn‟t send traffic to  Why should Scanrand3?  1) Send SYN  Maximum Segment Size==1460  Window Size==1460 (for all packets)  2) Upon receiving a SYN|ACK, reply with an ACK  Include “GET / HTTP/1.0” payload  Yes, you can put a payload in the initial ACK!  3) Upon receiving an ACK, if there is a payload, ACK it  Save the payload  4) Upon receiving a FIN|ACK, RST  Save the payload, if any
  • 80. No Local State  If the first SYN is dropped – OK, nobody‟s around to retransmit it  May want to log RST|ACK to avoid future retransmits  If the SYN|ACK is dropped to the client, server retransmits SYN|ACK  If the ACK w/ initial payload is dropped to the server, server retransmits SYN|ACK, causing new ACK w/ payload  If any ACK w/ response payload is dropped to the client, server will retransmit ACK w/ response payload  Same with FIN|ACK  Window size of 1460 means we always know which particular packet to acknowledge – only one in flight (usually)
  • 81. Performance  Relatively unoptimized code on a well hosted but underpowered server (cheap Dual Opteron)  50-80K servers/sec w/ full payloads  3.25M IPs takes 60-80 seconds, retrieves about 800MB of content  Task is embarrassingly parallelizable across threads, databases, etc.  Should be able to use multiple bpf filters to route packets to their appropriate thread with kernel filtering  Writing to a SQLite DB, and then backing up to disk, is really fast (substantially faster than fwrite, though haven‟t tested a large mmap yet)  You basically reassemble payloads in SQLite as a postprocess
  • 82. Security  Scanrand pioneered inverse SYN cookies – you protect against spoofed responses by validating fields in the response against hashes of data plus a secret only you know  16 bits in source port + 32 bits in sequence number are possible  May be able to get another 32 bits out of TCP Timestamps, which are usually supported  Haven‟t implemented yet, so very easy to poison me   Sequence space becomes less secure, the more data you actually send  You do know the exact size of each payload, so you can say “I only accept responses with no payload seq, payload 1 seq, payload 2 seq, etc”  Technically the other said can ACK at any byte offset, but that doesn‟t mean they actually will
  • 83. Some Notes  Kernels have actually gotten kind of fast  Non-blocking connect() plus epoll should be able to get pretty fast  Certainly easier to code for that model!  Didn‟t work for me (not sure why)  This approach ultimately becomes fastest  Probably need a “writev” call to spew many packets w/o a write for each
  • 84. More Notes  Can also try more efficient stores than sqlite  Giant allocation of RAM with fixed offsets per IP  MemSQL  Neat project by ex-facebookers – compiles SQL to C++  They think even with the indexes they can do +100K  Can have merged approaches too  Only start keeping state if I like the response from the server  Note that stateless client + stateless server = no retransmits 
  • 85. What should the coding model be?  Flat file / command line?  C?  JavaScript?  Lua?  Could implement support for nmap scripts
  • 86. Most Important Feature  Blacklist support  Most networks don‟t mind getting swept  They certainly are, already  Some do  Part of being a whitehat is you let people know who you are, and listen to their requests  So you end up with a pile of IP ranges not to sweep  It can actually take a substantial amount of CPU if you check the list naively  Need to compile it into a quickly queriable structure  I don‟t think firewall rules apply to spoofed traffic
  • 87. Simple Architectural Note  Don‟t try to interact with the Linux firewall  Just pick another IP on the LAN and send from their  Respond to ARP traffic for it  (Yes, it is an advantage of the socket model that you don‟t need to requisition another IP)
  • 88. Whew!  Lots of stuff!  Hope you enjoyed!  This may not be how you try to fix stuff…but it‟s what I try to do   Thanks to everyone cited in the slides  Thanks also to Nick, Johnny, Blackstock, Alex, Allessandra, Allessa ndra, and Andrew of The Sub for putting up with me in DEFCON mode ;)