Black Ops Of TCP/IP 2011 Dan Kaminsky, Chief Scientist, DKH
Intro I’m Dan Kaminsky I write code Not here to fix authentication Working on that Not here to make DNSSEC scale Working on that too
What I’m here for Return to form As a community, we’ve sort of stopped looking at network security Mapping networks Evading firewalls Subverting design assumptions This is probably the right thing – looking at attacks: Acquire Beachhead SQLi the web front end PDF the client backend Lilypad Use acquired credentials to break everything else Netsec is only so relevant in such an environment
So? We’re going to look into it anyway. Maybe we’ll find something interesting.
BitCoin “BitCoin turns nerd forums into libertarian forums” It’s infected everything else in nerddom, why not this talk? What is it? Attempt at making a digital currency with no central bank A system with economic properties I don’t know anything about An overlay network upon the Internet that people think has certain properties
BitCoin In A Nutshell Built on doing three things TRANSFER: “I Alice, give Bob 2.1 BTC” Alice signs the declaration to Bob’s public key GOSSIP: “Heh everyone! Did you hear that Alice gave Bob 2.1 BTC?” Alice sends that declaration into a peer to peer network that gossips the change APPEND: “Everyone, the official registry of transactions should now include Alice paying Bob, Charlie paying David, and so on.” This is gossiped too Requires solving a problem so hard, it takes the world 10 minutes for someone to do it If it takes less than 10 minutes, it’s not hard enough Crypto lets you make things hard enough Solving the problem gives you 50 BTC (today) to Transfer
The Truth Of Bitcoin …this is not my BitCoin talk Go to dankaminsky.com for a more detailed deck BitCoin is actually really impressive Entire classes of bugs are just missing The first five times you think you understand it, you don’t BitCoin has fixed almost all flaws that aren’t forced by the design
The Main Flaws (there are a few more) Does not scale Totally not anonymous
Scalability (from BitCoin’s Own Wiki) Bandwidth “Let's assume an average rate of 2000tps, so just VISA…. Shifting 60 gigabytes of data in, say, 60 seconds means an average rate of 1 gigabyte per second, or 8 gigabits per second.” CPU ”A network node capable of keeping up with VISA would need roughly 50 cores + whatever is used for mining (done by separate machines/GPUs).” Storage “ A 3 terabyte hard disk costs less than $200 today and will be cheaper still in future, so you'd need one such disk for every 21 days of operation (at 1gb per block).”
OK, so you end up with supernodes and normal nodes What are the characteristics of supernodes? They’re banks “Welcome to the new boss, who looks suspiciously like the old boss” I’m not saying banks are bad or anything The “peer to peer” model of BitCoin eventually goes away; as soon as the thing gets big, the entire thing switches to a banking model With all the elements of banking people think BitCoin is immune to, without necessarily the properties people like However, until then…
An Interesting Question Travis Goodspeed: “Heh Dan, any chance BitCoin can be used as a samizdat service?” Samizdat (Russian: самиздат; Russian pronunciation: [səmᵻˈzdat]) was a key form of dissidentactivity across the Soviet bloc in which individuals reproduced censored publications by hand and passed the documents from reader to reader. An old challenge The Internet is usually about sending data, ephemerally Can we use it to store data, indefinitely? Well, if BitCoin is eventually going to require a 3TB HD every 21 days…and is going to need to keep that data forever…
Len. Our community recently lost one of its shining lights If one executes: strings --bytes=20 ~/.bitcoin/blk0001.dat Strings extracts human readable text from any blob of data Usually used to find hardcoded interesting stuff in executables, like default passwords The block database of all transactions ever pushed into BitCoin, run through a filter that extracts all human readable text from the (presently) 450MB file…
…and just because it would have made Len laugh
How This Works In BitCoin, Alice gives money to Bob by issuing a sort of challenge “Whoever can sign a message with the public key that hashes to the following bytes, may claim this money.” Well, bytes are bytes Instead of pushing the hash of a public key (20 bytes), we push 20 characters of a testimonial
Side Effect This does cost BTC About 1.0BTC in total There’s minimums to transferring money This does destroy the money The network thinks somewhere, there must be a public key with a hash of “Len was our friend.” I am OK with this. It is the cyber equivalent of pouring one out for your homies.
Can we get higher bandwidth? BitCoin lets you send money to a public key directly, rather than its hash 10x increase from 20 bytes to 200 bytes This is not a bug BitCoin allows for extra data in a signature
Signature Expansion BitCoin works with small programs The program from the receiver is: “Put this signature and public key on a stack” The program from the sender is: “Take the signature and public key off the stack and make sure they’re good.” The receiver can put extra stuff on the stack, and yes, it still works just fine This is in fact a bug that is visible purely from being pedantic about the English Language
Illicit Signature Expansion Signatures can’t cover themselves Chicken and Egg So signatures also don’t cover the presence or absence of additional data within themselves Block appending does cover additional data But there is time between when transactions are first created and emitted, and when they’re included in a block append So it turns out anyone can add additional data to an otherwise valid transaction
Limited Usefulness If you’re just some random relay, gossiping the information, you have to compete with the real version Transaction fees limit you to about 1KB of embed per 0.01 BTC (14 cents) This does not apply to you if you generate the signature with extra data, because then you can pay fees This does not apply to you if you calculate the block – you can include as much as you want, up to present 2MB limit, and force everyone else to carry Still better than 20 bytes per 0.01 BTC Yes, Travis, bitcoinfs is totally possible
What about Anonymity? Looking at blockexplorer.com Transaction Sources: These are all the same ID Transaction Dst’s:One of these IDs is (likely) all of the IDs on the left
Problem: Linking pseudonyms isn’t enough Reid/Harrigan get lucky One BitCoin source publishes the IPs it gives money to Another user posted to a forum seeking donations to a linked ID They’re linking pseudonyms within BC, but they’re not linking to IP via out of band processes The published audit trail is noisy and deniable “Naturally, much of this analysis is circumstantial. We cannot say for certain whether or not these ﬂows imply a shared agency in both incidents. There is always the possibility of drawing false inferences.” Is there another source of data?
P2P! There are two sources of transaction information in BitCoin The “blocks” that have been set in stone The “loose transactions” waiting to be merged into blocks These (effectively) always refer to a single identity Both are “gossiped” around the network Big relay race; Alice tells Bob and Charlie, Bob tells David and Eric, Charlie tells Frank and Gary
Subverting the Relay Race An attacker can just connect to every node in the public cloud at once “But that could take 50,000 connections!” Yeah, we can do that in Python now. Kernels don’t suck anymore (well as much). When you’re connected to every node, the first node to inform you of a transaction is the source of it “Done relay it because done done it” More or less true, and absolutely over time (Bonus: You can accelerate your own transactions, by relaying them to everyone yourself) BlitCoin – accelerated probing of BitCoin
Discovering Nodes Just scan the Internet on 8333/TCP Join the IRC channels! #bitcoin, #bitcoin00 to #bitcoin99 on LFNET BitBot Recursively ask every node about every other node it knows about “get_addr” message Can start from hardcoded seeds
Statement From Gavin Andreson, lead dev on BitCoin Bitcoin transactions are more private than credit card or PayPal transactions, but are less private than physical-world cash transactions. Unless you are very careful in the way you use Bitcoin (and you have the technical know-how to use it with other anonymizing technologies like Tor or i2p), you should assume that a persistent, motivated attacker will be able to associate your IP address with yourbitcoin transactions.
What about Tor? Tor indeed obfuscates IPs derived from outbound connections It does nothing if you’re still listening on 8333/tcp and somebody sweeps the net for you Bug filed in BitCoin to shut off listener when operating through Tor
What about unreachable nodes? Most are behind NAT, and only connect via outbound links The active inbound set is only 3000-8000 nodes So, you just create 3000-8000 nodes and you’re half the gossip network Probably only need a few hundred, since each node will collect ~7 peers and you only need one
Just how unreachable are they? Many users are behind wireless routers Routers implement NAT – outbound is easy, inbound is hard “Poor Man’s Firewall” Don’t mock it, it was more effective than real firewalling when it came out Most home routers implement UPNP – Universal Plug And Play UPNP allows nodes inside your network to ask the router to open up ports from the Internet BitCoin now supports doing this by default …but even if it didn’t…
How UPNP is supposed to work Internal hosts send a multicast message out via SSDP (Simple Service Discovery Protocol) 1900/UDP M-SEARCH Multicast Internal UPNP nodes – media players, routers, etc – respond w/ endpoints that can be twiddled via web services requests 1900/UDP NOTIFY Unicast Responses are sometimes just flooded out, in the absence of M-SEARCH SSDP NOTIFY messages are supposed to contain a randomized URL for UPNP messages to go to
Question UPNP is supposed to only work on internal interfaces “Hello Router, please let the outside world in.” It would be tragic if routers listened to UPNP on external interface as well.. “Hello Router, please let the outside world (read: me) in”
More Stats Not all listeners on 2869 are fully open Would require fixed UPNP endpoints, instead of the randomized ones Microsoft uses Many verified listeners though Hundreds of thousands to millions Entire countries have standardized NATs that are vulnerable
Your Princess Is In Another Castle Turns out there’s a speaker talking at DEFCON about just this very subject! I’m a little more careful about independent rediscoveries now Daniel Garcia found that there were open UPNP endpoints on the net last year Track 3, Friday, 17:00 ArmijnHemelalso did some great work upnp-hacks.org Also noted that sometimes UPNP was exposed to outside world, back in ~2007 Still true, unfixed
What about outside the consumer space? Corporate environments Less about BitCoin and UPNP More about web services and ACLs Are there ways past corporate ACLs? Access Control Lists “Access to this IP is constrained to the following range”
Ye Olde Trick IP Spoofing Just pretend to be a source IP vaguely near the target, and you’ll probably pass ACLs “But BCP’s!” Real world, IP spoofing is not hard, as long as you’re not virtualized IP spoofing – the one thing the cloud isn’t very good for
Is IP Spoofing Still Effective? Sure! Let me just pull this DNS trick out of the archive… Generate a query for “$RANDOM.attacker-domain.com” Send query to all IPs on a network, from various IPs that network might trust x.1.1.1 -> x.1.100.8 x.1.100.1 -> x.1.100.8 Response will go back to IP you don’t control – but first, the server will try to resolve $RANDOM.attacker-domain.com – from you! (Yes, this was another way to exploit that bug.) Granted, this only works for an obscure application like DNS and UDP…certainly nothing built on TCP
Understanding The Limits Of IP Spoofing Most modern protocols run over TCP, a reliable communication protocol 1) Alice sends Bob a SYN, containing a random sequence number 2) Bob replies with a SYN|ACK, containing both Alice’s sequence number, and his own sequence number 3) Alice replies to Bob with an ACK, containing both sequence numbers Data can be sent now Sequence numbers become a sort of “password” for all future traffic If Alice spoofs her IP, she doesn’t see Bob’s sequence number, so she can’t complete step 3
Sequence Numbers Didn’t Used To Be Random Obviously if you can guess a sequence number, you can blindly inject into sessions So, make them random? Problem: Connections are identified by source port, dest port, source IP, and dest IP 18.104.22.168:50000 -> 22.214.171.124:53 Sometimes, ports are recycled from one connection to the next What if a packet arrives from an old connection? It could look like it belongs in the new one! Fix this by having random sequence numbers, unless id is the same, then we go sequential in time to maximize distance
A Problem: Memory What if somebody just floods us with connection attempts? They don’t have to remember all of our “passwords” They don’t even need to use their own IP addresses We need to remember all of theirs This is a SYN flood, and it’s old as dirt
Solution: SYN Cookies Specified (if not invented) by Dan Bernstein in 1999 Finally on by default in Linux in 2008 The “password” turns into a challenge “If you can send this back to me, I’ll accept your data” Uses 3/4ths of the sequence number (24 bits) to store the hash of a secret and the four tuple, 5 bits for time, three bits for connection metadata 5 bits is exposed to everyone publicly, 3 bits don’t matter, so there’s 24 bits of security
Alas Average of 8 million packets to bypass SYN cookies May be less, due to fudge factors Of course DJB knew this “No matter what function is used, the attacker will succeed in a connection forgery after millions of random ACK packets.” But it’s a different reality than 1999 Sending 8M packets is easy now, we has the bandwidth Forged connections have arbitrary sources They get through your ACLs They can contain arbitrary Web Services payloads Definitely REST, maybe SOAP
Are you safe if you disable SYN cookies? Well, not on Linux Linux is RFC 1948 compliant for the lower 24 bits Uses MD4, but still Upper 8 bits? Counter, starting at 0, increments every five minutes Sequential Shared between inbound and outbound connections So, you send a query from your actual IP once or twice to find the offset, and blindly spoof a SYN and a payload-containing ACK After 8M tries, you win
Impact on RST attacks Tony Watson, “Slipping In The Window” Noticed that only one 32 bit “password” was required for Resets (RSTs) Noticed that the “password” only had to be in the “window” of valid data that could sequentially be sent Window describes how many bytes a sender is allowed to transmit without a receiver acknowledging Noticed that the “window” wasn’t even limited to 16 bits; was being expanded 5-8 bits more from “Window Scaling” 32-16-8 = 8 bits = 128 packets to kill a session on average New possibility: 32 - 16 – 8 – 8 = 0 bits = 1 packet will always work (assuming full sized window)
Beyond RST: Injection? RST handlers (usually) only check SEQ# (32 bits) ACK handlers however check both SEQ# and ACK# (64 bits) 64 – (16 bits from Alice window) – (16 bits from server window) = 32 bits 2B packets for 50% 32 bits – (5 bits from Alice window scaling) – (5 bits from Bob window scaling) = 22 bits 1M packets for 50% Uh oh 22 bits – (8 bits from Alice predictable high bits) – (8 bits from Bob predictable high bits) = 6 bits 16 packets for 50%
Difficulty: Ports Linux randomizes the source port of a new connection by default You don’t worry about this when you’re doing an ACL bypass, because you control the source port and the dest port You do have to worry about this when injecting into other sessions though 6 bits (from large windows and high bit disclosure) + 13 bits (port leakage) = 19 bits 250K packets for 50% injection even with port randomization Note that sometimes a TCP client sets its source port (DNS, BGP)
Status This is very old code in Linux Predates the check in history of LinusTorvalds They’re figuring out the right fix that won’t cause even more problems There are many potential wrong fixes that are even worse
A Digression RFC1948 is an interesting construction Sequential and ordered with the key Random and unpredictable without Can participate with either: Aprivate component (the secret, mixed in with the 4-tuple), able to generate all possible sequence numbers A public component (a sample sequence number), transmitted over the network, successfully received and retransmitted Public/private cryptography with nothing but a password? Clearly this is impossible Only possible here because of intersection of network security and crypto
To be clear Passwords are a bad idea They’re constantly being lost and forgotten and stolen They are responsible for 50% of compromises They increasingly look like l33tspeak, and this is not helping But, supposing we ignore all that…and assuming that we’re stuck with them…
An Old Challenge  How do we use a password to log into a system without that system learning our password? “We hash it!” You’re still giving the server your plaintext password, it just isn’t storing it If salt (random but public prefix) is omitted, attacker can precalculate hash->password database, notice when two users use the same one
“Send me the password hashed against $RANDOM” Digest/NTLM are more advanced versions Requires server to store plaintext password or password equivalent “We require knowledge of password to go from keypair to shared session secret” SPEKE/SRP Requires both client and server to run fairly obscure code – good luck getting either deployed
So… Is it possible (NOT ADVISABLE, OBVIOUSLY THIS IS A BAD IDEA) to build a system where the client only remembers a password, but the server: Stores nothing but a normal public key Deploys nothing but a standard challenge to make sure the client has the matching private key, derived unilaterally from a password? In other words… Can we construct a keypair out of a password?
A Foreboding Question What vulnerability impacted all asymmetric cryptosystems, be they RSA, DSA, or ECC?
…ok… Debian Specifically, a change to the way Debian calculated random numbers in OpenSSL It always calculated the same numbers All asymmetric cryptosystems use entropy as follows: Collect: Grab random bits Permute: Alter those bits until they meet certain requirements. Then emit a public/private keypair Predictable entropy == Predictable keypairs, no matter the algorithm
Uh Oh What if we turned the Debian bug…into a feature? Cryptography is all about constructions We have hash functions, stream ciphers, block ciphers, all of which can be constructed from eachother Note too this is often a bad idea We know how to take a password and construct an everlasting stream of psuedorandom numbers from it “Predictable Entropy” We can even do so in a way that is Hard, in both CPU time and Memory scrypt
A TRULY TERRIFYING AND UGLY AND BAD AND AWESOME IDEA What if you make the output of a password-seeded PRNG, the input to an asymmetric key generator? You’d end up with 2048 bit RSA keypairs, with a “trapdoor” in the form of a password This isn’t theoretical
Enter Phidelius Harry Potter, properly understood, is a story about the epic consequences of losing one’s password. Fidelius is how passwords fail in the HP universe, so… Phidelius hooks /dev/random, /dev/urandom, OpenSSL’s Random functions, and a few other tidbits to provide predictable entropy where it isn’t expected Uses a modified version of scrypt to require ~1 second processing time, and about 256MB of RAM, per crack attempt No GPU fun for you Can be seeded with a file as well
What Phidelius Gives You Generic, multi-application support for predictably generating keypairs from passwords ssh-keygen for SSH keys openssl for certificates Phreebird for DNSSEC keys Allows message signing, message encryption, client certificate authentication, etc. with nothing but a password Solves the “log in with a password, without the system learning your password” problem thoroughly, without you having to store anything anywhere With BitCoin, you could literally give money to the bearer of a word, or a photo.
No pain server side All time/memory hard requirements are limited to the client – the server just implements completely standard crypto
Primary Issues With Phidelius The obvious ones It uses passwords Passwords tend to be low entropy The not obvious ones It’s fragile An explicit scheme to use a password to seed an RSA key, for instance, fixes parameters like “How sure do we need to be that this number is prime?” As an implicit scheme, it depends on assumptions that happen to be encoded into a particular version of a particular key generator It’s hard to salt All users of the common password “password” have the same public/private keypair!
Salting with Phidelius Basic idea is that the private key is computed not just from the password, but from the public key as well The public key is then the carrier of the salt Works for protocols like SSL, fails for protocols like PGP Also a good channel of parameters, like “scrypt doesn’t need to use 256MB of RAM” Can be implemented with no magic code on server, but client needs magic code to embed metadata in public key, and to extract said magic during computation of private
But, to get back to TCP/IP… Lets talk about one last thing we can do with networks. We can find biased network policy, no matter how subtle If biased networks are affecting you, this gives you proof. If you are biasing your network, this is how proof will come.
The Topology Link 1 Google.com Link 2 ISP Client Home Router Microsoft.com Yahoo.com Link 3
Understanding the Target 1) “Magic box” is deployed within ISP network, in front of all links 2) Box matches packets to policies, and applies different rules to different packets Can be stateless – “Do I like this packet?” Can be stateful – “This packet is part of a flow. Do I like this flow?” 3) Policies can be anything and can do anything Limit maximum bandwidth Increase minimum latency Alter content
The Problem With Subtlety Say bing.com is 50ms slower than google.com Is this because of the ISP? Or is this because google.com has better hosting? There are many reasons why bing.com might be slower than google.com, granting plausible deniability
Requirement: Normalization Whether the tester is accessing bing.com or google.com, the network path should be identical (or at least uncorrelated) We call this normalization That way, any changes would be the result not of path, but of policy (presumably, and ultimately detectably, at the ISP)
Simple Normalization: HTTP Policy: “All flows associated with a HTTP request w/ Host: www.bing.com should be delayed by 50ms” Detector: Configure a single server to accept HTTP requests for www.bing.com, www.google.com, etc. Then set the client to use it as a proxy server If traffic from the proxy server is faster for some names, than it is for others, you’ve just detected a HTTP-biased policy!
The Problem 1) This is very protocol dependent HTTP can be made to do this at low work Other protocols require lots of work to implement/emulate 2) The policy can always be specific to IP addresses Sniff DNS to learn which IPs to cover Doesn’t matter how many hundreds of test servers you have, if policies are only applied to genuine bing.com or google.com servers
The Solution: N00ter N00ter: The Network Normalization Engine Start with a VPN Traffic is pushed from the Client to a Broker An IP associated with the Broker contacts Servers, who reply to the Broker The Broker sends traffic back to the Client Normally, the ISP sees nothing because traffic between Client and Broker is encrypted Now, instead of encrypting traffic from Broker to Client, send it back to the client Unencrypted Spoofed, as if there was no Broker
SPOOFING ALL THE INTERNET We want the ISP to see our return traffic We’re trying to trigger the response, that would normally be reserved for Bing/Google, for our normalized test server Policy engine can’t tell, because we’re impersonating the real entities Traffic took the same path Traffic came from the same source Why else would we see different Quality of Service?
What About Forward Flow? The policy engine in this scenario doesn’t see traffic from Client to Server That’s encrypted, VPN style What if it just didn’t trigger the filtering policy if it didn’t see both sides of the conversation?
ENTER ROTO N00TER Normal N00ter: Spoof the server to the client RotoNooter: Spoof the client to the server Sample A: Client talks directly to the real Google ISP sees SYN Sample B: Client talks to real Google by way of Broker, who spoofs the Client. Google replies directly. ISP does not see SYN Both samples have the same path! If they have different performance characteristics, it must be because of the segment of the network that no longer sees client traffic – the ISP!
Catch-22 If ISP applies policy to half-flows, N00ter can differentiate the performance of the spoofed half flow of Google, versus the spoofed half flow from Bing If ISP applies policy only to full flows, RotoNooter can differentiate the performance of the full flow to and from the real Google, versus the half flow from the real Google Either way, N00ter Wins This is the endgame. Biased policies might as well be transparent, because they’re not going to be deniable.
Retaining Full Flows Suppose you really want the ISP to see bidirectional traffic Advantage: Triggers all policies. Also, opens up listeners for NATs, that might be inconvenient to get around Disadvantage: If the ISP sees Client->Server traffic, then the Server sees Client->Server traffic It may reply, interfere, complain, etc.
Strategy 1: Bad TCP Checksum Client can tunnel valid traffic to Broker, and push packets with invalid TCP checksums to Server Advantage: Invalid TCP checksums are ignored. Server won’t interfere. NAT almost certainly won’t check sums; Policy engine might not Disadvantage: Policy engine could. NAT might fix sums. Catch-22 with checksums If policy is disabled when checksums are bad, policy can be proven by having Broker provide steady stream of good sums while ISP sees the bad ones
Strategy 2: Low TTL Client can send traffic to Server with TTL that causes packets to be dropped in the middle of the Internet Advantage: Legitimate traffic. Disadvantage: Policy could note low TTL. Router may drop sessions from ICMP Time Exceeded messages. Sort of a router DoS. Another Catch-22: Can probably even figure out which hop the policy engine lives at, by when precisely the flow policy shifts
Strategy 3: The Silent Splice When a TCP stack receives a message not associated with an active socket, it’s supposed to RST But many servers have firewalls that silently ignore unassociated messages For Security! We can have the Client complete a three way handshake with a server, snipe the connection with a RST from the Broker, and then splice a connection between Broker and Server, with what Client (and ISP) think is a connection between Client and Server Packets from Client to Server will be ignored by server Packets from Server to Client are actually spoofed by Broker Policy Engine sees client talking to server. Policy Engine sees server talking to client. You can’t explain that. 100% Perfect Bidirectional Flow
A Bit Of Warning If you’re passively monitoring network traffic, be aware that these techniques do mean a malicious client can make it look like they’re having a conversation with anyone Particularly if the server ignores unassociated traffic Keep complete traffic logs! Validate checksums Check TTLs
Where N00ter Is Now Emulates half flows at present Very very fast (written to the old LibPaketto code!) Supports anything that runs over IP If you want to know whether a network prefers XBOX360 traffic to Playstation 3 traffic, this’ll tell you. N00ter is extremely neutral – It Just Works Again, it’s just a VPN that exposes Server->Client traffic in the hopes it’ll get filtered
Summary Networks are neat BitCoin isn’t anonymous UPNP sometimes exposes itself to the outside world ACLs can be bypassed using some interesting sequence number tricks and large number of packets Passwords can be used to seed asymmetric crypto, though they probably shouldn’t Subtle net neutrality hacks are doomed. Transparency or bust. Research hosting thanks to N2K of 3Crowd and Doxx of LyonLabs Anyone want to do some release engineering for me?