Advertisement
Advertisement

More Related Content

Similar to Cats And Dogs Living Together: Langsec Is Also About Usability(20)

Advertisement

Cats And Dogs Living Together: Langsec Is Also About Usability

  1. Cats and Dogs Living Together: Langsec Is Also About Usability Meredith L. Patterson SEC-T 2014 Stardate 68179.7
  2. Forward Observer’s Log, Science Vessel Beagle “The worse your logic, the more interesting the consequences to which it gives rise.” -- Bertrand Russell
  3. What is usability for devs? • IDEs? • Code completion? • Developers’ main tools are libraries • Nobody’s really studied what makes APIs “good” or “bad” to use
  4. “Sooner or later you’re going to have to stop throwing new functions into that menu and clean it up.” -- Jonathan Korman
  5. The Prime Directive “Whenever mankind interferes with a less developed civilisation, no matter how well intentioned that interference may be, the results are invariably disastrous.” -- Jean-Luc Picard This is why we can’t get rid of PHP.
  6. Image © “Melonpool” from the TrekBBS forum
  7. The Second Directive Computation must be composable to be reliable.
  8. cf. Alter and Oppenheimer, “Uniting the Tribes of Fluency to Form a Metacognitive Nation,” 2009
  9. Chunking we’ll never remember this, will we nope cf. George A. Miller, “The Magical Number Seven, Plus or Minus Two,” 1956
  10. Semantics-First Design • Every problem has a domain • Every problem also has a range – What are the effects of success? – What are the effects of failure? • Model how domain values map to range values • Then invent domain-meaningful syntax to describe the mappings cf. Erwig and Walkingshaw, “Semantics First! Rethinking the Language Design Process,” 2011
  11. cf. Georgiev et al, “The Most Dangerous Code in the World”, 2012
  12. When a yes-or-no question isn’t • CURLOPT_SSL_VERIFYHOST – Sounds like a boolean, right? – Nope! 2 = verify, 1 = “a CN exists”, and TRUE = 1 – “Future versions will stop returning an error for 1 and just treat 1 and 2 the same” – 11 releases later, it’s still there • But now I know it’s a valid cert, right? – Only if CURLOPT_SSL_VERIFYPEER=TRUE too
  13. That something has two sides…
  14. Fine, I’ll use plain OpenSSL • Great. Did you set SSL_VERIFY_PEER? – And did you set a verify_callback with it? • Either way, did you call SSL_get_verify_result()? • Gotta validate that host yourself, too • GnuTLS is no better – Returns negative values for some errors – But 0 for others, like self-signed certs!
  15. Takeaway you’re not helping
  16. It Gets Better • Some libraries have been around long enough to watch their interfaces evolve • C++ STL got a lot better in C++11 – They had to add move semantics to do it, but threading is awesome now – Confusing auto_ptr gone; shared_ptr and unique_ptr do what they say on the tin • But let’s talk about a security library.
  17. You call this making it easy? gpgme_ctx_t ctx; gpgme_error_t err; gpgme_data_t cipher, plain; gpgme_engine_info_t engine; [~20 lines of boilerplate] err = gpgme_op_decrypt(ctx, cipher, plain); if (err == GPG_ERR_NO_ERROR) { [at least 8 more lines of boilerplate, just to see what you decrypted] } ... Python has to be better, right?
  18. …maybe? • ISConf GPG.py: wraps the gpg binary • Very opinionated about: – How keyrings are named – Which options various operations use • Leaves out a lot of functionality – Want a detached signature? Too bad “WHO PUTS UNITTESTS IN A TRY/EXCEPT BLOCK WHICH CATCHES ALL EXCEPTIONS?!”
  19. 2013: finally something usable • All the command-line functionality! • Public interface, no need to touch the rest • Sanitizes untrusted inputs! • kwargs for all the things! • All in all, much more pythonic • THANK YOU ISIS, WE LOVE YOU
  20. “I believe that usability is a security concern; systems that do not pay attention to the human interaction factors involved risk failing to provide security by failing to attract users.” -- Len Sassaman
  21. Credits • @skry • Jonathan Korman • The education panel at SLE2014, especially: – Massimo Tisi – Eric Walkingshaw and Martin Erwig • The GIMP and G’MIC • Paramount Pictures (and everyone at TrekCore) • My sisters the elementary school teachers

Editor's Notes

  1. Humans are really, really bad at reasoning about humans – including themselves. Even really experienced designers get surprised all the time by how users respond to the interfaces they develop. UX has become data-driven, because people fool themselves constantly about what they think they want, and only actual usage data can confirm whether the reasoning that drove your decisions was valid or flawed. Not really even confirm; more like hint. “The street finds its uses for things – uses the makers never intended.” Tools that weren’t intended for contexts where security matters still end up getting used in life-or-death situations all the time; this might have been news when the Arab Spring broke out, but nobody has an excuse anymore. But even with security-sensitive use cases popping up everywhere from the Ukraine to Cupertino, we don’t have the luxury of A/B testing to empirically determine whether the tools we build provide the security properties we think they do. We have to get it right the first time.
  2. There are certain arguments…
  3. … that we keep having again and again and again….
  4. … and I’m getting really tired of them. Security vs. usability is probably the *oldest*.
  5. When people talk about usability in a development context, they’re usually talking about IDEs, code completion, and so on. Not here. In an enterprise (har!) context, you often don’t get to pick what language to use, but you do have degrees of freedom about what libraries you use Turns out there’s been next to no design research on what makes an API “good” or “bad” to use
  6. Tons of effort on graphical interfaces, next to none on text interfaces. But some of these insights translate. We talk about technical debt; Jonathan Korman talks about UX design debt. Sooner or later you’re going to have to stop adding methods to that API and refactor it into something people can remember how to use without having to look it up all the time.
  7. We *can* talk about what makes *tools made from language* “easy” or “hard” to use I’m not going to be able to speak decisively about that, because de gustibus est non disputandum There has been very little research on this as well, but we can draw insights from cognitive science and its applications in education. HOWEVER.
  8. UX WITHOUT USER RESEARCH IS NOT UX (then click) We need to do empirical research on iterating toward usability. Nadim Kobeissi started with usability, and has been iterating toward security since 2011, and *that’s actually working*. But it’s still dangerous. People in a hostile environment who have a risky tool they can use and a safe tool they can’t will use the risky tool every time. If you’re in Syria and your choice is between Facebook or not getting vital information to the people who need it, Facebook wins. I have friends here in Stockholm who used to give tech support to Syrian rebels. I say “used to” because one by one, the Syrians fell off the face of the net. We don’t know where they are or what happened to them. Let that sink in for a minute. We have libraries that we know can provide security properties that people want and need, but as we’ll see, the design of those libraries often makes it really difficult to use them in a way that does provide those properties. We can test tools built with these libraries in non-hostile environments, and in a few minutes we’ll talk about how one team actually did. This implies we can also iterate toward library usability in a non-hostile environment, and we should be.
  9. There’s a tendency among security practitioners to look down on people who don’t take security into account when they write code. When we come in and tell them “you need to be coding differently,” two things happen: they get upset, and they still get it wrong. It’s hard to stay mindful of multiple concerns at the same time, even when those concerns are not inherently in conflict. And I don’t think that security and usability are inherently in conflict, but I do think that we as security practitioners need to take a step back and observe the choices that regular developers make – which concerns they work hard to satisfy and which ones they kick to the curb – and think about why they make the tradeoffs they do. [click] If people use something that’s terrible in most ways, it’s because that tool is doing something that other tools aren’t.
  10. PHP is terrible for everything except: Getting up and running quickly – there’s less to configure than any other language, drop a template in the right directory and you’re done. “Hello World” is literally a file containing the text “Hello World.” Not leaking, since state that isn’t stored in a database or a memory cache is destroyed Hilariously, this means PHP is more referentially transparent than other web languages, and violates REST principles less In other words, PHP meets some of people’s concerns about how code is supposed to behave It just violates nearly all of our concerns about how code is supposed to behave Turns out, both of these matter a lot. When management cares about speed, “time to unblock” is your most important metric as a developer. And when there are enough other users who have gotten up and running, then gotten stuck the same place you have, someone has answered your question on StackOverflow.
  11. If you can’t rely on it, it isn’t secure. But this goes farther than that; computation must be composable in order to get the right answer in the first place. If process 1 transforms A into B, and process 2 transforms B into C, then you can compose them sequentially into a system that transforms A into C – but only if the processes operate in that order. If process 2 goes to transform B into C and there isn’t a B there yet, but process 2 just grabs whatever data it sees and assumes that’s a B, all bets are off. And if a third-party adversary put that data there, process 2 and the entire environment it’s running on is in trouble. If you have two computations that interact – one produces data that the other consumes, both write data to the same location, whatever – they are only composable if they don’t violate each other’s assumptions. Which is why in langsec we talk about boundaries of competence – those points of interaction where assumptions that must not be violated can be violated if one of the actors is malicious or even just sloppy. Keeping track of a lot of assumptions is hard. The biggest problem that API designers face is how to enable developers to manage that state. And while we don’t have any silver bullets, we do know a few things about brains work that can inform those design decisions. So let’s talk about those.
  12. You’re probably already familiar with the concept of fluency with a language. If you’re fluent in a language, you find it easy to understand and express things in that language – it doesn’t take very much work. Processing fluency refers to how much work it takes to process information. It is subjective. There are several kinds of processing fluency; the ones we care about are: Perceptual fluency – how easy is it to recognise a piece of information, especially based on what it looks like. When two pieces of information look too similar – like method names, or option names – perceptual fluency suffers. Retrieval fluency – how easy it is to remember a piece of information. This is affected by several cognitive biases, particularly recency bias, which is your tendency to remember the most recent thing you encountered, and the availability heuristic, which is your tendency to stop thinking as soon as you remember that most recent thing. This quickly becomes a self-fulfilling prophecy: learn something the wrong way once, do it the wrong way until you force yourself to stop. Decision fluency – how easy it is to make a decision. Having too many options, or options that are difficult to tell apart, makes it much harder to make decisions at all, much less the right one.
  13. Recognition vocabulary: the words that you already know the meanings of and can recognise immediately. Also known as sight vocabulary. In natural language, homonyms – words that are spelled the same, and sound the same, but have different meanings – cause confusion. This means it’s important to think about what you name things. (example: refactoring parser_project into ParserModel and PerParseContext. Sol suggests “ParseContext,” Milly says “no, that sounds like you’re parsing a context” – and she’s right. “Parse” is more available as a verb than as a noun, and we don’t want future maintainers to jump to the conclusion that it’s a verb here! So we fight the availability heuristic with a preposition, because “per” is almost always followed by a noun.) Being too specific can also cause confusion. .NET: URLPathEncode – you’d think it would encode a URL, right? Only up until the ?, because everything after that is the query. But people tend to think of a full URL as the “path” to the resource it locates. Microsoft had to call this out specifically in their ASP.NET best practices because so many people opened themselves up to XSS by only encoding the path of the URL and not the query parameters.
  14. People tend to remember things in groups, categorising them by the extent to which they’re interrelated. But if there are too many similar elements in a group, retrieval fluency suffers: unless you have some way to organise them into smaller and more closely related clusters, then nest those clusters into a hierarchical structure, you’ll have a hard time even remembering roughly how many elements there were. Effective working limit empirically seems to be somewhere between 5 and 9. Implications for large namespaces should be obvious – and yes, this totally contradicts “flat is better than nested” from the Zen of Python. But even the Zen of Python thinks we should be using more namespaces. OTOH, deep inheritance hierarchies create a related problem: sure everything’s organised, but the path back to the root is so long that it’s hard to remember which methods came from which superclass. And it’s worth studying whether implementing too many interfaces, or using too many mixins, creates problems.
  15. “Write the man page first” – good advice, but not the same as “come up with the syntax first.” Domain and range – you probably heard those in high school algebra class when you learned about functions. Same deal here.
  16. So we still only have a rough idea of what makes a good API, but if we restrict “bad” to “fails to provide users with the security guarantees it promises,” there’s actually been some science on that. 2 years ago, a team from the University of Texas and Stanford did an exhaustive review of how applications and other libraries use SSL implementations like OpenSSL and GnuTLS, data-transport libraries like cURL and Apache HTTPClient, and language modules like Python’s httplib. Everybody was doing it wrong. Not just the little “everybodies” like people building storefronts on top of Drupal – although they were certainly vulnerable, given that of the 14 shopping-cart modules they looked at, only 2 had cert validation turned on, and they were both for Google Checkout, which doesn’t exist anymore. Google Wallet replaced it, and although it requires HTTPS to send things like credit card numbers around, there’s no way for Google Wallet to know that that conn hasn’t been MiTMed. No, by “everybody” we’re talking about things like the Amazon EC2 Java library. Android push notifications. Amazon Flexible Payments. Paypal. EVERYTHING.
  17. CURLOPT_SSL_VERIFYHOST has to be set to 2 in order to check that the Common Name in the cert matches the server’s hostname. That’s all it verifies. Version 7.28.1 introduced the “throw an error if CURLOPT_SSL_VERIFYHOST=1” behaviour in November 2012. 7.38.0 came out a week and a half ago. There have been 10 minor version bumps and a point release in almost 2 years, Worse, there’s another option that affects how CURLOPT_SSL_VERIFYHOST behaves, and it works differently depending on what SSL library you build cURL against. CURLOPT_SSL_VERIFYPEER actually is a boolean, and it defaults to TRUE, but if someone switches it off for whatever reason, cURL no longer checks that the cert is authentic – only that the names match. That’s with OpenSSL. Build against NSS and set VERIFYPEER to false, and cURL won’t even check that the hostname matches the Common Name. Your users will never know. Again with the confusing names: to your average dev, a “peer” is another client like yourself, a “host” is a server. Yes, RFC5246 (TLS) calls everyone peers. That doesn’t make it obvious.
  18. Why would you ever want to only check that a host and CN match, and not check that the cert is authentic? Sure, in dev you might use a self-signed cert to get your code working before you pay good money for a cert; again, time to unblock is the crucial metric. But what’s blocking you is the libraries themselves, because they’ve established an invalid mapping between domain and range. The domain consists of certs and CAs. Those are the inputs. The range – the set of possible outputs when you check the authenticity of a cert – has at least four possibilities. The cert can have an invalid signature, which means you’re done, fail closed. It can be valid but self-signed, which means you can’t authenticate it against the PKI. Similarly, it can have a valid signature that chains back to a root you don’t have, which actually happened when I went to pay my Belgian taxes online for the first time. Or it can have a valid signature and a valid trust chain back to the root. Instead of reporting what actually happens so that devs can decide what subset of the range constitutes failure for their particular domain, and then whether to fail open or closed, libraries force devs to fail open in order to make any progress at all. Good luck remembering to switch that back to fail-closed.
  19. Now you have even more problems. Instead of a library managing them for you badly, you have to manage them all yourself. Per Georgiev et al, SSL_connect sets an error value if chain-of-trust verification fails, but if there is no callback, SSL_connect still succeeds if the error isn’t related to incorrect parsing. If you only check the return value, you don’t actually know what happened. Lynx misunderstood GnuTLS so badly that although they checked the tls_status code, which is analogous to the error value that OpenSSL sets, both checks for GNUTLS_CERT_SIGNER_NOT_FOUND were only reached if tls_status was negative. 0 is not negative.
  20. Providing too much choice is paralyzing. Inconsistent error reporting deludes people into thinking they’re safe when they aren’t. Providing too little choice is frustrating. People will turn off all the security features they have to in order to get their work done, and who has time to go turn that back on? If you want your library to be fully functional, make it express the circumstances that have resulted from its actions in a consistent manner, and let developers make their choice from there. And for crying out loud, stop hiding error codes behind what appear to be successful return values. People can only observe the principle of full recognition before processing if they know where to find what they have to recognise. If you make them look in more than one place, you’re setting them up for failure.
  21. The changes they had to make to C++ to support these STL changes reach way down into the language. In order to support move constructors and move assignment, they had to add an entirely new kind of reference – a reference to the right-hand side of an assignment operation. And they weren’t afraid to deprecate things! Not only auto_ptr; adding move semantics deprecates entire idioms that C++ users had become resigned to, like swapping a container with a temporary copy of itself to get rid of extra capacity or making heavyweight classes inherit from a useless but small base class to reduce overhead during temporary copy construction.
  22. GnuPG Made Easy made a lot of the same usability mistakes that OpenSSL did. It has its own squirrelly I/O abstraction layer; you have to set up a “context” god-object first; it lumps disparate data types like keys and usernames together under a god-struct where some members are only valid for certain types. Error handling is a little better, in that most functions return gpgme_error_t … but then all the operations you care about, like decryption or signing, have corresponding gpg_op_foo_result functions that return a gpgme_foo_result_t that does not contain the result. The actual result is in the context, but the only way to get it out of the context is to use the I/O abstraction layer functions that recapitulate the C stdio library, and if you do anything else to the context before retrieving the result_t, you can kiss that data goodbye. You didn’t need to know who that message was encrypted to, right? But it was 2000 when Werner Koch designed this API. And it’s C. Python has to be better, right?
  23. This example has been in pygpgme in almost exactly this form since revision 4, in January 2006. A lot of the boilerplate is hidden, but this is still the same idiom. It’s tightly coupled to the gpgme API, and not very pythonic. An older library, pyme, is just about as bad.
  24. I end up having the same problems with build systems all the time – people write them assuming that the way they do it is the way every right-thinking person ought to do it. [click] But were they necessarily all that right-thinking? Who left that comment, and where?
  25. Last year, Isis Lovecruft from the Tor Project went and rewrote GPG.py The interface surface is actually smaller now thanks to the refactoring.
  26. With many tasks, appearing to have accomplished something is just as good as having actually accomplished it. This is manifestly not the case for APIs. As a result, you need to not just make it easy to do what you want to do, but hard to use it wrong. APIs need to actively be difficult to use in ways that are dangerous to developers, because those dangers propagate out to end-users. We’ve learned some hard lessons about what makes APIs difficult to use correctly. We’re getting better at making APIs easier to use correctly. Now we need to figure out how to make them hard to use wrong.
Advertisement