Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Eat Your Vegetables
Data Security for Data Scientists
Welcome to Eat Your Vegetables!
Hope you're having a great confere...
2
Agenda
1. Agenda
2. Intro
3. Convincing Time
4. Security Concepts
5. Tools
6. Questions
The agenda - self referential, e...
3
Name:
Will Voorhees
Occupation:
Software Engineer
Favorite Color:
Who's this guy?! - Your SPEAKER
Pic from Halloween 201...
Orange
Twitter:
@will2041
Why are we here?
Remember this bit from Red vs Blue?
To talk about vegetables!
Vegetables are go...
45
Someone Wants Your Data
No, seriously. Someonewants your data.This is the predicate to everything
"Well duh, my data is...
6 . 1
Why should I care?
Obvious stuff - money
About a year ago, insurance company Lloyd's estimated $400 bil/year lost to...
6 . 2
HIPPAA
SOX
Trade Sanctions
(Government) contracts
Etc.
6 . 3
Analytical Confidence
Encryption/signing can provide a ...
6 . 4
Fun!
You're kidding...
Puzzles!
Red Team vs Blue Team competition
Caesar cipher
Some of the math is interesting
6 . 5
You got me!
That was filler.
I'm sorry
They're valid reasons, they're just not the most important reason
6 . 6
Trust
It's what's for dinner.
We're data stewards - everyone trusts us with data
Doesn't matter what data you have, ...
6 . 7
Trust Has a Cost
Governments lose national security - OPM (Office of Personnel Management), IRS
e-commerce sites lose...
6 . 8
Human Trust
Trust of people isn't as easily quantified
Target, Ashley Madison still in business - But what's the impa...
7
What can I do?
Good news: there's some easy stuff
Bad news: there's some really hard stuffEasy stuff is pretty easy, onc...
8
Patching can proactively save your butt
Doing it often means you know how to do it quickly
Quick response can be really ...
9 . 2
Access Control
Don't leave things open to the world!
Some restriction is better than no restriction
Accounts can onl...
9 . 3
TLS
a.k.a. SSL
TLS replaced SSL
Everyone still calls TLS "SSL"
What - authentication and encryption on connections
9 . 4
TLS Myths
Let's encrypt - https://letsencrypt.org
Cost
It's 2016 and Let's Encrypt is a thing
Performance impact is ...
9 . 5
Account Separation
Yes, that's actually a fruit, but I started getting desparate
Your backup user doesn't need write...
9 . 6
Short Lived Credentials
Just like passwords, you need to rotate your keys
Limit blast radius
STS hands out temporary...
9 . 7
Devs keep putting keys in
Github!
9 . 8
Scanners grab creds and spin up instances for Bitcoin mining, etc.
Shorted lived creds limit the blast radius
10
Hey...
11 . 1
Signatures
Provides that "no one messed with this"
guarantee
Teased with "Analytical Confidence"
Compliment to encry...
Signing vs hashing
Signing proves identity - hacker can just update a hash with the data
11 . 2
Encryption
But not really....
But there are other challenges...
11 . 3
Encrypted data is a pain
It's always going to be slower
Some tools just freak out...
11 . 4
Key management is a painMore data = more keys
New data should use a different key
A leaked key doesn't reveal all y...
11 . 5
Bummed out yet?
Yes, I know that's not a vegetable
It's got the "vegetable" tag on flickr... so remember the importa...
11 . 6
Tips
Decrypt, but be safe
Split it up
Work with metadata
Use the tools...
Not all doom and gloom
Minimize the amoun...
12 . 1
Tools!
You're not alone - lots of people care about security
There are low level libraries and high level tools
Eve...
12 . 2
High Level
JWT
python-jose
https://jwt.io/
https://github.com/mpdavis/python-joseStart at the top of the stack
JWT ...
12 . 3
JOSE
from jose import jws
signed = jws.sign({'a': 'b'}, 'secret', algorithm='HS256')
>>> 'eyJhbGciOiJIUzI1NiIsInR5c...
12 . 4
Low Level
PyCrypto
PyOpenSSL
https://github.com/dlitz/pycrypto
https://github.com/pyca/pyopenssl
Taking a step down...
12 . 5
Too scary!
cryptography https://github.com/pyca/cryptography
OK, maybe we went too far down the stack
Most of us do...
12 . 6
Example
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"My g...
12 . 7
Pickle
Popular, so they deserve a note
Be careful with pickles
12 . 8
Oh, wait a second...
class Payload(object):
""" Executes /bin/ls when unpickled. """
def __reduce__(self):
""" Run ...
12 . 9
Mitigations
Sign your pickles
Secure transfer
Don't pickle
Sign and verify pickles before unpickling
Trusted endpoi...
13 . 1
Providers
Changing topics entirely here...
All that is great, but why do it when someone can provide it for you?!
T...
13 . 2
Server Side Encryption
S3 lets you do server side encryption
Can have bucket policy to enforce
Prevents a data leak...
13 . 3
Client Side Encryption
s3-encryption https://github.com/bold eld/s3-encryption
You encrypt things before sending to...
13 . 4
Speaking of keys...
Key Management Service
You want keys, KMS gives you keys
Makes Amazon manage all the keys you'r...
14
Conclusion!
Security is important! Trust is priceless.
Do the basics - they are better than nothing
Python has lots of ...
15
Thanks/Promo Time
You ne people
District Data Labs - http://www.districtdatalabs.com
My first time speaking, so thanks f...
16
Questions?
Twitter: @will2041
Slides: http://bit.ly/2dBcgVx
http://www.slideshare.net/WilliamVoorhees1/eat-your-vegetab...
Upcoming SlideShare
Loading in …5
×

Eat Your Vegetables - Data Security for Data Scientists

500 views

Published on

Presentation for PyDataDC 2016

You've got data. Lots of it. You might not realize it, but people want to get their hands on that data. You probably don't want that, so let's go over a few things you can do to dissuade attackers from getting their grubby mitts on your hard processed datastore. We'll cover the obvious things (spoiler alert: encryption) and then move on to some advances techniques for keeping your data secure while still keeping it usable (that is to say, analyzable).

Published in: Software
  • Be the first to comment

  • Be the first to like this

Eat Your Vegetables - Data Security for Data Scientists

  1. 1. 1 Eat Your Vegetables Data Security for Data Scientists Welcome to Eat Your Vegetables! Hope you're having a great conference so far
  2. 2. 2 Agenda 1. Agenda 2. Intro 3. Convincing Time 4. Security Concepts 5. Tools 6. Questions The agenda - self referential, eh? Intro stuff Why this is important (if you're not convinced already) Some basic tips for security - NOT definitions Tools to make security easier Time for questions at the end Slides will be online
  3. 3. 3 Name: Will Voorhees Occupation: Software Engineer Favorite Color: Who's this guy?! - Your SPEAKER Pic from Halloween 2010 Been doing tech for 15 years, but real software development for about 5 Currently working in security org creating enterprise security tools
  4. 4. Orange Twitter: @will2041 Why are we here? Remember this bit from Red vs Blue? To talk about vegetables! Vegetables are good for you And by "vegetable", I mean security You've got to EAT YOUR VEGTABLES, just like mom said Vegetables can be tasty
  5. 5. 45 Someone Wants Your Data No, seriously. Someonewants your data.This is the predicate to everything "Well duh, my data is awesome!" Attackers are interested in all kinds of data for all kinds of reasons Even Pokemon Go accounts have value It's not always monetary - 2015 Ashley Madison as an example "Hacktivism"
  6. 6. 6 . 1 Why should I care? Obvious stuff - money About a year ago, insurance company Lloyd's estimated $400 bil/year lost to hacking So what other reasons do I have?
  7. 7. 6 . 2 HIPPAA SOX Trade Sanctions (Government) contracts Etc. 6 . 3 Analytical Confidence Encryption/signing can provide a "no one touched this" guarantee Nice out of box benefit of adding security Nothing like re-running a model on data that's changed and freaking out
  8. 8. 6 . 4 Fun! You're kidding... Puzzles! Red Team vs Blue Team competition Caesar cipher Some of the math is interesting
  9. 9. 6 . 5 You got me! That was filler. I'm sorry They're valid reasons, they're just not the most important reason
  10. 10. 6 . 6 Trust It's what's for dinner. We're data stewards - everyone trusts us with data Doesn't matter what data you have, someone trusts you with it We are ultimately responsible for our data Magic Information Security elves won't save us
  11. 11. 6 . 7 Trust Has a Cost Governments lose national security - OPM (Office of Personnel Management), IRS e-commerce sites lose sales Remember that money thing? I lied! Journal of Cyber Security says a breach costs as much as the defense It's cheaper to get hacked So maybe it's not about money...
  12. 12. 6 . 8 Human Trust Trust of people isn't as easily quantified Target, Ashley Madison still in business - But what's the impact? This is all very murky - needs more research In absence of data, do what's right
  13. 13. 7 What can I do? Good news: there's some easy stuff Bad news: there's some really hard stuffEasy stuff is pretty easy, once you learn it Hard stuff is really hard, even after you learn it Think Heartbleed from 2014 for OpenSSL Buffer overflow bug let attacker get memory dump
  14. 14. 8 Patching can proactively save your butt Doing it often means you know how to do it quickly Quick response can be really important - think heartbleed or shellshock 9 . 1 The Easy Stuff I claim that the easy stuff is pretty easy! Sad fact: doing the basics makes you better than a lot of companies
  15. 15. 9 . 2 Access Control Don't leave things open to the world! Some restriction is better than no restriction Accounts can only do certain things This is what keeps your intern from deleting your data lake Let's use Nissan as an example Leaf completely open to internet for physical control Minimum bad PR, maximum loss of life
  16. 16. 9 . 3 TLS a.k.a. SSL TLS replaced SSL Everyone still calls TLS "SSL" What - authentication and encryption on connections
  17. 17. 9 . 4 TLS Myths Let's encrypt - https://letsencrypt.org Cost It's 2016 and Let's Encrypt is a thing Performance impact is negligible Gmail to SSL -> No special tuning, less than 1% CPU and ms of latency
  18. 18. 9 . 5 Account Separation Yes, that's actually a fruit, but I started getting desparate Your backup user doesn't need write access to your master DB Whole companies have been lost because they used one account (Code Spaces) Minimize blast radius Backups on a separate account!
  19. 19. 9 . 6 Short Lived Credentials Just like passwords, you need to rotate your keys Limit blast radius STS hands out temporary credentials Short lived because...
  20. 20. 9 . 7 Devs keep putting keys in Github!
  21. 21. 9 . 8 Scanners grab creds and spin up instances for Bitcoin mining, etc. Shorted lived creds limit the blast radius 10 Hey, that's just general security stuff! What about big data?!Take a Breather All that stuff gates access to the data Even if you do nothing else, this is your first line defense But yes, let's talk about data
  22. 22. 11 . 1 Signatures Provides that "no one messed with this" guarantee Teased with "Analytical Confidence" Compliment to encryption
  23. 23. Signing vs hashing Signing proves identity - hacker can just update a hash with the data 11 . 2 Encryption But not really... Cryptography Engineering: Design Principles and Practical Applications First thing people think of Not an "End All Be All" People think slapping on encryption solves all the security issues Really hard to get right Confidentiality vs integrity AES - some modes provide integrity, others don't Waaay more to this than I can cover - Google or book
  24. 24. But there are other challenges... 11 . 3 Encrypted data is a pain It's always going to be slower Some tools just freak out at the thought - Bye bye grep So we really want to work with unencrypted data But for minimal time and only in certain places Tooling can help with this - but it requires effort Callback to companies going out of business, etc.
  25. 25. 11 . 4 Key management is a painMore data = more keys New data should use a different key A leaked key doesn't reveal all your data Now you have many keys to manage Keep keys somewhere else! Reference to key used to encrypt should be kept with data S3 metadata can keep a key reference Key serials - don't forget 'em
  26. 26. 11 . 5 Bummed out yet? Yes, I know that's not a vegetable It's got the "vegetable" tag on flickr... so remember the importance of correct tagging
  27. 27. 11 . 6 Tips Decrypt, but be safe Split it up Work with metadata Use the tools... Not all doom and gloom Minimize the amount of time data is unencrypted When actually working with it, keep it somewhere safe You don't need all data for everything. Split things up If you don't need the actual data, just work with metadata and avoid encryption all together Speaking of tools...
  28. 28. 12 . 1 Tools! You're not alone - lots of people care about security There are low level libraries and high level tools Everyday new tools that make security easier are being developed Do NOT roll your own crypto
  29. 29. 12 . 2 High Level JWT python-jose https://jwt.io/ https://github.com/mpdavis/python-joseStart at the top of the stack JWT = JSON Web Token JOSE = JavaScript Object Signing and Encryption Does signing, but no encryption Can be used for powerful web AuthN/AuthZ Encryption via TLS for connections
  30. 30. 12 . 3 JOSE from jose import jws signed = jws.sign({'a': 'b'}, 'secret', algorithm='HS256') >>> 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhIjoiYiJ9.jiMyrsmD8AoHWeQgmxZ5yq8z0lXS67_ Signed JSON
  31. 31. 12 . 4 Low Level PyCrypto PyOpenSSL https://github.com/dlitz/pycrypto https://github.com/pyca/pyopenssl Taking a step down the stack here... Both provide low-level crypto operations You have the POWER!
  32. 32. 12 . 5 Too scary! cryptography https://github.com/pyca/cryptography OK, maybe we went too far down the stack Most of us don't need low level primitives By the same folks doing PyOpenSSL Goal is to have human friendly crypto
  33. 33. 12 . 6 Example from cryptography.fernet import Fernet key = Fernet.generate_key() f = Fernet(key) token = f.encrypt(b"My giant binary blob") f.decrypt(token) Data encryption and decryption is easy!
  34. 34. 12 . 7 Pickle Popular, so they deserve a note Be careful with pickles
  35. 35. 12 . 8 Oh, wait a second... class Payload(object): """ Executes /bin/ls when unpickled. """ def __reduce__(self): """ Run /bin/ls on the remote machine. """ return (subprocess.Popen, (('/bin/ls',),)) Example from Travis Cunningham Unpickling executes code on box
  36. 36. 12 . 9 Mitigations Sign your pickles Secure transfer Don't pickle Sign and verify pickles before unpickling Trusted endpoints that allow changes in between don't work
  37. 37. 13 . 1 Providers Changing topics entirely here... All that is great, but why do it when someone can provide it for you?! Terribly biased to AWS, so we're going to focus on that, but a lot of this applies to any provider
  38. 38. 13 . 2 Server Side Encryption S3 lets you do server side encryption Can have bucket policy to enforce Prevents a data leak from revealing everything Great for regulatory compliance, but trusts Amazon wholly Although they are trustworthy More than 50% of IT professionals don't fully trust providers to not leak data If you're paranoid...
  39. 39. 13 . 3 Client Side Encryption s3-encryption https://github.com/bold eld/s3-encryption You encrypt things before sending to storage See AmazonS3EncryptionClient Key idea: you can add security to existing libraries! boto + cryptography = cool Transparent/easy security gets people onboard
  40. 40. 13 . 4 Speaking of keys... Key Management Service You want keys, KMS gives you keys Makes Amazon manage all the keys you're using for encryption KMS keeps the key and limits access as you see fit Here again, Java SDK has a full feature set to emulate Direct envelope (key+data) encryption method
  41. 41. 14 Conclusion! Security is important! Trust is priceless. Do the basics - they are better than nothing Python has lots of security tools Providers can help Thank goodness he's done...
  42. 42. 15 Thanks/Promo Time You ne people District Data Labs - http://www.districtdatalabs.com My first time speaking, so thanks for being my inaugural audience Feedback welcome! DDL is DC based data science research group Come see what we're up to! Talks in this room at 11:30, 1:15, and 3:00 tomorrow
  43. 43. 16 Questions? Twitter: @will2041 Slides: http://bit.ly/2dBcgVx http://www.slideshare.net/WilliamVoorhees1/eat-your-vegetables-data-security-for-data-scientists LETTUCE go out on a good note with some questions

×