SlideShare a Scribd company logo
1 of 43
Download to read offline
1
Eat Your Vegetables
Data Security for Data Scientists
Welcome to Eat Your Vegetables!
Hope you're having a great conference so far
2
Agenda
1. Agenda
2. Intro
3. Convincing Time
4. Security Concepts
5. Tools
6. Questions
The agenda - self referential, eh?
Intro stuff
Why this is important (if you're not convinced already)
Some basic tips for security - NOT definitions
Tools to make security easier
Time for questions at the end
Slides will be online
3
Name:
Will Voorhees
Occupation:
Software Engineer
Favorite Color:
Who's this guy?! - Your SPEAKER
Pic from Halloween 2010
Been doing tech for 15 years, but real software development for about 5
Currently working in security org creating enterprise security tools
Orange
Twitter:
@will2041
Why are we here?
Remember this bit from Red vs Blue?
To talk about vegetables!
Vegetables are good for you
And by "vegetable", I mean security
You've got to EAT YOUR VEGTABLES, just like mom said
Vegetables can be tasty
45
Someone Wants Your Data
No, seriously. Someonewants your data.This is the predicate to everything
"Well duh, my data is awesome!"
Attackers are interested in all kinds of data for all kinds of reasons
Even Pokemon Go accounts have value
It's not always monetary - 2015 Ashley Madison as an example
"Hacktivism"
6 . 1
Why should I care?
Obvious stuff - money
About a year ago, insurance company Lloyd's estimated $400 bil/year lost to hacking
So what other reasons do I have?
6 . 2
HIPPAA
SOX
Trade Sanctions
(Government) contracts
Etc.
6 . 3
Analytical Confidence
Encryption/signing can provide a "no one touched this" guarantee
Nice out of box benefit of adding security
Nothing like re-running a model on data that's changed and freaking out
6 . 4
Fun!
You're kidding...
Puzzles!
Red Team vs Blue Team competition
Caesar cipher
Some of the math is interesting
6 . 5
You got me!
That was filler.
I'm sorry
They're valid reasons, they're just not the most important reason
6 . 6
Trust
It's what's for dinner.
We're data stewards - everyone trusts us with data
Doesn't matter what data you have, someone trusts you with it
We are ultimately responsible for our data
Magic Information Security elves won't save us
6 . 7
Trust Has a Cost
Governments lose national security - OPM (Office of Personnel Management), IRS
e-commerce sites lose sales
Remember that money thing? I lied! Journal of Cyber Security says a breach costs as much as the defense
It's cheaper to get hacked
So maybe it's not about money...
6 . 8
Human Trust
Trust of people isn't as easily quantified
Target, Ashley Madison still in business - But what's the impact?
This is all very murky - needs more research
In absence of data, do what's right
7
What can I do?
Good news: there's some easy stuff
Bad news: there's some really hard stuffEasy stuff is pretty easy, once you learn it
Hard stuff is really hard, even after you learn it
Think Heartbleed from 2014 for OpenSSL
Buffer overflow bug let attacker get memory dump
8
Patching can proactively save your butt
Doing it often means you know how to do it quickly
Quick response can be really important - think heartbleed or shellshock
9 . 1
The Easy Stuff
I claim that the easy stuff is pretty easy!
Sad fact: doing the basics makes you better than a lot of companies
9 . 2
Access Control
Don't leave things open to the world!
Some restriction is better than no restriction
Accounts can only do certain things
This is what keeps your intern from deleting your data lake
Let's use Nissan as an example
Leaf completely open to internet for physical control
Minimum bad PR, maximum loss of life
9 . 3
TLS
a.k.a. SSL
TLS replaced SSL
Everyone still calls TLS "SSL"
What - authentication and encryption on connections
9 . 4
TLS Myths
Let's encrypt - https://letsencrypt.org
Cost
It's 2016 and Let's Encrypt is a thing
Performance impact is negligible
Gmail to SSL -> No special tuning, less than 1% CPU and ms of latency
9 . 5
Account Separation
Yes, that's actually a fruit, but I started getting desparate
Your backup user doesn't need write access to your master DB
Whole companies have been lost because they used one account (Code Spaces)
Minimize blast radius
Backups on a separate account!
9 . 6
Short Lived Credentials
Just like passwords, you need to rotate your keys
Limit blast radius
STS hands out temporary credentials
Short lived because...
9 . 7
Devs keep putting keys in
Github!
9 . 8
Scanners grab creds and spin up instances for Bitcoin mining, etc.
Shorted lived creds limit the blast radius
10
Hey, that's just general
security stuff!
What about big data?!Take a Breather
All that stuff gates access to the data
Even if you do nothing else, this is your first line defense
But yes, let's talk about data
11 . 1
Signatures
Provides that "no one messed with this"
guarantee
Teased with "Analytical Confidence"
Compliment to encryption
Signing vs hashing
Signing proves identity - hacker can just update a hash with the data
11 . 2
Encryption
But not really...
Cryptography Engineering: Design Principles and Practical
Applications
First thing people think of
Not an "End All Be All"
People think slapping on encryption solves all the security issues
Really hard to get right
Confidentiality vs integrity
AES - some modes provide integrity, others don't
Waaay more to this than I can cover - Google or book
But there are other challenges...
11 . 3
Encrypted data is a pain
It's always going to be slower
Some tools just freak out at the thought - Bye bye grep
So we really want to work with unencrypted data
But for minimal time and only in certain places
Tooling can help with this - but it requires effort
Callback to companies going out of business, etc.
11 . 4
Key management is a painMore data = more keys
New data should use a different key
A leaked key doesn't reveal all your data
Now you have many keys to manage
Keep keys somewhere else!
Reference to key used to encrypt should be kept with data
S3 metadata can keep a key reference
Key serials - don't forget 'em
11 . 5
Bummed out yet?
Yes, I know that's not a vegetable
It's got the "vegetable" tag on flickr... so remember the importance of correct tagging
11 . 6
Tips
Decrypt, but be safe
Split it up
Work with metadata
Use the tools...
Not all doom and gloom
Minimize the amount of time data is unencrypted
When actually working with it, keep it somewhere safe
You don't need all data for everything. Split things up
If you don't need the actual data, just work with metadata and avoid encryption all together
Speaking of tools...
12 . 1
Tools!
You're not alone - lots of people care about security
There are low level libraries and high level tools
Everyday new tools that make security easier are being developed
Do NOT roll your own crypto
12 . 2
High Level
JWT
python-jose
https://jwt.io/
https://github.com/mpdavis/python-joseStart at the top of the stack
JWT = JSON Web Token
JOSE = JavaScript Object Signing and Encryption
Does signing, but no encryption
Can be used for powerful web AuthN/AuthZ
Encryption via TLS for connections
12 . 3
JOSE
from jose import jws
signed = jws.sign({'a': 'b'}, 'secret', algorithm='HS256')
>>> 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhIjoiYiJ9.jiMyrsmD8AoHWeQgmxZ5yq8z0lXS67_
Signed JSON
12 . 4
Low Level
PyCrypto
PyOpenSSL
https://github.com/dlitz/pycrypto
https://github.com/pyca/pyopenssl
Taking a step down the stack here...
Both provide low-level crypto operations
You have the POWER!
12 . 5
Too scary!
cryptography https://github.com/pyca/cryptography
OK, maybe we went too far down the stack
Most of us don't need low level primitives
By the same folks doing PyOpenSSL
Goal is to have human friendly crypto
12 . 6
Example
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"My giant binary blob")
f.decrypt(token)
Data encryption and decryption is easy!
12 . 7
Pickle
Popular, so they deserve a note
Be careful with pickles
12 . 8
Oh, wait a second...
class Payload(object):
""" Executes /bin/ls when unpickled. """
def __reduce__(self):
""" Run /bin/ls on the remote machine. """
return (subprocess.Popen, (('/bin/ls',),))
Example from Travis Cunningham
Unpickling executes code on box
12 . 9
Mitigations
Sign your pickles
Secure transfer
Don't pickle
Sign and verify pickles before unpickling
Trusted endpoints that allow changes in between don't work
13 . 1
Providers
Changing topics entirely here...
All that is great, but why do it when someone can provide it for you?!
Terribly biased to AWS, so we're going to focus on that, but a lot of this applies to any provider
13 . 2
Server Side Encryption
S3 lets you do server side encryption
Can have bucket policy to enforce
Prevents a data leak from revealing everything
Great for regulatory compliance, but trusts Amazon wholly
Although they are trustworthy
More than 50% of IT professionals don't fully trust providers to not leak data
If you're paranoid...
13 . 3
Client Side Encryption
s3-encryption https://github.com/bold eld/s3-encryption
You encrypt things before sending to storage
See AmazonS3EncryptionClient
Key idea: you can add security to existing libraries! boto + cryptography = cool
Transparent/easy security gets people onboard
13 . 4
Speaking of keys...
Key Management Service
You want keys, KMS gives you keys
Makes Amazon manage all the keys you're using for encryption
KMS keeps the key and limits access as you see fit
Here again, Java SDK has a full feature set to emulate
Direct envelope (key+data) encryption method
14
Conclusion!
Security is important! Trust is priceless.
Do the basics - they are better than nothing
Python has lots of security tools
Providers can help
Thank goodness he's done...
15
Thanks/Promo Time
You ne people
District Data Labs - http://www.districtdatalabs.com
My first time speaking, so thanks for being my inaugural audience
Feedback welcome!
DDL is DC based data science research group
Come see what we're up to!
Talks in this room at 11:30, 1:15, and 3:00 tomorrow
16
Questions?
Twitter: @will2041
Slides: http://bit.ly/2dBcgVx
http://www.slideshare.net/WilliamVoorhees1/eat-your-vegetables-data-security-for-data-scientists
LETTUCE go out on a good note with some questions

More Related Content

Similar to Eat Your Vegetables - Data Security for Data Scientists

Intro to web 2.0 Security
Intro to web 2.0 SecurityIntro to web 2.0 Security
Intro to web 2.0 SecurityJP Bourget
 
The ultimate privacy guide
The ultimate privacy guideThe ultimate privacy guide
The ultimate privacy guideJD Liners
 
Webinar Security: Apps of Steel transcription
Webinar Security:  Apps of Steel transcriptionWebinar Security:  Apps of Steel transcription
Webinar Security: Apps of Steel transcriptionService2Media
 
Cloud Security - Idealware
Cloud Security - IdealwareCloud Security - Idealware
Cloud Security - IdealwareIdealware
 
apsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLPapsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLPandreasschuster
 
The Cloud Beckons, But is it Safe?
The Cloud Beckons, But is it Safe?The Cloud Beckons, But is it Safe?
The Cloud Beckons, But is it Safe?NTEN
 
Security.pptx
Security.pptxSecurity.pptx
Security.pptxjohn6938
 
Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:Recursion Ventures
 
Hacking databases
Hacking databasesHacking databases
Hacking databasessunil kumar
 
Hacking databases
Hacking databasesHacking databases
Hacking databasessunil kumar
 
Security for AWS : Journey to Least Privilege (update)
Security for AWS : Journey to Least Privilege (update)Security for AWS : Journey to Least Privilege (update)
Security for AWS : Journey to Least Privilege (update)dhubbard858
 
Security for AWS: Journey to Least Privilege
Security for AWS: Journey to Least PrivilegeSecurity for AWS: Journey to Least Privilege
Security for AWS: Journey to Least PrivilegeLacework
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computingDigital Shende
 
Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?Rob Fuller
 
Survey Presentation About Application Security
Survey Presentation About Application SecuritySurvey Presentation About Application Security
Survey Presentation About Application SecurityNicholas Davis
 
10 Tips to Strengthen Your Insider Threat Program
10 Tips to Strengthen Your Insider Threat Program 10 Tips to Strengthen Your Insider Threat Program
10 Tips to Strengthen Your Insider Threat Program Dtex Systems
 
You may be compliant...
You may be compliant...You may be compliant...
You may be compliant...Greg Swedosh
 
You may be compliant, but are you really secure?
You may be compliant, but are you really secure?You may be compliant, but are you really secure?
You may be compliant, but are you really secure?Thomas Burg
 
Nick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs securityNick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs securityDevSecCon
 

Similar to Eat Your Vegetables - Data Security for Data Scientists (20)

Intro to web 2.0 Security
Intro to web 2.0 SecurityIntro to web 2.0 Security
Intro to web 2.0 Security
 
The ultimate privacy guide
The ultimate privacy guideThe ultimate privacy guide
The ultimate privacy guide
 
Webinar Security: Apps of Steel transcription
Webinar Security:  Apps of Steel transcriptionWebinar Security:  Apps of Steel transcription
Webinar Security: Apps of Steel transcription
 
Cloud Security - Idealware
Cloud Security - IdealwareCloud Security - Idealware
Cloud Security - Idealware
 
apsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLPapsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLP
 
The Cloud Beckons, But is it Safe?
The Cloud Beckons, But is it Safe?The Cloud Beckons, But is it Safe?
The Cloud Beckons, But is it Safe?
 
The Cloud Beckons, But is it Safe?
The Cloud Beckons, But is it Safe?The Cloud Beckons, But is it Safe?
The Cloud Beckons, But is it Safe?
 
Security.pptx
Security.pptxSecurity.pptx
Security.pptx
 
Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:Black Ops of Fundamental Defense:
Black Ops of Fundamental Defense:
 
Hacking databases
Hacking databasesHacking databases
Hacking databases
 
Hacking databases
Hacking databasesHacking databases
Hacking databases
 
Security for AWS : Journey to Least Privilege (update)
Security for AWS : Journey to Least Privilege (update)Security for AWS : Journey to Least Privilege (update)
Security for AWS : Journey to Least Privilege (update)
 
Security for AWS: Journey to Least Privilege
Security for AWS: Journey to Least PrivilegeSecurity for AWS: Journey to Least Privilege
Security for AWS: Journey to Least Privilege
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?Why isn't infosec working? Did you turn it off and back on again?
Why isn't infosec working? Did you turn it off and back on again?
 
Survey Presentation About Application Security
Survey Presentation About Application SecuritySurvey Presentation About Application Security
Survey Presentation About Application Security
 
10 Tips to Strengthen Your Insider Threat Program
10 Tips to Strengthen Your Insider Threat Program 10 Tips to Strengthen Your Insider Threat Program
10 Tips to Strengthen Your Insider Threat Program
 
You may be compliant...
You may be compliant...You may be compliant...
You may be compliant...
 
You may be compliant, but are you really secure?
You may be compliant, but are you really secure?You may be compliant, but are you really secure?
You may be compliant, but are you really secure?
 
Nick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs securityNick Drage & Fraser Scott - Epic battle devops vs security
Nick Drage & Fraser Scott - Epic battle devops vs security
 

Recently uploaded

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Eat Your Vegetables - Data Security for Data Scientists

  • 1. 1 Eat Your Vegetables Data Security for Data Scientists Welcome to Eat Your Vegetables! Hope you're having a great conference so far
  • 2. 2 Agenda 1. Agenda 2. Intro 3. Convincing Time 4. Security Concepts 5. Tools 6. Questions The agenda - self referential, eh? Intro stuff Why this is important (if you're not convinced already) Some basic tips for security - NOT definitions Tools to make security easier Time for questions at the end Slides will be online
  • 3. 3 Name: Will Voorhees Occupation: Software Engineer Favorite Color: Who's this guy?! - Your SPEAKER Pic from Halloween 2010 Been doing tech for 15 years, but real software development for about 5 Currently working in security org creating enterprise security tools
  • 4. Orange Twitter: @will2041 Why are we here? Remember this bit from Red vs Blue? To talk about vegetables! Vegetables are good for you And by "vegetable", I mean security You've got to EAT YOUR VEGTABLES, just like mom said Vegetables can be tasty
  • 5. 45 Someone Wants Your Data No, seriously. Someonewants your data.This is the predicate to everything "Well duh, my data is awesome!" Attackers are interested in all kinds of data for all kinds of reasons Even Pokemon Go accounts have value It's not always monetary - 2015 Ashley Madison as an example "Hacktivism"
  • 6. 6 . 1 Why should I care? Obvious stuff - money About a year ago, insurance company Lloyd's estimated $400 bil/year lost to hacking So what other reasons do I have?
  • 7. 6 . 2 HIPPAA SOX Trade Sanctions (Government) contracts Etc. 6 . 3 Analytical Confidence Encryption/signing can provide a "no one touched this" guarantee Nice out of box benefit of adding security Nothing like re-running a model on data that's changed and freaking out
  • 8. 6 . 4 Fun! You're kidding... Puzzles! Red Team vs Blue Team competition Caesar cipher Some of the math is interesting
  • 9. 6 . 5 You got me! That was filler. I'm sorry They're valid reasons, they're just not the most important reason
  • 10. 6 . 6 Trust It's what's for dinner. We're data stewards - everyone trusts us with data Doesn't matter what data you have, someone trusts you with it We are ultimately responsible for our data Magic Information Security elves won't save us
  • 11. 6 . 7 Trust Has a Cost Governments lose national security - OPM (Office of Personnel Management), IRS e-commerce sites lose sales Remember that money thing? I lied! Journal of Cyber Security says a breach costs as much as the defense It's cheaper to get hacked So maybe it's not about money...
  • 12. 6 . 8 Human Trust Trust of people isn't as easily quantified Target, Ashley Madison still in business - But what's the impact? This is all very murky - needs more research In absence of data, do what's right
  • 13. 7 What can I do? Good news: there's some easy stuff Bad news: there's some really hard stuffEasy stuff is pretty easy, once you learn it Hard stuff is really hard, even after you learn it Think Heartbleed from 2014 for OpenSSL Buffer overflow bug let attacker get memory dump
  • 14. 8 Patching can proactively save your butt Doing it often means you know how to do it quickly Quick response can be really important - think heartbleed or shellshock 9 . 1 The Easy Stuff I claim that the easy stuff is pretty easy! Sad fact: doing the basics makes you better than a lot of companies
  • 15. 9 . 2 Access Control Don't leave things open to the world! Some restriction is better than no restriction Accounts can only do certain things This is what keeps your intern from deleting your data lake Let's use Nissan as an example Leaf completely open to internet for physical control Minimum bad PR, maximum loss of life
  • 16. 9 . 3 TLS a.k.a. SSL TLS replaced SSL Everyone still calls TLS "SSL" What - authentication and encryption on connections
  • 17. 9 . 4 TLS Myths Let's encrypt - https://letsencrypt.org Cost It's 2016 and Let's Encrypt is a thing Performance impact is negligible Gmail to SSL -> No special tuning, less than 1% CPU and ms of latency
  • 18. 9 . 5 Account Separation Yes, that's actually a fruit, but I started getting desparate Your backup user doesn't need write access to your master DB Whole companies have been lost because they used one account (Code Spaces) Minimize blast radius Backups on a separate account!
  • 19. 9 . 6 Short Lived Credentials Just like passwords, you need to rotate your keys Limit blast radius STS hands out temporary credentials Short lived because...
  • 20. 9 . 7 Devs keep putting keys in Github!
  • 21. 9 . 8 Scanners grab creds and spin up instances for Bitcoin mining, etc. Shorted lived creds limit the blast radius 10 Hey, that's just general security stuff! What about big data?!Take a Breather All that stuff gates access to the data Even if you do nothing else, this is your first line defense But yes, let's talk about data
  • 22. 11 . 1 Signatures Provides that "no one messed with this" guarantee Teased with "Analytical Confidence" Compliment to encryption
  • 23. Signing vs hashing Signing proves identity - hacker can just update a hash with the data 11 . 2 Encryption But not really... Cryptography Engineering: Design Principles and Practical Applications First thing people think of Not an "End All Be All" People think slapping on encryption solves all the security issues Really hard to get right Confidentiality vs integrity AES - some modes provide integrity, others don't Waaay more to this than I can cover - Google or book
  • 24. But there are other challenges... 11 . 3 Encrypted data is a pain It's always going to be slower Some tools just freak out at the thought - Bye bye grep So we really want to work with unencrypted data But for minimal time and only in certain places Tooling can help with this - but it requires effort Callback to companies going out of business, etc.
  • 25. 11 . 4 Key management is a painMore data = more keys New data should use a different key A leaked key doesn't reveal all your data Now you have many keys to manage Keep keys somewhere else! Reference to key used to encrypt should be kept with data S3 metadata can keep a key reference Key serials - don't forget 'em
  • 26. 11 . 5 Bummed out yet? Yes, I know that's not a vegetable It's got the "vegetable" tag on flickr... so remember the importance of correct tagging
  • 27. 11 . 6 Tips Decrypt, but be safe Split it up Work with metadata Use the tools... Not all doom and gloom Minimize the amount of time data is unencrypted When actually working with it, keep it somewhere safe You don't need all data for everything. Split things up If you don't need the actual data, just work with metadata and avoid encryption all together Speaking of tools...
  • 28. 12 . 1 Tools! You're not alone - lots of people care about security There are low level libraries and high level tools Everyday new tools that make security easier are being developed Do NOT roll your own crypto
  • 29. 12 . 2 High Level JWT python-jose https://jwt.io/ https://github.com/mpdavis/python-joseStart at the top of the stack JWT = JSON Web Token JOSE = JavaScript Object Signing and Encryption Does signing, but no encryption Can be used for powerful web AuthN/AuthZ Encryption via TLS for connections
  • 30. 12 . 3 JOSE from jose import jws signed = jws.sign({'a': 'b'}, 'secret', algorithm='HS256') >>> 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhIjoiYiJ9.jiMyrsmD8AoHWeQgmxZ5yq8z0lXS67_ Signed JSON
  • 31. 12 . 4 Low Level PyCrypto PyOpenSSL https://github.com/dlitz/pycrypto https://github.com/pyca/pyopenssl Taking a step down the stack here... Both provide low-level crypto operations You have the POWER!
  • 32. 12 . 5 Too scary! cryptography https://github.com/pyca/cryptography OK, maybe we went too far down the stack Most of us don't need low level primitives By the same folks doing PyOpenSSL Goal is to have human friendly crypto
  • 33. 12 . 6 Example from cryptography.fernet import Fernet key = Fernet.generate_key() f = Fernet(key) token = f.encrypt(b"My giant binary blob") f.decrypt(token) Data encryption and decryption is easy!
  • 34. 12 . 7 Pickle Popular, so they deserve a note Be careful with pickles
  • 35. 12 . 8 Oh, wait a second... class Payload(object): """ Executes /bin/ls when unpickled. """ def __reduce__(self): """ Run /bin/ls on the remote machine. """ return (subprocess.Popen, (('/bin/ls',),)) Example from Travis Cunningham Unpickling executes code on box
  • 36. 12 . 9 Mitigations Sign your pickles Secure transfer Don't pickle Sign and verify pickles before unpickling Trusted endpoints that allow changes in between don't work
  • 37. 13 . 1 Providers Changing topics entirely here... All that is great, but why do it when someone can provide it for you?! Terribly biased to AWS, so we're going to focus on that, but a lot of this applies to any provider
  • 38. 13 . 2 Server Side Encryption S3 lets you do server side encryption Can have bucket policy to enforce Prevents a data leak from revealing everything Great for regulatory compliance, but trusts Amazon wholly Although they are trustworthy More than 50% of IT professionals don't fully trust providers to not leak data If you're paranoid...
  • 39. 13 . 3 Client Side Encryption s3-encryption https://github.com/bold eld/s3-encryption You encrypt things before sending to storage See AmazonS3EncryptionClient Key idea: you can add security to existing libraries! boto + cryptography = cool Transparent/easy security gets people onboard
  • 40. 13 . 4 Speaking of keys... Key Management Service You want keys, KMS gives you keys Makes Amazon manage all the keys you're using for encryption KMS keeps the key and limits access as you see fit Here again, Java SDK has a full feature set to emulate Direct envelope (key+data) encryption method
  • 41. 14 Conclusion! Security is important! Trust is priceless. Do the basics - they are better than nothing Python has lots of security tools Providers can help Thank goodness he's done...
  • 42. 15 Thanks/Promo Time You ne people District Data Labs - http://www.districtdatalabs.com My first time speaking, so thanks for being my inaugural audience Feedback welcome! DDL is DC based data science research group Come see what we're up to! Talks in this room at 11:30, 1:15, and 3:00 tomorrow