Analyzing Pwned
Passwords with Spark
Kelley Robinson
@kelleyrobinson
Developer Evangelist, Account Security
👋 ☎
+👋 ☎ 🔐
BIG DATA & SECURITY @KELLEYROBINSON
Spark: then and now
The state of passwords
Spark in action
Big Data & Security
BIG DATA & SECURITY @KELLEYROBINSON
BIG DATA & SECURITY @KELLEYROBINSON
Apache Spark Ecosystem
BIG DATA & SECURITY @KELLEYROBINSON
Spark Abstractions
Then
Now
RDD (Resilient Distributed Dataset)
DataFrames
https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
@KELLEYROBINSONBIG DATA & SECURITY
RDDs
• Immutable & distributed
collection
• Unstructured data
• Low-level transformation
and control
BIG DATA & SECURITY
https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
@KELLEYROBINSON
DataFrames
• Structured data
• Fast
• SQL DSLs
BIG DATA & SECURITY @KELLEYROBINSON
Apache Spark Ecosystem
BIG DATA & SECURITY @KELLEYROBINSON
Python is the
future of Spark
BIG DATA & SECURITY
https://www.slideshare.net/databricks/composable-parallel-processing-in-apache-spark-and-weld
@KELLEYROBINSON
The State of Passwords
@KELLEYROBINSONBIG DATA & SECURITY
Spark in action
BIG DATA & SECURITY @KELLEYROBINSON
http://www.commitstrip.com/en/2018/04/27/security-security-security/?
Benefits of Spark
Fast
Flexible
Good for exploration
Proven for large systems
BIG DATA & SECURITY @KELLEYROBINSON
Challenges
Opaque error messages
Operationalizing
Documentation
http://heather.miller.am/blog/launching-a-spark-cluster-part-1.html
BIG DATA & SECURITY @KELLEYROBINSON
🤔
@KELLEYROBINSONBIG DATA & SECURITY
Big Data
& Security
BIG DATA & SECURITY @KELLEYROBINSON
@KELLEYROBINSON
BIG DATA & SECURITY @KELLEYROBINSON
Passwords aren't great.
But they aren't going away
anytime soon.
BIG DATA & SECURITY @KELLEYROBINSON
Aim for a seamless
user experience.
]
BIG DATA & SECURITY @KELLEYROBINSON
Developers have the
responsibility to
secure authentication.
BIG DATA & SECURITY
https://twitter.com/PWTooStrong
@KELLEYROBINSON
Further Reading on Pwned Passwords
https://twilio.com/blog/2018/06/analyzing-pwned-passwords-with-apache-spark.html
https://www.troyhunt.com/pwned-passwords-in-practice-real-world-examples-of-blocking-the-worst-passwords/
https://www.twilio.com/blog/2018/06/round-up-libraries-for-checking-pwned-passwords-in-your-7-favorite-languages.html
THANK YOU!
@kelleyrobinson
Resources
• Apache Spark
• Jacek's Spark Documentation
• Zeppelin
• RDDs vs. Datasets
• Running Spark on a Cluster
• Debugging PySpark (Video) (Slides)
• 2FA Guides
• Pwned Passwords Further Reading
@KELLEYROBINSONBIG DATA & SECURITY
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018

Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018