Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyzing Pwned Passwords with Spark and Scala

471 views

Published on

Apache Spark aims to solve the problem of working with large scale distributed data -- and with access to over 500 million leaked passwords we have a lot of data to dig through.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Analyzing Pwned Passwords with Spark and Scala

  1. 1. Analyzing Pwned Passwords with Spark Kelley Robinson @kelleyrobinson Developer Evangelist
  2. 2. +
  3. 3. BIG DATA & SECURITY @KELLEYROBINSON Spark: then and now The state of passwords Spark in action Big Data ∩ Security
  4. 4. BIG DATA & SECURITY @KELLEYROBINSON
  5. 5. BIG DATA & SECURITY @KELLEYROBINSON
  6. 6. BIG DATA & SECURITY @KELLEYROBINSON Apache Spark Ecosystem
  7. 7. BIG DATA & SECURITY @KELLEYROBINSON Spark Abstractions Then Now RDD (Resilient Distributed Dataset) DataFrames / Datasets
  8. 8. https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html @KELLEYROBINSONBIG DATA & SECURITY RDDs • Immutable & distributed collection • Unstructured data • Low-level transformation and control
  9. 9. BIG DATA & SECURITY https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html @KELLEYROBINSON
  10. 10. https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html @KELLEYROBINSONBIG DATA & SECURITY Datasets • Structured data • Strongly typed • Fast
  11. 11. @KELLEYROBINSONBIG DATA & SECURITY Datasets • Structured data • Strongly typed • Fast • SQL DSLs
  12. 12. BIG DATA & SECURITY @KELLEYROBINSON Apache Spark Ecosystem
  13. 13. BIG DATA & SECURITY @KELLEYROBINSON Scala has the most robust language API
  14. 14. BIG DATA & SECURITY https://www.slideshare.net/databricks/composable-parallel-processing-in-apache-spark-and-weld @KELLEYROBINSON
  15. 15. BIG DATA & SECURITY https://twitter.com/CamJo89/status/996497423621996544 @KELLEYROBINSON
  16. 16. BIG DATA & SECURITY @KELLEYROBINSON Spark: then and now The state of passwords Spark in action Big Data ∩ Security
  17. 17. @KELLEYROBINSONBIG DATA & SECURITY Spark: then and now The state of passwords Spark in action Big Data ∩ Security
  18. 18. https://twitter.com/dog_rates/status/986762231290490881
  19. 19. Benefits Fast Flexible Good for exploration Proven for large systems BIG DATA & SECURITY @KELLEYROBINSON
  20. 20. Challenges Opaque error messages Operationalizing Documentation http://heather.miller.am/blog/launching-a-spark-cluster-part-1.html BIG DATA & SECURITY @KELLEYROBINSON
  21. 21. BIG DATA & SECURITY https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/ @KELLEYROBINSON 👍💯 The missing Spark documentation
  22. 22. BIG DATA & SECURITY @KELLEYROBINSON Spark: then and now The state of passwords Spark in action Big Data ∩ Security
  23. 23. BIG DATA & SECURITY @KELLEYROBINSON
  24. 24. @KELLEYROBINSON
  25. 25. BIG DATA & SECURITY
  26. 26. THANK YOU! @kelleyrobinson
  27. 27. Spark Resources • Apache Spark • Jacek's Spark Documentation • Zeppelin • RDDs vs. Datasets • Running Spark on a Cluster Security Resources • Pwned Passwords • Reverse SHA1 hashes • LastPass and 1Password • 2FA Guides @KELLEYROBINSONBIG DATA & SECURITY

×