Successfully reported this slideshow.
Your SlideShare is downloading. ×

Migrate and Modernize Hadoop-Based Security Policies for Databricks

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 40 Ad

Migrate and Modernize Hadoop-Based Security Policies for Databricks

Download to read offline

Data teams are faced with a variety of tasks when migrating Hadoop-based platforms to Databricks. A common pitfall happens during the migration step where often overlooked access control policies can block adoption. This session will focus on the best practices to migrate and modernize Hadoop-based policies to govern data access (such as those in Apache Ranger or Apache Sentry). Data architects must consider new, fine-grained access control requirements when migrating from Hadoop architectures to Databricks in order to deliver secure access to as many data sets and data consumers as possible. This session will provide guidance across open source, AWS, Azure and partner tools, such as Immuta, on how to scale existing Hadoop-based policies to dynamically support more classes of users, implement fine-grained access control and leverage automation to protect sensitive data while maximizing utility — without manual effort

Data teams are faced with a variety of tasks when migrating Hadoop-based platforms to Databricks. A common pitfall happens during the migration step where often overlooked access control policies can block adoption. This session will focus on the best practices to migrate and modernize Hadoop-based policies to govern data access (such as those in Apache Ranger or Apache Sentry). Data architects must consider new, fine-grained access control requirements when migrating from Hadoop architectures to Databricks in order to deliver secure access to as many data sets and data consumers as possible. This session will provide guidance across open source, AWS, Azure and partner tools, such as Immuta, on how to scale existing Hadoop-based policies to dynamically support more classes of users, implement fine-grained access control and leverage automation to protect sensitive data while maximizing utility — without manual effort

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Migrate and Modernize Hadoop-Based Security Policies for Databricks (20)

Advertisement

More from Databricks (20)

Advertisement

Migrate and Modernize Hadoop-Based Security Policies for Databricks

  1. 1. Migrate and Modernize Hadoop- Based Security Policies for Databricks Steve Touw CTO, Immuta
  2. 2. Can I just migrate my Apache Ranger/Sentry Policies Directly to [Databricks]? [presto] [synapse] [snowflake] [starburst] [etc…]
  3. 3. Can I just migrate my Apache Ranger/Sentry Policies Directly to [Databricks]? Migrate Modernize Yes! No! How do I get to Yes for both? (that’s what this talk is about…)
  4. 4. Why Modernize?
  5. 5. 2012 - Development of Cloudera Access (later renamed to Sentry) starts 2013 - XA Secure created, later acquired by Hortonworks A lot has changed in 8 years...
  6. 6. Hadoop is No Longer The Center of the Universe Multi-cloud, Multi- compute Managing compute-specific controls across more than one of these systems is not feasible
  7. 7. Data Protection Laws of the World...Growing https://www.dlapiperdataprotection.com/
  8. 8. WHY IMMUTA 1990 2025 Privacy Rules & Regulations driving data “fuel crisis” Compliant Data for Analytics HIPAA (1996) GDPR (2018) CCPA (2020) GLBA (1999) HITECH (2009) 350+ Privacy & Infosec Bills Proposed The Data “Fuel” Crisis DataLegallyUsableforAnalytics
  9. 9. WHY IMMUTA We need to secure our data. I need to use our data. LEGAL / COMPLIANCE DATA ANALYSTS & SCIENTISTS So the data “tug of war” has begun… DATA DATA PLATFORM OWNER / DATA ENGINEERING
  10. 10. More Complexity, Changing Definitions of Privacy Preservation Language from CCPA (and other similar language in GDPR) “1798.145(a)(5): The obligations imposed on businesses by this title shall not restrict a business’ ability to collect, use, retain, sell, or disclose consumer information that is deidentified or in the aggregate consumer information.” Meaning, if you deidentify/anonymize the data, CCPA doesn’t apply, yay! But, nothing in life is free… PI is defined as information "that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked." !!!!!
  11. 11. How to balance the speed of the business with secure access to sensitive data? The Privacy vs Utility Tradeoff FULL PRIVACY FULL UTILITY Closed Open THE RISK OF DATA USE Sweet spot More stringent definitions are swinging the pendulum here Momentum LEGAL / COMPLIANCE DATA ANALYSTS & SCIENTISTS
  12. 12. The World has Changed. We are in: The “Cloud Private Data Era” More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  13. 13. The “Cloud Private Data Era” Has Created a Role Tidal Wave More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  14. 14. Role Explosion Example (Real Customer Use Case) Each row-level policy in Ranger is tied to an individual role - but they are all doing the “same thing” If you want to show new data, you need a new Role and a new Policy This isn’t just Ranger - think AWS IAM Roles too! redacted redacted redacted redacted redacted redacted redacted redacted user associated to role the exact same policy written over and over again the only change: the role
  15. 15. Role-Based Access Control (RBAC) is Broken ▪ RBAC should really be named “Static- based Access Control” ▪ It’s like writing code without being able to use variables!
  16. 16. 2012 - Development of Cloudera Access (later renamed to Sentry) starts 2013 - XA Secure created, later acquired by Hortonworks Conceived Before the Cloud Private Data Era
  17. 17. You Must Do Both… If You Don’t, You Won’t Realize the Benefits of the Cloud Migrate Modernize Yes! Yes!
  18. 18. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  19. 19. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  20. 20. Separate Policy from Platform Just like the big data era required the separation of compute from storage, the private data era requires the separation of policy from platform. This allows defining policy externally from the platform and executing enforcement live in the platform without creating data copies/views. ● Table access controls ● Column level controls ● Row level security ● Cell-level controls In a consistent manner, no matter your compute
  21. 21. You Must Also Separate Policy from Physical Thousands of tables and columns PoliciesThousands of policies Abstract with logical metadata PII, PHI, Address, SSN, etc... Very few, understandable, policies
  22. 22. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  23. 23. Remember This? ▪ RBAC should really be named “Static-based Access Control” ▪ It’s like writing code without being able to use variables! Wouldn’t it have been nice to just write this with a variable and have the policy dynamically defined at RUN TIME? organization_name IN (SELECT org_name from redacted WHERE role IN (@role)) ▪ This is ABAC and it really should be called “Dynamic-based Access Control”
  24. 24. Ranger/Hortonworks Real Customer Example They had 8 rules per table times 12 tables for a total of 96 rules! redacted redacted redacted redacted redacted redacted redacted redacted user associated to role the exact same policy written over and over again the only change: the role
  25. 25. With ABAC/Immuta, It’s a Single Policy! This is because it separates the user details from the policy and treats them as a read-time variable. This also future-proofs the policy. We can also build the rule once and have it apply to all 12 tables with our logical metadata layer (discussed previously). This also future-proofs adding new tables.
  26. 26. Let’s Cover How To Fix Each of These... Attribute-based Access Control (ABAC) Privacy Enhancing Technologies (PETs) Separation of Policy from Platform More regulatory and privacy concerns More stringent definitions of privacy preservation Complex data platform ecosystem
  27. 27. How to balance the speed of the business with secure access to sensitive data? How Do We Hit The Privacy vs Utility Sweet Spot? FULL PRIVACY FULL UTILITY Closed Open THE RISK OF DATA USE Sweet spot LEGAL / COMPLIANCE DATA ANALYSTS & SCIENTISTS
  28. 28. I know stuff about Judd and Leslie photo credit: Gawker
  29. 29. New York Taxi & Limousine Commission • Data was released containing taxi pickups, dropoffs, location, time, amount, and tip amount, among others • This seems pretty harmless?
  30. 30. Well, Judd and Leslie May Not Think It’s Harmless • This photos was geotagged (with time), so by simply querying by medallion and time, we know how much Judd and Leslie tip!
  31. 31. Limit Features Limit Records Limit Functions Reduced specificity Regular Expressions for strings Rounding for numeric data Column restriction Hide or replace values with NULL Row restrictions Restrict access to certain types of rows Differential Privacy Inject noise into aggregate measures based on privacy guarantees Hashing/Encryption Local DP Randomly alter a percentage of data Aggregate-Only Only allow aggregate functions on data K-anonymization Suppress values that can lead to linkage attacks
  32. 32. Taxi data properly anonymized while providing utility Generalize: remove precision from time and space Randomize: replace with false but legitimate values at a specified rate Mask: using salted deterministic hash Direct Identifier: Indirect Identifiers: Sensitive
  33. 33. Attack occurs when the potential for re-identification exists. Factors include: ● Access ● External Knowledge ● Incentives Attack Event (A) represents the probability that an attack occurs Success Event (S) represents the probability that an attack is successful Terminology BACKGROUND Attack A S
  34. 34. Data Risk Risk Mitigation modify data to limit the ability of an adversary to make inferences Inferences ● Record ownership ● Participation ● Attribute Values Techniques ● k-Anon ● LDP ● DP ● Masking A S
  35. 35. Context Risk Risk A A S Mitigation “shrinks” the attack surface. Controls ● Limiting Access ● Limiting types of Queries ● Purpose Limitations ● Agreements ● Creating Disincentives ● Training
  36. 36. A S A S A S Risk Utility Risk Utility Risk Utility
  37. 37. Ok, but I put all this effort into Sentry / Ranger, this seems like a big change...
  38. 38. Migration Utility from Ranger/Sentry → Immuta Migrate policies but also modernize
  39. 39. DEMO...
  40. 40. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×