Your SlideShare is downloading. ×
Taking Hadoop to Enterprise Security Standards
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Taking Hadoop to Enterprise Security Standards

529
views

Published on

Published in: Technology, Business

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
529
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
41
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ©2014 LinkedIn Corporation. All Rights Reserved. Taking Hadoop to Enterprise Security Standards
  • 2. Access Control
  • 3. How many of you need or have access control in Hadoop?
  • 4. ©2014 LinkedIn Corporation. All Rights Reserved. Users First Internal Threat Keeping Data Secure External Threat
  • 5. More granular the access controls are more people can have access to the data
  • 6. ©2014 LinkedIn Corporation. All Rights Reserved. Hadoop – Status Quo Multiple Query Execution Engines Custom Code Execution Auditing
  • 7. ©2014 LinkedIn Corporation. All Rights Reserved. User ID Email Address IP address Billing address Security Customer Service Data Scientist Adding & Removing group membership can take up to few hours HDFS file permissions are very coarse (at file level) HDFS File Permissions
  • 8. ©2014 LinkedIn Corporation. All Rights Reserved. Other Access Control Solutions
  • 9. ©2014 LinkedIn Corporation. All Rights Reserved. Mixed Data Multiple Data Processing Systems Data for Everyone Challenges
  • 10. ©2014 LinkedIn Corporation. All Rights Reserved. Extensible Authorization Fine Grain Control Fast Changes to Authorization Rules What do we need?
  • 11. ©2014 LinkedIn Corporation. All Rights Reserved. Our Solution: Access Control via Encryption Apache Kafka HDFS Key Server Parquet ETLEncrypted Events
  • 12. ©2014 LinkedIn Corporation. All Rights Reserved. User A’s Job User B’s Job User C’s Job Producer Job ETL User Parquet File User Columns A 5 B 2, 5 Key Server Access Control via Encryption
  • 13. ©2014 LinkedIn Corporation. All Rights Reserved. Columnar Storage Page 0 Page 1 Page 2 Column a Column b Rowgroup Parquet Format Brief Overview of Parquet
  • 14. ©2014 LinkedIn Corporation. All Rights Reserved. *Yet to be integrated into open source Parquet Field mode Page Column | Page Mode | Hybrid Mode Encryption Support in Parquet*
  • 15. ©2014 LinkedIn Corporation. All Rights Reserved. Examples  Emails – Analysts need it to join with other tables but may not require access to individual emails N Values (Page) Encrypt each value at a time karthik@gmail.com harsh@gmail.com harsh@gmail.com arvind@gmail.com xxxxxxx yyyyyyy yyyyyyy zzzzzzz Field Mode
  • 16. ©2014 LinkedIn Corporation. All Rights Reserved. Field Mode Joins Counts Distribution Analysis No/Low compression
  • 17. ©2014 LinkedIn Corporation. All Rights Reserved. Page Mode  No information is leaked except entropy of the data  Better performance than other modes N Values (Page) Encode Compress Encrypt
  • 18. ©2014 LinkedIn Corporation. All Rights Reserved. Hybrid Mode  More fine grain control of information  Increase in overhead due to double encryption/decryption N Values (Page) Encrypt each value Encrypt
  • 19. ©2014 LinkedIn Corporation. All Rights Reserved. Plain Text | Encrypted Value | No Access Field Mode Page Mode Hybrid Mode
  • 20. ©2014 LinkedIn Corporation. All Rights Reserved. Key Versioning  Each key is versioned and specific for a source (File/Event name)  Reduces the exposure incase of key leakage  Time based access control – All users by default can access only last 30 days of data – Give users access to data in specific time period  Authentication of producers can be done separately
  • 21. ©2014 LinkedIn Corporation. All Rights Reserved. Better Auditing Coverage Retention Enforcement Key Server Features Multifactor Authentication
  • 22. ©2014 LinkedIn Corporation. All Rights Reserved. PIG Usage
  • 23. Thank you!