©2014 LinkedIn Corporation. All Rights Reserved.
Taking Hadoop to Enterprise Security
Standards
Access Control
How many of you need or have
access control in Hadoop?
©2014 LinkedIn Corporation. All Rights Reserved.
Users First Internal Threat
Keeping Data Secure
External Threat
More granular the access controls are
more people can have access to the data
©2014 LinkedIn Corporation. All Rights Reserved.
Hadoop – Status Quo
Multiple Query
Execution
Engines
Custom Code
Executio...
©2014 LinkedIn Corporation. All Rights Reserved.
User ID Email Address IP address Billing address
Security Customer Servic...
©2014 LinkedIn Corporation. All Rights Reserved.
Other Access Control Solutions
©2014 LinkedIn Corporation. All Rights Reserved.
Mixed Data Multiple Data Processing
Systems
Data for Everyone
Challenges
©2014 LinkedIn Corporation. All Rights Reserved.
Extensible
Authorization
Fine Grain
Control
Fast Changes to
Authorization...
©2014 LinkedIn Corporation. All Rights Reserved.
Our Solution: Access Control via Encryption
Apache Kafka
HDFS
Key Server
...
©2014 LinkedIn Corporation. All Rights Reserved.
User A’s Job
User B’s Job
User C’s Job
Producer
Job
ETL User
Parquet File...
©2014 LinkedIn Corporation. All Rights Reserved.
Columnar Storage
Page 0
Page 1
Page 2
Column a Column b
Rowgroup
Parquet ...
©2014 LinkedIn Corporation. All Rights Reserved.
*Yet to be integrated into open source Parquet
Field mode
Page
Column
| P...
©2014 LinkedIn Corporation. All Rights Reserved.
Examples
 Emails – Analysts need it to join with other tables but may no...
©2014 LinkedIn Corporation. All Rights Reserved.
Field Mode
Joins Counts
Distribution Analysis
No/Low compression
©2014 LinkedIn Corporation. All Rights Reserved.
Page Mode
 No information is leaked except entropy of the data
 Better ...
©2014 LinkedIn Corporation. All Rights Reserved.
Hybrid Mode
 More fine grain control of information
 Increase in overhe...
©2014 LinkedIn Corporation. All Rights Reserved.
Plain Text | Encrypted Value | No Access
Field Mode Page Mode
Hybrid Mode
©2014 LinkedIn Corporation. All Rights Reserved.
Key Versioning
 Each key is versioned and specific for a source (File/Ev...
©2014 LinkedIn Corporation. All Rights Reserved.
Better Auditing
Coverage
Retention
Enforcement
Key Server Features
Multif...
©2014 LinkedIn Corporation. All Rights Reserved.
PIG Usage
Thank you!
Upcoming SlideShare
Loading in...5
×

Taking Hadoop to Enterprise Security Standards

720

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
720
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
49
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Taking Hadoop to Enterprise Security Standards

  1. 1. ©2014 LinkedIn Corporation. All Rights Reserved. Taking Hadoop to Enterprise Security Standards
  2. 2. Access Control
  3. 3. How many of you need or have access control in Hadoop?
  4. 4. ©2014 LinkedIn Corporation. All Rights Reserved. Users First Internal Threat Keeping Data Secure External Threat
  5. 5. More granular the access controls are more people can have access to the data
  6. 6. ©2014 LinkedIn Corporation. All Rights Reserved. Hadoop – Status Quo Multiple Query Execution Engines Custom Code Execution Auditing
  7. 7. ©2014 LinkedIn Corporation. All Rights Reserved. User ID Email Address IP address Billing address Security Customer Service Data Scientist Adding & Removing group membership can take up to few hours HDFS file permissions are very coarse (at file level) HDFS File Permissions
  8. 8. ©2014 LinkedIn Corporation. All Rights Reserved. Other Access Control Solutions
  9. 9. ©2014 LinkedIn Corporation. All Rights Reserved. Mixed Data Multiple Data Processing Systems Data for Everyone Challenges
  10. 10. ©2014 LinkedIn Corporation. All Rights Reserved. Extensible Authorization Fine Grain Control Fast Changes to Authorization Rules What do we need?
  11. 11. ©2014 LinkedIn Corporation. All Rights Reserved. Our Solution: Access Control via Encryption Apache Kafka HDFS Key Server Parquet ETLEncrypted Events
  12. 12. ©2014 LinkedIn Corporation. All Rights Reserved. User A’s Job User B’s Job User C’s Job Producer Job ETL User Parquet File User Columns A 5 B 2, 5 Key Server Access Control via Encryption
  13. 13. ©2014 LinkedIn Corporation. All Rights Reserved. Columnar Storage Page 0 Page 1 Page 2 Column a Column b Rowgroup Parquet Format Brief Overview of Parquet
  14. 14. ©2014 LinkedIn Corporation. All Rights Reserved. *Yet to be integrated into open source Parquet Field mode Page Column | Page Mode | Hybrid Mode Encryption Support in Parquet*
  15. 15. ©2014 LinkedIn Corporation. All Rights Reserved. Examples  Emails – Analysts need it to join with other tables but may not require access to individual emails N Values (Page) Encrypt each value at a time karthik@gmail.com harsh@gmail.com harsh@gmail.com arvind@gmail.com xxxxxxx yyyyyyy yyyyyyy zzzzzzz Field Mode
  16. 16. ©2014 LinkedIn Corporation. All Rights Reserved. Field Mode Joins Counts Distribution Analysis No/Low compression
  17. 17. ©2014 LinkedIn Corporation. All Rights Reserved. Page Mode  No information is leaked except entropy of the data  Better performance than other modes N Values (Page) Encode Compress Encrypt
  18. 18. ©2014 LinkedIn Corporation. All Rights Reserved. Hybrid Mode  More fine grain control of information  Increase in overhead due to double encryption/decryption N Values (Page) Encrypt each value Encrypt
  19. 19. ©2014 LinkedIn Corporation. All Rights Reserved. Plain Text | Encrypted Value | No Access Field Mode Page Mode Hybrid Mode
  20. 20. ©2014 LinkedIn Corporation. All Rights Reserved. Key Versioning  Each key is versioned and specific for a source (File/Event name)  Reduces the exposure incase of key leakage  Time based access control – All users by default can access only last 30 days of data – Give users access to data in specific time period  Authentication of producers can be done separately
  21. 21. ©2014 LinkedIn Corporation. All Rights Reserved. Better Auditing Coverage Retention Enforcement Key Server Features Multifactor Authentication
  22. 22. ©2014 LinkedIn Corporation. All Rights Reserved. PIG Usage
  23. 23. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×