Todd Lipcon and Aaron T. Myers
     @tlipcon / @atmyers
Who are We?


• Todd Lipcon – Software Engineer, Cloudera
 • HDFS contributor, HBase committer
 • Primarily works on the P...
Outline

• Hadoop Security Overview
 • Hadoop Security pre CDH3b3
 • Hadoop Security with CDH3b3
• Details of Deploying Se...
Hadoop Security: Overview
Why do we care about security?

• Foo, Inc has a product that has both paid ads and search
• “Payment fraud” team needs lo...
Security pre CDH3b3: User Authentication



• Authentication is by vigorous assertion
• Trivial to impersonate other user:...
Security pre CDH3b3: Server Authentication




                 None
Security pre CDH3b3: HDFS


• Unix-like file permissions were introduced in
  Hadoop v16.1
• Provides standard user/group/...
Security pre CDH3b3: Job Control



• ACLs per job queue for job submission / killing
• No ACLs for viewing counters / log...
Security pre CDH3b3: Tasks


• Individual tasks all run as the same user
 • Whoever the TT is running as (usually 'hadoop'...
Security pre CDH3b3: Web interfaces




              None
Security with CDH3b3: User Authentication

• Authentication is secured by Kerberos v5
 • RPC connections secured with SASL...
Security with CDH3b3: Server Authentication




• Kerberos authentication is bi-directional
• Users can be sure that they ...
Security with CDH3b3: HDFS




• Same permissions model
• But, a user can no longer trivially impersonate
  other users (s...
Security with CDH3b3: Job Control



• A job now has its own ACLs, including a view ACL
• A job can now specify who can vi...
Security with CDH3b3: Tasks

• Tasks can now run as the user who launched the job
• Ensures isolation of tasks which run o...
Security with CDH3b3: Web Interfaces



• Out of the box Kerberized SSL support
• Pluggable servlet filters (more on this ...
Security with CDH3b3: Threat Model


• The Hadoop security system assumes that:
 • Users do not have root access to cluste...
Thanks, Yahoo!




Yahoo! did the vast majority of the
   core Hadoop security work
Hadoop Security:
Deployment Details
Requirements: Kerberos Infrastructure
• Kerberos domain (KDC)
 • eg. MIT krb5 in RHEL, or MS Active Directory
• Kerberos p...
Configuring daemons for security

• Most daemons have two configs:
 • Keytab location (eg dfs.datanode.keytab.file)
 • Ker...
Setting up users
• Each user must have a Kerberos principal
• May want some shared accounts:
 • sharedaccount/alice and sh...
Installing Secure Hadoop

• MapReduce and HDFS should run as separate
  users (e.g. 'hdfs' and 'mapred')
• New task-contro...
Securing higher-level services
• Many “middle tier” applications need to act on
  behalf of their clients when interacting...
Customizing Security

• Current plug-in points:
 • hadoop.http.filter.initializers - may configure a
    custom ServletFil...
Deployment Gotchas

• MIT Kerberos 1.8.1 (in Ubuntu) incompatible with
  Java Krb5 implementation
  • Run “kinit -R” after...
Best Practices for AD Integration

• MIT Kerberos realm inside cluster:
 • CLUSTER.FOOCORP.COM
• Existing Active Directory...
What Hadoop Security Is

• Strong authentication
 • Malicious impersonation now impossible
• Better authorization
 • More ...
What Hadoop Security Is Not



• Encryption on the wire
• Encryption on disk
• Protection against DOS attacks
• Enabled by...
Security Roadmap

• Comprehensive documentation and best
  practices
• Pluggable “edge authentication” (eg PKI, SAML)
• Ea...
Upcoming SlideShare
Loading in...5
×

Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010

2,524

Published on

Hadoop Security

Todd Lipcon and Aaron Myers, Cloudera

Learn more @ http://www.cloudera.com/hadoop/

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,524
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010"

  1. 1. Todd Lipcon and Aaron T. Myers @tlipcon / @atmyers
  2. 2. Who are We? • Todd Lipcon – Software Engineer, Cloudera • HDFS contributor, HBase committer • Primarily works on the Platform Team • Aaron Myers – Software Engineer, Cloudera • Masters thesis on security sandboxing in Linux kernel • Primarily works on the Application Team
  3. 3. Outline • Hadoop Security Overview • Hadoop Security pre CDH3b3 • Hadoop Security with CDH3b3 • Details of Deploying Secure Hadoop • Summary
  4. 4. Hadoop Security: Overview
  5. 5. Why do we care about security? • Foo, Inc has a product that has both paid ads and search • “Payment fraud” team needs logs of all credit card payments • “Search quality” team needs all search logs and click history • “Search quality” team probably shouldn't access “payment fraud” datasets. • But “ads fraud” team may need to access both search logs and payment info • If they can share a cluster, we also get better utilization!
  6. 6. Security pre CDH3b3: User Authentication • Authentication is by vigorous assertion • Trivial to impersonate other user: • Just set property “hadoop.job.ugi” when running job or command
  7. 7. Security pre CDH3b3: Server Authentication None
  8. 8. Security pre CDH3b3: HDFS • Unix-like file permissions were introduced in Hadoop v16.1 • Provides standard user/group/other r/w/x • Protects well-meaning users from accidents • Does nothing to prevent malicious users from causing harm (weak authentication)
  9. 9. Security pre CDH3b3: Job Control • ACLs per job queue for job submission / killing • No ACLs for viewing counters / logs • Does nothing to prevent malicious users from causing harm (weak authentication)
  10. 10. Security pre CDH3b3: Tasks • Individual tasks all run as the same user • Whoever the TT is running as (usually 'hadoop') • Tasks not isolated from each other • Tasks which read/write from local storage can interfere with each other • Malicious tasks can kill each other
  11. 11. Security pre CDH3b3: Web interfaces None
  12. 12. Security with CDH3b3: User Authentication • Authentication is secured by Kerberos v5 • RPC connections secured with SASL “GSSAPI” mechanism • Provides proven, strong authentication and single-sign-on • Hadoop servers can ensure that users are who they say they are • Group resolution is done on the server side
  13. 13. Security with CDH3b3: Server Authentication • Kerberos authentication is bi-directional • Users can be sure that they are communicating with the Hadoop server they think they are
  14. 14. Security with CDH3b3: HDFS • Same permissions model • But, a user can no longer trivially impersonate other users (strong authentication)
  15. 15. Security with CDH3b3: Job Control • A job now has its own ACLs, including a view ACL • A job can now specify who can view logs, counters, configuration, and who can modify (kill) it • JT enforces these ACLs (strong authentication)
  16. 16. Security with CDH3b3: Tasks • Tasks can now run as the user who launched the job • Ensures isolation of tasks which run on the same TT • Local file permissions enforced • Local system permissions enforced (e.g. signals) • Can take advantage of per-user system limits • e.g. Linux ulimits
  17. 17. Security with CDH3b3: Web Interfaces • Out of the box Kerberized SSL support • Pluggable servlet filters (more on this later)
  18. 18. Security with CDH3b3: Threat Model • The Hadoop security system assumes that: • Users do not have root access to cluster machines • Users do not have root access to shared user machines (e.g. bastion box) • Users cannot read or inject packets on the network
  19. 19. Thanks, Yahoo! Yahoo! did the vast majority of the core Hadoop security work
  20. 20. Hadoop Security: Deployment Details
  21. 21. Requirements: Kerberos Infrastructure • Kerberos domain (KDC) • eg. MIT krb5 in RHEL, or MS Active Directory • Kerberos principals (SPNs) for every daemon • hdfs/hostname@REALM for DN, NN, 2NN • mapred/hostname@REALM for TT and JT • host/hostname@REALM for web UIs • Keytabs for service principals distributed to correct hosts
  22. 22. Configuring daemons for security • Most daemons have two configs: • Keytab location (eg dfs.datanode.keytab.file) • Kerberos principal (eg dfs.datanode.kerberos.principal) • Principal can use the special token '_HOST' to substitute hostname of the daemon (eg 'hdfs/_HOST@MYREALM') • Several other configs to enable security in the first place • See example-confs/conf.secure in CDH3
  23. 23. Setting up users • Each user must have a Kerberos principal • May want some shared accounts: • sharedaccount/alice and sharedaccount/bob principals both act as sharedaccount on HDFS - you can use this! • hdfs/alice is also useful for alice to act as a superuser • Users running MR jobs must also have unix accounts on each of the slaves • Centralized user database (eg LDAP) is a practical necessity
  24. 24. Installing Secure Hadoop • MapReduce and HDFS should run as separate users (e.g. 'hdfs' and 'mapred') • New task-controller setuid executable allows tasks to run as a user • New JNI code in libhadoop.so to plug subtle security holes • Install CDH3b3 with hadoop-0.20-sbin and hadoop-0.20-native packages to get this all set up
  25. 25. Securing higher-level services • Many “middle tier” applications need to act on behalf of their clients when interacting with Hadoop • e.g: Oozie, Hive Server, Hue/Beeswax • “Proxy User” feature provides “secure impersonation” (think sudo). • hadoop.proxyuser.oozie.hosts - IPs where “oozie” may act as an impersonator • hadoop.proxyuser.oozie.groups - groups whose users “oozie” may impersonate
  26. 26. Customizing Security • Current plug-in points: • hadoop.http.filter.initializers - may configure a custom ServletFilter to integrate with existing enterprise web SSO • hadoop.security.group.mapping - map a kerberos principal (alice@FOOCORP.COM) to a set of groups (users,engstaff,searchquality,adsdata) • hadoop.security.auth_to_local - regex mappings of Kerberos principals to usernames
  27. 27. Deployment Gotchas • MIT Kerberos 1.8.1 (in Ubuntu) incompatible with Java Krb5 implementation • Run “kinit -R” after kinit to work around • Enable allow_weak_crypto in /etc/krb5.conf - necessary for kerberized SSL • Must deploy “unlimited security policy JAR” in JAVA_HOME/jre/lib/security • Lifesaver: HADOOP_OPTS= ”-Dsun.security.krb5.debug=true” hadoop ...
  28. 28. Best Practices for AD Integration • MIT Kerberos realm inside cluster: • CLUSTER.FOOCORP.COM • Existing Active Directory domain: • FOOCORP.COM or maybe AD.FOOCORP.COM • Set up one-way cross-realm trust • Cluster realm must trust corporate AD realm • See “Step by Step Guide to Kerberos 5 Interoperability” in Windows Server docs
  29. 29. What Hadoop Security Is • Strong authentication • Malicious impersonation now impossible • Better authorization • More control over who can view/control jobs • Ensure isolation between running tasks • An ongoing development priority
  30. 30. What Hadoop Security Is Not • Encryption on the wire • Encryption on disk • Protection against DOS attacks • Enabled by default
  31. 31. Security Roadmap • Comprehensive documentation and best practices • Pluggable “edge authentication” (eg PKI, SAML) • Easier deployment if you use Cloudera Enterprise • More authorization features across CDH components • e.g. Hive and HBase access control

×