• Save
Hadoop Security: Overview
 

Like this? Share it with your network

Share

Hadoop Security: Overview

on

  • 10,127 views

Cloudera Software Engineer, Aaron Myers, presented an overview of Apache Hadoop security at the Los Angeles Hadoop User Group.

Cloudera Software Engineer, Aaron Myers, presented an overview of Apache Hadoop security at the Los Angeles Hadoop User Group.

Statistics

Views

Total Views
10,127
Views on SlideShare
9,863
Embed Views
264

Actions

Likes
19
Downloads
0
Comments
0

6 Embeds 264

http://www.cloudera.com 225
http://answers.mapr.com 23
http://blog.cloudera.com 7
https://blog.cloudera.com 6
http://tech.shopzilla.com 2
http://test.cloudera.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop Security: Overview Presentation Transcript

  • 1. Private Property: No Trespassing Hadoop Security Explained Aaron T. Myers atm@cloudera.com @atm
  • 2. Who am I?• Aaron T. Myers – Software Engineer, Cloudera• Hadoop HDFS, Common Committer• Masters thesis on security sandboxing in Linux kernel• Primarily works on the Core Platform Team
  • 3. Outline• Hadoop Security Overview • Hadoop Security pre CDH3 • Hadoop Security with CDH3• Details of Deploying Secure Hadoop• Summary
  • 4. Hadoop Security: Overview
  • 5. Why do we care about security?• SecureCommerceWebSite, Inc has a product that has both paid ads and search• “Payment Fraud” team needs logs of all credit card payments• “Search Quality” team needs all search logs and click history• “Ads Fraud” team needs to access both search logs and payment info • So we cant segregate these datasets to different clusters• If they can share a cluster, we also get better utilization!
  • 6. Security pre CDH3: User Authentication• Authentication is by vigorous assertion• Trivial to impersonate other user: • Just set property “hadoop.job.ugi” when running job or command• Group resolution is done client side
  • 7. Security pre CDH3: Server Authentication None
  • 8. Security pre CDH3: HDFS• Unix-like file permissions were introduced in Hadoop v16.1• Provides standard user/group/other r/w/x• Protects well-meaning users from accidents• Does nothing to prevent malicious users from causing harm (weak authentication)
  • 9. Security pre CDH3: Job Control• ACLs per job queue for job submission / killing• No ACLs for viewing counters / logs• Does nothing to prevent malicious users from causing harm (weak authentication)
  • 10. Security pre CDH3: Tasks• Individual tasks all run as the same user • Whoever the TT is running as (usually hadoop)• Tasks not isolated from each other • Tasks which read/write from local storage can interfere with each other • Malicious tasks can kill each other• Hadoop is designed to execute arbitrary code
  • 11. Security pre CDH3: Web interfaces None
  • 12. Security with CDH3: User Authentication• Authentication is secured by Kerberos v5 • RPC connections secured with SASL “GSSAPI” mechanism • Provides proven, strong authentication and single-sign-on• Hadoop servers can ensure that users are who they say they are• Group resolution is done on the server side
  • 13. Security with CDH3: Server Authentication• Kerberos authentication is bi-directional• Users can be sure that they are communicating with the Hadoop server they think they are
  • 14. Security with CDH3: HDFS• Same general permissions model • Added sticky bit for directories (e.g. /tmp)• But, a user can no longer trivially impersonate other users (strong authentication)
  • 15. Security with CDH3: Job Control• A job now has its own ACLs, including a view ACL• Job can now specify who can view logs, counters, configuration, and who can modify (kill) it• JT enforces these ACLs (strong authentication)
  • 16. Security with CDH3: Tasks• Tasks now run as the user who launched the job • Probably the most complex part of Hadoops security implementation• Ensures isolation of tasks which run on the same TT • Local file permissions enforced • Local system permissions enforced (e.g. signals)• Can take advantage of per-user system limits • e.g. Linux ulimits
  • 17. Security with CDH3: Web Interfaces• Out of the box Kerberized SSL support• Pluggable servlet filters (more on this later)
  • 18. Security with CDH3: Threat Model• The Hadoop security system assumes that: • Users do not have root access to cluster machines • Users do not have root access to shared user machines (e.g. bastion box) • Users cannot read or inject packets on the network
  • 19. Thanks, Yahoo!Yahoo! did the vast majority of the core Hadoop security work
  • 20. Hadoop Security:Deployment Details
  • 21. Requirements: Kerberos Infrastructure• Kerberos domain (KDC) • eg. MIT Krb5 in RHEL, or MS Active Directory• Kerberos principals (SPNs) for every daemon • hdfs/hostname@REALM for DN, NN, 2NN • mapred/hostname@REALM for TT and JT • host/hostname@REALM for web UIs• Keytabs for service principals distributed to correct hosts
  • 22. Configuring daemons for security• Most daemons have two configs: • Keytab location (eg dfs.datanode.keytab.file) • Kerberos principal (eg dfs.datanode.kerberos.principal)• Principal can use the special token _HOST to substitute hostname of the daemon (eg hdfs/_HOST@MYREALM)• Several other configs to enable security in the first place • See example-confs/conf.secure in CDH3
  • 23. Setting up users• Each user must have a Kerberos principal• May want some shared accounts: • sharedaccount/alice and sharedaccount/bob principals both act as sharedaccount on HDFS - you can use this! • hdfs/alice is also useful for alice to act as a superuser• Users running MR jobs must also have unix accounts on each of the slaves• Centralized user database (eg LDAP) is a practical necessity
  • 24. Installing Secure Hadoop• MapReduce and HDFS services should run as separate users (e.g. hdfs and mapred)• New task-controller setuid executable allows tasks to run as a user• New JNI code in libhadoop.so to plug subtle security holes• Install CDH3 with hadoop-0.20-sbin and hadoop- 0.20-native packages to get this all set up
  • 25. Securing higher-level services• Many “middle tier” applications need to act on behalf of their clients when interacting with Hadoop • e.g: Oozie, Hive Server, Hue/Beeswax• “Proxy User” feature provides secure impersonation (think sudo). • hadoop.proxyuser.oozie.hosts - IPs where “oozie” may act as an impersonator • hadoop.proxyuser.oozie.groups - groups whose users “oozie” may impersonate
  • 26. Customizing Security• Current plug-in points: • hadoop.http.filter.initializers - may configure a custom ServletFilter to integrate with existing enterprise web SSO • hadoop.security.group.mapping - map a kerberos principal (alice@FOOCORP.COM) to a set of groups (users,engstaff,searchquality,adsdata) • hadoop.security.auth_to_local - regex mappings of Kerberos principals to usernames
  • 27. Deployment Gotchas• MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+) incompatible with Java Krb5 implementation • Run “kinit -R” after kinit to work around• Enable allow_weak_crypto in /etc/krb5.conf - necessary for kerberized SSL• Must deploy “unlimited security policy JAR” in JAVA_HOME/jre/lib/security• Lifesaver: HADOOP_OPTS= ”-Dsun.security.krb5.debug=true” hadoop ...
  • 28. Best Practices for AD Integration• MIT Kerberos realm inside cluster: • CLUSTER.FOOCORP.COM• Existing Active Directory domain: • FOOCORP.COM or maybe AD.FOOCORP.COM• Set up one-way cross-realm trust • Cluster realm must trust corporate AD realm • See “Step by Step Guide to Kerberos 5 Interoperability” in Windows Server docs
  • 29. Hadoop Security: Summary
  • 30. What Hadoop Security Is• Strong authentication • Malicious impersonation now impossible• Better authorization • More control over who can view/control jobs• Ensure isolation between running tasks• An ongoing development priority
  • 31. What Hadoop Security Is Not• Encryption on the wire• Encryption on disk• Protection against DOS attacks• Enabled by default
  • 32. Security Beyond Core Hadoop• Comprehensive documentation and best practices • https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide• All components of CDH3 are capable of interacting with a secure Hadoop cluster• Hive 0.7 (included in CDH3) added a rich set of access controls• Much easier deployment if you use Cloudera Enterprise
  • 33. Security Roadmap• Pluggable “edge authentication” (eg PKI, SAML)• More authorization features across CDH components • e.g. HBase access controls• Data encryption support
  • 34. Questions? Aaron T. Myersatm@cloudera.com @atm