Hadoop Security Preview


Published on

Hadoop Security preview presentation at the Hadoop Bay Area User Group, March 24 at Yahoo! Sunnyvale campus

1 Comment
  • could not download
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop Security Preview

  1. 1. Preview of Hadoop Security Owen O’Malley Yahoo Hadoop Development [email_address]
  2. 2. Problem <ul><li>Primary Goal: Keep Data in HDFS Secure from unauthorized access! </li></ul><ul><li>Corollary: All HDFS clients must be authenticated to ensure they are the user they claim to be. </li></ul><ul><li>Since Map/Reduce runs applications as the user, it must authenticate users. </li></ul><ul><li>Since servers (HDFS, Map/Reduce) are entrusted with user credentials, they must also be authenticated. </li></ul><ul><li>Kerberos will be the underlying authentication system. </li></ul><ul><li>Must be able to configure security on or off. </li></ul>
  3. 3. Adding Security to a Large Project
  4. 4. Security Development Team <ul><li>Boris Shkolnik </li></ul><ul><li>Devaraj Das </li></ul><ul><li>Jakob Homan </li></ul><ul><li>Owen O’Malley </li></ul><ul><li>Kan Zhang </li></ul><ul><li>Jitendra Nath Pandey </li></ul><ul><li>With Paranoid assistance from: </li></ul><ul><ul><ul><li>Ram Marti </li></ul></ul></ul>
  5. 5. Security Threats in Hadoop <ul><li>User to Service Authentication </li></ul><ul><ul><li>No User Authentication on NameNode or JobTracker </li></ul></ul><ul><ul><ul><li>Client code supplies user and group names </li></ul></ul></ul><ul><ul><li>No User Authorization on DataNode – Fixed in 0.21 </li></ul></ul><ul><ul><ul><li>Users can read/write any block </li></ul></ul></ul><ul><ul><li>No User Authorization on JobTracker </li></ul></ul><ul><ul><ul><li>Users can modify or kill other user’s jobs </li></ul></ul></ul><ul><ul><ul><li>Users can modify the persistent state of JobTracker </li></ul></ul></ul><ul><li>Service to Service Authentication </li></ul><ul><ul><li>No Authentication of DataNodes and TaskTrackers </li></ul></ul><ul><ul><ul><li>Users can start fake DataNodes and TaskTrackers </li></ul></ul></ul><ul><li>No Encryption on Wire or Disk </li></ul>
  6. 6. Definitions <ul><li>Authentication – Ensuring the user is who they claim to be. </li></ul><ul><ul><li>We have a very poor job of this currently </li></ul></ul><ul><ul><li>We need it on both RPC and Web UI. </li></ul></ul><ul><li>Authorization – Ensuring the user can only do things that they are allowed to do. </li></ul><ul><ul><li>HDFS does this already via owners, groups and permissions </li></ul></ul><ul><ul><li>Map/Reduce does not do this </li></ul></ul>
  7. 7. Using Kerberos and Single Signon <ul><li>Kerberos allows user to sign in once to obtain Ticket Granting Tickets (TGT) </li></ul><ul><ul><ul><li>kinit – get a new Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>klist – list your Kerberos tickets </li></ul></ul></ul><ul><ul><ul><li>kdestroy – destroy your Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>TGT’s last for 10 hours, renewable for 7 days by default </li></ul></ul></ul><ul><ul><li>PAM on Linux and Solaris can automatically do kinit for you </li></ul></ul><ul><ul><ul><li>Still needs your password </li></ul></ul></ul><ul><ul><li>Once you have a TGT Hadoop commands work like before </li></ul></ul><ul><ul><ul><li>hadoop fs –ls / </li></ul></ul></ul><ul><ul><ul><li>hadoop jar wordcount.jar in-dir out-dir </li></ul></ul></ul>
  8. 8. Kerberos Dataflow
  9. 9. API Changes <ul><li>Very Minimal API Changes! </li></ul><ul><li>UserGroupInformation *completely* changed. </li></ul><ul><li>MapReduce added Authorization </li></ul><ul><li>Jobs now have a Credentials object that can store secrets. (available from JobConf and JobContext) </li></ul><ul><li>Automatically get tokens for HDFS systems </li></ul><ul><li>Primary HDFS, File{In,Out}putFormat, and DistCp </li></ul><ul><li>Can set mapreduce.job.hdfs-servers </li></ul><ul><li>Set ACL’s via mapreduce.job.acl-{view,modify}-job </li></ul>Yahoo! Template 3, Confidential
  10. 10. Other MapReduce Security Changes <ul><li>MapReduce System directory was 777 but now 700. </li></ul><ul><li>Tasks run as user instead of TaskTracker user. </li></ul><ul><li>Task directories were globally visible and now 700. </li></ul><ul><li>Distributed Cache is now secure </li></ul><ul><li>Shared (original is world readable) is shared by everyone’s jobs. </li></ul><ul><li>Private (original is not world readable) is shared by user’s jobs. </li></ul>
  11. 11. Web UIs <ul><li>Hadoop and especially MapReduce make heavy use of the Web UIs. </li></ul><ul><li>These need to be authenticated also… </li></ul><ul><li>We will make it pluggable, but include a login module that uses the Kerberos username and password. </li></ul><ul><li>Even better is if someone makes a SPNEGO filter for Jetty that uses the Kerberos tickets from the browser. </li></ul><ul><li>All of the servlets will use the authenticated username and enforce permissions appropriately. </li></ul>
  12. 12. Proxy-Users <ul><li>Some services must access HDFS and MapReduce as other users </li></ul><ul><li>HDFS and MapReduce allow users to create configuration entries to define: </li></ul><ul><li>Who the proxy service can impersonate </li></ul><ul><li>Which hosts they can impersonate from </li></ul><ul><li>hadoop.proxyuser.superguy.groups=goodguys </li></ul><ul><li>hadoop.proxyuser.superguy.hosts=secretbase </li></ul>
  13. 13. Remaining Security Issues <ul><li>We are not encrypting on the wire. </li></ul><ul><ul><ul><li>It will be possible within the framework, but not in 0.22. </li></ul></ul></ul><ul><li>We are not encrypting on disk. </li></ul><ul><ul><ul><li>For either HDFS or MapReduce. </li></ul></ul></ul><ul><li>Encryption is expensive in terms of CPU and IO speed. </li></ul><ul><li>Our current threat model is that the attacker has access to a user account, but not root or physical access. </li></ul><ul><ul><ul><li>They can’t sniff the packets on the network. </li></ul></ul></ul>