Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Security Preview


Published on

Hadoop Security preview presentation at the Hadoop Bay Area User Group, March 24 at Yahoo! Sunnyvale campus

  • could not download
    Are you sure you want to  Yes  No
    Your message goes here

Hadoop Security Preview

  1. 1. Preview of Hadoop Security Owen O’Malley Yahoo Hadoop Development [email_address]
  2. 2. Problem <ul><li>Primary Goal: Keep Data in HDFS Secure from unauthorized access! </li></ul><ul><li>Corollary: All HDFS clients must be authenticated to ensure they are the user they claim to be. </li></ul><ul><li>Since Map/Reduce runs applications as the user, it must authenticate users. </li></ul><ul><li>Since servers (HDFS, Map/Reduce) are entrusted with user credentials, they must also be authenticated. </li></ul><ul><li>Kerberos will be the underlying authentication system. </li></ul><ul><li>Must be able to configure security on or off. </li></ul>
  3. 3. Adding Security to a Large Project
  4. 4. Security Development Team <ul><li>Boris Shkolnik </li></ul><ul><li>Devaraj Das </li></ul><ul><li>Jakob Homan </li></ul><ul><li>Owen O’Malley </li></ul><ul><li>Kan Zhang </li></ul><ul><li>Jitendra Nath Pandey </li></ul><ul><li>With Paranoid assistance from: </li></ul><ul><ul><ul><li>Ram Marti </li></ul></ul></ul>
  5. 5. Security Threats in Hadoop <ul><li>User to Service Authentication </li></ul><ul><ul><li>No User Authentication on NameNode or JobTracker </li></ul></ul><ul><ul><ul><li>Client code supplies user and group names </li></ul></ul></ul><ul><ul><li>No User Authorization on DataNode – Fixed in 0.21 </li></ul></ul><ul><ul><ul><li>Users can read/write any block </li></ul></ul></ul><ul><ul><li>No User Authorization on JobTracker </li></ul></ul><ul><ul><ul><li>Users can modify or kill other user’s jobs </li></ul></ul></ul><ul><ul><ul><li>Users can modify the persistent state of JobTracker </li></ul></ul></ul><ul><li>Service to Service Authentication </li></ul><ul><ul><li>No Authentication of DataNodes and TaskTrackers </li></ul></ul><ul><ul><ul><li>Users can start fake DataNodes and TaskTrackers </li></ul></ul></ul><ul><li>No Encryption on Wire or Disk </li></ul>
  6. 6. Definitions <ul><li>Authentication – Ensuring the user is who they claim to be. </li></ul><ul><ul><li>We have a very poor job of this currently </li></ul></ul><ul><ul><li>We need it on both RPC and Web UI. </li></ul></ul><ul><li>Authorization – Ensuring the user can only do things that they are allowed to do. </li></ul><ul><ul><li>HDFS does this already via owners, groups and permissions </li></ul></ul><ul><ul><li>Map/Reduce does not do this </li></ul></ul>
  7. 7. Using Kerberos and Single Signon <ul><li>Kerberos allows user to sign in once to obtain Ticket Granting Tickets (TGT) </li></ul><ul><ul><ul><li>kinit – get a new Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>klist – list your Kerberos tickets </li></ul></ul></ul><ul><ul><ul><li>kdestroy – destroy your Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>TGT’s last for 10 hours, renewable for 7 days by default </li></ul></ul></ul><ul><ul><li>PAM on Linux and Solaris can automatically do kinit for you </li></ul></ul><ul><ul><ul><li>Still needs your password </li></ul></ul></ul><ul><ul><li>Once you have a TGT Hadoop commands work like before </li></ul></ul><ul><ul><ul><li>hadoop fs –ls / </li></ul></ul></ul><ul><ul><ul><li>hadoop jar wordcount.jar in-dir out-dir </li></ul></ul></ul>
  8. 8. Kerberos Dataflow
  9. 9. API Changes <ul><li>Very Minimal API Changes! </li></ul><ul><li>UserGroupInformation *completely* changed. </li></ul><ul><li>MapReduce added Authorization </li></ul><ul><li>Jobs now have a Credentials object that can store secrets. (available from JobConf and JobContext) </li></ul><ul><li>Automatically get tokens for HDFS systems </li></ul><ul><li>Primary HDFS, File{In,Out}putFormat, and DistCp </li></ul><ul><li>Can set mapreduce.job.hdfs-servers </li></ul><ul><li>Set ACL’s via mapreduce.job.acl-{view,modify}-job </li></ul>Yahoo! Template 3, Confidential
  10. 10. Other MapReduce Security Changes <ul><li>MapReduce System directory was 777 but now 700. </li></ul><ul><li>Tasks run as user instead of TaskTracker user. </li></ul><ul><li>Task directories were globally visible and now 700. </li></ul><ul><li>Distributed Cache is now secure </li></ul><ul><li>Shared (original is world readable) is shared by everyone’s jobs. </li></ul><ul><li>Private (original is not world readable) is shared by user’s jobs. </li></ul>
  11. 11. Web UIs <ul><li>Hadoop and especially MapReduce make heavy use of the Web UIs. </li></ul><ul><li>These need to be authenticated also… </li></ul><ul><li>We will make it pluggable, but include a login module that uses the Kerberos username and password. </li></ul><ul><li>Even better is if someone makes a SPNEGO filter for Jetty that uses the Kerberos tickets from the browser. </li></ul><ul><li>All of the servlets will use the authenticated username and enforce permissions appropriately. </li></ul>
  12. 12. Proxy-Users <ul><li>Some services must access HDFS and MapReduce as other users </li></ul><ul><li>HDFS and MapReduce allow users to create configuration entries to define: </li></ul><ul><li>Who the proxy service can impersonate </li></ul><ul><li>Which hosts they can impersonate from </li></ul><ul><li>hadoop.proxyuser.superguy.groups=goodguys </li></ul><ul><li>hadoop.proxyuser.superguy.hosts=secretbase </li></ul>
  13. 13. Remaining Security Issues <ul><li>We are not encrypting on the wire. </li></ul><ul><ul><ul><li>It will be possible within the framework, but not in 0.22. </li></ul></ul></ul><ul><li>We are not encrypting on disk. </li></ul><ul><ul><ul><li>For either HDFS or MapReduce. </li></ul></ul></ul><ul><li>Encryption is expensive in terms of CPU and IO speed. </li></ul><ul><li>Our current threat model is that the attacker has access to a user account, but not root or physical access. </li></ul><ul><ul><ul><li>They can’t sniff the packets on the network. </li></ul></ul></ul>