Your SlideShare is downloading. ×
0
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Hadoop Security in Detail__HadoopSummit2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop Security in Detail__HadoopSummit2010

4,450

Published on

Hadoop Summit 2010 - Developers Track …

Hadoop Summit 2010 - Developers Track
Hadoop Security in Detail
Owen O'Malley, Yahoo!

Published in: Technology
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,450
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
12
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop Security Hadoop Summit 2010 Owen O’Malley [email_address] Yahoo’s Hadoop Team
  • 2. Problem <ul><li>Yahoo! has more yahoos than clusters. </li></ul><ul><ul><li>Hundreds of yahoos using Hadoop each month </li></ul></ul><ul><ul><li>38,000 computers in ~20 Hadoop clusters. </li></ul></ul><ul><ul><li>Sharing requires isolation or trust. </li></ul></ul><ul><li>Different users need different data. </li></ul><ul><ul><li>Not all yahoos should have access to sensitive data </li></ul></ul><ul><ul><ul><li>financial data and PII </li></ul></ul></ul><ul><li>In Hadoop 0.20, easy to impersonate. </li></ul><ul><ul><ul><li>Segregate different data on separate clusters </li></ul></ul></ul>
  • 3. Solution <ul><li>Prevent unauthorized HDFS access </li></ul><ul><ul><li>All HDFS clients must be authenticated. </li></ul></ul><ul><ul><li>Including tasks running as part of MapReduce jobs </li></ul></ul><ul><ul><li>And jobs submitted through Oozie. </li></ul></ul><ul><li>Users must also authenticate servers </li></ul><ul><ul><li>Otherwise fraudulent servers could steal credentials </li></ul></ul><ul><li>Integrate Hadoop with Kerberos </li></ul><ul><ul><li>Provides well tested open source distributed authentication system. </li></ul></ul>
  • 4. Requirements <ul><li>Security must be optional. </li></ul><ul><ul><li>Not all clusters are shared between users. </li></ul></ul><ul><li>Hadoop must not prompt for passwords </li></ul><ul><ul><li>Makes it easy to make trojan horse versions. </li></ul></ul><ul><ul><li>Must have single sign on. </li></ul></ul><ul><li>Must support backwards compatibility </li></ul><ul><ul><li>HFTP must be secure, but allow reading from insecure clusters </li></ul></ul>
  • 5. Kerberos and Single Sign-on <ul><li>Kerberos allows user to sign in once </li></ul><ul><ul><li>Obtains Ticket Granting Ticket (TGT) </li></ul></ul><ul><ul><ul><li>kinit – get a new Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>klist – list your Kerberos tickets </li></ul></ul></ul><ul><ul><ul><li>kdestroy – destroy your Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>TGT’s last for 10 hours, renewable for 7 days by default </li></ul></ul></ul><ul><ul><li>Once you have a TGT, Hadoop commands just work </li></ul></ul><ul><ul><ul><li>hadoop fs –ls / </li></ul></ul></ul><ul><ul><ul><li>hadoop jar wordcount.jar in-dir out-dir </li></ul></ul></ul>
  • 6. Kerberos Dataflow
  • 7. Definitions <ul><li>Authentication – Determining the user </li></ul><ul><ul><li>Hadoop 0.20 completely trusted the user </li></ul></ul><ul><ul><ul><li>User states their username and groups over wire </li></ul></ul></ul><ul><ul><li>We need it on both RPC and Web UI. </li></ul></ul><ul><li>Authorization – What can that user do? </li></ul><ul><ul><li>HDFS had owners, groups and permissions since 0.16. </li></ul></ul><ul><ul><li>Map/Reduce had nothing in 0.20. </li></ul></ul>
  • 8. Authentication <ul><li>Changes low-level transport </li></ul><ul><li>RPC authentication using SASL </li></ul><ul><ul><li>Kerberos </li></ul></ul><ul><ul><li>Token </li></ul></ul><ul><ul><li>Simple </li></ul></ul><ul><li>Browser HTTP secured via plugin </li></ul><ul><li>Tool HTTP (eg. Fsck) via SSL/Kerberos </li></ul>
  • 9. Primary Communication Paths
  • 10. Authorization <ul><li>HDFS </li></ul><ul><ul><li>Command line unchanged </li></ul></ul><ul><ul><li>Web UI enforces authentication </li></ul></ul><ul><li>MapReduce added Access Control Lists </li></ul><ul><ul><li>Lists of users and groups that have access. </li></ul></ul><ul><ul><li>mapreduce.job.acl-view-job – view job </li></ul></ul><ul><ul><li>mapreduce.job.acl-modify-job – kill or modify job </li></ul></ul>
  • 11. API Changes <ul><li>Very Minimal API Changes </li></ul><ul><ul><li>UserGroupInformation *completely* changed. </li></ul></ul><ul><li>MapReduce added secret credentials </li></ul><ul><ul><li>Available from JobConf and JobContext </li></ul></ul><ul><ul><li>Never displayed via Web UI </li></ul></ul><ul><li>Automatically get tokens for HDFS </li></ul><ul><ul><li>Primary HDFS, File{In,Out}putFormat, and DistCp </li></ul></ul><ul><ul><li>Can set mapreduce.job.hdfs-servers </li></ul></ul>
  • 12. MapReduce Security Changes <ul><li>MapReduce System directory now 700. </li></ul><ul><li>Tasks run as user instead of TaskTracker. </li></ul><ul><ul><li>Setuid program that runs tasks. </li></ul></ul><ul><li>Task directories are now 700. </li></ul><ul><li>Distributed Cache is now secure </li></ul><ul><ul><li>Shared (original is world readable) is shared by everyone’s jobs. </li></ul></ul><ul><ul><li>Private (original is not world readable) is shared by user’s jobs. </li></ul></ul>
  • 13. Web UIs <ul><li>Hadoop relies on the Web UIs. </li></ul><ul><ul><li>These need to be authenticated also… </li></ul></ul><ul><li>Web UI authentication is pluggable. </li></ul><ul><ul><li>Yahoo uses an internal package </li></ul></ul><ul><ul><li>We have written a very simple static auth plug-in </li></ul></ul><ul><ul><ul><li>Dr. Who returns again (the third doctor?) </li></ul></ul></ul><ul><li>We really need a SPNEGO plug-in… </li></ul><ul><li>All servlets enforce permissions. </li></ul>
  • 14. Proxy-Users <ul><li>Some services access HDFS and MapReduce as other users. </li></ul><ul><li>Configure services with the proxy user: </li></ul><ul><ul><li>Who the proxy service can impersonate </li></ul></ul><ul><ul><ul><li>hadoop.proxyuser.superguy.groups=goodguys </li></ul></ul></ul><ul><ul><li>Which hosts they can impersonate from </li></ul></ul><ul><ul><ul><li>hadoop.proxyuser.superguy.hosts=secretbase </li></ul></ul></ul><ul><li>New admin commands to refresh </li></ul><ul><ul><li>Don’t need to bounce cluster </li></ul></ul>
  • 15. Out of Scope <ul><li>Encryption </li></ul><ul><ul><li>RPC transport – easy </li></ul></ul><ul><ul><li>Block transport protocol – difficult </li></ul></ul><ul><ul><li>On disk – difficult </li></ul></ul><ul><li>File Access Control Lists </li></ul><ul><ul><li>Still use Unix-style owner, group, other permissions </li></ul></ul><ul><li>Non-Kerberos Authentication </li></ul><ul><ul><li>Much easier now that framework is available </li></ul></ul>
  • 16. Schedule <ul><li>The security team worked hard to get security added to Hadoop on schedule. </li></ul><ul><li>Security Development team: </li></ul><ul><ul><li>Devaraj Das, Ravi Gummadi, Jakob Homan, Owen O’Malley, Jitendra Pandey, Boris Shkolnik, Vinod Vavilapalli, Kan Zhang </li></ul></ul><ul><li>Currently on science (beta) clusters </li></ul><ul><li>Deploy to production clusters in August </li></ul>
  • 17. Questions? <ul><li>Questions should be sent to: </li></ul><ul><ul><li>common/hdfs/mapreduce-user@hadoop.apache.org </li></ul></ul><ul><li>Security holes should be sent to: </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><li>Available from </li></ul><ul><ul><li>http://developer.yahoo.com/hadoop/distribution/ </li></ul></ul><ul><ul><li>Also a VM with Hadoop cluster with security </li></ul></ul><ul><li>Thanks! </li></ul>

×