Your SlideShare is downloading. ×

Hadoop Security Preview


Published on

Hadoop Security Preview from Yahoo!, presented at the Hadoop Bay Area User Group, March 24th

Hadoop Security Preview from Yahoo!, presented at the Hadoop Bay Area User Group, March 24th

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Preview of Hadoop Security Owen O’Malley Yahoo Hadoop Development [email_address]
  • 2. Problem
    • Primary Goal: Keep Data in HDFS Secure from unauthorized access!
    • Corollary: All HDFS clients must be authenticated to ensure they are the user they claim to be.
    • Since Map/Reduce runs applications as the user, it must authenticate users.
    • Since servers (HDFS, Map/Reduce) are entrusted with user credentials, they must also be authenticated.
    • Kerberos will be the underlying authentication system.
    • Must be able to configure security on or off.
  • 3. Adding Security to a Large Project
  • 4. Security Development Team
    • Boris Shkolnik
    • Devaraj Das
    • Jakob Homan
    • Owen O’Malley
    • Kan Zhang
    • Jitendra Nath Pandey
    • With Paranoid assistance from:
        • Ram Marti
  • 5. Security Threats in Hadoop
    • User to Service Authentication
      • No User Authentication on NameNode or JobTracker
        • Client code supplies user and group names
      • No User Authorization on DataNode – Fixed in 0.21
        • Users can read/write any block
      • No User Authorization on JobTracker
        • Users can modify or kill other user’s jobs
        • Users can modify the persistent state of JobTracker
    • Service to Service Authentication
      • No Authentication of DataNodes and TaskTrackers
        • Users can start fake DataNodes and TaskTrackers
    • No Encryption on Wire or Disk
  • 6. Definitions
    • Authentication – Ensuring the user is who they claim to be.
      • We have a very poor job of this currently
      • We need it on both RPC and Web UI.
    • Authorization – Ensuring the user can only do things that they are allowed to do.
      • HDFS does this already via owners, groups and permissions
      • Map/Reduce does not do this
  • 7. Using Kerberos and Single Signon
    • Kerberos allows user to sign in once to obtain Ticket Granting Tickets (TGT)
        • kinit – get a new Kerberos ticket
        • klist – list your Kerberos tickets
        • kdestroy – destroy your Kerberos ticket
        • TGT’s last for 10 hours, renewable for 7 days by default
      • PAM on Linux and Solaris can automatically do kinit for you
        • Still needs your password
      • Once you have a TGT Hadoop commands work like before
        • hadoop fs –ls /
        • hadoop jar wordcount.jar in-dir out-dir
  • 8. Kerberos Dataflow
  • 9. API Changes
    • Very Minimal API Changes!
    • UserGroupInformation *completely* changed.
    • MapReduce added Authorization
    • Jobs now have a Credentials object that can store secrets. (available from JobConf and JobContext)
    • Automatically get tokens for HDFS systems
    • Primary HDFS, File{In,Out}putFormat, and DistCp
    • Can set mapreduce.job.hdfs-servers
    • Set ACL’s via mapreduce.job.acl-{view,modify}-job
    Yahoo! Template 3, Confidential
  • 10. Other MapReduce Security Changes
    • MapReduce System directory was 777 but now 700.
    • Tasks run as user instead of TaskTracker user.
    • Task directories were globally visible and now 700.
    • Distributed Cache is now secure
    • Shared (original is world readable) is shared by everyone’s jobs.
    • Private (original is not world readable) is shared by user’s jobs.
  • 11. Web UIs
    • Hadoop and especially MapReduce make heavy use of the Web UIs.
    • These need to be authenticated also…
    • We will make it pluggable, but include a login module that uses the Kerberos username and password.
    • Even better is if someone makes a SPNEGO filter for Jetty that uses the Kerberos tickets from the browser.
    • All of the servlets will use the authenticated username and enforce permissions appropriately.
  • 12. Proxy-Users
    • Some services must access HDFS and MapReduce as other users
    • HDFS and MapReduce allow users to create configuration entries to define:
    • Who the proxy service can impersonate
    • Which hosts they can impersonate from
    • hadoop.proxyuser.superguy.groups=goodguys
    • hadoop.proxyuser.superguy.hosts=secretbase
  • 13. Remaining Security Issues
    • We are not encrypting on the wire.
        • It will be possible within the framework, but not in 0.22.
    • We are not encrypting on disk.
        • For either HDFS or MapReduce.
    • Encryption is expensive in terms of CPU and IO speed.
    • Our current threat model is that the attacker has access to a user account, but not root or physical access.
        • They can’t sniff the packets on the network.