• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop Security in Detail__HadoopSummit2010

Hadoop Security in Detail__HadoopSummit2010



Hadoop Summit 2010 - Developers Track

Hadoop Summit 2010 - Developers Track
Hadoop Security in Detail
Owen O'Malley, Yahoo!



Total Views
Views on SlideShare
Embed Views



2 Embeds 18

http://paper.li 9
http://paper.li 9



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Hadoop Security in Detail__HadoopSummit2010 Hadoop Security in Detail__HadoopSummit2010 Presentation Transcript

    • Hadoop Security Hadoop Summit 2010 Owen O’Malley [email_address] Yahoo’s Hadoop Team
    • Problem
      • Yahoo! has more yahoos than clusters.
        • Hundreds of yahoos using Hadoop each month
        • 38,000 computers in ~20 Hadoop clusters.
        • Sharing requires isolation or trust.
      • Different users need different data.
        • Not all yahoos should have access to sensitive data
          • financial data and PII
      • In Hadoop 0.20, easy to impersonate.
          • Segregate different data on separate clusters
    • Solution
      • Prevent unauthorized HDFS access
        • All HDFS clients must be authenticated.
        • Including tasks running as part of MapReduce jobs
        • And jobs submitted through Oozie.
      • Users must also authenticate servers
        • Otherwise fraudulent servers could steal credentials
      • Integrate Hadoop with Kerberos
        • Provides well tested open source distributed authentication system.
    • Requirements
      • Security must be optional.
        • Not all clusters are shared between users.
      • Hadoop must not prompt for passwords
        • Makes it easy to make trojan horse versions.
        • Must have single sign on.
      • Must support backwards compatibility
        • HFTP must be secure, but allow reading from insecure clusters
    • Kerberos and Single Sign-on
      • Kerberos allows user to sign in once
        • Obtains Ticket Granting Ticket (TGT)
          • kinit – get a new Kerberos ticket
          • klist – list your Kerberos tickets
          • kdestroy – destroy your Kerberos ticket
          • TGT’s last for 10 hours, renewable for 7 days by default
        • Once you have a TGT, Hadoop commands just work
          • hadoop fs –ls /
          • hadoop jar wordcount.jar in-dir out-dir
    • Kerberos Dataflow
    • Definitions
      • Authentication – Determining the user
        • Hadoop 0.20 completely trusted the user
          • User states their username and groups over wire
        • We need it on both RPC and Web UI.
      • Authorization – What can that user do?
        • HDFS had owners, groups and permissions since 0.16.
        • Map/Reduce had nothing in 0.20.
    • Authentication
      • Changes low-level transport
      • RPC authentication using SASL
        • Kerberos
        • Token
        • Simple
      • Browser HTTP secured via plugin
      • Tool HTTP (eg. Fsck) via SSL/Kerberos
    • Primary Communication Paths
    • Authorization
      • HDFS
        • Command line unchanged
        • Web UI enforces authentication
      • MapReduce added Access Control Lists
        • Lists of users and groups that have access.
        • mapreduce.job.acl-view-job – view job
        • mapreduce.job.acl-modify-job – kill or modify job
    • API Changes
      • Very Minimal API Changes
        • UserGroupInformation *completely* changed.
      • MapReduce added secret credentials
        • Available from JobConf and JobContext
        • Never displayed via Web UI
      • Automatically get tokens for HDFS
        • Primary HDFS, File{In,Out}putFormat, and DistCp
        • Can set mapreduce.job.hdfs-servers
    • MapReduce Security Changes
      • MapReduce System directory now 700.
      • Tasks run as user instead of TaskTracker.
        • Setuid program that runs tasks.
      • Task directories are now 700.
      • Distributed Cache is now secure
        • Shared (original is world readable) is shared by everyone’s jobs.
        • Private (original is not world readable) is shared by user’s jobs.
    • Web UIs
      • Hadoop relies on the Web UIs.
        • These need to be authenticated also…
      • Web UI authentication is pluggable.
        • Yahoo uses an internal package
        • We have written a very simple static auth plug-in
          • Dr. Who returns again (the third doctor?)
      • We really need a SPNEGO plug-in…
      • All servlets enforce permissions.
    • Proxy-Users
      • Some services access HDFS and MapReduce as other users.
      • Configure services with the proxy user:
        • Who the proxy service can impersonate
          • hadoop.proxyuser.superguy.groups=goodguys
        • Which hosts they can impersonate from
          • hadoop.proxyuser.superguy.hosts=secretbase
      • New admin commands to refresh
        • Don’t need to bounce cluster
    • Out of Scope
      • Encryption
        • RPC transport – easy
        • Block transport protocol – difficult
        • On disk – difficult
      • File Access Control Lists
        • Still use Unix-style owner, group, other permissions
      • Non-Kerberos Authentication
        • Much easier now that framework is available
    • Schedule
      • The security team worked hard to get security added to Hadoop on schedule.
      • Security Development team:
        • Devaraj Das, Ravi Gummadi, Jakob Homan, Owen O’Malley, Jitendra Pandey, Boris Shkolnik, Vinod Vavilapalli, Kan Zhang
      • Currently on science (beta) clusters
      • Deploy to production clusters in August
    • Questions?
      • Questions should be sent to:
        • common/hdfs/mapreduce-user@hadoop.apache.org
      • Security holes should be sent to:
        • [email_address]
      • Available from
        • http://developer.yahoo.com/hadoop/distribution/
        • Also a VM with Hadoop cluster with security
      • Thanks!