Securing Hadoop @eBay

1,464 views
1,327 views

Published on

eBay has large, multi-purpose Hadoop clusters with many petabytes of data. eBay Inc?s many subsidiaries and applications give rise to fascinating, complex scenarios for authentication, audit, access control, data protection, data safety, and privacy. To fulfill these complex requirements, security is enabled on Hadoop clusters at eBay. We have been experimenting with and implementing in our production systems several techniques to 1) Set up and operate large clusters (thousands of nodes) efficiently and effectively for a thousand users, and 2)Enable security transparently without impacting user jobs or over-burdening users with inconvenient restrictions Based on our experiments, we have assembled some effective ?best? practices for deploying, operating and using secure Hadoop?clusters. In this presentation, we explain these methods and rules-of-thumb that efficiently and effectively: 1)Set up reliable, scalable Infrastructure to enable strong security for large clusters 2)Set up a very large secure Hadoop cluster in a scalable, zero-touch automated way 3)Update software and configuration on the cluster efficiently while minimizing ?drift? among machines 4)Keep Hadoop services up and running with minimum human intervention 5)Manage user Access to Hadoop clusters? 6)Control access to data and other resources 7)Process highly confidential data using map-reduce programs

Published in: Sports, Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,464
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Securing Hadoop @eBay

  1. 1. Secure Hadoop @ eBay Benoy Antony & Jos Backus
  2. 2. Overview Cluster facts Enabling Security Process Supervision
  3. 3. Cluster Facts •Shared clusters & dedicated clusters •10s of PB and 10’s of thousands of slots per cluster •Runs HDP 1.2 •Used Primarily for analysis of user behavior and inventory •Mix of production jobs and ad-hoc jobs •Mix of MR, Hive, Pig, Cascading, Streaming etc. Secure Hadoop @ eBay 3
  4. 4. Why is Security needed at eBay ? •To control access to sensitive data – ACLS are ineffective without strong authentication •To execute tasks as the Job submitter •Build new features – Encryption Secure Hadoop @ eBay 4
  5. 5. Hadoop Security Overview •Authentication using Kerberos •Authorization via ACLs. •Group and user information using LDAP •Pluggable authentication for webui Secure Hadoop @ eBay 5
  6. 6. Security Infrastructure @ eBay •Cluster machines including Gateway are inside the firewall •Uses Active Directory for Kerberos and LDAP •Separate Domain for users and Hadoop Servers CORP AD Gate way JT NN HBM DN TT RS DN TT RS Hadoop AD Secure Hadoop @ eBay 6
  7. 7. Advantages of Separate user and Server Domains •Separates User and Server Authentication •Prevents additional Kerberos and LDAP traffic to Corp Servers •Hadoop team can manage Hadoop Server Accounts CORP AD Hadoop AD Secure Hadoop @ eBay 7 Hadoop Cluster Nodes Server accounts User accounts
  8. 8. Syncing Hadoop User Information •All nodes require User and Group Information – Permissions checks – Running tasks •Hadoop AD should contain user and group information •Periodic synchronization of user information from CORP AD to Hadoop AD – LDAP Synchronization Connector – User’s password is not synced. CORP AD Gate way JT NN DN TT DN TT Hadoop AD Secure Hadoop @ eBay 8 LSC Hadoop groups Hadoop users Batch accounts
  9. 9. No Cross Domain Trust ! •Modified Hadoop Authentication Layer – Hadoop Masters have two principals and corresponding keytabs •hdfs/namenode@hadoop.ebay.com •hdfs/namenode@corp.ebay.com – Loads server principal and key based on the client – Require changes in Hadoop, Hbase and Zookeeper servers. NN Hadoop AD DN TT hdfs/nn@hadoop hdfs/nn@corp Secure Hadoop @ eBay 9 CORP AD Obtain service ticket for hdfs/nn Obtain service ticket for hdfs/nn
  10. 10. User Authentication - Obtaining tickets •Ad-hoc jobs/queries are run using personal accounts – PAM module fetches tickets at login – kinit when tickets expire. •Production jobs are run using batch accounts. – Uses keytabs to obtain tickets – Automatic ticket renewal using K5start – Enabled transparent security rollout Secure Hadoop @ eBay 10
  11. 11. Encrypting Sensitive Data •Use case – Copies encrypted data to the cluster. – Key identifiers passed during job submission. – Job Client fetches Keys from Key Store using user’s credentials – Key Values protected using Cluster’s public key •Work in progress Key Store Job Client Read secrets JJob, S Secure Hadoop @ eBay 11 Hadoop Cluster
  12. 12. Direct Access to the cluster •Current Cluster Access is through the Gateway machine •Direct Access to cluster from Desktops – The communication should be encrypted – Communication inside the firewall need not be encrypted •Advantages – Increases user productivity – Reduce utilization of Gateway Gate wayssh Secure Hadoop @ eBay 12 Hadoop Cluster Auth Auth+Privacy
  13. 13. Summary •Infrastructure using Active Directory and separate domains •Authentication across domains without domain trust •Rollout with minimal disruption •Additional security features Secure Hadoop @ eBay 13
  14. 14. Process Supervision •Why? •What? •Process tree •Configuring a service •Sample run scripts •Service state commands •The env directory Secure Hadoop @ eBay 14
  15. 15. Why? •Daemons die from time to time – We don’t know about it – Would be nice if we could do something about it in a smart way •There are different ways to control daemons – Not portable – Changes with platform – Some init scripts are not well-written – Some ways require sudo – Caller’s environment can affect how daemon runs – Some ways don’t handle automatic restarts  Enter process supervision! Secure Hadoop @ eBay 15
  16. 16. What? •daemontools-encore: a uniform mechanism to control daemons – Simple command set: svc, svstat, svup, svok – Supports process state change callback (notify script) •Alert when a daemon crashes •Smart restarts (don’t restart if trashing) – Can be used for one-shot jobs (svc –o) – Portable, runs on many UNIX versions – Robust and reliable code (small is beautiful) – Includes configurable log management •multilog manages stdout, stderr output •Never fill up your disks •Multiple log queues possible (e.g. everything, errors only) Secure Hadoop @ eBay 16
  17. 17. Process Tree PPID PID STAT UID TIME COMMAND 1 2170 Ss 0 0:00 /bin/sh /usr/bin/svscanboot 2170 2183 S 0 1:35 _ svscan /service 2183 2185 S 0 0:00 | _ supervise gmon 2185 6494 Ss 101 0:02 | | _ /usr/sbin/gmond --foreground 2183 2186 S 0 0:00 | _ supervise log 2186 2198 Ss 101 0:00 | | _ multilog t ./main 2183 2187 S 0 0:00 | _ supervise puppet 2187 11917 Ssl 0 0:30 | | _ /apache/ruby-1.9.3/bin/ruby /apache/ruby-1.9.3/bin/puppet agent --no-daemonize --debug 2183 2188 S 0 0:00 | _ supervise log 2188 2199 Ss 52 0:54 | | _ multilog t ./main 2183 2189 S 0 0:00 | _ supervise hbase-regionserver 2189 3221 Ssl 680 7823:56 | | _ /usr/java/latest/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -ea -XX:+HeapDumpOnOutOfMemoryError 2183 2190 S 0 0:00 | _ supervise log 2190 2196 Ss 680 0:00 | | _ multilog s10485760 n500 ./main 2183 2191 S 0 0:00 | _ supervise datanode 2191 31690 Ss 0 0:00 | | _ jsvc.exec -Dproc_datanode -outfile /apache/hadoop-1.1.2.22/libexec/../logs/jsvc.out 31690 31795 Sl 680 2457:22 | | _ jsvc.exec -Dproc_datanode -outfile /apache/hadoop-1.1.2.22/libexec/../logs/jsvc.out 2183 2192 S 0 0:00 | _ supervise log 2192 2204 Ss 680 0:00 | | _ multilog s10485760 n500 ./main 2183 2193 S 0 3:39 | _ supervise tasktracker 2193 28229 Ssl 680 1587:34 | | _ /usr/java/latest/bin/java -Dproc_tasktracker -Xmx600m -server -Dlog4j.configuration=log4j.properties 28229 8218 Ssl 1098929040 1103:23 | | _ /usr/java/jdk1.6.0_31/jre/bin/java -Djava.library.path=/apache/hadoop-1.1.2.22/libexec/../ 28229 30645 Ssl 1098929444 1:27 | | _ /usr/java/jdk1.6.0_31/jre/bin/java -Djava.library.path=/apache/hadoop-1.1.2.22/libexec/../ 28229 7444 Ssl 1098929009 5:53 | | _ /usr/java/jdk1.6.0_31/jre/bin/java -Djava.library.path=/apache/hadoop-1.1.2.22/libexec/../ 28229 7446 Ssl 1098929009 6:12 | | _ /usr/java/jdk1.6.0_31/jre/bin/java -Djava.library.path=/apache/hadoop-1.1.2.22/libexec/../ 28229 7455 Ssl 1098929009 6:07 | | _ /usr/java/jdk1.6.0_31/jre/bin/java -Djava.library.path=/apache/hadoop-1.1.2.22/libexec/../ 28229 7848 Ssl 1098929009 6:32 | | _ /usr/java/jdk1.6.0_31/jre/bin/java -Djava.library.path=/apache/hadoop-1.1.2.22/libexec/../ 2183 2194 S 0 0:00 | _ supervise log 2194 2205 Ss 680 2:06 | _ multilog s10485760 n500 ./main 2170 2184 S 0 0:00 _ readproctitle service errors: ............................................................................ Secure Hadoop @ eBay 17
  18. 18. Configuring A Service •A service consists of a directory: /var/lib/service/foo •Holds some files and directories: – start (optional) – run – notify (optional) – stop (optional) – log/run – log/main – env •To enable a service, put a symlink to it in /service, and svscan will start it: – ln –s /var/lib/service/foo /service/foo Secure Hadoop @ eBay 18
  19. 19. Sample run Scripts /service/tasktracker/run #!/bin/sh exec 2>&1 # Give the hadoop user access setfacl -R -m u:hadoop:rwx supervise exec envdir env setuidgid hadoop /apache/hadoop/bin/hadoop tasktracker /service/tasktracker/log/run #!/bin/sh # Give the hadoop user access setfacl -R -m u:hadoop:rwx supervise test -d main || install -o hadoop -d main exec setuidgid hadoop multilog s10485760 n500 ./main Secure Hadoop @ eBay 19
  20. 20. The env directory # pwd /service/datanode/env # head * ==> HADOOP_DATANODE_OPTS <== -Dhadoop.log.file.RFA.MaxBackupIndex=500 -Dhadoop.log.file.RFA.MaxFileSize=100MB ==> HADOOP_HOME <== /apache/hadoop ==> HADOOP_LOG_DIR <== /apache/hadoop/logs ==> HADOOP_LOGFILE <== hadoop-hadoop-datanode.log ==> HADOOP_ROOT_LOGGER <== INFO,RFA ==> HADOOP_SECURE_DN_USER <== Hadoop # Can use echo and rm to edit values! Secure Hadoop @ eBay 20
  21. 21. Service State Commands Secure Hadoop @ eBay 21 # svstat /service/* /service/datanode: up (pid 31690) 2774877 seconds, running /service/gmon: up (pid 24474) 41500 seconds, running /service/hbase-regionserver: up (pid 3221) 6475035 seconds, running /service/puppet: up (pid 11917) 2246936 seconds, running /service/tasktracker: up (pid 28229) 2757029 seconds, running # svc -t /service/datanode # sleep 10 # svstat /service/datanode /service/datanode: up (pid 8203) 10 seconds, running # svc -d /service/datanode # svstat /service/datanode /service/datanode: down 6 seconds, normally up, stopped # svc -u /service/datanode # sleep 10 # svstat /service/datanode /service/datanode: up (pid 9582) 10 seconds, running #
  22. 22. Questions ? Secure Hadoop @ eBay 22

×