SlideShare a Scribd company logo
1 of 55
Download to read offline
Hadoop Security Overview
   - From security infrastructure
deployment to high-level services

                       Jason Shih
                              Etu
                      2 Oct, 2012
                 Hadoop in Taiwan
Outline
•  Kerberos & LDAP
   –  Configuration & Installation
   –  Authentication & Authorization
   –  Interoperability
•  Hadoop Security & Services
   –    Authentication & Authorization in Hadoop
   –    Token Delegation & communication path
   –    HDFS: NN & DN
   –    MapReduce: JT+TT
   –    HBase: ZK+MASTER+RS
•  Etu Appliance
   –  New features & key benefits
   –  Software stacks, versions & HW spec.
•  Troubleshooting


  Hadoop in Taiwan
                                                   2
Who am I?
•  Etu
   –  Hadoop System Architect
•  Grid Computing Centre, ASGC
   –  Tech Lead on Grid Operation
   –  Scope: DC, OP, DM & GT
   –  Experiment Support (LHC Analysis Software, ES, EC (W&C) etc.)
•  Before Grid Computing – HPC @ ASCC
   –     System administration (IBM, SGI, Sun, *nix)
   –     Architecture Design & Parallel filesystem
   –     Performance Tuning & Optimization
   –     Application Support etc.




    Hadoop in Taiwan
                                                                      3
Does security matter?
•  Ticket Breakdown:
   –  Comprise ~3.1% issues are security related
       •  Hadoop common, HDFS, MR, YARN, HBase, Hive, & Pig etc.
   –  Majority are common+HDFS+MR related: ~73%
                                     30                            300
                                                All Tickets




                                                                         Security Related Tickets
                                     25                            250
                                                Security Related

                                     20                            200
                    Percentage (%)




                                     15                            150

                                     10                            100

                                      5                            50

                                      0                            0




 Hadoop in Taiwan
                                                                                    4
LDAP
           (lightweight) directory access protocol
           Small bit of data, mostly read access
       NIS
           Pros: setup, administration, widely support & scale fairly well
           Cons: weakly encrypted password, difficult to FW, lack of system auth
       NIS+
           Complicated, limited client support.




Kerberos & LDAP
  Configuration & Installation
  Authentication & Authorization
  Interoperability




                          Hadoop in Taiwan
                                                                                   5
LDAP Authentication

•  OpenLDAP: Lightweight Directory Access Protocol
    –  X.500 base (model for directory service in OSI concept)
    –  X.400 Std. by ITU late 70’s & early 80’s (email service)
•  Why directory?
    –  Specialized database design for frequent queries but infrequent
       updates
    –  lack of rollback functionality & transaction support
    –  Easily replicated aiming for high availability & scalability (but depend
       on size of info being published or replicated).




Terminology: http://www.zytrax.com/books/ldap/apd/
                                                                             6
LDAP Overview

•  Building blocks:
   –  Schemas, objectClasses, Attributes, matchingRules,
     Operational objects etc.
•  Models:
   –  Information
      •  information or data presented may/may-not the way data is actually
         stored
   –  Naming:
       •  def: 'dc=example,dc=com’ stumble across in LDAP
   –  Functional
       •  Read, Search, Write & Modify
   –  Security
       •  Fine grained manner, who can do what to what data


                                                                              7
Kerberos Introduction

•  What is Kerberos
   –  Named after Cerberus, the three-headed dog of Greek mythology,
      because?
   –  Composite by three components:
       •  KDC (Kerberos Distribution Center)
       •  Clients (Users/Hosts/Services)
       •  Server (Service providers requested to establish session)
   –  Scope of deployment: realm
   –  KDC provide:
       •  AS (Authentication Server)
       •  TGS (Ticket Granting Service)



   Hadoop in Taiwan
                                                                       8
Kerberos Introduction (cont’)

•  Kerberos Client
   –  PAM enable (pam_krb5)
       •  Other application, recompilation effort required: e.g. OpenSSH
   –  Application w/ native Kerberos support but few limited to ver. IV
•  Other Extension
   –  Windows Authentication (AD)
   –  NFS Authentication & Encryption
   –  AFS (Global Filesystem)
•  Symmetric key operations
   –  Order of magnitude Faster than public key operations e.g. SSL
•  Performs authentication not authorization
•  When user authenticates, they are given a “ticket”
   –  Default Lifetime: 8Hr


 Hadoop in Taiwan
                                                                           9
Kerberos: Definition & Terminology
•  KDC (Kerberos Distribution Center)
•  TGT (Ticket Granting Ticket)
   –  Special ticket permit client to obtain additional Kerberos ticket within same realm
•  Keytab
   –  key table file containing one or more keys, same as for hosts & users
•  Principal
   –  Primary
        •  First part of a Kerberos principal
        •  User: username, Service: the name of the service
   –  Instance
        •  Provide information that qualifies the primary
        •  User: desc. the intended use of corresponding credentials
        •  Host: FQDN
   –  Realm
        •  Logical network served by a single Kerberos DB and a set of KDC



     Hadoop in Taiwan
                                                                                      10
Kerberos Overview
                                             KDC (Key Distribution Center)

                                                     AS
                                           (Authentication Server)
                                                                     Database
                     AS_REQ                         TGS
                                          (Ticket Granting Server)
                                 AS_REP

                             AS_REQ(host/app-server)
                             AS_REQ(host/filesevers)

                                               AS_REP


                    Client                                           Application Server
                                            AP_REQ

                                                 AP_REP
                                                                FileServers


                                AP_REQ + File Request                                     Domain




                                        AP_REP + File



 Hadoop in Taiwan
                                                                                                   11
Kerberos Principals & Realms

•  Principal
   –  Generic: Name/instance@realm
   –  Examples
       •  etu@testdomain.com
       •  etu/admin
       •  host/master.testdomain.com
       •  ldap/ldap.testdomain.com
   –  Realm
       •  Typically domain name in all CAPS:
            e.g.: TESTDOMAIN.COM




   Hadoop in Taiwan
                                               12
Kerberos Command line
•  Administration
   –  kadmin: used to make changes to the accounts in the
      Kerberos database
         •  kadmin.local
   –    klist: used to view the tickets in the credential cache
   –    kinit: used to log onto the realm with the client’s key
   –    kdestroy: erases the credential cache
   –    kpasswd: used to change user passwords
   –    kprop: used to synch the master KDC with replicas, if any
•  Utility
   –  kdb5_util: create, destroy, stash, dump, load, ark, add_mkey,
      use_mkey, list_mkeys, update_princ_encryption &
      purge_mkeys


                                                                    13
Kerberos Administration (kadmin.local)
•  Available requests:
  add_principal, addprinc, ank
  delete_principal, delprinc
  modify_principal, modprinc
  change_password, cpw
  get_principal, getprinc
  list_principals, listprincs, get_principals, getprincs
  add_policy, addpol
  modify_policy, modpol
  delete_policy, delpol
  get_policy, getpol
  list_policies, listpols, get_policies, getpols
  get_privs, getprivs
  ktadd, xst
  ktremove, ktrem
  lock
  unlock
  purgekeys
                                                           14
Kerberos Principals (I)

•  Default principals (default realm: TESTDOMAIN.COM)

  K/M@TESTDOMAIN.COM
  hdfs@TESTDOMAIN.COM
  kadmin/admin@TESTDOMAIN.COM
  kadmin/changepw@TESTDOMAIN.COM
  Kadmin/master.testdomain.com@TESTDOMAIN.COM
  krbtgt/TESTDOMAIN.COM@TESTDOMAIN.COM
  ldapadm@TESTDOMAIN.COM
  ldap/master.testdomain.com@TESTDOMAIN.COM




                           Hadoop in Taiwan
                                                    15
Kerberos Principals (II)
•  Principals details (KV no., expiration & attributes)
  Principal: hdfs@TESTDOMAIN.COM
  Expiration date: [never]
  Last password change: Thu Nov 15 19:44:31 CST 2012
  Password expiration date: [none]
  Maximum ticket life: 1 day 00:00:00
  Maximum renewable life: 90 days 00:00:00
  Last modified: Thu Nov 15 19:44:31 CST 2012 (kadmin/admin@TESTDOMAIN.COM)
  Last successful authentication: [never]
  Last failed authentication: [never]
  Failed password attempts: 0
  Number of keys: 5
  Key: vno 2, aes128-cts-hmac-sha1-96, no salt
  Key: vno 2, aes256-cts-hmac-sha1-96, no salt
  Key: vno 2, arcfour-hmac, no salt
  Key: vno 2, des3-cbc-sha1, no salt
  Key: vno 2, des-cbc-crc, no salt
  MKey: vno 1
  Attributes:
  Policy: [none]                                                              16
Kerberos Server Configuration (I)
  •  libdefaults:
       default_realm = TESTDOMAIN.COM
           ticket_lifetime = 48h
           renew_lifetime = 8760h
           forwardable = true
           proxiable = true
           default_tkt_enctypes = aes128-cts-hmac-sha1-96 …
           default_tgs_enctypes = aes128-cts-hmac-sha1-96 …
           permitted_enctypes = aes128-cts-hmac-sha1-96 …
           dns_lookup_realm = false
           dns_lookup_kdc = false
           allow_weak_crypto = 1

Allow_weak_crypto – temporary workaround
•  By default, clients & servers will not using keys for ciphers.
•  Clients wont be able to authenticate to services w/ keys following support enctypes
•  Zero downtime w/ service updating new/strong-cophers keys to keytab
•  TGT can then update services’ keys to a sets including keys w/ stronger ciphers
   (kadmin cpw -keepold command)

                                                                                    17
Kerberos Server Configuration (II)

•  Realm & domain realm:
  [realms]
       TESTDOMAIN.COM = {
           default_domain = testdomain.com
           kdc = etu-master.testdomain.com
           admin_server = etu-master.testdomain.com
           database_module = openldap_ldapconf
       }

  [domain_realm]
      .testdomain.com = TESTDOMAIN.COM
      testdomain.com = TESTDOMAIN.COM




                               Hadoop in Taiwan
                                                      18
Kerberos Server Configuration (III)




                                      19
Kerberos KDC Config




                      20
Kerberos Encryption Types
•  Supported Ciphers
   –    des-cbc-crc
   –    des-cbc-md4
   –    des-cbc-md5
   –    des-cbc-raw
   –    des3-cbc-raw
   –    des-hmac-sha1
   –    arcfour-hmac-exp



•  Cryptographic Primitives
   –  Cryptographic Agility (v5)
   –  Etypes: Define set of primitives for cryptographic operations
        •  e.g.: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96,
          rc4-hmac, des-cbc-md5, rc4-hmac-exp
                                                                      21
Hadoop Security & Services
  HDFS: NN & DN
  MapReduce: JT+TT
  HBase: ZK+MASTER+RS




                  Hadoop in Taiwan
                                     22
Pre-CDH3
•  User Auth:
   –  User impersonation: set property “hadoop.job.ugi” in run job
•  Server Auth: N/A
•  HDFS (weak-authentication)
   –  Unix-like file permission (std: user/group/other r/w/x)
•  Job control:
   –  Lack of ACLs for counters/logging
   –  ACLs per job queue submission/killing
•  Web interface: N/A
•  Tasks:
   –  Not-isolated from the others
   –  All run as same users
   –  Interference with other tasks accessing identical local storage

                             Hadoop in Taiwan
                                                                        23
Security ship w/ CDH3:

•  Secure Authentication base on Kerberos
   –  RPC secured with SASL GSSAPI mechanism
   –  Strong authentication & SSO
•  Mutual authentication between servers/users/services
   –  Bi-directional for server auth.
•  HDFS:
   –  Same general permission model w/ sticky bit
•  ACLs for job control & view
•  Tasks isolation (launch by user) on same TT
•  Kerberized SSL support for web interface (pluggable
   serverlet)

                             Hadoop in Taiwan
                                                          24
Authentication & Authorization
•  Consideration
   –  Performance: symmetric keys (Kerberos) vs. public key (SSL)
   –  Management: central managed (KDC) vs. CRL propagation
•  Authentication – user identification
   –  Changes low-level transport
   –  RPC authentication using SASL
       •  Kerberos (GSSAPI)
       •  Token (GIGEST-MD5)
       •  Simple
   –  HTTP secured via plugin
•  Authorization – access control, resources & role
   –  HDFS
       •  Command line & semantics unchanged
       •  Web UI enforces authentication
   –  MapReduce added Access Control Lists
       •  Lists of users and groups that have access
       •  mapreduce.job.acl-view-job – view job
       •  mapreduce.job.acl-modify-job – kill or modify job
                                                                    25
Delegation Tokens
•  To prevent a flood of authentication requests at the start
   of a job, NameNode can create delegation tokens.
•  Allows user to authenticate once and pass credentials to
   all tasks of a job.
•  JobTracker automatically renews tokens while job is
   running.
•  Cancels tokens when job finishes.




                                                                26
Primary Communication Path




Ref: Hadoop Security Preview, Owen   27
Hadoop Security Enable

•  In “core-site.xml”
   –  Reset “simple” to disable security
   –  Property:
   hadoop.security.authentication = kerberos
   hadoop.security.authorization = true




                           Hadoop in Taiwan
                                               28
HDFS Security Configuration

•  In “hdfs-site.xml”, set property:
   dfs.block.access.token.enable = true
   dfs.namenode.keytab.file = ${HDFS_KEYTAB_PATH}

   dfs.namenode.kerberos.principal = ${HDFS_KRB5_PRINCIPAL}
   e.g.: etu/_HOST@${HADOOP_REALM}

   dfs.namenode.kerberos.internal.spnego.principal =
   HTTP/_HOST@${HADOOP_REALM}




                             Hadoop in Taiwan
                                                              29
Secondary NN Configuration

•  In “hdfs-site.xml”, set the following property:
   –  Similar properties as for NameNode
   –  Perfectly fine if you initiate with same Kerberos principal
   dfs.secondary.namenode.keytab.file
   dfs.secondary.namenode.kerberos.principal
   dfs.secondary.namenode.kerberos.internal.spnego.principal




                             Hadoop in Taiwan
                                                                    30
DataNode Security Configuration
•  In “hdfs-site.xml”
•  Replicate site xml to all DN
•  Privilege service port:
   –  Either recompile “jsvc” or adopt BigTop for secure DN service daemon
•  “sudo” privilege required
•  Appropriate variables for secured datanode
   HADOOP_SECURE_DN_USER
   HADOOP_SECURE_DN_PID_DIR (optional)
   HADOOP_SECURE_DN_LOG_DIR
   JSVC_HOME
•  Define the following properties:
   dfs.datanode.data.dir.perm
   dfs.datanode.address e.g.: 0.0.0.0:1004
   dfs.datanode.http.address e.g.: 0.0.0.0:1006
   dfs.datanode.keytab.file
   dfs.datanode.kerberos.principal e.g.: hdfs/_HOST@${HADOOP_REALM}
                                 Hadoop in Taiwan
                                                                             31
Secure HDFS Service Common Error
•  Error:
   ERROR security.UserGroupInformation: PriviledgedActionException
   as:etu (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed
   [Caused by GSSException: No valid credentials provided (Mechanism level:
   Failed to find any Kerberos tgt)]

   WARN ipc.Client: Exception encountered while connecting to the server :
   javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
   No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
   12/10/02 11:03:24 ERROR security.UserGroupInformation: PriviledgedActionException
   as:etu (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException:
   GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level:
   Failed to find any Kerberos tgt)]


•  C.f.: Successful Kerberos Authentication:
   Oct 02 11:06:16 master krb5kdc[30142](info): TGS_REQ (6 etypes {17 17 23 16 3 1})
   10.1.247.18: ISSUE: authtime 1349147029, etypes {rep=17 tkt=17 ses=17},
   etu@ETU.SYSTEX.TW for etu/master.etu.systex.tw@ETU.SYSTEX.TW



                                        Hadoop in Taiwan
                                                                                                  32
Secure MapReduce Configuration

•  In “mapred-site.xml”, for JT & TT
   –  Defined the following properties:
   mapreduce.jobtracker.kerberos.principal
   e.g.: mapred/_HOST@{HADOOP_REALM}

   mapreduce.jobtracker.keytab.file
   mapreduce.tasktracker.kerberos.principal
   mapreduce.tasktracker.keytab.file




                           Hadoop in Taiwan
                                              33
Secure MapReduce: TaskController

•  In “mapred-site.xml”
•  In taskcontroller.cfg:
   –  Default “banned.users” property is mapred, hdfs, and bin
   –  Default “min.user.id property” is 1000 (Err code: 255 if lower)
•  Take care also ownership & setuid for taskcontroller binary
   –  chown root:mapred task-controller
   –  chmod 4754 task-controller
•  Define also the following variables:
   mapred.task.tracker.task-controller
   e.g.: org.apache.hadoop.mapred.LinuxTaskController

   mapreduce.tasktracker.group
   e.g.: mapred

                              Hadoop in Taiwan
                                                                        34
Secure MapReduce: Best Practice

•  Always start with simple task before launch real workload:
   e.g.: PiEst
•  Make sure underlying HDFS enable security & functional
•  From KDC log:

  master krb5kdc[30142](info): TGS_REQ (6 etypes {17 17 23 16 3 1})
  192.168.70.18: ISSUE: authtime 1349147401, etypes {rep=17 tkt=17
  ses=17},
  etu@ETU.SYSTEX.TW for etu/master.etu.systex.tw@ETU.SYSTEX.TW




                           Hadoop in Taiwan
                                                                  35
Zookeeper Security Configuration (I)
•  zoo.cfg:
  authProvider.
  1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
  jaasLoginRenew=3600000


•  java.env
  export JVMFLAGS="-Djava.security.auth.login.config=/etc/
  zookeeper/conf/jaas.conf”




                          Hadoop in Taiwan
                                                                  36
Zookeeper Security Configuration (II)
•  JAAS configuration:
   Server:
   com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab="/etc/zookeeper/conf/zookeeper.keytab"
    storeKey=true
    useTicketCache=false
    principal="zookeeper/fully.qualified.domain.name@<YOUR-REALM>"


   Client:
   com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=false
    principal="zkcli"
    useTicketCache=true
    debug=true

                            Hadoop in Taiwan
                                                                     37
HBase Security Configuration
•  Authentication
  –  Identification mechanism for HBase servers & clients for HDFS, ZK
     & MR.
•  Authorization
  –  Ontop of coprocessor framework (AccessController): ACLs &
     allowable resources base on requesting users’ identity
•  Configuration:
  –    Secure HBase servers: master & regionserver
  –     REST API secure mode
  –    JAAS configuration for secure ZK quorum servers
  –    ACLs Configuration (table & column level)
         •  grant, revoke, alter and permission display




                                                                     38
HBase Severs w/ Secure HDFS Cluster
•  Required by all HBase severs, both Master & RS (hbase-site.xml)
•  Define following properties:

   hbase.security.authentication
   e.g.: kerberos

   hbase.rpc.engine
   e.g.: org.apache.hadoop.hbase.ipc.SecureRpcEngine

   hbase.regionserver.kerberos.principal
   e.g.: hbase/_HOST@${HADOOP_REALM}

   hbase.regionserver.keytab.file
   hbase.master.kerberos.principal
   hbase.master.keytab.file

                                                                 39
HBase: Secure ZK Quorum Connection
hbase-env.sh:
   export HBASE_OPTS="$HBASE_OPTS -Djava.security.auth.login.config=/
   etc/hbase/conf/zk-jaas.conf"
   export HBASE_MANAGES_ZK=false

   kerberos.removeHostFromPrincipal=true
   kerberos.removeRealmFromPrincipal=true


ZK JAAS configuration:
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   useTicketCache=false
   keyTab="/etc/hbase/conf/keytab.krb5”
   principal="hbase/fully.qualified.domain.name@<YOUR-REALM>";


HBase site xml, define also the following properties:
   hbase.zookeeper.quorum = $ZK_NODES
   hbase.cluster.distributed = true
                                                                        40
HBase Authorization Configuration

•  Required by all HBase severs, both Master & RS (hbase-site.xml)

  hbase.security.authorization (true)

  hbase.coprocessor.master.classes
  e.g.: org.apache.hadoop.hbase.security.access.AccessController

  hbase.coprocessor.region.classes
  e.g.: org.apache.hadoop.hbase.security.token.TokenProvider,
  org.apache.hadoop.hbase.security.access.AccessController




                                                                   41
HBase ACLs Rules

ACLs       Permissions
R/Read     Get, Scan, or Exists calls
W/Write    Put, Delete, LockRow, UnlockRow, IncrementColumnValue,
           CheckAndDelete, CheckAndPut, Flush, & Compact
C/Create   Create, Alter, & Drop
A/Admin    Enable, Disable, MajorCompact, Grant, Revoke, & Shutdown.




                                                                       42
HBase: ACLs for Authorization




                                43
Troubleshooting
•  Misconfiguration?
  –    Pseudo-distributed to cluster-wide configuration
  –    Full cluster functionality before kerberizing services
  –    Correct principal & keytab contains up-to-date KVNO.
  –    Disentangle security related settings to isolate root causes
        •  Ticket renewable fail? or expired.
•  System-wide
  –    Permission (files, directories and ownership), objClasses & ACLs
  –    System clock screw, KDC operation (REALM), file handle limitation? (ulimit)
  –    TT, RS, DN fail to start? Out of disk space? “dfs.datanode.du.reserved”
  –    Name resolve (reverse), routing (multi-channels) etc.
•  Extensive debugging info
  –  Increase root.logger level, e.g.: hadoop.root.logger & hadoop.security.logger
  –  Security mode: “-Djavax.net.debug=ssl -Dsun.security.krb5.debug=true”
•  Correct Hadoop “home” to look into?
•  Relevant logging system:
  –  KDC log provide: TGS & AS req., principals, authtime and etypes.

                                                                                     44
Etu Appliance
  New features & key benefits
  Software stacks & versions
  HW spec.




                       Hadoop in Taiwan
                                          45
Introducing – Etu Appliance 2.0
A purpose built high performance appliance for big data
processing
•  Automates cluster deployment
•  Optimize for the highest performance of big data processing
•  Delivers availability and security




                                                                 46
Why appliance?
     •  Misconfiguration comprise 35% of tickets
         –  Generic issues: memory allocation, disk spaces & file handling
     •  ~40% refer to system-wide and operation issues.
         –  System automation, robust deployment, dashboard and event
            management strictly required for production operation

                                              Stats. Upto 2011




Ref: Hadoop Troubleshooting, Kate, Cloudera                                  47
Key Benefits to Hadoopers

  Simplifies configuration and deployment
  •  Fully automated deployment and configuration

  High availability made easy
  •  Reliably deploy running mission critical big data application

  Enterprise-grade security
  •  Process and control sensitivity data with confidence

  Boost big data processing
  •  Fully optimized operating system to boost your data processing
     performance

  Provides a scalable and extensible foundation
  •  Adapts to your workload and grows with your business


                                                                      49
What’s New in Etu Appliance 2.0

•  New deployment feature – Auto deployment and
   configuration for master node high availability
•  New security features – LDAP integration and
   Kerberos authentication
•  New data source – Introduce Etu™ Dataflow data
   collection service with built in Syslog and FTP server
   for better integration with existing IT infrastructure
•  New user experience – new Etu™ Management
   Console with HDFS file browser and HBase table
   management



                                                            50
Software Stack Version Details

                                                                          Patch   Release
Apache Project             Package Version
                                                                          Level    Note
Apache Hadoop 2.0          hadoop-2.0.0+91                                 91
Apache Hbase               hbase-0.92.1+67                                 67
Apache Hive                hive-0.8.1+61                                   61
Apache Mahout              mahout-0.6+16                                   16
Apache MRv1                mr1-0.20.2+1216                                1216
Apache Pig                 pig-0.9.2+26                                    26
Apache Sqoop               Sqoop-1.4.1+28                                  28
Apache ZooKeeper           Zookeeper-3.4.3+15                              15
** CDH4 Release Note:
https://ccp.cloudera.com/display/CDH4DOC/CDH4+Release+Notes

** CDH4 Package Information:
https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging
+Information#CDHVersionandPackagingInformation-CDHVersion4.0.1Packaging




                                                                                            52
Etu References:
•  Chiang, Fred. (Deputy Vice President) “Big Data 101
                                  ” SYSTEX            ( ),
   24 May 2012.
   http://www.slideshare.net/fchiangtw/big-data-101
•  Chen, James. (Principal Consultant of Etu) “Hadoop
   SQL            ” SYSTEX            ( ), 24 May 2012.
   http://www.slideshare.net/chaoyu0513/hadoop-sql
•  Wu, Jeng-Hsueh. (Principal Architect of Etu), “Facing the
   Big Data challenge: a use case for leveraging from
   Hadoop and her friends”, OSDC, 14 Apr 2012.
   http://osdc.tw/schedule
•  Nien, Johnny. (Technical Manager) “Etu DataFlow: An
   efficient data streaming & pre-processing framework
   designed for Hadoop”, COSCUP, 19 Aug 2012.
   http://coscup.org/2012/en/program


                                                               53
Hadoop Security References:
•  Cloudera
    –  CDH3 Security Guide
    –  CDH4 Beta 2 Security Guide
•  Hadoop Security
     –    Slideshare
     –    “Hadoop Security Design”, Owen O’Malley et. al., Oct 2009
     –    “Integrating Kerberos into Apache Hadoop”, Owen O’Malley
     –    “Plugging the Holes: Security and Compatibility”, Owen O’Malley
     –    “Developing and deploying Apache Hadop Security” Hortonworks, Owen
     –    “Hadoop Security, Cloudera” Hadoop World 2010, Todd Lipcon & Aaron Myers
•    Kerberos & LDAP
     –  Administration:
        http://web.mit.edu/Kerberos/krb5-1.8/krb5-1.8.3/doc/krb5-admin.html
     –  Installation:
        http://web.mit.edu/Kerberos/krb5-1.8/krb5-1.8.3/doc/krb5-install.html
     –  Openldap: http://www.openldap.org/doc/admin24/dbtools.html



                                                                                54
Your Turn:
    Kerberos+LDAP
    HDFS+MR+HBase
    Best practices: Hive & Pig




                                 55
Question?

jasonshih@etusolution.com
Contact
www.etusolution.com
info@etusolution.com

Taipei, Taiwan
318, Rueiguang Rd., Taipei 114, Taiwan
T: +886 2 7720 1888
F: +886 2 8798 6069

Beijing, China
Room B-26, Landgent Center,
No. 24, East Third Ring Middle Rd.,
Beijing, China 100022
T: +86 10 8441 7988
F: +86 10 8441 7227




                                         57

More Related Content

What's hot

Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Cloudera, Inc.
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosSarvesh Meena
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and HadoopKai Zheng
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 

What's hot (20)

Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 

Similar to Hadoop security overview_hit2012_1117rev

HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)Kublr
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasusacelyc1112009
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarKamesh Pemmaraju
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...nnakasone
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPALDAPCon
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...CloudIDSummit
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networkspbelko82
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 

Similar to Hadoop security overview_hit2012_1117rev (20)

Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)Application Portability with Kubernetes (k8)
Application Portability with Kubernetes (k8)
 
The Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache PegasusThe Design, Implementation and Open Source Way of Apache Pegasus
The Design, Implementation and Open Source Way of Apache Pegasus
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Big data security
Big data securityBig data security
Big data security
 
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
CIS13: A Breakthrough in Directory Technology: Meet the Elephant in the Room ...
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Ajay_trivedi
Ajay_trivediAjay_trivedi
Ajay_trivedi
 

Recently uploaded

The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 

Recently uploaded (20)

The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 

Hadoop security overview_hit2012_1117rev

  • 1. Hadoop Security Overview - From security infrastructure deployment to high-level services Jason Shih Etu 2 Oct, 2012 Hadoop in Taiwan
  • 2. Outline •  Kerberos & LDAP –  Configuration & Installation –  Authentication & Authorization –  Interoperability •  Hadoop Security & Services –  Authentication & Authorization in Hadoop –  Token Delegation & communication path –  HDFS: NN & DN –  MapReduce: JT+TT –  HBase: ZK+MASTER+RS •  Etu Appliance –  New features & key benefits –  Software stacks, versions & HW spec. •  Troubleshooting Hadoop in Taiwan 2
  • 3. Who am I? •  Etu –  Hadoop System Architect •  Grid Computing Centre, ASGC –  Tech Lead on Grid Operation –  Scope: DC, OP, DM & GT –  Experiment Support (LHC Analysis Software, ES, EC (W&C) etc.) •  Before Grid Computing – HPC @ ASCC –  System administration (IBM, SGI, Sun, *nix) –  Architecture Design & Parallel filesystem –  Performance Tuning & Optimization –  Application Support etc. Hadoop in Taiwan 3
  • 4. Does security matter? •  Ticket Breakdown: –  Comprise ~3.1% issues are security related •  Hadoop common, HDFS, MR, YARN, HBase, Hive, & Pig etc. –  Majority are common+HDFS+MR related: ~73% 30 300 All Tickets Security Related Tickets 25 250 Security Related 20 200 Percentage (%) 15 150 10 100 5 50 0 0 Hadoop in Taiwan 4
  • 5. LDAP (lightweight) directory access protocol Small bit of data, mostly read access NIS Pros: setup, administration, widely support & scale fairly well Cons: weakly encrypted password, difficult to FW, lack of system auth NIS+ Complicated, limited client support. Kerberos & LDAP Configuration & Installation Authentication & Authorization Interoperability Hadoop in Taiwan 5
  • 6. LDAP Authentication •  OpenLDAP: Lightweight Directory Access Protocol –  X.500 base (model for directory service in OSI concept) –  X.400 Std. by ITU late 70’s & early 80’s (email service) •  Why directory? –  Specialized database design for frequent queries but infrequent updates –  lack of rollback functionality & transaction support –  Easily replicated aiming for high availability & scalability (but depend on size of info being published or replicated). Terminology: http://www.zytrax.com/books/ldap/apd/ 6
  • 7. LDAP Overview •  Building blocks: –  Schemas, objectClasses, Attributes, matchingRules, Operational objects etc. •  Models: –  Information •  information or data presented may/may-not the way data is actually stored –  Naming: •  def: 'dc=example,dc=com’ stumble across in LDAP –  Functional •  Read, Search, Write & Modify –  Security •  Fine grained manner, who can do what to what data 7
  • 8. Kerberos Introduction •  What is Kerberos –  Named after Cerberus, the three-headed dog of Greek mythology, because? –  Composite by three components: •  KDC (Kerberos Distribution Center) •  Clients (Users/Hosts/Services) •  Server (Service providers requested to establish session) –  Scope of deployment: realm –  KDC provide: •  AS (Authentication Server) •  TGS (Ticket Granting Service) Hadoop in Taiwan 8
  • 9. Kerberos Introduction (cont’) •  Kerberos Client –  PAM enable (pam_krb5) •  Other application, recompilation effort required: e.g. OpenSSH –  Application w/ native Kerberos support but few limited to ver. IV •  Other Extension –  Windows Authentication (AD) –  NFS Authentication & Encryption –  AFS (Global Filesystem) •  Symmetric key operations –  Order of magnitude Faster than public key operations e.g. SSL •  Performs authentication not authorization •  When user authenticates, they are given a “ticket” –  Default Lifetime: 8Hr Hadoop in Taiwan 9
  • 10. Kerberos: Definition & Terminology •  KDC (Kerberos Distribution Center) •  TGT (Ticket Granting Ticket) –  Special ticket permit client to obtain additional Kerberos ticket within same realm •  Keytab –  key table file containing one or more keys, same as for hosts & users •  Principal –  Primary •  First part of a Kerberos principal •  User: username, Service: the name of the service –  Instance •  Provide information that qualifies the primary •  User: desc. the intended use of corresponding credentials •  Host: FQDN –  Realm •  Logical network served by a single Kerberos DB and a set of KDC Hadoop in Taiwan 10
  • 11. Kerberos Overview KDC (Key Distribution Center) AS (Authentication Server) Database AS_REQ TGS (Ticket Granting Server) AS_REP AS_REQ(host/app-server) AS_REQ(host/filesevers) AS_REP Client Application Server AP_REQ AP_REP FileServers AP_REQ + File Request Domain AP_REP + File Hadoop in Taiwan 11
  • 12. Kerberos Principals & Realms •  Principal –  Generic: Name/instance@realm –  Examples •  etu@testdomain.com •  etu/admin •  host/master.testdomain.com •  ldap/ldap.testdomain.com –  Realm •  Typically domain name in all CAPS: e.g.: TESTDOMAIN.COM Hadoop in Taiwan 12
  • 13. Kerberos Command line •  Administration –  kadmin: used to make changes to the accounts in the Kerberos database •  kadmin.local –  klist: used to view the tickets in the credential cache –  kinit: used to log onto the realm with the client’s key –  kdestroy: erases the credential cache –  kpasswd: used to change user passwords –  kprop: used to synch the master KDC with replicas, if any •  Utility –  kdb5_util: create, destroy, stash, dump, load, ark, add_mkey, use_mkey, list_mkeys, update_princ_encryption & purge_mkeys 13
  • 14. Kerberos Administration (kadmin.local) •  Available requests: add_principal, addprinc, ank delete_principal, delprinc modify_principal, modprinc change_password, cpw get_principal, getprinc list_principals, listprincs, get_principals, getprincs add_policy, addpol modify_policy, modpol delete_policy, delpol get_policy, getpol list_policies, listpols, get_policies, getpols get_privs, getprivs ktadd, xst ktremove, ktrem lock unlock purgekeys 14
  • 15. Kerberos Principals (I) •  Default principals (default realm: TESTDOMAIN.COM) K/M@TESTDOMAIN.COM hdfs@TESTDOMAIN.COM kadmin/admin@TESTDOMAIN.COM kadmin/changepw@TESTDOMAIN.COM Kadmin/master.testdomain.com@TESTDOMAIN.COM krbtgt/TESTDOMAIN.COM@TESTDOMAIN.COM ldapadm@TESTDOMAIN.COM ldap/master.testdomain.com@TESTDOMAIN.COM Hadoop in Taiwan 15
  • 16. Kerberos Principals (II) •  Principals details (KV no., expiration & attributes) Principal: hdfs@TESTDOMAIN.COM Expiration date: [never] Last password change: Thu Nov 15 19:44:31 CST 2012 Password expiration date: [none] Maximum ticket life: 1 day 00:00:00 Maximum renewable life: 90 days 00:00:00 Last modified: Thu Nov 15 19:44:31 CST 2012 (kadmin/admin@TESTDOMAIN.COM) Last successful authentication: [never] Last failed authentication: [never] Failed password attempts: 0 Number of keys: 5 Key: vno 2, aes128-cts-hmac-sha1-96, no salt Key: vno 2, aes256-cts-hmac-sha1-96, no salt Key: vno 2, arcfour-hmac, no salt Key: vno 2, des3-cbc-sha1, no salt Key: vno 2, des-cbc-crc, no salt MKey: vno 1 Attributes: Policy: [none] 16
  • 17. Kerberos Server Configuration (I) •  libdefaults: default_realm = TESTDOMAIN.COM ticket_lifetime = 48h renew_lifetime = 8760h forwardable = true proxiable = true default_tkt_enctypes = aes128-cts-hmac-sha1-96 … default_tgs_enctypes = aes128-cts-hmac-sha1-96 … permitted_enctypes = aes128-cts-hmac-sha1-96 … dns_lookup_realm = false dns_lookup_kdc = false allow_weak_crypto = 1 Allow_weak_crypto – temporary workaround •  By default, clients & servers will not using keys for ciphers. •  Clients wont be able to authenticate to services w/ keys following support enctypes •  Zero downtime w/ service updating new/strong-cophers keys to keytab •  TGT can then update services’ keys to a sets including keys w/ stronger ciphers (kadmin cpw -keepold command) 17
  • 18. Kerberos Server Configuration (II) •  Realm & domain realm: [realms] TESTDOMAIN.COM = { default_domain = testdomain.com kdc = etu-master.testdomain.com admin_server = etu-master.testdomain.com database_module = openldap_ldapconf } [domain_realm] .testdomain.com = TESTDOMAIN.COM testdomain.com = TESTDOMAIN.COM Hadoop in Taiwan 18
  • 21. Kerberos Encryption Types •  Supported Ciphers –  des-cbc-crc –  des-cbc-md4 –  des-cbc-md5 –  des-cbc-raw –  des3-cbc-raw –  des-hmac-sha1 –  arcfour-hmac-exp •  Cryptographic Primitives –  Cryptographic Agility (v5) –  Etypes: Define set of primitives for cryptographic operations •  e.g.: aes256-cts-hmac-sha1-96, aes128-cts-hmac-sha1-96, rc4-hmac, des-cbc-md5, rc4-hmac-exp 21
  • 22. Hadoop Security & Services HDFS: NN & DN MapReduce: JT+TT HBase: ZK+MASTER+RS Hadoop in Taiwan 22
  • 23. Pre-CDH3 •  User Auth: –  User impersonation: set property “hadoop.job.ugi” in run job •  Server Auth: N/A •  HDFS (weak-authentication) –  Unix-like file permission (std: user/group/other r/w/x) •  Job control: –  Lack of ACLs for counters/logging –  ACLs per job queue submission/killing •  Web interface: N/A •  Tasks: –  Not-isolated from the others –  All run as same users –  Interference with other tasks accessing identical local storage Hadoop in Taiwan 23
  • 24. Security ship w/ CDH3: •  Secure Authentication base on Kerberos –  RPC secured with SASL GSSAPI mechanism –  Strong authentication & SSO •  Mutual authentication between servers/users/services –  Bi-directional for server auth. •  HDFS: –  Same general permission model w/ sticky bit •  ACLs for job control & view •  Tasks isolation (launch by user) on same TT •  Kerberized SSL support for web interface (pluggable serverlet) Hadoop in Taiwan 24
  • 25. Authentication & Authorization •  Consideration –  Performance: symmetric keys (Kerberos) vs. public key (SSL) –  Management: central managed (KDC) vs. CRL propagation •  Authentication – user identification –  Changes low-level transport –  RPC authentication using SASL •  Kerberos (GSSAPI) •  Token (GIGEST-MD5) •  Simple –  HTTP secured via plugin •  Authorization – access control, resources & role –  HDFS •  Command line & semantics unchanged •  Web UI enforces authentication –  MapReduce added Access Control Lists •  Lists of users and groups that have access •  mapreduce.job.acl-view-job – view job •  mapreduce.job.acl-modify-job – kill or modify job 25
  • 26. Delegation Tokens •  To prevent a flood of authentication requests at the start of a job, NameNode can create delegation tokens. •  Allows user to authenticate once and pass credentials to all tasks of a job. •  JobTracker automatically renews tokens while job is running. •  Cancels tokens when job finishes. 26
  • 27. Primary Communication Path Ref: Hadoop Security Preview, Owen 27
  • 28. Hadoop Security Enable •  In “core-site.xml” –  Reset “simple” to disable security –  Property: hadoop.security.authentication = kerberos hadoop.security.authorization = true Hadoop in Taiwan 28
  • 29. HDFS Security Configuration •  In “hdfs-site.xml”, set property: dfs.block.access.token.enable = true dfs.namenode.keytab.file = ${HDFS_KEYTAB_PATH} dfs.namenode.kerberos.principal = ${HDFS_KRB5_PRINCIPAL} e.g.: etu/_HOST@${HADOOP_REALM} dfs.namenode.kerberos.internal.spnego.principal = HTTP/_HOST@${HADOOP_REALM} Hadoop in Taiwan 29
  • 30. Secondary NN Configuration •  In “hdfs-site.xml”, set the following property: –  Similar properties as for NameNode –  Perfectly fine if you initiate with same Kerberos principal dfs.secondary.namenode.keytab.file dfs.secondary.namenode.kerberos.principal dfs.secondary.namenode.kerberos.internal.spnego.principal Hadoop in Taiwan 30
  • 31. DataNode Security Configuration •  In “hdfs-site.xml” •  Replicate site xml to all DN •  Privilege service port: –  Either recompile “jsvc” or adopt BigTop for secure DN service daemon •  “sudo” privilege required •  Appropriate variables for secured datanode HADOOP_SECURE_DN_USER HADOOP_SECURE_DN_PID_DIR (optional) HADOOP_SECURE_DN_LOG_DIR JSVC_HOME •  Define the following properties: dfs.datanode.data.dir.perm dfs.datanode.address e.g.: 0.0.0.0:1004 dfs.datanode.http.address e.g.: 0.0.0.0:1006 dfs.datanode.keytab.file dfs.datanode.kerberos.principal e.g.: hdfs/_HOST@${HADOOP_REALM} Hadoop in Taiwan 31
  • 32. Secure HDFS Service Common Error •  Error: ERROR security.UserGroupInformation: PriviledgedActionException as:etu (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 12/10/02 11:03:24 ERROR security.UserGroupInformation: PriviledgedActionException as:etu (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] •  C.f.: Successful Kerberos Authentication: Oct 02 11:06:16 master krb5kdc[30142](info): TGS_REQ (6 etypes {17 17 23 16 3 1}) 10.1.247.18: ISSUE: authtime 1349147029, etypes {rep=17 tkt=17 ses=17}, etu@ETU.SYSTEX.TW for etu/master.etu.systex.tw@ETU.SYSTEX.TW Hadoop in Taiwan 32
  • 33. Secure MapReduce Configuration •  In “mapred-site.xml”, for JT & TT –  Defined the following properties: mapreduce.jobtracker.kerberos.principal e.g.: mapred/_HOST@{HADOOP_REALM} mapreduce.jobtracker.keytab.file mapreduce.tasktracker.kerberos.principal mapreduce.tasktracker.keytab.file Hadoop in Taiwan 33
  • 34. Secure MapReduce: TaskController •  In “mapred-site.xml” •  In taskcontroller.cfg: –  Default “banned.users” property is mapred, hdfs, and bin –  Default “min.user.id property” is 1000 (Err code: 255 if lower) •  Take care also ownership & setuid for taskcontroller binary –  chown root:mapred task-controller –  chmod 4754 task-controller •  Define also the following variables: mapred.task.tracker.task-controller e.g.: org.apache.hadoop.mapred.LinuxTaskController mapreduce.tasktracker.group e.g.: mapred Hadoop in Taiwan 34
  • 35. Secure MapReduce: Best Practice •  Always start with simple task before launch real workload: e.g.: PiEst •  Make sure underlying HDFS enable security & functional •  From KDC log: master krb5kdc[30142](info): TGS_REQ (6 etypes {17 17 23 16 3 1}) 192.168.70.18: ISSUE: authtime 1349147401, etypes {rep=17 tkt=17 ses=17}, etu@ETU.SYSTEX.TW for etu/master.etu.systex.tw@ETU.SYSTEX.TW Hadoop in Taiwan 35
  • 36. Zookeeper Security Configuration (I) •  zoo.cfg: authProvider. 1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider jaasLoginRenew=3600000 •  java.env export JVMFLAGS="-Djava.security.auth.login.config=/etc/ zookeeper/conf/jaas.conf” Hadoop in Taiwan 36
  • 37. Zookeeper Security Configuration (II) •  JAAS configuration: Server: com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/zookeeper/conf/zookeeper.keytab" storeKey=true useTicketCache=false principal="zookeeper/fully.qualified.domain.name@<YOUR-REALM>" Client: com.sun.security.auth.module.Krb5LoginModule required useKeyTab=false principal="zkcli" useTicketCache=true debug=true Hadoop in Taiwan 37
  • 38. HBase Security Configuration •  Authentication –  Identification mechanism for HBase servers & clients for HDFS, ZK & MR. •  Authorization –  Ontop of coprocessor framework (AccessController): ACLs & allowable resources base on requesting users’ identity •  Configuration: –  Secure HBase servers: master & regionserver –  REST API secure mode –  JAAS configuration for secure ZK quorum servers –  ACLs Configuration (table & column level) •  grant, revoke, alter and permission display 38
  • 39. HBase Severs w/ Secure HDFS Cluster •  Required by all HBase severs, both Master & RS (hbase-site.xml) •  Define following properties: hbase.security.authentication e.g.: kerberos hbase.rpc.engine e.g.: org.apache.hadoop.hbase.ipc.SecureRpcEngine hbase.regionserver.kerberos.principal e.g.: hbase/_HOST@${HADOOP_REALM} hbase.regionserver.keytab.file hbase.master.kerberos.principal hbase.master.keytab.file 39
  • 40. HBase: Secure ZK Quorum Connection hbase-env.sh: export HBASE_OPTS="$HBASE_OPTS -Djava.security.auth.login.config=/ etc/hbase/conf/zk-jaas.conf" export HBASE_MANAGES_ZK=false kerberos.removeHostFromPrincipal=true kerberos.removeRealmFromPrincipal=true ZK JAAS configuration: com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=false keyTab="/etc/hbase/conf/keytab.krb5” principal="hbase/fully.qualified.domain.name@<YOUR-REALM>"; HBase site xml, define also the following properties: hbase.zookeeper.quorum = $ZK_NODES hbase.cluster.distributed = true 40
  • 41. HBase Authorization Configuration •  Required by all HBase severs, both Master & RS (hbase-site.xml) hbase.security.authorization (true) hbase.coprocessor.master.classes e.g.: org.apache.hadoop.hbase.security.access.AccessController hbase.coprocessor.region.classes e.g.: org.apache.hadoop.hbase.security.token.TokenProvider, org.apache.hadoop.hbase.security.access.AccessController 41
  • 42. HBase ACLs Rules ACLs Permissions R/Read Get, Scan, or Exists calls W/Write Put, Delete, LockRow, UnlockRow, IncrementColumnValue, CheckAndDelete, CheckAndPut, Flush, & Compact C/Create Create, Alter, & Drop A/Admin Enable, Disable, MajorCompact, Grant, Revoke, & Shutdown. 42
  • 43. HBase: ACLs for Authorization 43
  • 44. Troubleshooting •  Misconfiguration? –  Pseudo-distributed to cluster-wide configuration –  Full cluster functionality before kerberizing services –  Correct principal & keytab contains up-to-date KVNO. –  Disentangle security related settings to isolate root causes •  Ticket renewable fail? or expired. •  System-wide –  Permission (files, directories and ownership), objClasses & ACLs –  System clock screw, KDC operation (REALM), file handle limitation? (ulimit) –  TT, RS, DN fail to start? Out of disk space? “dfs.datanode.du.reserved” –  Name resolve (reverse), routing (multi-channels) etc. •  Extensive debugging info –  Increase root.logger level, e.g.: hadoop.root.logger & hadoop.security.logger –  Security mode: “-Djavax.net.debug=ssl -Dsun.security.krb5.debug=true” •  Correct Hadoop “home” to look into? •  Relevant logging system: –  KDC log provide: TGS & AS req., principals, authtime and etypes. 44
  • 45. Etu Appliance New features & key benefits Software stacks & versions HW spec. Hadoop in Taiwan 45
  • 46. Introducing – Etu Appliance 2.0 A purpose built high performance appliance for big data processing •  Automates cluster deployment •  Optimize for the highest performance of big data processing •  Delivers availability and security 46
  • 47. Why appliance? •  Misconfiguration comprise 35% of tickets –  Generic issues: memory allocation, disk spaces & file handling •  ~40% refer to system-wide and operation issues. –  System automation, robust deployment, dashboard and event management strictly required for production operation Stats. Upto 2011 Ref: Hadoop Troubleshooting, Kate, Cloudera 47
  • 48. Key Benefits to Hadoopers Simplifies configuration and deployment •  Fully automated deployment and configuration High availability made easy •  Reliably deploy running mission critical big data application Enterprise-grade security •  Process and control sensitivity data with confidence Boost big data processing •  Fully optimized operating system to boost your data processing performance Provides a scalable and extensible foundation •  Adapts to your workload and grows with your business 49
  • 49. What’s New in Etu Appliance 2.0 •  New deployment feature – Auto deployment and configuration for master node high availability •  New security features – LDAP integration and Kerberos authentication •  New data source – Introduce Etu™ Dataflow data collection service with built in Syslog and FTP server for better integration with existing IT infrastructure •  New user experience – new Etu™ Management Console with HDFS file browser and HBase table management 50
  • 50. Software Stack Version Details Patch Release Apache Project Package Version Level Note Apache Hadoop 2.0 hadoop-2.0.0+91 91 Apache Hbase hbase-0.92.1+67 67 Apache Hive hive-0.8.1+61 61 Apache Mahout mahout-0.6+16 16 Apache MRv1 mr1-0.20.2+1216 1216 Apache Pig pig-0.9.2+26 26 Apache Sqoop Sqoop-1.4.1+28 28 Apache ZooKeeper Zookeeper-3.4.3+15 15 ** CDH4 Release Note: https://ccp.cloudera.com/display/CDH4DOC/CDH4+Release+Notes ** CDH4 Package Information: https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging +Information#CDHVersionandPackagingInformation-CDHVersion4.0.1Packaging 52
  • 51. Etu References: •  Chiang, Fred. (Deputy Vice President) “Big Data 101 ” SYSTEX ( ), 24 May 2012. http://www.slideshare.net/fchiangtw/big-data-101 •  Chen, James. (Principal Consultant of Etu) “Hadoop SQL ” SYSTEX ( ), 24 May 2012. http://www.slideshare.net/chaoyu0513/hadoop-sql •  Wu, Jeng-Hsueh. (Principal Architect of Etu), “Facing the Big Data challenge: a use case for leveraging from Hadoop and her friends”, OSDC, 14 Apr 2012. http://osdc.tw/schedule •  Nien, Johnny. (Technical Manager) “Etu DataFlow: An efficient data streaming & pre-processing framework designed for Hadoop”, COSCUP, 19 Aug 2012. http://coscup.org/2012/en/program 53
  • 52. Hadoop Security References: •  Cloudera –  CDH3 Security Guide –  CDH4 Beta 2 Security Guide •  Hadoop Security –  Slideshare –  “Hadoop Security Design”, Owen O’Malley et. al., Oct 2009 –  “Integrating Kerberos into Apache Hadoop”, Owen O’Malley –  “Plugging the Holes: Security and Compatibility”, Owen O’Malley –  “Developing and deploying Apache Hadop Security” Hortonworks, Owen –  “Hadoop Security, Cloudera” Hadoop World 2010, Todd Lipcon & Aaron Myers •  Kerberos & LDAP –  Administration: http://web.mit.edu/Kerberos/krb5-1.8/krb5-1.8.3/doc/krb5-admin.html –  Installation: http://web.mit.edu/Kerberos/krb5-1.8/krb5-1.8.3/doc/krb5-install.html –  Openldap: http://www.openldap.org/doc/admin24/dbtools.html 54
  • 53. Your Turn: Kerberos+LDAP HDFS+MR+HBase Best practices: Hive & Pig 55
  • 55. Contact www.etusolution.com info@etusolution.com Taipei, Taiwan 318, Rueiguang Rd., Taipei 114, Taiwan T: +886 2 7720 1888 F: +886 2 8798 6069 Beijing, China Room B-26, Landgent Center, No. 24, East Third Ring Middle Rd., Beijing, China 100022 T: +86 10 8441 7988 F: +86 10 8441 7227 57