Hadoop REST API Security with Apache Knox Gateway


Published on

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop REST API Security with Apache Knox Gateway

  1. 1. © Hortonworks Inc. 2014 Securing Hadoop’s REST APIs Apache Knox Gateway Hadoop Summit 2014 Kevin Minder Larry McCayhttp://knox.apache.org/ user (at) knox.apache.org dev (at) knox.apache.org
  2. 2. © Hortonworks Inc. 2014 Agenda • Introduction • The What, Why and When of Apache Knox • Hadoop Context • Basic Knox operation and extensibility • How Knox • Enhances security • Simplifies access • Centralizes control • Integrates with the enterprise • What is next for Knox • Q & A
  3. 3. © Hortonworks Inc. 2014 Introductions Kevin Minder Middleware & WebServices Hortonworks Oracle HP Bluestone Larry McCay Middleware & Security Hortonworks Oracle Probaris HP Bluestone Tony Soprano Barone Sanitation Bada Bing Crime Boss Pauly D Jersey Shore House Member Disk Jockey Jersey really isn’t like this! Mostly… Just your “normal” Hadoop security guys.
  4. 4. © Hortonworks Inc. 2014 What is Apache Knox? • The Apache Knox Gateway is… • an extensible reverse proxy framework • for securely exposing REST APIs and HTTP based services at a perimeter • out of the box it provides: • support for several of the most common Hadoop services • integration with enterprise authentication systems • several other useful features
  5. 5. © Hortonworks Inc. 2014 What the Apache Knox Gateway isn’t • Not an alternative to Kerberos for strong Hadoop core authentication • Not a channel for high volume data ingest or export
  6. 6. © Hortonworks Inc. 2014 History and Status of the Apache Knox Gateway? • 2013-02: Accepted into Apache Incubator • 2013-04: Released 0.2.0 • 2013-10: Released 0.3.0 • 2014-02: Graduated to Apache TLP • 2014-04: Released 0.4.0, Included in HDP 2.1
  7. 7. © Hortonworks Inc. 2014 Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • Partial SSL for non-SSL services • WebApp vulnerability filter
  8. 8. © Hortonworks Inc. 2014 Layers Of Hadoop Security Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Authentication • Kerberos • Delegation Tokens OS Security • File Permissions • Process Isolation Authorization • MR ACLs • HDFS Permissions • HDFS ACLs • HiveATZ-NG • HBase ACLs • Accumulo Label Security • XA Security Policies Data Protection • Transport • Storage
  9. 9. © Hortonworks Inc. 2014 REST API Hadoop Services What does Perimeter Security really mean? Gateway REST API Firewall User Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host
  10. 10. © Hortonworks Inc. 2014 What REST APIs does Hadoop support? Service URL Example WebHDFS http://localhost:50070/webhdfs WebHCat (aka Templeton) http://localhost:50111/templeton Oozie http://localhost:11000/oozie HBase (via Stargate) http://localhost:60080 Hive (HiveServer2) http://localhost:10001/cliservice jdbc:hive2://localhost:10001/?hive.server2.transport.mode=http;hive.server2.thrif t.http.path=cliservice
  11. 11. © Hortonworks Inc. 2014 Basic Knox Operation & Extensibility
  12. 12. © Hortonworks Inc. 2014 Authentication and Identity Propagation 1. REST API Request 2. HTTP Basic Auth Challenge kminder:secret 3. Authenticate kminder:secret knox keytab 4. Authenticates as knox via SPNego (i.e. Kerberos) 5. REST API Request doAs kminder 0. Configure knox user to be known as trusted proxy LDAP
  13. 13. © Hortonworks Inc. 2014 Scalability and Fault Tolerance Hadoop Apache HTTPD+mod_proxy_balancer f5 BIG-IP HAProxy Knox Cluster (no shared state) Really any traditional web tier load balancer
  14. 14. © Hortonworks Inc. 2014 Extensibility: Providers and Services • Both are dynamically discovered on the class path via Java’s ServiceLoader • Providers • Add new features to the gateway that can be used by Services • Typically result in one or more filters being added to one or more chains • Services • Add new endpoints to the gateway to expose a specific service • Assemble filter chains to enable specific features via providers • Includes providing configuration to providers • For example URL rewrite rules • Associates endpoints with filter chains
  15. 15. © Hortonworks Inc. 2014 Topology Files • Describe the services that should be exposed for a specific cluster • Found in <GATEWAY_HOME>/conf/topologies • Name of topology file dictates URL component • sandbox.xml -> http://localhost:8443/gateway/sandbox/webhdfs/… <topology> <gateway> <provider> <role>authentication</role> <name>custom</name> </provider> </gateway> <service> <role>WEBHDFS</role> <url>http://localhost:50070</url> </service> </topology> Location of WebHDFS in target cluster Selects an authentication provider implementation
  16. 16. © Hortonworks Inc. 2014 Enhanced Security
  17. 17. © Hortonworks Inc. 2014 Protect Network Details: WebHDFS Example • WebHDFS direct curl -i -X PUT 'http://localhost:50070/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest’ HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://sandbox.hortonworks.com:50075/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest&namenoderp caddress=sandbox.hortonworks.com:8020&overwrite=false • WebHDFS via Knox curl -u guest:guest-password -i -k -X PUT 'https://localhost:8443/webhdfs/v1/user/guest/file2?op=CREATE’ HTTP/1.1 307 Temporary Redirect Location: https://localhost:8443/gateway/sandbox/webhdfs/data/v1/webhdfs/v1/user/guest/file2?_=AAAACAAAABAAAACAg UDT7-QQZlpkcm09lxrxI0Bgo9d- Egghp_qxmd4pQsmm3zvYc3M_LrDBQpMBNA48DnMS9QOhyzywCMl1WAShyX4RUETPjEcZa6x9Jwz7TMANj SRKMR6F3rKf93ME-VsI2Phe8CX72L6oiI778--8F9DQCO8LHFHzLL70iB13Hm2BLyj-x9p3tn7FOHxkbPl5d- eHxVop7Dk RPC and HTTP address of DataNode is leaked unnecessarily to REST client Encrypted query param contains dispatch information used by gateway when redirect followed
  18. 18. © Hortonworks Inc. 2014 Protect Network Details: Oozie Example • Oozie direct <configuration> <property> <name>oozie.wf.application.path</name> <value>hdfs://foo:9000/user/bansalm/myapp/</value> </property> ... </configuration> • Oozie via Knox <configuration> <property> <name>oozie.wf.application.path</name> <value>/user/bansalm/myapp/</value> </property> ... </configuration> • Example of submitting an Oozie job from Apache docs • https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html • HTTP POST XML below to /oozie/v1/jobs REST client must know RPC address of NameNode
  19. 19. © Hortonworks Inc. 2014 Partial SSL for non-SSL enabled services REST API REST API WebHCat DMZ Desktop Gateway HTTPS HTTP First “hop” through public/corp networks protected with SSL Last “hop” within secure network non-SSL
  20. 20. © Hortonworks Inc. 2014 WebApp Vulnerability Filter • The Knox WebAppSec provider allows for the plugin of vulnerability prevention filters • Cross Site Request Forgery CSRF is currently provided • Uses common required header technique • Later releases will include more filters based on standard techniques <provider <role>webappsec</role> <name>WebAppSec</name> <enabled>true</enabled> <param><name>csrf.enabled</name><value>true</value></param> <param><name>csrf.customHeader</name><value>X-XSRF-Header</value></param> <param><name>csrf.methodsToIgnore</name><value>GET,OPTIONS,HEAD</value></param> </provider>
  21. 21. © Hortonworks Inc. 2014 Simplified Access
  22. 22. © Hortonworks Inc. 2014 Knox Service URLs vs. direct URLs Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive Masters could be on many different hosts One hosts, one port Consistent paths
  23. 23. © Hortonworks Inc. 2014 Hadoop CLIs need almost full server configs /etc/hive/conf/hive-site.xml <property> <name>hive.server2.thrift.http.port</name> <value>10001</value> </property> <property> <name>hive.server2.thrift.http.path</name> <value>cliservice</value> </property> /etc/hadoop/conf/core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://sandbox.hortonworks.com:8020</value> </property> /etc/hadoop/conf/hdfs-site.xml <property> <name>dfs.namenode.http-address</name> <value>sandbox.hortonworks.com:50070</value> </property> /etc/hadoop/conf/yarn-site.xml <property> <name>yarn.resourcemanager.address</name> <value>sandbox.hortonworks.com:8050</value> </property> /etc/hive-webhcat/conf/webhcat-site.xml <property> <name>templeton.port</name> <value>50111</value> </property> /etc/oozie/conf/oozie-site.xml <property> <name>oozie.base.url</name> <value>http://sandbox.hortonworks.com:11000/oozie</value> </property> HBase – Command line These files may all be on different nodes on the cluster too!
  24. 24. © Hortonworks Inc. 2014 Kerberos Encapsulation 1. REST API Request 2. HTTP Basic Auth Challenge kminder:secret 3. Authenticate kminder:secret knox keytab 4. Authenticates as knox via SPNego (i.e. Kerberos) 5. REST API Request doAs kminder 0. Configure knox as trusted proxy The client isn’t even aware the cluster is secured with Kerberos
  25. 25. © Hortonworks Inc. 2014 REST API REST API Hadoop REST API Reach: Intranet Access Model DMZ Desktop Gateway Users will discover novel ways to use easily accessible REST APIs
  26. 26. © Hortonworks Inc. 2014 HTML/JS REST Hadoop REST API Reach: Middleware Access Model Web Tier / DMZ Browser “Give the APIs to the Apps” GatewayApp Server REST Most enterprises cannot deal with Kerberos in the web tier and don’t have CLI access
  27. 27. © Hortonworks Inc. 2014 REST API REST API Hadoop REST API Reach: Internet Access Model DMZ “Give the APIs to the Everyone” Gateway Internet HaaS vendors are exposing Hadoop REST APIs to the internet. What does the API tell these clients to know about your cluster?
  28. 28. © Hortonworks Inc. 2014 Multi-Cluster Support Gateway http://knox:8443/gateway/green/webhdfs/v1 http://knox:8443/gateway/blue/webhdfs/v1 green Production Cluster blue Research Cluster One hosts, one port for many clusters
  29. 29. © Hortonworks Inc. 2014 Simplified Client Certificate Management hdfs cert hive cert hbase cert knox cert knox pubkey hive pubkey hbase pubkey hdfs pubkey • User only needs to trust Knox’s cert • Admin only needs to manage multiple keys on Knox hosts
  30. 30. © Hortonworks Inc. 2014 Centralized Control
  31. 31. © Hortonworks Inc. 2014 SCP/SSHLogin Hadoop CLIs Hadoop SSH Edge Node CLI Access Model DMZ Edge Node Desktop “Take the Users to the CLI”Limited auditing on edge node CLI too hard to install on desktops
  32. 32. © Hortonworks Inc. 2014 REST APILogin REST API Hadoop Improved auditing and access control DMZ Desktop Gateway All activity audited consistently Additional authorization control available
  33. 33. © Hortonworks Inc. 2014 Service Level Authorization • Control access to services by user, group or IP address • Resource level authorization should always be done at resource manager (e.g. HDFS) <provider> <role>authorization</role> <name>AclsAuthz</name> <enabled>true</enabled> <param> <name>WEBHDFS.acl</name> <value>*;admin;</value> </param> </provider>
  34. 34. © Hortonworks Inc. 2014 XA Secure Integration Thoughts 1. REST API Request 0. Distribute policy 3. REST API Request Policy Server Agent 2. Service level authorization decision Agent integrated as authorization provider Policies authored in the portal and distributed by the policy server
  35. 35. © Hortonworks Inc. 2014 KNOX-250: SSH Bastion Auditing Functionality • Community is developing an extension • Based on Apache MINA SSHD • Provides administrative Hadoop SSH access via Knox • Further centralizes auditing of cluster administration
  36. 36. © Hortonworks Inc. 2014 KNOX-250: SSH Bastion Auditing Functionality SSHLogin Hadoop CLI Hadoop DMZ Desktop Gateway All activity audited consistently
  37. 37. © Hortonworks Inc. 2014 Enterprise Integration
  38. 38. © Hortonworks Inc. 2014 Apache Shiro Authentication Provider • Apache Shiro is the primary authentication provider for Knox • Used for both LDAP and Active Directory • Apache Shiro is a popular JEE and JSE security framework • Very modular and flexible architecture • Many community extensions • Integrated into Knox as normal authentication provider
  39. 39. © Hortonworks Inc. 2014 Apache Shiro Authentication Provider <provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>main.ldapRealm</name> <value>org.apache.shiro.realm.ldap.JndiLdapRealm</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param> </provider>
  40. 40. © Hortonworks Inc. 2014 SSO Integration • Similar in concept Hadoop’s trusted proxy model • Preconfigured for SiteMinder use case • HTTP Headers used to propagate pre-authenticated user and group info • Only acceptable for use in a tightly controlled network environment <provider> <role>federation</role> <name>HeaderPreAuth</name> <enabled>true</enabled> <param> <name>preauth.validation.method</name> <value>preauth.ip.validation</value> </param> <param> <name>preauth.ip.addresses</name> <value>127.0.*</value> </param> </provider>
  41. 41. © Hortonworks Inc. 2014 OAuth 2 • OAuth is becoming the defacto standard for communicating a user’s identity to REST APIs • It allows for explicit authorization by the user for the application to access resources • It has a number of ways to represent the user and authentication information to go over the wire • JSON Web Token (JWT) is an emerging standard for representing the various claims, attributes and scopes of an identity • Can be used as a bearer token, URL parameter or Header • OAuth is also gaining popularity as a federation token for SSO integrations
  42. 42. © Hortonworks Inc. 2014 KNOX-393: OAuth Resource Provider • Community investigating OAuth Federation Provider extension • Considering Apache Oltu • Warning: Diagram dramatically oversimplified • There are a number of other potential flows 2. REST API Request Authorization: Bearer <token> 3. validateAccessToken(<token>) 4. Authenticates as knox via SPNego (i.e. Kerberos) 5. REST API Request doAs kminder 0. Configure knox user to be known as trusted proxy 1. requestAccessToken(JWT) return Bearer token kminder
  43. 43. © Hortonworks Inc. 2014 What is next for Knox? Jira Assignee Description KNOX-393: OAuth Resource Provider for Middleware and Application Integration COMMUNITY OAuth 2 federation provider potentially based on Apache Oltu for external application SSO to Knox and Hadoop KNOX-355: Support Knox Authentication Provider based on Hadoop Auth Module (SPNEGO) KNOX Team SPNEGO authentication support for Knox clients KNOX-250: SSH Bastion Auditing Functionality COMMUNITY SSH tunneling and auditing functionality in addition to REST gateway services. KNOX-353: Support Hadoop Java Client URLs KNOX Team In order to be used Hadoop CLIs that can use REST, we need to support the expected URLs. This is in addition to the extended URLs for multiple Hadoop cluster support by Knox. KNOX-242: LDAP Authentication Enhancements KNOX Team Search attribute based authentication rather than simple LDAP bind. KNOX-74: Support YARN REST API KNOX Team Add support for the YARN REST API KNOX-66: Support Ambari REST API access via the Gateway KNOX Team Add support for the Ambari REST API TBD TBD What is important to you?
  44. 44. © Hortonworks Inc. 2014 Interested? • We’re hiring! • http://hortonworks.com/careers/open-positions/ • Especially hands on platform level development experience with • Kerberos • LDAP • OAuth • SAML • JAAS/GSS-API • Crypto
  45. 45. © Hortonworks Inc. 2014 Questions and Answers