Hadoop REST API Security with Apache Knox Gateway

© Hortonworks Inc. 2014
Securing Hadoop’s REST APIs
Apache Knox Gateway
Hadoop Summit 2014
Kevin Minder
Larry McCayhttp://knox.apache.org/
user (at) knox.apache.org
dev (at) knox.apache.org

Agenda
• Introduction
• The What, Why and When of Apache Knox
• Hadoop Context
• Basic Knox operation and extensibility
• How Knox
• Enhances security
• Simplifies access
• Centralizes control
• Integrates with the enterprise
• What is next for Knox
• Q & A

Introductions
Kevin Minder
Middleware &
WebServices
Hortonworks
Oracle
HP
Bluestone
Larry McCay
Middleware &
Security
Hortonworks
Oracle
Probaris
HP
Bluestone
Tony Soprano
Barone Sanitation
Bada Bing
Crime Boss
Pauly D
Jersey Shore House Member
Disk Jockey
Jersey really
isn’t like this!
Mostly…
Just your “normal”
Hadoop security
guys.

What is Apache Knox?
• The Apache Knox Gateway is…
• an extensible reverse proxy framework
• for securely exposing REST APIs and HTTP based services at a
perimeter
• out of the box it provides:
• support for several of the most common Hadoop services
• integration with enterprise authentication systems
• several other useful features

What the Apache Knox Gateway isn’t
• Not an alternative to Kerberos for strong Hadoop core authentication
• Not a channel for high volume data ingest or export

History and Status of the Apache Knox Gateway?
• 2013-02: Accepted into Apache Incubator
• 2013-04: Released 0.2.0
• 2013-10: Released 0.3.0
• 2014-02: Graduated to Apache TLP
• 2014-04: Released 0.4.0, Included in HDP 2.1

Why Knox?
Simplified Access
• Kerberos encapsulation
• Extends API reach
• Single access point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration
• Active Directory integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• Partial SSL for non-SSL services
• WebApp vulnerability filter

Layers Of Hadoop Security
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache Knox (i.e. Gateways)
Authentication
• Kerberos
• Delegation Tokens
OS Security
• File Permissions
• Process Isolation
Authorization
• MR ACLs
• HDFS Permissions
• HDFS ACLs
• HiveATZ-NG
• HBase ACLs
• Accumulo Label Security
• XA Security Policies
Data Protection
• Transport
• Storage

REST API
Hadoop
Services
What does Perimeter Security really mean?
Gateway
REST API
Firewall
User
Firewall
required at
perimeter
(today)
Knox Gateway
controls all
Hadoop REST
API access
through firewall
Hadoop
cluster
mostly
unaffected
Firewall only
allows
connections
through specific
ports from Knox
host

What REST APIs does Hadoop support?
Service URL Example
WebHDFS http://localhost:50070/webhdfs
WebHCat (aka Templeton) http://localhost:50111/templeton
Oozie http://localhost:11000/oozie
HBase (via Stargate) http://localhost:60080
Hive (HiveServer2) http://localhost:10001/cliservice
jdbc:hive2://localhost:10001/?hive.server2.transport.mode=http;hive.server2.thrif
t.http.path=cliservice

Basic Knox Operation & Extensibility

Authentication and Identity Propagation
1. REST API Request
2. HTTP Basic Auth Challenge
kminder:secret
3. Authenticate kminder:secret
knox
keytab
4. Authenticates as
knox via SPNego
(i.e. Kerberos)
5. REST API Request
doAs kminder
0. Configure
knox user to be
known as
trusted proxy
LDAP

Scalability and Fault Tolerance
Hadoop
Apache HTTPD+mod_proxy_balancer
f5 BIG-IP
HAProxy
Knox Cluster
(no shared state)
Really any
traditional
web tier
load balancer

Extensibility: Providers and Services
• Both are dynamically discovered on the class path via Java’s ServiceLoader
• Providers
• Add new features to the gateway that can be used by Services
• Typically result in one or more filters being added to one or more chains
• Services
• Add new endpoints to the gateway to expose a specific service
• Assemble filter chains to enable specific features via providers
• Includes providing configuration to providers
• For example URL rewrite rules
• Associates endpoints with filter chains

Topology Files
• Describe the services that should be exposed for a specific cluster
• Found in <GATEWAY_HOME>/conf/topologies
• Name of topology file dictates URL component
• sandbox.xml -> http://localhost:8443/gateway/sandbox/webhdfs/…
<topology>
<gateway>
<provider>
<role>authentication</role>
<name>custom</name>
</provider>
</gateway>
<service>
<role>WEBHDFS</role>
<url>http://localhost:50070</url>
</service>
</topology>
Location of
WebHDFS in
target cluster
Selects an
authentication
provider
implementation

Enhanced Security

Protect Network Details: WebHDFS Example
• WebHDFS direct
curl -i -X PUT 'http://localhost:50070/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest’
HTTP/1.1 307 TEMPORARY_REDIRECT
Location:
http://sandbox.hortonworks.com:50075/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest&namenoderp
caddress=sandbox.hortonworks.com:8020&overwrite=false
• WebHDFS via Knox
curl -u guest:guest-password -i -k -X PUT 'https://localhost:8443/webhdfs/v1/user/guest/file2?op=CREATE’
HTTP/1.1 307 Temporary Redirect
Location:
https://localhost:8443/gateway/sandbox/webhdfs/data/v1/webhdfs/v1/user/guest/file2?_=AAAACAAAABAAAACAg
UDT7-QQZlpkcm09lxrxI0Bgo9d-
Egghp_qxmd4pQsmm3zvYc3M_LrDBQpMBNA48DnMS9QOhyzywCMl1WAShyX4RUETPjEcZa6x9Jwz7TMANj
SRKMR6F3rKf93ME-VsI2Phe8CX72L6oiI778--8F9DQCO8LHFHzLL70iB13Hm2BLyj-x9p3tn7FOHxkbPl5d-
eHxVop7Dk
RPC and
HTTP address
of DataNode is
leaked
unnecessarily
to REST client
Encrypted query param contains
dispatch information used by gateway
when redirect followed

Protect Network Details: Oozie Example
• Oozie direct
<configuration>
<property>
<name>oozie.wf.application.path</name>
<value>hdfs://foo:9000/user/bansalm/myapp/</value>
</property>
...
</configuration>
• Oozie via Knox
<configuration>
<property>
<name>oozie.wf.application.path</name>
<value>/user/bansalm/myapp/</value>
</property>
...
</configuration>
• Example of submitting an Oozie job from Apache docs
• https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html
• HTTP POST XML below to /oozie/v1/jobs
REST client
must know
RPC address
of NameNode

Partial SSL for non-SSL enabled services
REST API REST API
WebHCat
DMZ
Desktop
Gateway
HTTPS HTTP
First “hop”
through
public/corp
networks
protected with
SSL
Last “hop”
within
secure
network
non-SSL

WebApp Vulnerability Filter
• The Knox WebAppSec provider allows for the plugin of vulnerability prevention filters
• Cross Site Request Forgery CSRF is currently provided
• Uses common required header technique
• Later releases will include more filters based on standard techniques
<provider
<role>webappsec</role>
<name>WebAppSec</name>
<enabled>true</enabled>
<param><name>csrf.enabled</name><value>true</value></param>
<param><name>csrf.customHeader</name><value>X-XSRF-Header</value></param>
<param><name>csrf.methodsToIgnore</name><value>GET,OPTIONS,HEAD</value></param>
</provider>

Simplified Access

Knox Service URLs vs. direct URLs
Service Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
Masters could
be on many
different hosts
One hosts,
one port
Consistent
paths

Hadoop CLIs need almost full server configs
/etc/hive/conf/hive-site.xml
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
</property>
/etc/hadoop/conf/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://sandbox.hortonworks.com:8020</value>
</property>
/etc/hadoop/conf/hdfs-site.xml
<property>
<name>dfs.namenode.http-address</name>
<value>sandbox.hortonworks.com:50070</value>
</property>
/etc/hadoop/conf/yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>sandbox.hortonworks.com:8050</value>
</property>
/etc/hive-webhcat/conf/webhcat-site.xml
<property>
<name>templeton.port</name>
<value>50111</value>
</property>
/etc/oozie/conf/oozie-site.xml
<property>
<name>oozie.base.url</name>
<value>http://sandbox.hortonworks.com:11000/oozie</value>
</property>
HBase – Command line
These files
may all be
on different
nodes on
the cluster
too!

Kerberos Encapsulation
1. REST API Request
2. HTTP Basic Auth Challenge
kminder:secret
3. Authenticate kminder:secret
knox
keytab
4. Authenticates as
knox via SPNego
(i.e. Kerberos)
5. REST API Request
doAs kminder
0. Configure
knox as trusted
proxy
The client isn’t
even aware the
cluster is secured
with Kerberos

REST API REST API
Hadoop
REST API Reach: Intranet Access Model
DMZ
Desktop
Gateway
Users will
discover novel
ways to use easily
accessible REST
APIs

HTML/JS REST
Hadoop
REST API Reach: Middleware Access Model
Web Tier / DMZ
Browser
“Give the APIs to the Apps”
GatewayApp
Server
REST
Most enterprises
cannot deal with
Kerberos in the
web tier and don’t
have CLI access

REST API REST API
Hadoop
REST API Reach: Internet Access Model
DMZ
“Give the APIs to the Everyone”
Gateway
Internet
HaaS vendors
are exposing
Hadoop REST
APIs to the
internet. What
does the API tell
these clients to
know about your
cluster?

Multi-Cluster Support
Gateway
http://knox:8443/gateway/green/webhdfs/v1 http://knox:8443/gateway/blue/webhdfs/v1
green
Production
Cluster
blue
Research
Cluster
One hosts,
one port for
many
clusters

Simplified Client Certificate Management
hdfs
cert
hive
cert
hbase
cert
knox
cert
knox
pubkey
hive
pubkey
hbase
pubkey
hdfs
pubkey
• User only needs to trust Knox’s cert
• Admin only needs to manage multiple keys on Knox hosts

Centralized Control

SCP/SSHLogin Hadoop CLIs
Hadoop
SSH Edge Node CLI Access Model
DMZ
Edge Node
Desktop
“Take the Users to the CLI”Limited
auditing on
edge node
CLI too hard
to install on
desktops

REST APILogin REST API
Hadoop
Improved auditing and access control
DMZ
Desktop
Gateway
All activity
audited
consistently
Additional
authorization
control
available

Service Level Authorization
• Control access to services by user, group or IP address
• Resource level authorization should always be done at resource manager (e.g. HDFS)
<provider>
<role>authorization</role>
<name>AclsAuthz</name>
<param>
<name>WEBHDFS.acl</name>
<value>*;admin;127.0.0.1</value>
</param>
</provider>

XA Secure Integration Thoughts
1. REST API Request
0. Distribute
policy
3. REST API Request
Policy Server
Agent
2. Service level
authorization decision
Agent
integrated as
authorization
provider
Policies
authored in
the portal and
distributed by
the policy
server

KNOX-250: SSH Bastion Auditing Functionality
• Community is developing an extension
• Based on Apache MINA SSHD
• Provides administrative Hadoop SSH access via Knox
• Further centralizes auditing of cluster administration

KNOX-250: SSH Bastion Auditing Functionality
SSHLogin Hadoop CLI
Hadoop
DMZ
Desktop
Gateway
All activity
audited
consistently

Enterprise Integration

Apache Shiro Authentication Provider
• Apache Shiro is the primary authentication provider for Knox
• Used for both LDAP and Active Directory
• Apache Shiro is a popular JEE and JSE security framework
• Very modular and flexible architecture
• Many community extensions
• Integrated into Knox as normal authentication provider

Apache Shiro Authentication Provider
<provider>
<role>authentication</role>
<name>ShiroProvider</name>
<param>
<name>main.ldapRealm</name>
<value>org.apache.shiro.realm.ldap.JndiLdapRealm</value>
</param>
<param>
<name>main.ldapRealm.userDnTemplate</name>
<value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.url</name>
<value>ldap://localhost:33389</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
<value>simple</value>
</param>
<param>
<name>urls./**</name>
<value>authcBasic</value>
</param>
</provider>

SSO Integration
• Similar in concept Hadoop’s trusted proxy model
• Preconfigured for SiteMinder use case
• HTTP Headers used to propagate pre-authenticated user and group info
• Only acceptable for use in a tightly controlled network environment
<provider>
<role>federation</role>
<name>HeaderPreAuth</name>
<param>
<name>preauth.validation.method</name>
<value>preauth.ip.validation</value>
</param>
<param>
<name>preauth.ip.addresses</name>
<value>127.0.*</value>
</param>
</provider>

OAuth 2
• OAuth is becoming the defacto standard for communicating a user’s
identity to REST APIs
• It allows for explicit authorization by the user for the application to
access resources
• It has a number of ways to represent the user and authentication
information to go over the wire
• JSON Web Token (JWT) is an emerging standard for representing the
various claims, attributes and scopes of an identity
• Can be used as a bearer token, URL parameter or Header
• OAuth is also gaining popularity as a federation token for SSO
integrations

KNOX-393: OAuth Resource Provider
• Community investigating OAuth Federation Provider extension
• Considering Apache Oltu
• Warning: Diagram dramatically oversimplified
• There are a number of other potential flows
2. REST API Request
Authorization: Bearer <token>
3. validateAccessToken(<token>)
4. Authenticates as
knox via SPNego
(i.e. Kerberos)
5. REST API Request
doAs kminder
0. Configure
knox user to be
known as
trusted proxy
1. requestAccessToken(JWT)
return Bearer token
kminder

What is next for Knox?
Jira Assignee Description
KNOX-393: OAuth Resource Provider for
Middleware and Application Integration
COMMUNITY OAuth 2 federation provider potentially based on Apache
Oltu for external application SSO to Knox and Hadoop
KNOX-355: Support Knox Authentication
Provider based on Hadoop Auth Module
(SPNEGO)
KNOX Team SPNEGO authentication support for Knox clients
KNOX-250: SSH Bastion Auditing Functionality COMMUNITY SSH tunneling and auditing functionality in addition to
REST gateway services.
KNOX-353: Support Hadoop Java Client URLs KNOX Team In order to be used Hadoop CLIs that can use REST, we
need to support the expected URLs. This is in addition to
the extended URLs for multiple Hadoop cluster support
by Knox.
KNOX-242: LDAP Authentication
Enhancements
KNOX Team Search attribute based authentication rather than simple
LDAP bind.
KNOX-74: Support YARN REST API KNOX Team Add support for the YARN REST API
KNOX-66: Support Ambari REST API access
via the Gateway
KNOX Team Add support for the Ambari REST API
TBD TBD What is important to you?

Interested?
• We’re hiring!
• http://hortonworks.com/careers/open-positions/
• Especially hands on platform level development experience with
• Kerberos
• LDAP
• OAuth
• SAML
• JAAS/GSS-API
• Crypto

Questions and Answers

Hadoop REST API Security with Apache Knox Gateway

More Related Content

What's hot

Similar to Hadoop REST API Security with Apache Knox Gateway

More from DataWorks Summit

Recently uploaded

Hadoop REST API Security with Apache Knox Gateway