SlideShare a Scribd company logo
IMPLEMENTING HTTPFS & KNOX WITH ISILON
ONEFS TO ENHANCE HDFS ACCESS SECURITY
Boni Bruno, CISSP, CISM, CGEIT
Principal Solutions Architect
DELL EMC
ABSTRACT
This paper describes implementing HTTPFS and Knox together with Isilon
OneFS to enhance HDFS access security. This integrated solution has
been tested and certified with Hortonworks on HDP v2.4 and Isilon OneFS
v 8.0.0.3.
CONTENTS
Introduction...................................................................................................................................................................3
WebHDFS REST API....................................................................................................................................................3
WebHDFS Port Assignment in Isilon OneFS...............................................................................................................5
WebHDFS Examples with ISILON...............................................................................................................................5
WebHDFS Security Concerns .....................................................................................................................................8
HTTPFS...........................................................................................................................................................................9
Installing HTTPFS........................................................................................................................................................9
Configuring HTTPFS .................................................................................................................................................10
Configuring HTTPFS for Kerberos ............................................................................................................................13
Running and Stopping HTTPFS.................................................................................................................................19
Configuring HTTPFS Auto-Start................................................................................................................................19
Testing HTTPFS ........................................................................................................................................................22
Knox.............................................................................................................................................................................24
Installing Knox..........................................................................................................................................................24
Configuring Knox using Ambari ...............................................................................................................................24
Configuring Knox for LDAP.......................................................................................................................................26
Configuring Knox for Kerberos.................................................................................................................................28
Testing Knox and Isilon Impersonation Defense .....................................................................................................30
Final Comments.......................................................................................................................................................35
Appendix......................................................................................................................................................................37
Additional Testing Results .......................................................................................................................................38
INTRODUCTION
Hadoop provides a Java native API to support file system operations such as create, rename or delete files and
directories, open, read or write files, set permissions, etc. This is great for applications running within the Hadoop
cluster, but there may be use cases where an external application needs to make such file system operations on
files stored on HDFS as well. Hortonworks developed the WebHDFS REST API to support these requirements based
on standard REST functionalities. WebHDFS REST APIs support a complete File System / File Context interface for
HDFS.
WEBHDFS REST API
WEBHDFS IS BASED ON HTTP OPERATIONS LIKE GET, PUT, POST AND DELETE. WEBHDFS
OPERATIONS LIKE OPEN, GETFILESTATUS, LISTSTATUS ARE USING HTTP GET, OTHER
OPERATIONS LIKE CREATE, MKDIRS, RENAME, SETPERMISSIONS ARE RELYING ON HTTP
PUT. APPEND OPERATIONS ARE BASED ON HTTP POST, WHILE DELETE IS USING HTTP
DELETE. AUTHENTICATION CAN BE BASED ON USER NAME, QUERY PARAMETER (AS PART
OF THE HTTP QUERY STRING) OR IF SECURITY IS ENABLED, THROUGH KERBEROS.
Web HDFS is enabled in a Hadoop cluster by defining the following property in hdfs-site.xml:
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
If using Ambari, enable WebHDFS under the General Settings of HDFS as shown below:
When using Isilon as a centralized HDFS storage repository for a given Hadoop Cluster, all namenode and datanode
functions must be configured to run on Isilon for the entire Hadoop cluster. By design, WebHDFS needs access to
all nodes in the cluster. Before the WebHDFS interface on Isilon can be used by the Hadoop Cluster, you must
enable WebHDFS in the Protocol Settings for HDFS on the designated Access Zone in Isilon - this is easily done in
the OneFS GUI. In the example below, hdp24 is the HDFS Access Zone for the Hadoop Cluster. Note the check
mark next to ENABLE WebHDFS access.
It is not sufficient to just enable WebHDFS in Ambari. Isilon must also be configured with WebHDFS enabled so
end to end WebHDFS communication can work in the Hadoop cluster. If multiple Access Zones are defined on
Isilon, make sure to enable WebHDFS as needed on each access zone.
WEBHDFS PORT ASSIGNMENT IN ISILON ONEFS
All references to Hadoop host hdp24 in this document refer to a defined SmartConnect HDFS Access Zone on
Isilon. TCP Port 8082 is the port OneFS uses for WebHDFS. It is important that the hdfs-site.xml file in the Hadoop
Cluster reflect the correct port designation for HTTP access to Isilon. See Ambari screen shot below for reference.
WEBHDFS EXAMPLES WITH ISILON
Assuming the Hadoop cluster is up and running with Isilon and WebHDFS has been properly enabled for the
Hadoop cluster, we are ready to test WebHDFS. CURL is a great command line tool for transferring data using
various protocols, including HTTP/HTTPS. The examples below use curl to invoke the WebHDFS REST API available
in Isilon OneFS to conduction various file system operations. Again, all references to hdp24 used in the curl
commands below refer to the SmartConnect HDFS Access Zone on Isilon and not some edge node in the cluster.
GETTING FILE STATUS EXAMPLE
The screen shot above shows curl being used to connect to Isilon’s WebHDFS interface on port 8082, the
GETFILESTATUS operation is used as user hduser1 to retrieve info on the projects.txt file.
Note: The projects.txt file is a test file I created. It is not part of the Hortonworks software.
A web browser may also be used to get projects.txt file status from Isilon WebHDFS as shown below:
This is similar to executing hdfs dfs –ls /user/hduser1/projects.txt from a Hadoop client node n107 as shown
below:
This quick example shows the flexibility of using WebHDFS. It provides a simple way to execute Hadoop file system
operations by an external client that does not necessarily run on the Hadoop cluster itself. Let’s look at another
example.
CREATING A DIRECTORY EXAMPLE
Here the MKDIRS operation on a different client node n105 is used with PUT to create the directory /tmp/hduser
as user hduser1 on Isilon. We can tell by the true Boolean result the operation was successful. We can also check
the result by using hdfs to see the directory on Isilon as shown below:
OPEN A FILE EXAMPLE
In the example above, the OPEN operation is used with curl to display the text string “Knox HTTPFS Isilon Project”
within the /tmp/hduser1/project.txt file.
As shown before, a web browser can be used to access the file as well. Here a browser is configured to
automatically open text files in notepad, so accessing the WebHDFS API on Isilon as shown below will open the
contents of /tmp/hduser1/project.txt in notepad directly.
To validate the contents from within the cluster, we can use hdfs as show below:
I’m only scratching the surface with the examples above; there are various operations you can execute with
WebHDFS. You can easily use WebHDFS to append data to files, rename files or directories, create new files, etc.
See the Appendix for many more examples.
It should be apparent that WebHDFS provides a simple, standard way to execute Hadoop file system operations
with external clients that do not necessarily run within the Hadoop cluster itself.
WEBHDFS SECURITY CONCERNS
SOMETHING WORTH POINTING OUT WITH THE ABOVE EXAMPLES AND WITH WEBHDFS IN
GENERAL – CLIENTS ARE DIRECTLY ACCESSING THE NAMENODES AND DATANODES VIA
PREDEFINED PORTS. THIS CAN BE SEEN AS A SECURITY ISSUE FOR MANY ORGANIZATIONS
WANTING TO ENABLE EXTERNAL WEBHDFS ACCESS TO THEIR HADOOP INFRASTRUCTURE.
Many organizations do not want their Hadoop infrastructure accessed directly from external clients. As seen thus
far, external clients can use WebHDFS to directly access the actual ports namenodes and datanodes are listening
on in the Hadoop Cluster and leverage the WebHDFS REST API to conduct various file system operations. Although
firewalls can filter access from external clients, the ports are still being directly access. As a result, firewalls do not
prohibit the execution of various WebHDFS operations.
The solution to this issue, in many cases, is to enable Kerberos in the Hadoop cluster and deploy Secure REST API
Gateways that enforce strong authentication and access control to WebHDFS. The remainder of this document
focuses on using HTTPFS and Knox in conjunction with Isilon OneFS to provide a secure WebHDFS deployment with
Hadoop. A diagram of the secure architecture is shown below for reference.
HTTPFS
The introduction section of this document provides an overview of WebHDFS and demonstrates how the
WebHDFS REST APIs support a complete File System / File Context interface for HDFS. WebHDFS is efficient as it
streams data from each datanode and can support external clients like curl or web browsers to extend data access
beyond the Hadoop cluster.
Since WebHDFS needs access to all nodes in the cluster by design, WebHDFS inherently establishes a wider foot
print for HDFS access in a Hadoop cluster since clients can access HDFS over HTTP/HTTPS. To help minimize the
size of the foot print to clients, a gateway solution is needed that provides a similar File System / File Context
interface for HDFS, and this is where HTTPFS comes in to play.
HTTPFS is a service that provides a REST HTTP gateway supporting all HDFS File System operations (read and
write). HTTPFS can be used to provide a gateway interface, i.e. choke point, to Isilon and limit broad HDFS access
from external clients to the Hadoop cluster. HTTPFS can also be integrated with Knox to improve service level
authorization, LDAP & AD integration, and overall perimeter security. See the Knox section of this document for
more details. The remainder of this section covers the installation and configuration of HTTPFS with Isilon.
INSTALLING HTTPFS
HTTPFS can be installed on Ambari server or a worker node, for production deployments, deploying on a dedicated
worker node is a best practice.
To install HTTPFS: yum install hadoop-httpfs (Note: existing HWX repos are hadoop-httpfs aware)
Note: The HTTPFS service is a tomcat application that relies on having the Hadoop libraries and configuration
available, so make sure to install HTTPFS on an edge node that is being managed by Ambari.
After you install HTTPFS, the directories below will be created on the HTTPFS server:
/usr/hdp/2.x.x.x-x/hadoop-httpfs
/etc/hadoop-httpfs/conf
/etc/hadoop-httpfs/tomcat-deployment
CONFIGURING HTTPFS
If you change directories to /usr/hdp on your HTTPFS server and list the files there, you will see a directory with
the version number of your existing HDP release. Make note of it so you can set the current version for httpfs. Set
the version for current with the following command:
hdp-select set hadoop-httpfs 2.x.x.x-x (replace the x with your HDP release)
The installation of httpfs above deploys scripts which have some hardcoded values that need to be changed.
Adjust the /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh script:
#!/bin/bash
# Autodetect JAVA_HOME if not defined
if [ -e /usr/libexec/bigtop-detect-javahome ]; then
. /usr/libexec/bigtop-detect-javahome
elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then
. /usr/lib/bigtop-utils/bigtop-detect-javahome
fi
### Added to assist with locating the right configuration directory
export HTTPFS_CONFIG=/etc/hadoop-httpfs/conf
### Remove the original HARD CODED Version reference.
Next, you need to create the following symbolic links:
cd /usr/hdp/current/hadoop-httpfs
ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf
ln -s ../hadoop/libexec libexec
Like all the other Hadoop components, httpfs follows the use of *-env.sh files to control the startup environment.
Above, in the httpfs.sh script we set the location of the configuration directory, this configuration directory is used
to find and load the httpfs-env.sh file.
The httpfs-env.sh file needs to be modified as shown below:
# Add exports to control and set the Catalina directories for starting and finding the httpfs application
export CATALINA_BASE=/usr/hdp/current/hadoop-httpfs
export HTTPFS_CATALINA_HOME=/etc/hadoop-httpfs/tomcat-deployment
# Set a log directory that matches your standards
export HTTPFS_LOG=/var/log/hadoop/httpfs
# Set a tmp directory for httpfs to store interim files
export HTTPFS_TEMP=/tmp/httpfs
The default port for httpfs is TCP 14000. If you need to change the port for httpfs, add the following export to the
above httpfs-env.sh file on the HTTPFS server:
export HTTPFS_HTTP_PORT=<new_port>
In the Ambari web interface, add httpfs as a proxy user in core-site.xml in the HDFS > Configs > Advanced >
Custom core site section:
Note: If the properties that are referenced below do not already exist, do the following steps:
1. Click the Add Property link in the Custom core site area to open the Add Property window.
2. Add each value in the <name> part in the Key field.
3. Add each value in the <value> part in the Value field.
4. Click Add. Then click Save.
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
Make sure to restart HDFS and related components after making the above changes to core-site.xml.
At this point HTTPFS is configured to work with a non-Kerberos Hadoop cluster. If your cluster is not secured with
Kerberos, you can skip the following section CONFIGURING HTTPFS FOR KERBEROS and proceed to RUNNING AND
STOPPING HTTPFS and TESTING HTTPFS.
CONFIGURING HTTPFS FOR KERBEROS
Ambari does not automate the configuration of HTTPFS to support Kerberos. If your Hadoop cluster was secured
with Kerberos using Ambari, you will need to create some needed keytabs and modify the httpfs-site.xml before
HTTPFS will work in a secure Kerberos Hadoop cluster.
The following assumptions are made for this section on configuring HTTPFS for Kerberos:
1. HTTPFS has been installed, configured, and verified to be working prior to enabling Kerberos.
2. Kerberos was enabled using Ambari and an MIT KDC and Isilon is configured and verified for Kerberos.
Both httpfs and HTTP service principals must be created for HTTPFS if they do not already exist.
Create the httpfs and HTTP (see note below) principals:
kadmin: addprinc -randkey httpfs/fully.qualified.domain.name@EXAMPLE-REALM.COM
kadmin: addprinc -randkey HTTP/fully.qualified.domain.name@EXAMPLE-REALM.COM
Note: HTTP principal and keytab may already exist as this is typically needed for other Hadoop services in
a secure Kerberos Hadoop cluster deployment. HTTP must be in CAPITAL LETTERS.
Create the keytab files for both httpfs and HTTP (see note above) principals:
kadmin -q "ktadd -k /etc/security/keytabs/httpfs.service.keytab httpfs/fully.qualified.domain.name@EXAMPLE-REALM.COM"
kadmin -q "ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/fully.qualified.domain.name@EXAMPLE-REALM.COM"
Note: The spnego keytab above only needs to be created if it does not already exist on the node running HTTPFS.
Merge the two keytab files into a single keytab file:
ktutil: rkt /etc/security/keytabs/httpfs.service.keytab
ktutil: rkt /etc/security/keytabs/spnego.service.keytab
ktutil: wkt /etc/security/ketyabs/httpfs-http.service.keytab
ktutil: quit
The above will create a file named httpfs-http.service.keytab in /etc/security/keytabs.
Note: This keytab should be copied to the HTTPFS node.
Test that the merged keytab file works:
klist -kt /etc/security/keytabs/httpfs-http.service.keytab
The above command should list both hdfs and HTTP principals for the httpfs-http.service.keytab. Below is an
example output from a test cluster:
Change the ownership and permissions of the /etc/security/keytabs/httpfs-http.service.keytab file:
chown httpfs:hadoop /etc/security/keytabs/httpfs-http.service.keytab
chmod 400 /etc/security/keytabs/httpfs-http.service.keytab
Edit the HTTPFS server httpfs-site.xml configuration file in the HTTPFS configuration directory by setting the
following properties:
httpfs.authentication.type: kerberos
httpfs.hadoop.authentication.type: kerberos
httpfs.authentication.kerberos.principal: HTTP/<FQDN of HTTPFS host>@< YOUR-REALM.COM>
httpfs.authentication.kerberos.keytab: /etc/hadoop-httpfs/conf/httpfs-http.service.keytab
httpfs.hadoop.authentication.kerberos.principal: httpfs/<FQDN of HTTPFS host>@< YOUR-REALM.COM>
httpfs.hadoop.authentication.kerberos.keytab: /etc/security/keytabs/httpfs-http.service.keytab
httpfs.authentication.kerberos.name.rules: Use the value configured for 'hadoop.security.auth_to_local' in
Ambari's HDFS Configs under "Advanced Core-Site".
An example httpfs-site.xml is listed below, with the relevant Kerberos information highlighted in red:
<configuration>
<!-- HTTPFS proxy user setting -->
<property>
<name>httpfs.proxyuser.knox.hosts</name>
<value>*</value>
</property>
<property>
<name>httpfs.proxyuser.knox.groups</name>
<value>*</value>
</property>
<!-- HUE proxy user setting -->
<property>
<name>httpfs.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>httpfs.proxyuser.hue.groups</name>
<value>*</value>
</property>
<property>
<name>httpfs.hadoop.config.dir</name>
<value>/etc/hadoop/conf</value>
</property>
<property>
<name>httpfs.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>httpfs.hadoop.authentication.type</name>
<value>kerberos</value>
</property>
<property>
<name>kerberos.realm</name>
<value>SOLARCH.LAB.EMC.COM</value>
</property>
<property>
<name>httpfs.authentication.kerberos.principal</name>
<value>HTTP/n105.solarch.lab.emc.com@SOLARCH.LAB.EMC.COM</value>
</property>
<property>
<name>httpfs.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/httpfs-http.service.keytab</value>
</property>
<property>
<name>httpfs.hadoop.authentication.kerberos.principal</name>
<value>httpfs/n105.solarch.lab.emc.com@SOLARCH.LAB.EMC.COM</value>
</property>
<property>
<name>httpfs.hadoop.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/httpfs-http.service.keytab</value>
</property>
<property>
<name>httpfs.authentication.kerberos.name.rules</name>
<value>
RULE:[1:$1@$0](accumulo@SOLARCH.LAB.EMC.COM)s/.*/accumulo/
RULE:[1:$1@$0](ambari-qa@SOLARCH.LAB.EMC.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hbase@SOLARCH.LAB.EMC.COM)s/.*/hbase/
RULE:[1:$1@$0](hdfs@SOLARCH.LAB.EMC.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark@SOLARCH.LAB.EMC.COM)s/.*/spark/
RULE:[1:$1@$0](tracer@SOLARCH.LAB.EMC.COM)s/.*/accumulo/
RULE:[1:$1@$0](.*@SOLARCH.LAB.EMC.COM)s/@.*//
RULE:[2:$1@$0](accumulo@SOLARCH.LAB.EMC.COM)s/.*/accumulo/
RULE:[2:$1@$0](amshbase@SOLARCH.LAB.EMC.COM)s/.*/ams/
RULE:[2:$1@$0](amszk@SOLARCH.LAB.EMC.COM)s/.*/ams/
RULE:[2:$1@$0](dn@SOLARCH.LAB.EMC.COM)s/.*/hdfs/
RULE:[2:$1@$0](falcon@SOLARCH.LAB.EMC.COM)s/.*/falcon/
RULE:[2:$1@$0](hbase@SOLARCH.LAB.EMC.COM)s/.*/hbase/
RULE:[2:$1@$0](hdfs@SOLARCH.LAB.EMC.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@SOLARCH.LAB.EMC.COM)s/.*/hive/
RULE:[2:$1@$0](knox@SOLARCH.LAB.EMC.COM)s/.*/knox/
RULE:[2:$1@$0](httpfs@SOLARCH.LAB.EMC.COM)s/.*/httpfs/
RULE:[2:$1@$0](mapred@SOLARCH.LAB.EMC.COM)s/.*/mapred/
RULE:[2:$1@$0](nn@SOLARCH.LAB.EMC.COM)s/.*/hdfs/
RULE:[2:$1@$0](oozie@SOLARCH.LAB.EMC.COM)s/.*/oozie/
RULE:[2:$1@$0](yarn@SOLARCH.LAB.EMC.COM)s/.*/yarn/
DEFAULT </value>
</property>
</configuration>
This concludes the configuration work needed for HTTPFS to work in a secure Kerberos Hadoop cluster.
Follow the instructions in the next sections to start and test HTTPFS with Isilon.
RUNNING AND STOPPING HTTPFS
Executing httpfs is simple.
To start:
cd /usr/hdp/current/hadoop-httpfs/sbin
./httpfs.sh start
To stop:
./httpfs.sh stop
CONFIGURING HTTPFS AUTO-START
As the root user, create the following hadoop-httpfs script in /etc/init.d:
#!/bin/bash
hdp-select set hadoop-httpfs 2.x.x.x.x-x
# See how we were called.
case "$1" in
start)
/usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh start
;;
stop)
/usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh stop
;;
*)
echo $"Usage: $prog {start|stop|restart}"
esac
As root user:
chmod 755 /etc/init.d/hadoop-httpfs
chkconfig --add hadoop-httpfs
# Start Service
service hadoop-httpfs start
# Stop Service
service hadoop-httpfs stop
This method will run the service as the httpfs user. Ensure that the httpfs user has permissions to write to the log
directory /var/log/hadoop/httpfs. The correct permission settings are shown below:
Note: the httpfs user also needs to be created on Isilon. The httpfs user is a system account that gets created
during installation of httpfs. As with all other Hadoop server accounts, Isilon needs to have all service accounts
defined as a LOCAL PROVIDER in the appropriate HDFS Access Zone (e.g. hdp24) as shown below.
Create the httpfs user in the LOCAL HDFS Access Zone for your cluster in Isilon OneFS. Assign the httpfs user to
the hadoop primary group. Leave the httpfs account Disabled as shown above and below. The UID on Isilon does
not need to match the UID on the httpfs server.
TESTING HTTPFS
As seen in the introduction section of this document, the curl command is an excellent tool for testing WebHDFS;
the same is true for testing HTTPFS. The default port for httpfs is TCP PORT 14000. The tests below show how
HTTPFS and Isilon OneFS can be used together in a Hadoop cluster. The requests made on port 14000 on the
HTTPFS gateway are passed to Isilon. The HTTPFS gateway is configured for Kerberos as is the Isilon HDFS Access
Zone. The Kerberos configuration is optional, but recommended for production Hadoop deployments to improve
cluster security.
The testing below is with Kerberos enabled. So make sure you have obtained and cached an appropriate Kerberos
ticket-granting ticket before running the commands. Use klist to verify you have a ticket cached as shown below:
GETTING A USER’S HOME DIRECTORY EXAMPLE
The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the
GETHOMEDIRECTORY operation is used on user hduser1 to retrieve the home directory info.
HTTP enables GSS-Negotiate authentication. It is primarily meant as a support for Kerberos5 authentication but
may be also used along with another authentication method. GSS-Negotiate is specified with the –-negotiate
option with curl and the –w defines what to display on stdout after a completed and successful operation.
LIST DIRECTORY EXAMPLE
The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the LISTSTATUS
operation is used as user hduser1 to do a directory listing on /tmp/hduser1.
CREAT DIRECTORY EXAMPLE
The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the MKDIRS
operation is used as user hduser1 to create the directory /tmp/hduser1/test. The Boolean result of true means
the command executed successfully.
We can verify the creation of the directory with the hdfs command as show below:
This concludes the HTTPFS installation, configuration, and testing section of this document. The next section
covers how to integrate Knox with HTTPFS and Isilon.
KNOX
Knox enables the integration of enterprise identity management solutions and numerous perimeter security
features for REST/HTTP access to Hadoop and provides perimeter security for Hadoop services. Knox currently
supports YARN, WebHCAT, Oozie, HBase, Hive, and WebHDFS Hadoop services. The focus of this paper is on the
WebHDFS Hadoop service only. Just like HTTPFS, Knox can be installed on Kerberized and Non-Kerberized
Hadoop clusters.
Knox by default uses WebHDFS to perform any HDFS operation, but it can also leverage HTTPFS for the same HDFS
operations. Knox with HTTPFS provides a defense in depth strategy around REST/HTTP access to Hadoop and Isilon
OneFS.
This section covers the installation and configuration of Knox and LDAP services to work with HTTPFS in a
Kerberized cluster to provide secure REST/HTTP communications to Hadoop and Isilon OneFS.
INSTALLING KNOX
Knox is included with Hortonworks Data Platform by default. If you unselected the Knox service during installation
of HDP, just click the Actions button in Ambari and select the Knox service as shown below and click install.
CONFIGURING KNOX USING AMBARI
Knox can be managed through Ambari. Since HTTPFS runs on port 14000, a topology change to Knox for the
WebHDFS role is needed. Change the topology within the Advance topology section in Ambari/Knox, an example
topology configuration for the WebHDFS role is shown below:
The WebHDFS role is listed as a service in the topology configuration:
<service>
<role>WEBHDFS</role>
<url>http://<HTTPFS_HOST>:14000/webhdfs</url>
</service>
The HTTPFS_HOST should be replaced with the fully qualified name of the HTTPFS server. Port 14000 is the default
port for HTTPFS. If you made a change to the HTTPFS port assignment make sure to reflect the port change in the
Knox topology configuration as well. Everything else in the topology configuration can be left alone unless you
made other port changes to other services.
In the Ambari web interface, check that knox is configured as a proxy user in core-site.xml in the HDFS > Configs >
Advanced > Custom core site section and that the fully qualified domain name of the Knox host is set.
Note: If the properties that are referenced below do not already exist, do the following steps:
1. Click the Add Property link in the Custom core site area to open the Add Property window.
2. Add each value in the <name> part in the Key field.
3. Add each value in the <value> part in the Value field.
4. Click Add. Then click Save.
<property>
<name>hadoop.proxyuser.knox.host</name>
<value>n105.solarch.lab.emc.com</value>
</property>
<property>
<name>hadoop.proxyuser.knox.groups</name>
<value>users</value>
</property>
Make sure to restart HDFS and related components after making the above changes to core-site.xml.
CONFIGURING KNOX FOR LDAP
Knox can easily integrate with LDAP - just add an LDAP provider and associated parameters to the topology
configuration and you are done. An example LDAP provider (within the topology file) is shown below:
<provider>
<role>authentication</role>
<name>ShiroProvider</name>
<enabled>true</enabled>
<param>
<name>sessionTimeout</name>
<value>30</value>
</param>
<param>
<name>main.ldapRealm</name>
<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
</param>
<param>
<name>main.ldapRealm.userDnTemplate</name>
<value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.url</name>
<value>ldap://localhost:33389</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
<value>simple</value>
</param>
<param>
<name>urls./**</name>
<value>authcBasic</value>
</param>
</provider>
The LDAP provider directs Knox to use a directory service for authentication. In the example above, a local LDAP
Provider (port 33389) is being used for basic authentication for all urls. Make sure you use a supported LDAP
service compatible with Hortonworks and Isilon and to modify the Knox topology configuration to match your
deployed LDAP configuration if LDAP will be used with Knox.
Supported LDAP servers:
OpenLDAP
Active Directory w/ RFC2307 schema
extension
Apple OpenDirectory (OD)
Centrify
Oracle Directory Sever
ApacheDS
Red Hat Directory Server (RHDS)
Radiantlogic VDS
Novell Directory Server (NDS)
CONFIGURING KNOX FOR KERBEROS
If the Hadoop cluster is secure with Kerberos, you need to make sure Knox is configured for Kerberos as well to
avoid authentication errors with the HTTPFS gateway and backend Isilon cluster. The Kerberos configuration for
Knox is done under Advance gateway-site in Ambari. An example configuration is shown below:
The Advanced gateway-site configuration allows you to specify the Knox gateway port (e.g. 8444), the location of
the krb5.conf (Kerberos configuration file), and set the gateway to use Kerberos (set to true).
The Advance knox-env in Ambari allows you to set the Knox user and group accounts, Knox keytab path, and Knox
Principal Name. An example configuration is shown below:
Note: the knox user also needs to be created on Isilon. The knox user is a system account that gets created during
installation of knox. As with all other Hadoop server accounts, Isilon needs to have all service accounts defined as
a LOCAL PROVIDER in the appropriate HDFS Access Zone (e.g. hdp24) as shown below.
Create the knox user in the LOCAL HDFS Access Zone for your cluster in Isilon OneFS. Assign the knox user to the
hadoop primary group. Leave the knox account Disabled as shown above and below. The UID on Isilon does not
need to match the UID on the knox server.
TESTING KNOX AND ISILON IMPERSONATION DEFENSE
Now that Knox and HTTPFS have been installed and configured, we can begin end-to-end testing with Isilon in a
secure Kerberos Hadoop cluster deployment using either curl or a web browser.
GETTING A USER’S HOME DIRECTORY EXAMPLE
The screen shot above shows curl being used to connect to the Knox gateway on port 8444 with LDAP user
ldapuser1, the GETHOMEDIRECTORY operation is used to retrieve the home directory info for the LDAP user. The
network connection to the Knox gateway is secured with TLS.
Let’s see what happens when we use the same REST HTTP operation over a web browser that connects to the
Knox gateway:
First, the Knox gateway will prompt for user authentication, after entering the correct LDAP credentials, we can
see the result of the REST HTTP GETHOMEDIRECTORY operation in the web browser as shown below:
Note that the network connection to the Knox gateway is secured with TLS as shown below:
I used self-signed certificates for this lab deployment, so there is a certificate error shown, but the network
connection is securely encrypted with TLS and a strong AES cipher.
OPENING A FILE EXAMPLE
Unlike the GETHOMEDIRECTORY operation shown in the previous test example, the OPEN operation actually
accesses data - we want to employ more security checks when data is being access in cases like this.
The screen shot above shows curl being used to connect to the Knox gateway on port 8444 as LDAP user
ldapuser1, the OPEN operation then tries to open the contents of the project.txt file in /tmp/hduser1, but a Server
Error is encountered. Although Isilon is aware of ldapuser1, Isilon provides an added layer of security to check for
impersonation attacks.
In this case, the HTTPFS gateway (which runs as the httpfs user) is acting as a proxy for user ldapuser1 REST HTTP
request between Knox and Isilon. When Isilon receives the OPEN request from httpfs on behalf of ldapuser1,
Isilon checks its Proxy User settings to see if httpfs is authorized to impersonate as ldapuser1 or the group
ldapuser1 is in, i.e. the hadoop group.
Assuming it is within policy for httpfs to impersonate anyone in the hadoop group, we can update the Proxy User
settings on Isilon so httpfs is authorized to process requests from either the ldapuser1 user specifically or anyone
in the hadoop group. The example below depicts a proxy configuration for the hadoop group:
With the proxy user setting in place, we can successfully run the previous test example to open a file:
As show above, with the correct Isilon Proxy User Policy in place on Isilon, the Open operation is now
allowed. Note: If the /tmp/hduser1 directory on Isilon did not have global read permissions set, this
operation would fail as shown below:
Changing the permissions on the /tmp/hduser1 directory on Isilon caused a permission denied error for
the same previous test operation. This is a testament to the embedded Isilon OneFS security features
and a benefit of using a centralized HDFS storage solution like Isilon.
CREAT DIRECTORY EXAMPLE
The screen shot above shows curl being used to connect to the Knox gateway on port 8444, the
MKDIRS operation is used as user ldapuser1 to create the directory /tmp/ldaptest. The Boolean result
of true means the command executed successfully.
We can verify the creation of the directory with the hdfs command as show below:
This concludes the Knox installation, configuration, and testing section of this document.
Please see the Appendix for additional Knox/HTTPFS/Isilon test examples.
FINAL COMMENTS
This solution has been tested and certified by both DELL EMC and Hortonworks with success. One thing that was
noticed during testing of the integrated solution is that httpfs wants the header “content-type: octet” stipulated
on data upload requests. The content-type is support by both WebHDFS & HTTPFS, but HTTPFS will throw a 400
Bad Request Error.
For example, let say you create a test data_file on the cluster with the CREATE operation, you will need to use the
–H flag with curl to specify the Content-Type accordingly, see example below:
With the Content-Type specified, the data upload successfully completes with no errors. This is an HTTPFS
requirement and has nothing to do with either Knox or Isilon OneFS. We can use hdfs command to see the
content of the created data_file as shown below:
Reading the file via curl does not require anything special as shown below:
The port for Knox was changed to 8444 instead of the default 8443. Be aware when setting up HTTPS for the
Ambari web interface, the default port is also 8443. To avoid port conflicts, I recommend you carefully assign a
unique port to your Knox gateway; port 8444 is a safe bet.
APPENDIX
ADDITIONAL TESTING RESULTS
Below are additional testing examples for reference.
RENAMING A FILE EXAMPLE
The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the
RENAME operation to rename data_file to data_file_new, the Boolean result of true means the command
executed successfully.
We can verify further my listing the contents of the /tmp/ldaptest directory:
SETTING FILE REPLICATION EXAMPLE
The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute
the SETREPLICATION operation to set replication to 1 for data_file_new, the Boolean result of true means
the command executed successfully.
Note: Isilon will always respond with true for these kinds of requests, but the reality is that Isilon OneFS file
system is much more efficient than HDFS, Isilon uses erasure encoding instead of replication to maintain
high availability.
SETTING FILE PERMISSIONS EXAMPLE
The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute
the SETPERMISSION operation to 777 for data_file_new, the HTTP/1.1 200 OK result of means the
command executed successfully. The hdfs command shows that the permissions for this data file were
changed on Isilon accordingly.
APPENDING DATA TO A FILE EXAMPLE
The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute
the APPEND operation to add ApendInfo to data_file_new, the HTTP/1.1 200 OK result of means the
command executed successfully. The hdfs command shows the data was appended successfully on Isilon.
RECURSIVE DELETE EXAMPLE
The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute
the DELETE operation to recursively delete from /tmp/ldaptest on, the HTTP/1.1 200 OK result of means the
command executed successfully. The hdfs command shows the directory and its content was successfully
removed from Isilon.

More Related Content

What's hot

Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
Andrew Underwood
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...
Principled Technologies
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppet
glynnfoster
 
Boost your work with hardware from Intel
Boost your work with hardware from IntelBoost your work with hardware from Intel
Boost your work with hardware from Intel
Principled Technologies
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
Wei Gong
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
DataWorks Summit
 
Using ACFS as a Storage for EBS
Using ACFS as a Storage for EBSUsing ACFS as a Storage for EBS
Using ACFS as a Storage for EBSAndrejs Karpovs
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
inside-BigData.com
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre Storage
Intel IT Center
 
Reach new heights with Nutanix
Reach new heights with NutanixReach new heights with Nutanix
Reach new heights with Nutanix
Principled Technologies
 
What's New and Coming in Oracle ASM 12c Rel. 2 - by Jim Williams
What's New and Coming in Oracle ASM 12c Rel. 2 - by Jim WilliamsWhat's New and Coming in Oracle ASM 12c Rel. 2 - by Jim Williams
What's New and Coming in Oracle ASM 12c Rel. 2 - by Jim Williams
Markus Michalewicz
 
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionOptimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionAndrejs Karpovs
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
Intel IT Center
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Big-Data-as-a-Service (BDaaS) Meetup
 
Lustre Releases Update from LAD'14
Lustre Releases Update from LAD'14Lustre Releases Update from LAD'14
Lustre Releases Update from LAD'14
inside-BigData.com
 
Oow14 con7681-rman-1
Oow14 con7681-rman-1Oow14 con7681-rman-1
Oow14 con7681-rman-1
Dan Glasscock
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
Frank Munz
 
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
Principled Technologies
 

What's hot (20)

Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016Dell Lustre Storage Architecture Presentation - MBUG 2016
Dell Lustre Storage Architecture Presentation - MBUG 2016
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
 
EBS on ACFS white paper
EBS on ACFS white paperEBS on ACFS white paper
EBS on ACFS white paper
 
Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...Deploying Apache Spark and testing big data applications on servers powered b...
Deploying Apache Spark and testing big data applications on servers powered b...
 
Managing Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with PuppetManaging Oracle Solaris Systems with Puppet
Managing Oracle Solaris Systems with Puppet
 
Boost your work with hardware from Intel
Boost your work with hardware from IntelBoost your work with hardware from Intel
Boost your work with hardware from Intel
 
Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...Spectrum Scale - Diversified analytic solution based on various storage servi...
Spectrum Scale - Diversified analytic solution based on various storage servi...
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
Using ACFS as a Storage for EBS
Using ACFS as a Storage for EBSUsing ACFS as a Storage for EBS
Using ACFS as a Storage for EBS
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre Storage
 
Reach new heights with Nutanix
Reach new heights with NutanixReach new heights with Nutanix
Reach new heights with Nutanix
 
What's New and Coming in Oracle ASM 12c Rel. 2 - by Jim Williams
What's New and Coming in Oracle ASM 12c Rel. 2 - by Jim WilliamsWhat's New and Coming in Oracle ASM 12c Rel. 2 - by Jim Williams
What's New and Coming in Oracle ASM 12c Rel. 2 - by Jim Williams
 
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced CompressionOptimizing E-Business Suite Storage Using Oracle Advanced Compression
Optimizing E-Business Suite Storage Using Oracle Advanced Compression
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Lustre Releases Update from LAD'14
Lustre Releases Update from LAD'14Lustre Releases Update from LAD'14
Lustre Releases Update from LAD'14
 
Oow14 con7681-rman-1
Oow14 con7681-rman-1Oow14 con7681-rman-1
Oow14 con7681-rman-1
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
 
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
Symantec NetBackup 7.6 benchmark comparison: Data protection in a large-scale...
 

Similar to KNOX-HTTPFS-ONEFS-WP

Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linuxTRCK
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
Leons Petražickis
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Alluxio, Inc.
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
Shashwat Shriparv
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
Positive Hack Days
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
Farzad Nozarian
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
Giovanna Roda
 
LAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docxLAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docx
Karim Fathallah
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio, Inc.
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
Hdfs connector
Hdfs connectorHdfs connector
Hdfs connector
prudhvivreddy
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
Sandeep Raju
 
Hdfs design
Hdfs designHdfs design
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Alluxio, Inc.
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio, Inc.
 
Run a mapreduce job
Run a mapreduce jobRun a mapreduce job
Run a mapreduce job
subburaj raj
 
Please explain in steps how to Add an index-html file and how to Uploa.pdf
Please explain in steps how to Add an index-html file and how to Uploa.pdfPlease explain in steps how to Add an index-html file and how to Uploa.pdf
Please explain in steps how to Add an index-html file and how to Uploa.pdf
a1salesagency
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
Hdfs connector
Hdfs connectorHdfs connector
Hdfs connector
kiranvanga
 

Similar to KNOX-HTTPFS-ONEFS-WP (20)

Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linux
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop lab s...
 
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
LAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docxLAB1 Loading Data into HDFS.docx
LAB1 Loading Data into HDFS.docx
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
Hdfs connector
Hdfs connectorHdfs connector
Hdfs connector
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
Hdfs design
Hdfs designHdfs design
Hdfs design
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
 
Run a mapreduce job
Run a mapreduce jobRun a mapreduce job
Run a mapreduce job
 
Please explain in steps how to Add an index-html file and how to Uploa.pdf
Please explain in steps how to Add an index-html file and how to Uploa.pdfPlease explain in steps how to Add an index-html file and how to Uploa.pdf
Please explain in steps how to Add an index-html file and how to Uploa.pdf
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hdfs connector
Hdfs connectorHdfs connector
Hdfs connector
 

More from Boni Bruno

Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
Boni Bruno
 
20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800 20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800
Boni Bruno
 
Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018
Boni Bruno
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBoni Bruno
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonEMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonBoni Bruno
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APM
Boni Bruno
 
Decreasing Incident Response Time
Decreasing Incident Response TimeDecreasing Incident Response Time
Decreasing Incident Response Time
Boni Bruno
 

More from Boni Bruno (9)

Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800 20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800
 
Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018
 
Splunk-EMC
Splunk-EMCSplunk-EMC
Splunk-EMC
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-Validation
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC IsilonEMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APM
 
Decreasing Incident Response Time
Decreasing Incident Response TimeDecreasing Incident Response Time
Decreasing Incident Response Time
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 

KNOX-HTTPFS-ONEFS-WP

  • 1. IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions Architect DELL EMC ABSTRACT This paper describes implementing HTTPFS and Knox together with Isilon OneFS to enhance HDFS access security. This integrated solution has been tested and certified with Hortonworks on HDP v2.4 and Isilon OneFS v 8.0.0.3.
  • 2. CONTENTS Introduction...................................................................................................................................................................3 WebHDFS REST API....................................................................................................................................................3 WebHDFS Port Assignment in Isilon OneFS...............................................................................................................5 WebHDFS Examples with ISILON...............................................................................................................................5 WebHDFS Security Concerns .....................................................................................................................................8 HTTPFS...........................................................................................................................................................................9 Installing HTTPFS........................................................................................................................................................9 Configuring HTTPFS .................................................................................................................................................10 Configuring HTTPFS for Kerberos ............................................................................................................................13 Running and Stopping HTTPFS.................................................................................................................................19 Configuring HTTPFS Auto-Start................................................................................................................................19 Testing HTTPFS ........................................................................................................................................................22 Knox.............................................................................................................................................................................24 Installing Knox..........................................................................................................................................................24 Configuring Knox using Ambari ...............................................................................................................................24 Configuring Knox for LDAP.......................................................................................................................................26 Configuring Knox for Kerberos.................................................................................................................................28 Testing Knox and Isilon Impersonation Defense .....................................................................................................30 Final Comments.......................................................................................................................................................35 Appendix......................................................................................................................................................................37 Additional Testing Results .......................................................................................................................................38
  • 3. INTRODUCTION Hadoop provides a Java native API to support file system operations such as create, rename or delete files and directories, open, read or write files, set permissions, etc. This is great for applications running within the Hadoop cluster, but there may be use cases where an external application needs to make such file system operations on files stored on HDFS as well. Hortonworks developed the WebHDFS REST API to support these requirements based on standard REST functionalities. WebHDFS REST APIs support a complete File System / File Context interface for HDFS. WEBHDFS REST API WEBHDFS IS BASED ON HTTP OPERATIONS LIKE GET, PUT, POST AND DELETE. WEBHDFS OPERATIONS LIKE OPEN, GETFILESTATUS, LISTSTATUS ARE USING HTTP GET, OTHER OPERATIONS LIKE CREATE, MKDIRS, RENAME, SETPERMISSIONS ARE RELYING ON HTTP PUT. APPEND OPERATIONS ARE BASED ON HTTP POST, WHILE DELETE IS USING HTTP DELETE. AUTHENTICATION CAN BE BASED ON USER NAME, QUERY PARAMETER (AS PART OF THE HTTP QUERY STRING) OR IF SECURITY IS ENABLED, THROUGH KERBEROS. Web HDFS is enabled in a Hadoop cluster by defining the following property in hdfs-site.xml: <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> If using Ambari, enable WebHDFS under the General Settings of HDFS as shown below:
  • 4. When using Isilon as a centralized HDFS storage repository for a given Hadoop Cluster, all namenode and datanode functions must be configured to run on Isilon for the entire Hadoop cluster. By design, WebHDFS needs access to all nodes in the cluster. Before the WebHDFS interface on Isilon can be used by the Hadoop Cluster, you must enable WebHDFS in the Protocol Settings for HDFS on the designated Access Zone in Isilon - this is easily done in the OneFS GUI. In the example below, hdp24 is the HDFS Access Zone for the Hadoop Cluster. Note the check mark next to ENABLE WebHDFS access. It is not sufficient to just enable WebHDFS in Ambari. Isilon must also be configured with WebHDFS enabled so end to end WebHDFS communication can work in the Hadoop cluster. If multiple Access Zones are defined on Isilon, make sure to enable WebHDFS as needed on each access zone.
  • 5. WEBHDFS PORT ASSIGNMENT IN ISILON ONEFS All references to Hadoop host hdp24 in this document refer to a defined SmartConnect HDFS Access Zone on Isilon. TCP Port 8082 is the port OneFS uses for WebHDFS. It is important that the hdfs-site.xml file in the Hadoop Cluster reflect the correct port designation for HTTP access to Isilon. See Ambari screen shot below for reference. WEBHDFS EXAMPLES WITH ISILON Assuming the Hadoop cluster is up and running with Isilon and WebHDFS has been properly enabled for the Hadoop cluster, we are ready to test WebHDFS. CURL is a great command line tool for transferring data using various protocols, including HTTP/HTTPS. The examples below use curl to invoke the WebHDFS REST API available in Isilon OneFS to conduction various file system operations. Again, all references to hdp24 used in the curl commands below refer to the SmartConnect HDFS Access Zone on Isilon and not some edge node in the cluster. GETTING FILE STATUS EXAMPLE The screen shot above shows curl being used to connect to Isilon’s WebHDFS interface on port 8082, the GETFILESTATUS operation is used as user hduser1 to retrieve info on the projects.txt file. Note: The projects.txt file is a test file I created. It is not part of the Hortonworks software.
  • 6. A web browser may also be used to get projects.txt file status from Isilon WebHDFS as shown below: This is similar to executing hdfs dfs –ls /user/hduser1/projects.txt from a Hadoop client node n107 as shown below: This quick example shows the flexibility of using WebHDFS. It provides a simple way to execute Hadoop file system operations by an external client that does not necessarily run on the Hadoop cluster itself. Let’s look at another example. CREATING A DIRECTORY EXAMPLE Here the MKDIRS operation on a different client node n105 is used with PUT to create the directory /tmp/hduser as user hduser1 on Isilon. We can tell by the true Boolean result the operation was successful. We can also check the result by using hdfs to see the directory on Isilon as shown below:
  • 7. OPEN A FILE EXAMPLE In the example above, the OPEN operation is used with curl to display the text string “Knox HTTPFS Isilon Project” within the /tmp/hduser1/project.txt file. As shown before, a web browser can be used to access the file as well. Here a browser is configured to automatically open text files in notepad, so accessing the WebHDFS API on Isilon as shown below will open the contents of /tmp/hduser1/project.txt in notepad directly. To validate the contents from within the cluster, we can use hdfs as show below:
  • 8. I’m only scratching the surface with the examples above; there are various operations you can execute with WebHDFS. You can easily use WebHDFS to append data to files, rename files or directories, create new files, etc. See the Appendix for many more examples. It should be apparent that WebHDFS provides a simple, standard way to execute Hadoop file system operations with external clients that do not necessarily run within the Hadoop cluster itself. WEBHDFS SECURITY CONCERNS SOMETHING WORTH POINTING OUT WITH THE ABOVE EXAMPLES AND WITH WEBHDFS IN GENERAL – CLIENTS ARE DIRECTLY ACCESSING THE NAMENODES AND DATANODES VIA PREDEFINED PORTS. THIS CAN BE SEEN AS A SECURITY ISSUE FOR MANY ORGANIZATIONS WANTING TO ENABLE EXTERNAL WEBHDFS ACCESS TO THEIR HADOOP INFRASTRUCTURE. Many organizations do not want their Hadoop infrastructure accessed directly from external clients. As seen thus far, external clients can use WebHDFS to directly access the actual ports namenodes and datanodes are listening on in the Hadoop Cluster and leverage the WebHDFS REST API to conduct various file system operations. Although firewalls can filter access from external clients, the ports are still being directly access. As a result, firewalls do not prohibit the execution of various WebHDFS operations. The solution to this issue, in many cases, is to enable Kerberos in the Hadoop cluster and deploy Secure REST API Gateways that enforce strong authentication and access control to WebHDFS. The remainder of this document focuses on using HTTPFS and Knox in conjunction with Isilon OneFS to provide a secure WebHDFS deployment with Hadoop. A diagram of the secure architecture is shown below for reference.
  • 9. HTTPFS The introduction section of this document provides an overview of WebHDFS and demonstrates how the WebHDFS REST APIs support a complete File System / File Context interface for HDFS. WebHDFS is efficient as it streams data from each datanode and can support external clients like curl or web browsers to extend data access beyond the Hadoop cluster. Since WebHDFS needs access to all nodes in the cluster by design, WebHDFS inherently establishes a wider foot print for HDFS access in a Hadoop cluster since clients can access HDFS over HTTP/HTTPS. To help minimize the size of the foot print to clients, a gateway solution is needed that provides a similar File System / File Context interface for HDFS, and this is where HTTPFS comes in to play. HTTPFS is a service that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). HTTPFS can be used to provide a gateway interface, i.e. choke point, to Isilon and limit broad HDFS access from external clients to the Hadoop cluster. HTTPFS can also be integrated with Knox to improve service level authorization, LDAP & AD integration, and overall perimeter security. See the Knox section of this document for more details. The remainder of this section covers the installation and configuration of HTTPFS with Isilon. INSTALLING HTTPFS HTTPFS can be installed on Ambari server or a worker node, for production deployments, deploying on a dedicated worker node is a best practice. To install HTTPFS: yum install hadoop-httpfs (Note: existing HWX repos are hadoop-httpfs aware) Note: The HTTPFS service is a tomcat application that relies on having the Hadoop libraries and configuration available, so make sure to install HTTPFS on an edge node that is being managed by Ambari. After you install HTTPFS, the directories below will be created on the HTTPFS server: /usr/hdp/2.x.x.x-x/hadoop-httpfs /etc/hadoop-httpfs/conf /etc/hadoop-httpfs/tomcat-deployment
  • 10. CONFIGURING HTTPFS If you change directories to /usr/hdp on your HTTPFS server and list the files there, you will see a directory with the version number of your existing HDP release. Make note of it so you can set the current version for httpfs. Set the version for current with the following command: hdp-select set hadoop-httpfs 2.x.x.x-x (replace the x with your HDP release) The installation of httpfs above deploys scripts which have some hardcoded values that need to be changed. Adjust the /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh script: #!/bin/bash # Autodetect JAVA_HOME if not defined if [ -e /usr/libexec/bigtop-detect-javahome ]; then . /usr/libexec/bigtop-detect-javahome elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then . /usr/lib/bigtop-utils/bigtop-detect-javahome fi ### Added to assist with locating the right configuration directory export HTTPFS_CONFIG=/etc/hadoop-httpfs/conf ### Remove the original HARD CODED Version reference. Next, you need to create the following symbolic links: cd /usr/hdp/current/hadoop-httpfs ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf ln -s ../hadoop/libexec libexec
  • 11. Like all the other Hadoop components, httpfs follows the use of *-env.sh files to control the startup environment. Above, in the httpfs.sh script we set the location of the configuration directory, this configuration directory is used to find and load the httpfs-env.sh file. The httpfs-env.sh file needs to be modified as shown below: # Add exports to control and set the Catalina directories for starting and finding the httpfs application export CATALINA_BASE=/usr/hdp/current/hadoop-httpfs export HTTPFS_CATALINA_HOME=/etc/hadoop-httpfs/tomcat-deployment # Set a log directory that matches your standards export HTTPFS_LOG=/var/log/hadoop/httpfs # Set a tmp directory for httpfs to store interim files export HTTPFS_TEMP=/tmp/httpfs The default port for httpfs is TCP 14000. If you need to change the port for httpfs, add the following export to the above httpfs-env.sh file on the HTTPFS server: export HTTPFS_HTTP_PORT=<new_port> In the Ambari web interface, add httpfs as a proxy user in core-site.xml in the HDFS > Configs > Advanced > Custom core site section: Note: If the properties that are referenced below do not already exist, do the following steps: 1. Click the Add Property link in the Custom core site area to open the Add Property window. 2. Add each value in the <name> part in the Key field. 3. Add each value in the <value> part in the Value field. 4. Click Add. Then click Save.
  • 12. <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> Make sure to restart HDFS and related components after making the above changes to core-site.xml. At this point HTTPFS is configured to work with a non-Kerberos Hadoop cluster. If your cluster is not secured with Kerberos, you can skip the following section CONFIGURING HTTPFS FOR KERBEROS and proceed to RUNNING AND STOPPING HTTPFS and TESTING HTTPFS.
  • 13. CONFIGURING HTTPFS FOR KERBEROS Ambari does not automate the configuration of HTTPFS to support Kerberos. If your Hadoop cluster was secured with Kerberos using Ambari, you will need to create some needed keytabs and modify the httpfs-site.xml before HTTPFS will work in a secure Kerberos Hadoop cluster. The following assumptions are made for this section on configuring HTTPFS for Kerberos: 1. HTTPFS has been installed, configured, and verified to be working prior to enabling Kerberos. 2. Kerberos was enabled using Ambari and an MIT KDC and Isilon is configured and verified for Kerberos. Both httpfs and HTTP service principals must be created for HTTPFS if they do not already exist. Create the httpfs and HTTP (see note below) principals: kadmin: addprinc -randkey httpfs/fully.qualified.domain.name@EXAMPLE-REALM.COM kadmin: addprinc -randkey HTTP/fully.qualified.domain.name@EXAMPLE-REALM.COM Note: HTTP principal and keytab may already exist as this is typically needed for other Hadoop services in a secure Kerberos Hadoop cluster deployment. HTTP must be in CAPITAL LETTERS. Create the keytab files for both httpfs and HTTP (see note above) principals: kadmin -q "ktadd -k /etc/security/keytabs/httpfs.service.keytab httpfs/fully.qualified.domain.name@EXAMPLE-REALM.COM" kadmin -q "ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/fully.qualified.domain.name@EXAMPLE-REALM.COM" Note: The spnego keytab above only needs to be created if it does not already exist on the node running HTTPFS. Merge the two keytab files into a single keytab file:
  • 14. ktutil: rkt /etc/security/keytabs/httpfs.service.keytab ktutil: rkt /etc/security/keytabs/spnego.service.keytab ktutil: wkt /etc/security/ketyabs/httpfs-http.service.keytab ktutil: quit The above will create a file named httpfs-http.service.keytab in /etc/security/keytabs. Note: This keytab should be copied to the HTTPFS node. Test that the merged keytab file works: klist -kt /etc/security/keytabs/httpfs-http.service.keytab The above command should list both hdfs and HTTP principals for the httpfs-http.service.keytab. Below is an example output from a test cluster: Change the ownership and permissions of the /etc/security/keytabs/httpfs-http.service.keytab file: chown httpfs:hadoop /etc/security/keytabs/httpfs-http.service.keytab chmod 400 /etc/security/keytabs/httpfs-http.service.keytab Edit the HTTPFS server httpfs-site.xml configuration file in the HTTPFS configuration directory by setting the following properties:
  • 15. httpfs.authentication.type: kerberos httpfs.hadoop.authentication.type: kerberos httpfs.authentication.kerberos.principal: HTTP/<FQDN of HTTPFS host>@< YOUR-REALM.COM> httpfs.authentication.kerberos.keytab: /etc/hadoop-httpfs/conf/httpfs-http.service.keytab httpfs.hadoop.authentication.kerberos.principal: httpfs/<FQDN of HTTPFS host>@< YOUR-REALM.COM> httpfs.hadoop.authentication.kerberos.keytab: /etc/security/keytabs/httpfs-http.service.keytab httpfs.authentication.kerberos.name.rules: Use the value configured for 'hadoop.security.auth_to_local' in Ambari's HDFS Configs under "Advanced Core-Site". An example httpfs-site.xml is listed below, with the relevant Kerberos information highlighted in red: <configuration> <!-- HTTPFS proxy user setting --> <property> <name>httpfs.proxyuser.knox.hosts</name> <value>*</value> </property> <property> <name>httpfs.proxyuser.knox.groups</name> <value>*</value> </property> <!-- HUE proxy user setting --> <property> <name>httpfs.proxyuser.hue.hosts</name> <value>*</value>
  • 17. <property> <name>httpfs.authentication.kerberos.principal</name> <value>HTTP/n105.solarch.lab.emc.com@SOLARCH.LAB.EMC.COM</value> </property> <property> <name>httpfs.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/httpfs-http.service.keytab</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.principal</name> <value>httpfs/n105.solarch.lab.emc.com@SOLARCH.LAB.EMC.COM</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/httpfs-http.service.keytab</value> </property> <property> <name>httpfs.authentication.kerberos.name.rules</name> <value> RULE:[1:$1@$0](accumulo@SOLARCH.LAB.EMC.COM)s/.*/accumulo/ RULE:[1:$1@$0](ambari-qa@SOLARCH.LAB.EMC.COM)s/.*/ambari-qa/ RULE:[1:$1@$0](hbase@SOLARCH.LAB.EMC.COM)s/.*/hbase/ RULE:[1:$1@$0](hdfs@SOLARCH.LAB.EMC.COM)s/.*/hdfs/ RULE:[1:$1@$0](spark@SOLARCH.LAB.EMC.COM)s/.*/spark/
  • 18. RULE:[1:$1@$0](tracer@SOLARCH.LAB.EMC.COM)s/.*/accumulo/ RULE:[1:$1@$0](.*@SOLARCH.LAB.EMC.COM)s/@.*// RULE:[2:$1@$0](accumulo@SOLARCH.LAB.EMC.COM)s/.*/accumulo/ RULE:[2:$1@$0](amshbase@SOLARCH.LAB.EMC.COM)s/.*/ams/ RULE:[2:$1@$0](amszk@SOLARCH.LAB.EMC.COM)s/.*/ams/ RULE:[2:$1@$0](dn@SOLARCH.LAB.EMC.COM)s/.*/hdfs/ RULE:[2:$1@$0](falcon@SOLARCH.LAB.EMC.COM)s/.*/falcon/ RULE:[2:$1@$0](hbase@SOLARCH.LAB.EMC.COM)s/.*/hbase/ RULE:[2:$1@$0](hdfs@SOLARCH.LAB.EMC.COM)s/.*/hdfs/ RULE:[2:$1@$0](hive@SOLARCH.LAB.EMC.COM)s/.*/hive/ RULE:[2:$1@$0](knox@SOLARCH.LAB.EMC.COM)s/.*/knox/ RULE:[2:$1@$0](httpfs@SOLARCH.LAB.EMC.COM)s/.*/httpfs/ RULE:[2:$1@$0](mapred@SOLARCH.LAB.EMC.COM)s/.*/mapred/ RULE:[2:$1@$0](nn@SOLARCH.LAB.EMC.COM)s/.*/hdfs/ RULE:[2:$1@$0](oozie@SOLARCH.LAB.EMC.COM)s/.*/oozie/ RULE:[2:$1@$0](yarn@SOLARCH.LAB.EMC.COM)s/.*/yarn/ DEFAULT </value> </property> </configuration> This concludes the configuration work needed for HTTPFS to work in a secure Kerberos Hadoop cluster. Follow the instructions in the next sections to start and test HTTPFS with Isilon.
  • 19. RUNNING AND STOPPING HTTPFS Executing httpfs is simple. To start: cd /usr/hdp/current/hadoop-httpfs/sbin ./httpfs.sh start To stop: ./httpfs.sh stop CONFIGURING HTTPFS AUTO-START As the root user, create the following hadoop-httpfs script in /etc/init.d: #!/bin/bash hdp-select set hadoop-httpfs 2.x.x.x.x-x # See how we were called. case "$1" in start) /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh start ;; stop) /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh stop ;; *) echo $"Usage: $prog {start|stop|restart}" esac
  • 20. As root user: chmod 755 /etc/init.d/hadoop-httpfs chkconfig --add hadoop-httpfs # Start Service service hadoop-httpfs start # Stop Service service hadoop-httpfs stop This method will run the service as the httpfs user. Ensure that the httpfs user has permissions to write to the log directory /var/log/hadoop/httpfs. The correct permission settings are shown below: Note: the httpfs user also needs to be created on Isilon. The httpfs user is a system account that gets created during installation of httpfs. As with all other Hadoop server accounts, Isilon needs to have all service accounts defined as a LOCAL PROVIDER in the appropriate HDFS Access Zone (e.g. hdp24) as shown below.
  • 21. Create the httpfs user in the LOCAL HDFS Access Zone for your cluster in Isilon OneFS. Assign the httpfs user to the hadoop primary group. Leave the httpfs account Disabled as shown above and below. The UID on Isilon does not need to match the UID on the httpfs server.
  • 22. TESTING HTTPFS As seen in the introduction section of this document, the curl command is an excellent tool for testing WebHDFS; the same is true for testing HTTPFS. The default port for httpfs is TCP PORT 14000. The tests below show how HTTPFS and Isilon OneFS can be used together in a Hadoop cluster. The requests made on port 14000 on the HTTPFS gateway are passed to Isilon. The HTTPFS gateway is configured for Kerberos as is the Isilon HDFS Access Zone. The Kerberos configuration is optional, but recommended for production Hadoop deployments to improve cluster security. The testing below is with Kerberos enabled. So make sure you have obtained and cached an appropriate Kerberos ticket-granting ticket before running the commands. Use klist to verify you have a ticket cached as shown below: GETTING A USER’S HOME DIRECTORY EXAMPLE The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the GETHOMEDIRECTORY operation is used on user hduser1 to retrieve the home directory info. HTTP enables GSS-Negotiate authentication. It is primarily meant as a support for Kerberos5 authentication but may be also used along with another authentication method. GSS-Negotiate is specified with the –-negotiate option with curl and the –w defines what to display on stdout after a completed and successful operation.
  • 23. LIST DIRECTORY EXAMPLE The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the LISTSTATUS operation is used as user hduser1 to do a directory listing on /tmp/hduser1. CREAT DIRECTORY EXAMPLE The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the MKDIRS operation is used as user hduser1 to create the directory /tmp/hduser1/test. The Boolean result of true means the command executed successfully. We can verify the creation of the directory with the hdfs command as show below: This concludes the HTTPFS installation, configuration, and testing section of this document. The next section covers how to integrate Knox with HTTPFS and Isilon.
  • 24. KNOX Knox enables the integration of enterprise identity management solutions and numerous perimeter security features for REST/HTTP access to Hadoop and provides perimeter security for Hadoop services. Knox currently supports YARN, WebHCAT, Oozie, HBase, Hive, and WebHDFS Hadoop services. The focus of this paper is on the WebHDFS Hadoop service only. Just like HTTPFS, Knox can be installed on Kerberized and Non-Kerberized Hadoop clusters. Knox by default uses WebHDFS to perform any HDFS operation, but it can also leverage HTTPFS for the same HDFS operations. Knox with HTTPFS provides a defense in depth strategy around REST/HTTP access to Hadoop and Isilon OneFS. This section covers the installation and configuration of Knox and LDAP services to work with HTTPFS in a Kerberized cluster to provide secure REST/HTTP communications to Hadoop and Isilon OneFS. INSTALLING KNOX Knox is included with Hortonworks Data Platform by default. If you unselected the Knox service during installation of HDP, just click the Actions button in Ambari and select the Knox service as shown below and click install. CONFIGURING KNOX USING AMBARI Knox can be managed through Ambari. Since HTTPFS runs on port 14000, a topology change to Knox for the WebHDFS role is needed. Change the topology within the Advance topology section in Ambari/Knox, an example topology configuration for the WebHDFS role is shown below:
  • 25. The WebHDFS role is listed as a service in the topology configuration: <service> <role>WEBHDFS</role> <url>http://<HTTPFS_HOST>:14000/webhdfs</url> </service> The HTTPFS_HOST should be replaced with the fully qualified name of the HTTPFS server. Port 14000 is the default port for HTTPFS. If you made a change to the HTTPFS port assignment make sure to reflect the port change in the Knox topology configuration as well. Everything else in the topology configuration can be left alone unless you made other port changes to other services. In the Ambari web interface, check that knox is configured as a proxy user in core-site.xml in the HDFS > Configs > Advanced > Custom core site section and that the fully qualified domain name of the Knox host is set. Note: If the properties that are referenced below do not already exist, do the following steps: 1. Click the Add Property link in the Custom core site area to open the Add Property window. 2. Add each value in the <name> part in the Key field. 3. Add each value in the <value> part in the Value field. 4. Click Add. Then click Save.
  • 26. <property> <name>hadoop.proxyuser.knox.host</name> <value>n105.solarch.lab.emc.com</value> </property> <property> <name>hadoop.proxyuser.knox.groups</name> <value>users</value> </property> Make sure to restart HDFS and related components after making the above changes to core-site.xml. CONFIGURING KNOX FOR LDAP Knox can easily integrate with LDAP - just add an LDAP provider and associated parameters to the topology configuration and you are done. An example LDAP provider (within the topology file) is shown below: <provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>sessionTimeout</name> <value>30</value> </param> <param> <name>main.ldapRealm</name> <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name>
  • 27. <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param> </provider> The LDAP provider directs Knox to use a directory service for authentication. In the example above, a local LDAP Provider (port 33389) is being used for basic authentication for all urls. Make sure you use a supported LDAP service compatible with Hortonworks and Isilon and to modify the Knox topology configuration to match your deployed LDAP configuration if LDAP will be used with Knox. Supported LDAP servers: OpenLDAP Active Directory w/ RFC2307 schema extension Apple OpenDirectory (OD) Centrify Oracle Directory Sever ApacheDS Red Hat Directory Server (RHDS) Radiantlogic VDS Novell Directory Server (NDS)
  • 28. CONFIGURING KNOX FOR KERBEROS If the Hadoop cluster is secure with Kerberos, you need to make sure Knox is configured for Kerberos as well to avoid authentication errors with the HTTPFS gateway and backend Isilon cluster. The Kerberos configuration for Knox is done under Advance gateway-site in Ambari. An example configuration is shown below: The Advanced gateway-site configuration allows you to specify the Knox gateway port (e.g. 8444), the location of the krb5.conf (Kerberos configuration file), and set the gateway to use Kerberos (set to true). The Advance knox-env in Ambari allows you to set the Knox user and group accounts, Knox keytab path, and Knox Principal Name. An example configuration is shown below: Note: the knox user also needs to be created on Isilon. The knox user is a system account that gets created during installation of knox. As with all other Hadoop server accounts, Isilon needs to have all service accounts defined as a LOCAL PROVIDER in the appropriate HDFS Access Zone (e.g. hdp24) as shown below.
  • 29. Create the knox user in the LOCAL HDFS Access Zone for your cluster in Isilon OneFS. Assign the knox user to the hadoop primary group. Leave the knox account Disabled as shown above and below. The UID on Isilon does not need to match the UID on the knox server.
  • 30. TESTING KNOX AND ISILON IMPERSONATION DEFENSE Now that Knox and HTTPFS have been installed and configured, we can begin end-to-end testing with Isilon in a secure Kerberos Hadoop cluster deployment using either curl or a web browser. GETTING A USER’S HOME DIRECTORY EXAMPLE The screen shot above shows curl being used to connect to the Knox gateway on port 8444 with LDAP user ldapuser1, the GETHOMEDIRECTORY operation is used to retrieve the home directory info for the LDAP user. The network connection to the Knox gateway is secured with TLS. Let’s see what happens when we use the same REST HTTP operation over a web browser that connects to the Knox gateway: First, the Knox gateway will prompt for user authentication, after entering the correct LDAP credentials, we can see the result of the REST HTTP GETHOMEDIRECTORY operation in the web browser as shown below:
  • 31. Note that the network connection to the Knox gateway is secured with TLS as shown below: I used self-signed certificates for this lab deployment, so there is a certificate error shown, but the network connection is securely encrypted with TLS and a strong AES cipher. OPENING A FILE EXAMPLE Unlike the GETHOMEDIRECTORY operation shown in the previous test example, the OPEN operation actually accesses data - we want to employ more security checks when data is being access in cases like this.
  • 32. The screen shot above shows curl being used to connect to the Knox gateway on port 8444 as LDAP user ldapuser1, the OPEN operation then tries to open the contents of the project.txt file in /tmp/hduser1, but a Server Error is encountered. Although Isilon is aware of ldapuser1, Isilon provides an added layer of security to check for impersonation attacks. In this case, the HTTPFS gateway (which runs as the httpfs user) is acting as a proxy for user ldapuser1 REST HTTP request between Knox and Isilon. When Isilon receives the OPEN request from httpfs on behalf of ldapuser1, Isilon checks its Proxy User settings to see if httpfs is authorized to impersonate as ldapuser1 or the group ldapuser1 is in, i.e. the hadoop group. Assuming it is within policy for httpfs to impersonate anyone in the hadoop group, we can update the Proxy User settings on Isilon so httpfs is authorized to process requests from either the ldapuser1 user specifically or anyone in the hadoop group. The example below depicts a proxy configuration for the hadoop group: With the proxy user setting in place, we can successfully run the previous test example to open a file:
  • 33. As show above, with the correct Isilon Proxy User Policy in place on Isilon, the Open operation is now allowed. Note: If the /tmp/hduser1 directory on Isilon did not have global read permissions set, this operation would fail as shown below: Changing the permissions on the /tmp/hduser1 directory on Isilon caused a permission denied error for the same previous test operation. This is a testament to the embedded Isilon OneFS security features and a benefit of using a centralized HDFS storage solution like Isilon. CREAT DIRECTORY EXAMPLE The screen shot above shows curl being used to connect to the Knox gateway on port 8444, the MKDIRS operation is used as user ldapuser1 to create the directory /tmp/ldaptest. The Boolean result of true means the command executed successfully.
  • 34. We can verify the creation of the directory with the hdfs command as show below: This concludes the Knox installation, configuration, and testing section of this document. Please see the Appendix for additional Knox/HTTPFS/Isilon test examples.
  • 35. FINAL COMMENTS This solution has been tested and certified by both DELL EMC and Hortonworks with success. One thing that was noticed during testing of the integrated solution is that httpfs wants the header “content-type: octet” stipulated on data upload requests. The content-type is support by both WebHDFS & HTTPFS, but HTTPFS will throw a 400 Bad Request Error. For example, let say you create a test data_file on the cluster with the CREATE operation, you will need to use the –H flag with curl to specify the Content-Type accordingly, see example below: With the Content-Type specified, the data upload successfully completes with no errors. This is an HTTPFS requirement and has nothing to do with either Knox or Isilon OneFS. We can use hdfs command to see the content of the created data_file as shown below:
  • 36. Reading the file via curl does not require anything special as shown below: The port for Knox was changed to 8444 instead of the default 8443. Be aware when setting up HTTPS for the Ambari web interface, the default port is also 8443. To avoid port conflicts, I recommend you carefully assign a unique port to your Knox gateway; port 8444 is a safe bet.
  • 38. ADDITIONAL TESTING RESULTS Below are additional testing examples for reference. RENAMING A FILE EXAMPLE The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the RENAME operation to rename data_file to data_file_new, the Boolean result of true means the command executed successfully. We can verify further my listing the contents of the /tmp/ldaptest directory: SETTING FILE REPLICATION EXAMPLE The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the SETREPLICATION operation to set replication to 1 for data_file_new, the Boolean result of true means the command executed successfully.
  • 39. Note: Isilon will always respond with true for these kinds of requests, but the reality is that Isilon OneFS file system is much more efficient than HDFS, Isilon uses erasure encoding instead of replication to maintain high availability. SETTING FILE PERMISSIONS EXAMPLE The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the SETPERMISSION operation to 777 for data_file_new, the HTTP/1.1 200 OK result of means the command executed successfully. The hdfs command shows that the permissions for this data file were changed on Isilon accordingly. APPENDING DATA TO A FILE EXAMPLE
  • 40. The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the APPEND operation to add ApendInfo to data_file_new, the HTTP/1.1 200 OK result of means the command executed successfully. The hdfs command shows the data was appended successfully on Isilon. RECURSIVE DELETE EXAMPLE The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the DELETE operation to recursively delete from /tmp/ldaptest on, the HTTP/1.1 200 OK result of means the command executed successfully. The hdfs command shows the directory and its content was successfully removed from Isilon.