SlideShare a Scribd company logo
1 of 104
Download to read offline
#RememberRuddy
_____________________________
EMC ISILON HADOOP STARTER KIT
Deploying IBM BigInsights v 4.0 with EMC ISILON
Release 1.0
October, 2015
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 2
To learn more about how EMC products, services, and solutions can help solve your
business and IT challenges, contact your local representative or authorized reseller,
visit www.emc.com, or explore and compare products in the EMC Store
Copyright © 2015 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as of its publication date.
The information is subject to change without notice.
The information in this publication is provided “as is.” EMC Corporation makes no
representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or fitness
for a particular purpose.
Use, copying, and distribution of any EMC software described in this publication
requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation
Trademarks on EMC.com.
EMC are registered trademarks or trademarks of EMC, Inc. in the United States
and/or other jurisdictions. All other trademarks used herein are the property of their
respective owners.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 3
Contents
INTRODUCTION........................................................................................6
IBM & EMC Technology Highlights ........................................................................ 6
Audience........................................................................................................... 7
Apache Hadoop Projects...................................................................................... 7
IBM Open Platform and the Ambari Manager ......................................................... 8
Isilon Scale-Out NAS for HDFS............................................................................. 8
Overview of Isilon Scale-Out NAS for Big Data....................................................... 9
PRE-INSTALLATION CHECKLIST .............................................................10
Supported Software Versions............................................................................. 10
Hardware Requirements and Suggested Hadoop Service Layout............................. 10
INSTALLATION OVERVIEW .....................................................................12
Prerequisites ................................................................................................... 12
Isilon Scale-Out NAS or Isilon OneFS Simulator ........................................................... 12
Linux...................................................................................................................... 13
Networking ............................................................................................................. 13
DNS ....................................................................................................................... 14
Other ..................................................................................................................... 15
Prepare Isilon .................................................................................................. 15
Assumptions............................................................................................................ 15
SmartConnect for HDFS ............................................................................................ 16
OneFS Access Zones................................................................................................. 17
Sharing Data between Access Zones .......................................................................... 18
User & Group ID’s .................................................................................................... 19
Configuring Isilon for HDFS ....................................................................................... 19
Create DNS Records for Isilon.................................................................................... 25
Prepare Linux Compute Nodes ........................................................................... 25
Linux Operating System packages needed for IBM BigInsights:...................................... 25
Enable NTP on all Linux Compute nodes...................................................................... 26
Disable SELinux on each node if enabled before installing Ambari. ................................. 26
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 4
Check UMASK Settings ............................................................................................. 26
Set ulimit Properties................................................................................................. 27
Kernel Modifications ................................................................................................. 27
Create IBM BigInsights Hadoop Users and Groups........................................................ 27
Configure Passwordless SSH...................................................................................... 28
Additional Linux Packages to Install............................................................................ 28
Test DNS Resolution................................................................................................. 29
Edit sudoers file on all Linux compute nodes................................................................ 29
INSTALLING IBM OPEN PLATFORM (OP) ................................................29
Download IBM Open Platform Software............................................................... 29
Create IBM Open Platform Repository ................................................................. 30
Validating IBM Open Platform Install................................................................... 38
Adding a Hadoop User ...................................................................................... 40
Additional Service Tests .................................................................................... 40
HDFS...................................................................................................................... 40
YARN/MAPREDUCE ................................................................................................... 41
HIVE ...................................................................................................................... 42
HBASE.................................................................................................................... 43
Ambari Service Check....................................................................................... 44
INSTALLING IBM VALUE PACKAGES .......................................................45
Before You Begin ............................................................................................. 45
Installation Procedure....................................................................................... 46
Select IBM BigInsights Service to Install ............................................................. 50
Installing BigInsights Home............................................................................... 51
Configure Knox ................................................................................................ 52
Installing BigSheets.......................................................................................... 54
Installing Big SQL............................................................................................. 57
Connecting to Big SQL ...................................................................................... 62
Running JSqsh......................................................................................................... 62
Connection setup ..................................................................................................... 62
Commands and queries ............................................................................................ 63
Command and query edit.......................................................................................... 65
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 5
Configuration variables ............................................................................................. 66
Installing Text Analytics .................................................................................... 67
Installing Big R ................................................................................................ 71
IBM BigInsights Online Tutorials................................................................................. 76
SECURITY CONFIGURATION AND ADMINISTRATION..............................77
Setting up HTTPS for Ambari ............................................................................. 77
Configuring SSL support for HBase REST gateway with Knox ................................. 78
Overview of Kerberos ....................................................................................... 82
Enabling Kerberos for IBM Open Platform............................................................ 85
Manually generating keytabs for Kerberos authentication ...................................... 86
Setting up Active Directory or LDAP authentication in Ambari ................................ 91
Enabling Kerberos for HDFS on Isilon.................................................................. 97
Using MIT Kerberos 5 ............................................................................................... 97
Running the Ambari Kerberos Wizard.................................................................. 99
Trouble Shooting and Support ..........................................................................104
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 6
EMC Isilon Hadoop Starter Kit for
IBM BigInsights v 4.0
This document describes how to create a Hadoop environment utilizing IBM® Open Platform
with Apache Hadoop and an EMC® Isilon® scale-out network-attached storage (NAS) for HDFS
accessible shared storage. Installation and configuration of IBM BigInsights Value Packages is
also presented in this document.
Introduction
IBM & EMC Technology Highlights
The IBM® Open Platform with Apache Hadoop is comprised of entirely Apache Hadoop
open source components, such as Apache Ambari, YARN, Spark, Knox, Slider, Sqoop,
Flume, Hive, Oozie, HBase, ZooKeeper, and more. After installing IBM Open Platform, you
can install additional IBM value-add service modules.
These value-add service modules are installed separately, and they include IBM
BigInsights® Analyst, IBM BigInsights Data Scientist, and the IBM BigInsights Enterprise
Management module to provide enhanced capabilities to IBM Open Platform to accelerate
the conversion of all types of data into business insight and action.
The EMC® Isilon® Scale-Out Network-Attached Storage (NAS) platform provides Hadoop
clients with direct access to big data through a Hadoop File System (HDFS) interface.
Powered by the distributed EMC Isilon OneFS® operating system, an EMC Isilon cluster
delivers a powerful yet simple and highly efficient storage platform with native HDFS
integration to accelerate analytics, gain new flexibility, and avoid the costs of a separate
Hadoop infrastructure.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 7
Audience
This document is intended for IT program managers, IT architects, Developers, and IT
management to easily deploy IBM BigInsights v4.0 with EMC Isilon OneFS v 7.2.0.3 for
HDFS storage. If a physical EMC Isilon Cluster is not available, download the free EMC Isilon
OneFS Simulator which can be installed as a virtual machine for integration testing and
training purposes. See http://www.emc.com/getisilon for EMC Isilon OneFS Simulator.
Apache Hadoop Projects
Apache Hadoop is an open source, batch data processing system for enormous amounts of
data. Hadoop runs as a platform that provides cost-effective, scalable infrastructure for
building Big Data analytic applications. All Hadoop clusters contain a distributed file system
called the Hadoop Distributed File System (HDFS) and a computation layer called
MapReduce.
The Apache Hadoop project contains the following subprojects:
• Hadoop Distributed File System (HDFS) – A distributed file system that provides
high-throughput access to application data.
• Hadoop MapReduce – A software framework for writing applications to reliably
process large amounts of data in parallel across a cluster.
Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive, Sqoop,
Flume, Oozie, Slider, HBase, Zookeeper and more that extend the value of Hadoop and
improves its usability.
Version 2 of Apache Hadoop introduces YARN, a sub-project of Hadoop that separates the
resource management and processing components. YARN was born of a need to enable a
broader array of interaction patterns for data stored in HDFS beyond MapReduce. The YARN-
based architecture of Hadoop 2.0 provides a more general processing platform that is not
constrained to MapReduce.
For full details of the Apache Hadoop project see http://hadoop.apache.org/.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 8
IBM Open Platform and the Ambari Manager
The IBM Open Platform with Apache Hadoop enables Enterprise Hadoop by providing the
complete set of essential Hadoop capabilities required for any enterprise. Utilizing YARN at
its core, it provides capabilities for several functional areas including Data Management,
Data Access, Data Governance, Integration, Security and Operations.
IBM Open Platform delivers the core elements of Hadoop - scalable storage and distributed
computing – as well as all of the necessary enterprise capabilities such as security, high
availability and integration with a broad range of hardware and software solutions.
Apache Ambari is an open operational framework for provisioning, managing and monitoring
Apache Hadoop clusters.
As of version 4.0 of IBM Open Platform, Ambari can be used to setup and deploy Hadoop
clusters for nearly any task. Ambari can provision, manage and monitor every aspect of a
Hadoop deployment.
More information on IBM Open Platform can be found at:
http://www-01.ibm.com/software/data/infosphere/hadoop/enterprise.html
Isilon Scale-Out NAS for HDFS
EMC Isilon is the only scale-out NAS platform natively integrated with the Hadoop
Distributed File System (HDFS). Using HDFS as an over-the-wire protocol, you can deploy a
powerful, efficient, and flexible data storage and analytics ecosystem.
In addition to native integration with HDFS, EMC Isilon storage easily scales to support
massively large Hadoop analytics projects. Isilon scale-out NAS also offers unmatched
simplicity, efficiency, flexibility, and reliability that you need to maximize the value of your
Hadoop data storage and analytics workflow investment.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 9
Overview of Isilon Scale-Out NAS for Big Data
The EMC Isilon scale-out platform combines modular hardware with unified software to
provide the storage foundation for data analysis. Isilon scale-out NAS is a fully distributed
system that consists of nodes of modular hardware arranged in a cluster. The distributed
Isilon OneFS operating system combines the memory, I/O, CPUs, and disks of the nodes into
a cohesive storage unit to present a global namespace as a single file system.
The nodes work together as peers in a shared-nothing hardware architecture with no single
point of failure. Every node adds capacity, performance, and resiliency to the cluster and
each node acts as a Hadoop namenode and datanode.
The namenode daemon is a distributed process that runs on all the nodes in the cluster. A
compute client can connect to any node through HDFS.
As nodes are added, the file system expands dynamically and redistributes data, eliminating
the work of partitioning disks and creating volumes. The result is a highly efficient and
resilient storage architecture that brings all the advantages of an enterprise scale-out NAS
system to storing data for analysis.
With traditional direct attached storage, the ratio of CPU, RAM, and disk space requirements
depends on the workload—these factors make it difficult to size a Hadoop cluster before you
have had a chance to measure your MapReduce workload. Expanding data sets also makes
sizing decisions upfront problematic. Isilon scale-out NAS lends itself perfectly to this
situation: Isilon scale-out NAS lets you increase CPUs, RAM, and disk space by adding nodes
to dynamically match storage capacity and performance with the demands of a dynamic
Hadoop workload.
An Isilon cluster optimizes data protection. OneFS more efficiently and reliably protects data
than HDFS. The HDFS protocol, by default, replicates a block of data three times. In
contrast, OneFS stripes the data across the cluster and protects the data with forward error
correction codes, which consume less space than replication with better protection.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 10
Pre-installation Checklist
Supported Software Versions
The environment used for this document consists of the following software versions:
 Ambari 1.7.0_IBM
 IBM Open Platform v 4.0.0.0
 Isilon OneFS 7.2.0.3 with patch-159065
 All of IBM BigInsights v 4.0 value packs, i.e. Business Analyst, Data
Scientist, and Enterprise Management
______________________________________________________________________
Note: IBM BigInsights v 4.0 requires OneFS v 7.2.0.3 with patch-159065.
OneFS version 7.2.0.4 should also work as well as version 7.2.1.1 when available.
Do not install IBM BigInsights with OneFS versions lower than 7.2.0.3.
See EMC Isilon Supportability and Compatibility Guide for the latest compatibility updates:
https://support.emc.com/docu44518_Isilon-Supportability-and-Compatibility-
Guide.pdf?language=en_US
Hardware Requirements and Suggested Hadoop Service Layout
Detail system requirements for IBM BigInsights compute nodes can be found at:
http://www-01.ibm.com/support/docview.wss?uid=swg27027565
In a multi-node IBM BigInsights cluster, it is suggested that you have at least one
management node in your non-high availability environment, if performance is not an
issue. If performance is a concern, consider configuring at least three management nodes.
If you use the BigInsights - Big SQL service, consider configuring four management
nodes. If you use a high availability environment, consider six management nodes. Use
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 11
the following list as a guide for the nodes in your IBM/EMC cluster. A suggested layout is
shown in Table 1 for both Non-High availability and High availability deployments.
________________________________________________________________________________________
Note: With both deployment options, EMC Isilon provides namenode, secondary
namenode and datanode functions for the entire cluster. Do not designate any compute
node as a namenode, secondary namenode, or datanode in any aspect of the IBM
BigInsights configuration.
Table 1. Suggested Service Layout
Non-High availability High availability
Management node 1
 Ambari
 PostgreSQL
 Knox
 Zookeeper
 Hive
 Spark
 Spark History Server
 BigInsights Home
 BigSheets
 Big R
 BigSQL Headnode
 Text Analytics
Management node 2
 Resource Manager
 HBase Master
 Zookeeper
 Oozie
 Ambari monitoring service
Management node 3
 Job history server
 Zookeeper
 App Timeline Server
 Kafka
Management node 4
 Big SQL Scheduler
 Hive Server (MySQL)
 MySQL metastore
 Hive/Oozie metastore
 WebHCat Server
 Data Server Manager
Management node 1
 Ambari
 PostgreSQL
 Spark
 Spark History Server
 BigSQL Headnode
Management node 2
 Resource Manager
 Zookeeper
 Oozie
 Ambari monitoring service
 BigInsights Home
Management node 3
 Resource Manager (standby)
 Job history server
 Zookeeper
 App Timeline Server
 Kafka
 Oozie (Standby)
Management node 4
 Big SQL Scheduler
 HBase Master (standby)
 Hive Server
 MySQL Server
 Hive metastore
 WebHCat Server
 Data Server Manager
Management node 5
 Big SQL Headnode (Standby)
 Big SQL Scheduler (Standby)
 HBase Master
 Hive Server (Standby)
 Hive Metastore (Standby)
 Journal Node
 Zookeeper
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 12
Installation Overview
Below is the overview of the installation process that this document will describe.
1. Confirm prerequisites.
2. Prepare your network infrastructure including DNS.
3. Prepare your Isilon cluster.
4. Prepare Linux compute nodes.
5. Install Ambari Server.
6. Use Ambari Manager to deploy IBM Open Platform to compute nodes.
7. Install IBM BigInsights Value Packages
8. Perform key functional tests.
Prerequisites
Isilon Scale-Out NAS or Isilon OneFS Simulator
 For low-capacity, non-performance testing of Isilon, the EMC Isilon OneFS Simulator can
be used instead of a cluster of physical Isilon appliances. This can be downloaded for free
from http://www.emc.com/getisilon.
Refer to the EMC Isilon OneFS Simulator Install Guide for details. Be sure to follow the
section for running the virtual nodes in VMware ESX. Only a single virtual node is required
but adding additional nodes will allow you to explore other features such as data
protection, SmartPools (tiering), and SmartConnect (network load balancing).
 For physical Isilon nodes, you should have already completed the console-based
installation process for your first Isilon node and added two other nodes for a
minimum of 3 Isilon nodes.
 You should have OneFS version 7.2.0.3 + patch 159065 installed on Isilon.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 13
 You must obtain OneFS HDFS license code and install it on your Isilon cluster. You can
get your free OneFS HDFS license from:
http://www.emc.com/campaign/isilon-hadoop/index.htm.
 It is recommended, but not required, to have a SmartConnect Advanced license for
your Isilon cluster.
 To allow for scripts and other small files to be easily shared between all nodes in your
environment, it is highly recommended to enable NFS (Unix Sharing) on your Isilon
cluster. By default, the entire /ifs directory is already exported and this can remain
unchanged. This document assumes that a single Isilon cluster is used for this NFS
export as well as for HDFS. However, there is no requirement that the NFS export be
on the same Isilon cluster that you are using for HDFS.
Linux
 RedHat Enterprise Linux (RHEL) Server 6 (Update 5 minimum) or comparable
CentOS Server.
 100GB Root Partition
 At a minimum, 96G RAM for production environments. The more RAM the better
for Hadoop.
Networking
 For the best performance, a single 10 Gigabit Ethernet switch should connect to at
least one 10 Gigabit port on each Linux host. Additionally, the same switch should
connect to at least one 10 Gigabit port on each Isilon node.
 A single dedicated layer-2 network can be used to connect all hosts and Isilon nodes.
Although multiple networks can be used for increased security, monitoring, and
robustness.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 14
 At least an entire /24 IP address block should be allocated to your network. This will
allow a DNS reverse lookup zone to be delegated to your Hadoop DNS server.
 If using the EMC Isilon OneFS Simulator, you will need at least two static IP addresses
(one for the node’s ext-1 interface, another for the SmartConnect service IP). Each
additional Isilon node will require an additional IP address.
 At a minimum, you will need to allocate to your Isilon cluster one IP address per
Access Zone per Isilon node. In general, you will need one Access Zone for each
separate Hadoop cluster that will use Isilon for HDFS storage.
 For the best possible load balancing during an Isilon node failure scenario, the
recommended number of IP addresses is given by the formula below. Of course, this
is in addition to any IP addresses used for non-HDFS pools.
# of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones)
For example, 20 IP addresses are recommended for 5 Isilon nodes and 2 Access Zones.
 This document will assume that Internet access is available to all servers to download
various components from Internet repositories.
DNS
 A DNS server is required and you must have the ability to create DNS records and
zone delegations.
 It is recommended that your DNS server delegate a subdomain to your Isilon cluster.
For instance, DNS requests for subnet0-pool0.isiloncluster1.example.com or
isiloncluster1.example.com should be delegated to the Service IP defined on your
Isilon cluster.
 To allow for a convenient way of changing the HDFS Namenode used by all Hadoop
applications and services, create a DNS record for your Isilon cluster’s HDFS
Namenode service. This should be a CNAME alias to your Isilon SmartConnect zone.
Specify a TTL of 1 minute to allow for quick changes. For example, create a CNAME
record for mycluster1-hdfs.example.com that targets subnet0-
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 15
pool0.isiloncluster1.example.com. If you later want to redirect all HDFS I/O to another
cluster or a different pool on the same Isilon cluster, you simply need to change the
DNS record and restart all Hadoop services.
Other
 See http://www.github.com/bonibruno/BigInsights, there are three scripts to
download to help automate new IBM BigInsights installations with EMC Isilon:
1. bi_create_users.sh – use this script to create the users and groups on all the
Linux nodes before beginning installation.
2. isilon_create_users.sh – use this script to create the users and groups on
Isilon before beginning installation. You must first create your access zone
(described later in this document, e.g. ibm), before running this script.
3. isilon_create_directories.sh – run this after the script above.
More information on the use of these scripts is provided in the installation section of this
document.
Prepare Isilon
Assumptions
This document makes the assumptions listed below. These are not necessarily
requirements but they are usually valid and simplify the process.
 It is assumed that you are not using a directory service such as Active
Directory for Hadoop users and groups.
 It is assumed that you are not using Kerberos authentication for Hadoop.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 16
SmartConnect for HDFS
A best practice for HDFS on Isilon is to utilize two SmartConnect IP address pools for each
access zone. One IP address pool should be used by Hadoop clients to connect to the HDFS
namenode service on Isilon and it should use the dynamic IP allocation method to
minimize connection interruptions in the event that an Isilon node fails.
____________________________________________________________________
Note: Dynamic IP allocation requires a SmartConnect Advanced license.
____________________________________________________________________
A Hadoop client uses a specific SmartConnect IP address pool simply by using its zone
name (DNS name) in the HDFS URI:
For example, hdfs://subnet0-pool1.isiloncluster1.example.com:8020
A second IP address pool should be used for HDFS datanode connections and it should also
use dynamic IP allocation method. To assign specific Smart-Connect IP address pools for
datanode connections, you will use the “isi hdfs racks modify” command. If the network
is flat, there is no need to use “isi hdfs racks modify”, the default configuration will suffice.
If IP addresses are limited and you have a SmartConnect Advanced license, you may
choose to use a single dynamic pool for namenode and datanode connections. This may
result in uneven utilization of Isilon nodes.
If you do not have a SmartConnect Advanced license, you may choose to use a single
static pool for namenode and datanode connections. This may result in some failed HDFS
connections in the event of a node failure.
For more information, see EMC Isilon Best Practices for Hadoop Data Storage white paper
online at: https://www.emc.com/collateral/white-papers/h13926-wp-emc-isilon-hadoop-
best-practices-onefs72.pdf
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 17
OneFS Access Zones
Access zones on OneFS are a way to select a distinct configuration for the OneFS cluster
based on the IP address that the client connects to. For HDFS, this configuration includes
authentication methods, HDFS root path, and authentication providers (AD, LDAP, local,
etc.). By default, OneFS includes a single access zone called System.
If you will only have a single Hadoop cluster connecting to your Isilon cluster, then you can
use the System access zone with no additional configuration. However, to have more than
one Hadoop cluster connect to your Isilon cluster, it is best to have each Hadoop cluster
connect to a separate OneFS access zone. This will allow OneFS to present each Hadoop
cluster with its own HDFS namespace and an independent set of users.
For more information, see Security and Compliance for Scale-out Hadoop Data Lakes
whitepaper.
To view your current list of access zones and the IP pools associated with them:
isiloncluster1-1# isi zone zones list
Name Path
------------
System /ifs
------------
Total: 1
isiloncluster1-1# isi networks list pools -v
subnet0:pool0
In Subnet: subnet0
Allocation: Static
Ranges: 1
10.111.129.115-10.111.129.126
Pool Membership: 4
1:10gige-1 (up)
2:10gige-1 (up)
3:10gige-1 (up)
4:10gige-1 (up)
Aggregation Mode: Link Aggregation Control Protocol (LACP)
Access Zone: System (1)
SmartConnect:
Suspended Nodes : None
Auto Unsuspend ... 0
Zone : subnet0-pool0.isiloncluster1.lab.example.com
Time to Live : 0
Service Subnet : subnet0
Connection Policy: Round Robin
Failover Policy : Round Robin
Rebalance Policy : Automatic Failback
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 18
To create a new access zone and an associated IP address pool:
isiloncluster1-1# mkdir -p /ifs/isiloncluster1/zone1
isiloncluster1-1# isi zone zones create --name zone1 
--path /ifs/isiloncluster1/zone1
isiloncluster1-1# isi networks create pool --name subnet0:pool1 
--ranges 10.111.129.127-10.111.129.138 --ifaces 1-4:10gige-1 
--access-zone zone1 --zone subnet0-pool1.isiloncluster1.lab.example.com 
--sc-subnet subnet0 --dynamic
Creating pool
‘subnet0:pool1’: OK
Saving: OK
____________________________________________________________________
Note: If you do not have a SmartConnect Advanced license, you will need to omit the --
dynamic option.
____________________________________________________________________
Sharing Data between Access Zones
By default, the data in one access zone cannot be access by users in another access zone.
In certain cases, however, you may need to make the same data set available to more
than one Hadoop compute cluster. Using fully qualified HDFS paths, e.g. hdfs://zone1-
hdfs.example.com/hadoop/dir1, can render a data set available across two or more
access zones.
With fully qualified HDFS paths, the data sets do not cross access zones. Instead, the
Hadoop jobs can access the data sets from a common shared HDFS namespace. For
instance, you can selectively share data between two or more access zones based on
referential links and file/directory permissions.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 19
User & Group ID’s
Isilon clusters and Hadoop servers each have their own mapping of user IDs (uid) to user
names and group IDs (gid) to group names. When Isilon is used only for HDFS storage by
the Hadoop servers, the IDs do not need to match. This is due to the fact that the HDFS
protocol only refers to users and groups by their names, and never their numeric IDs.
In contrast, the NFS protocol refers to users and groups by their numeric IDs. Although
NFS is rarely used in traditional Hadoop environments, the high-performance, enterprise-
class, and POSIX-compatible NFS functionality of Isilon makes NFS a compelling protocol
for certain workflows. If you expect to use both NFS and HDFS on your Isilon cluster (or
simply want to be open to the possibility in the future), it is highly recommended to
maintain consistent names and numeric IDs for all users and groups on Isilon and your
Hadoop servers. In a multi-tenant environment with multiple Hadoop clusters, numeric IDs
for users in different clusters should be distinct.
For instance, the user bigsql in Hadoop cluster 1 may have ID 1013 and this same ID will
be used in the Isilon access zone for Hadoop cluster 1 as well as every server in Hadoop
cluster 1. The user bigsql in Hadoop cluster 2 may have ID 710 and this ID will be used in
the Isilon access zone for Hadoop cluster 2 as well as every server in Hadoop cluster 2.
Configuring Isilon for HDFS
_____________________________________________________________________
Note: In the steps below, replace zone1 with System to use the default System access
zone or you may specify the name of a new access zone that you previously created.
______________________________________________________________________
1. Open a web browser to the your Isilon cluster’s web administration page. If you
don’t know the URL, simply point your browser to:
https://isilon_node_ip_address:8080
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 20
The isilon_node_ip_address is any IP address on any Isilon node that is in the System
Access Zone. This usually corresponds to the ext-1 interface of any Isilon node.
2. Login with your root account. You specified the root password when you configured
your first node using the console.
3. Check, and edit as necessary, your NTP settings. Click Cluster Management ->
General Settings -> NTP.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 21
1. SSH into any node in your Isilon cluster as root.
2. Confirm that your Isilon cluster is at OneFS version 7.2.0.3.
isiloncluster1-1# isi version
Isilon OneFS v7.2.0.3 ...
3. For OneFS version 7.2.0.3, you must have patch-159065 installed. You can view
the list of patches you have installed with:
# isi pkg info
patch-159065: This patch adds support for the Ambari 1.7.0_IBM Server.
4. Install the patch if needed:
[user@workstation ~]$ scp patch-159065.tgz root@mycluster1-hdfs:/tmp
isiloncluster1-1# gunzip < /tmp/patch-159065.tgz | tar -xvf -
isiloncluster1-1# isi pkg install patch-159065.tar
Preparing to install the package...
Checking the package for installation...
Installing the package
Committing the installation...
Package successfully installed.
5. Verify your HDFS license.
isiloncluster1-1# isi license
Module License Status Configuration Expiration Date
------ -------------- ------------- ---------------
HDFS Evaluation Not Configured November12, 2016
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 22
6. Create the HDFS root directory. This is usually called hadoop and must be within
the access zone directory.
isiloncluster1-1# mkdir -p /ifs/isiloncluster1/zone1/hadoop
7. Set the HDFS root directory for the access zone.
isiloncluster1-1# isi zone zones modify zone1 
--hdfs-root-directory /ifs/isiloncluster1/zone1/hadoop
8. Set the HDFS block size used for reading from Isilon.
isiloncluster1-1# isi hdfs settings modify --default-block-size 128M
9. Create an indicator file so that we can easily determine when we are looking your
Isilon cluster via HDFS.
isiloncluster1-1# touch 
/ifs/isiloncluster1/zone1/hadoop/THIS_IS_ISILON_isiloncluster1_zone1
10.Copy the scripts (isilon_create_users.sh & isilon_create_directories.sh) you
downloaded from http://www.github.com/bonibruno/BigInsights to Isilon,
[user@workstation ~]$ scp isilon_create_*.sh 
root@isilon_node_ip_address:/ifs/isiloncluster1/scripts
11.Execute the script isilon_create_users.sh. This script will create all required
users and groups for IBM BigInsights v 4.0.
Warning: The script isilon_create_users.sh will create local user and group accounts on
your Isilon cluster for Hadoop services. If you are using a directory service such as Active
Directory and you want these users and groups to be defined in your directory service,
then DO NOT run this script.
Instead, refer to the OneFS documentation and EMC Isilon Best Practices for Hadoop Data
Storage.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 23
Script Usage:
isilon_create_users.sh –dist <DIST> [–startgid <GID>] [–startuid <UID>] [–
zone <ZONE>]
dist - This will correspond to your Hadoop distribution – bi4.0
startgid - Group IDs will begin with this value. For example: 1000
startuid - User IDs will begin with this value. This is generally the same as gid_base. For
example: 1000.
zone – Access Zone name. For example: zone1
isiloncluster1-1# bash /ifs/isiloncluster1/scripts/isilon_create_users.sh 
--dist bi4.0 --startgid 1000 --startuid 1000 --zone zone1
Example output of script is shown below:
Info: Hadoop distribution: bi
Info: groups will start at GID 1000
Info: users will start at UID 1000
Info: will put users in zone: zone1
Info: HDFS root: /ifs/isiloncluster1/hadoop
Failed to add member UID:1001 to group GROUP:hadoop: User is already in local group
SUCCESS -- Hadoop users created successfully!
Done!
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 24
______________________________________________________________________
Note: The “User is already in local group” message is expected, this user corresponds to
the hadoop user which is already in the hadoop group.
12. Execute the script isilon_create_directories.sh. This script will create all
required directories with the appropriate ownership and permissions.
Script Usage:
isilon_create_directories.sh –dist <DIST> [–fixperm] [–zone <ZONE>]
dist - This will correspond to your Hadoop distribution – bi4.0
fixperm - Updates ownership and permissions on hadoop directories.
zone - Access Zone name. For example: zone1
isiloncluster1-1# bash /ifs/isiloncluster1/scripts/isilon_create_directories.sh 
--dist bi4.0 --fixperm --zone zone1
13. Map the hdfs user to the Isilon superuser. This will allow the hdfs user to chown
(change ownership of) all files during IBM BigInsights installation.
______________________________________________________________________
Warning: The command below will restart the HDFS service on Isilon to ensure that any
cached user mapping rules are flushed. This will temporarily interrupt any HDFS
connections coming from other Hadoop clusters.
______________________________________________________________________
isiloncluster1-1# isi zone zones modify --user-mapping-rules=’’hdfs=>root’’ --zone zone1
isiloncluster1-1# isi services isi_hdfs_d disable ; isi services isi_hdfs_d enable
The service ‘isi_hdfs_d’ has been disabled.
The service ‘isi_hdfs_d’ has been enabled.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 25
Create DNS Records for Isilon
You will now create the required DNS records that will be used to access your Isilon
cluster.
1. Create a delegation record so that DNS requests for the zone
isiloncluster1.example.com are delegated to the Service IP that will be defined on
your Isilon cluster. The Service IP can be any unused static IP address in your lab
subnet.
2. Create a CNAME alias for your Isilon SmartConnect zone. For example, create a
CNAME record for mycluster1-hdfs.example.com that targets subnet0-
pool0.isiloncluster1.example.com.
3. Test name resolution.
[user@workstation ~]$ ping mycluster1-hdfs.example.com
PING subnet0-pool0.isiloncluster1.example.com (10.11.12.13) 56(84) bytes of data.
64 bytes from 10.11.12.13: icmp_seq=1 ttl=64 time=1.15 ms
Prepare Linux Compute Nodes
Linux Operating System packages needed for IBM BigInsights:
1. Compatibility Libraries
2. Networking Tools
3. Perl Support
4. Ruby Support
5. Web Services add on
6. PHP Support
7. Web Server
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 26
8. Mysql*
9. PostGres*
10.snmp support
11.Development Tools
12. Korn Shel
Enable NTP on all Linux Compute nodes
1. Edit /etc/ntp.conf file and add your NTP Server.
2. Enable NTP, “service ntpd start”
3. chkconfig –level 2345 ntpd on
Disable SELinux on each node if enabled before installing Ambari.
1. Edit /etc/selinux/config
2. Set SELINUX=disabled
3. Reboot
____________________________________________________________________
Note: SELinux can be disabled temporarily with the “setenforce 0” command.
____________________________________________________________________
Check UMASK Settings
The umask setting on each node should be set to 0022 in /etc/profile and /etc/bashrc.
Just modify existing umask entry if needed, e.g. “umask 0022”.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 27
Set ulimit Properties
1. Edit /etc/security/limits.d/90-nproc.conf
#set for all users
* hard nofile 65536
* soft nofile 65536
* hard nproc 65536
* hard nproc 65536
Kernel Modifications
1. Edit /etc/sysctl.conf and add the following:
vm.swappiness=5
kernel.pid_max=4194303
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv4.ip_local_port_range = 1024 64000
Create IBM BigInsights Hadoop Users and Groups
Create required users on all Linux nodes. It is recommended to create all Hadoop users
before installing IBM BigInsights. Use the bi_create_users.sh script obtained from:
http://www/github.com/bonibruno/BigInsights
[user_workstation ~$] scp bi_create_users.sh [node1]:/root
Run script, e.g. #./bi_create_users.sh
Repeat above for all nodes.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 28
Configure Passwordless SSH
Configure passwordless SSH for all Linux nodes.
1. Create Authentication SSH Keys
ssh-keygen -f id_rsa -t rsa -N
2. Create .ssh directories on all nodes
ssh root@[node1]
mkdir –p .ssh
cd .ssh
Upload generated keys to all hosts:
cat id_rsa.pub | ssh root@[node1] 'cat >> .ssh/authorized_keys'
Repeat above for all nodes.
3. Set permissions on .ssh directory
ssh root@[node1] "chmod 700 .ssh; chmod 640 .ssh/authorized_keys”
Additional Linux Packages to Install
Install the following packages on all Linux compute nodes.
 deltarpm
 python-deltarpm
 createrepo
 pam-1.1.1-17.el6.i686.rpm
 mysql-connector-java-5.1.17-6.el6.noarch.rpm
 ksh
 nc
 libdbi
 libstdc
 libaio
 java-1.7.0-openjdk-devel
 python-paramiko
 python-rrdtool-1.4.5-1.el6.rfx.x86_64
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 29
 snappy-1.0.5-1.el6.x86_64
 web-ui-framework
Install the above packages using the yum install command.
Test DNS Resolution
Make sure all compute nodes resolve with a fully qualifies domain name.
Ping each host with the associated FQDN and make sure it is reachable by FQDN.
Edit sudoers file on all Linux compute nodes.
1. Edit /etc/sudoers
## Additions needed for IBM BigInsights
hadoop ALL=(ALL) NOPASSWD: ALL
bigsql ALL=(ALL) NOPASSWD: ALL
Check IBM’s BigInsights Website for more info on preparing Linux nodes.
http://www01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsigh
ts.install.doc/doc/install_prepare.html
Installing IBM Open Platform (OP)
Download IBM Open Platform Software
Log into the IBM Passport Advantage web portal with your IBM assigned credentials and
download the following packages onto the designated Ambari server node:
• BI-AH-1.0.0.1-IOP-4.0.x86_64.bin
• IOP-4.0.0.0.x86_64.rpm
• iop-4.0.0.0.x86_64.tar.gz
• iop-utils-1.0-iop-4.0.x86_64.tar.gz
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 30
Create IBM Open Platform Repository
The IBM Open Platform with Apache Hadoop uses the repository-based Ambari installer.
You have two options for specifying the location of the repository from which Ambari
obtains the component packages.
The IBM Open Platform with Apache Hadoop installation includes OpenJDK 1.7.0. During
installation, you can either install the version provided or make sure Java™ 7 is installed
on all nodes in the cluster.
1. Log in to your Linux cluster as root, or as a user with root privileges.
2. Ensure that the nc package is installed on all nodes:
yum install -y nc
If you installed the Basic Server option on your server, the nc package might not be
installed, which might result in the failure on datanodes of the IBM Open Platform with
Apache Hadoop.
3. Locate the IOP-4.0.0.0.x86_64.rpm file you downloaded from the download site. Run the
following command to install the ambari.repo file into /etc/yum.repos.d:
yum install IOP-4.0.0.0.x86_64.rpm
If using a mirror repository, edit the file /etc/yum.repos.d/ambari.repo and replace
baseurl=http://ibm-open-platform.ibm.com/repos/Ambari/RHEL6/x86_64/1.7
with your mirror URL. For example,
baseurl=http://<web.server>/repos/Ambari/RHEL6/x86_64/1.7/
Disable the gpgcheck in the ambari.repo file. To disable signature validation,
change gpgcheck=1 to gpgcheck=0.
Alternatively, you can keep gpgcheck on and change the public key file location to the
mirror Ambari repository. To do this, change the following
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 31
gpgkey=http://ibm-open-platform.ibm.com/repos/Ambari/RHEL6/x86_64/1.7/BI-GPG-
KEY.public
to the following:
gpgkey=http://<web.server>/repos/Ambari/RHEL6/x86_64/1.7/BI-GPG-KEY.public
4. Clean the yum cache on each node so that the right packages from the remote repository
are seen by your local yum.
>sudo yum clean all
5. Install the Ambari server on the intended management node, using the following
command:
>sudo yum install ambari-server
Accept the install defaults.
6. If you are using a mirror repository, after you install the Ambari server, update the
following file with the mirror repository URLs.
/var/lib/ambari-server/resources/stacks/BigInsights/4.0/repos/repoinfo.xml
In the file, change the information from the Original content to the Modified content
Original content Modified content
<os type="redhat6">
<repo>
<baseurl>
http://ibm-open-
platform.ibm.com/repos/IOP/RHEL6/x86_64
/4.0</baseurl>
<repoid>IOP-4.0</repoid>
<reponame>IOP</reponame>
</repo>
<repo>
<os type="redhat6">
<repo>
<baseurl>
http://<web.server>/repos/IOP/RHE
L6/x86_64/4.0</baseurl>
<repoid>IOP-4.0</repoid>
<reponame>IOP</reponame>
</repo>
<repo>
<baseurl>
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 32
<baseurl>
http://ibm-open-
platform.ibm.com/repos/IOP-
UTILS/RHEL6/x86_64/1.0</baseurl>
<repoid>IOP-UTILS-1.0</repoid>
<reponame>IOP-UTILS</reponame>
</repo>
</os>
http://<web.server>/repos/IOP-
UTILS/RHEL6/x86_64/1.0</baseurl>
<repoid>IOP-UTILS-1.0</repoid>
<reponame>IOP-
UTILS</reponame>
</repo>
</os>
Edit the /etc/ambari-server/conf/ambari.properties file. change the information from the
Original content to the Modified content
Original content Modified content
jdk1.7.url=http://ibm-open-
platform.ibm.com/repos/IOP-
UTILS/RHEL6/x86_64/1.0/openjdk/jdk-
1.7.0.tar.gz
jdk1.7.url=http://<web.server>/r
epos/IOP-
UTILS/RHEL6/x86_64/1.0/openjdk
/jdk-1.7.0.tar.gz
7. Set up the Ambari server, using the following command:
>sudo ambari-server setup
Accept the setup preferences.
A Java JDK is installed as part of the Ambari server setup. However, the Ambari server
setup also allows you to reuse an existing JDK. The command is:
ambari-server setup -j /full/path/to/JDK
The JDK path set by the -j parameter must be the same on each node in the cluster.
8. Start the Ambari server, using the following command:
>sudo ambari-server start
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 33
9. If the Ambari server had been installed on your node previously, the node may contain
old cluster information. Reset the Ambari server to clean up its cluster information in the
database, using the following commands:
>sudo ambari-server stop
>sudo ambari-server reset
>sudo ambari-server start
10. Access the Ambari web user interface from a web browser by using the server name
(the fully qualified domain name, or the short name) on which you installed the software,
and port 8080. For example, enter abc.com:8080.
You can use any available port other than 8080 that will allow you to connect to the
Ambari server. In some networks, port 8080 is already in use. To use another port, do
the following:
a. Edit the ambari.properties file:
vi /etc/ambari-server/conf/ambari.properties
b. Add a line in the file to select another port:
client.api.port=8081
c. Save the file and restart the Ambari server:
ambari-server restart
11. Log in to the Ambari server with the default username and password: admin/admin.
The default username and password is required only for the first login. You can
configure users and groups after the first login to the Ambari web interface.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 34
12. On the Welcome page, click Launch Install Wizard.
13. On the Get Started page, enter a name for the cluster you want to create. The name
cannot contain blank spaces or special characters. Click Next.
14. You will deploy IBM Open Platform for Apache Hadoop with EMC Isilon. Ambari Server
allows for the immediate usage of an Isilon cluster for all HDFS services (NameNode and
DataNode), no reconfiguration will be necessary once the IBM Open Platform install is
completed.
1. SSH into Isilon as root and configure the Ambari Agent.
isiloncluster1-1# isi zone zones modify zone1 --hdfs-ambari-namenode
mycluster1-hdfs.example.com
isiloncluster1-1# isi zone zones modify zone1 --hdfs-ambari-server manager-
svr-1.example.com
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 35
15. On the Select Stack page, click the Stack version you want to install (BigInsights™ 4.0).
Click Next.
16. On the Install Options page, in Target Hosts, add the list of Linux hosts that the
Ambari server will manage and the IBM Open Platform with Apache Hadoop software will
deploy one node per line. For example, enter
host1.example.com
host2.example.com
host3.example.com
host4.example.com
In Host Registration Information, select one of the two options:
Provide the SSH Private Key to automatically register hosts
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 36
Click SSH Private Key. The private key file is /root/.ssh/id_rsa, where the root user
installed the Ambari server. Click Choose File to find the private key file you installed
previously. You should have retained a copy of the SSH private key (.ssh/id_rsa) in your
local directory when you set up password-less SSH. Copy and paste the key into the text
box manually. Click the Register and Confirm button.
____________________________________________________________________
Note: After the Linux hosts register, click the back button and Perform manual
registration for Isilon and do not use SSH.
____________________________________________________________________
Isilon has an ambari-agent within OneFS and needs to be manually registered in Ambari.
After registering Isilon manually, click the Next button. You should see the Ambari
agents on both your Linux hosts and Isilon become registered.
17. On the Confirm Hosts page, you check that the correct hosts for your cluster have been
located and that those hosts have the correct directories, packages, and processes to
continue the installation.
If hosts were selected in error, click the check boxes next to the hosts you want to
remove. Click Remove Selected. To remove a single host, click Remove in
the Action column.
If warnings are found during the check process, you can click Click here to see the
warnings to see what caused the warnings. The Host Checks page identifies any issues
with the hosts. For example, a host may have Transparent Huge Pages or Firewall issues.
You can ignore errors related to user names and groups as we pre-created the
users in the pre-installation steps of this document.
After you resolve the issues, click Rerun Checks on the Host Checks page. When you
have confirmed the hosts, click Next.
18. On the Choose Services page, select the services you want to install.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 37
Ambari shows a confirmation message to install the required service dependencies. For
example, when selecting Oozie only, the Ambari web interface shows messages for
accepting YARN/MR2, HDFS and Zookeeper installations. It also shows Nagios and
Ganglia for monitoring and alerting, but they are not required services.
19. On the Assign Masters page, assign NameNode and SNameNode components to the
Isilon SmartConnect address e.g. mycluster1-hdfs.example.com. The rest of the services
can be deployed per the recommended services layout - refer back to Table 1. Make
sure you assign Namenode and SNameNode only to the Isilon SmartConnect
address and none of the Linux nodes, e.g. only mycluster1-hdfs.example.com. Click
Next.
On the Assign Slaves and Clients page, assign the components to Linux hosts in your
cluster and make sure datanode is only assigned to Isilon.
Assign Client to the client nodes. Click Next.
Tip: If you anticipate adding the Big SQL service at some later time, you must include all
clients on all the anticipated Big SQL worker nodes. Big SQL specifically needs the HDFS,
Hive, HBase, Sqoop, HCat, and Oozie clients.
20. On the Customize Services page, select configuration settings for the services selected.
Default values are filled in automatically when available and they are the recommended
values. The installation wizard prompts you for required fields (such as password entries)
by displaying a number in a circle next to an installed service.
Assign passwords to Hive, Oozie, and any other selected services that require them.
The following settings should be checked:
• YARN Node Manager log-dirs
• YARN Node Manager local-dirs
• HBase local directory
• ZooKeeper directory
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 38
• Oozie Data Dir
• Storm storm.local.dir
Click the number and enter the requested information in the field outlined in red. Make
sure that the service port that is set is not already used by another component. For
example, the Knox gateway port is, by default, set as 8443. But, when the Ambari server
is set up with HTTPs, and the SSL port is set up using 8443, then you must change the
Knox gateway port to some other value.
____________________________________________________________________
Note: If you are working in an LDAP environment where users are set up centrally by the
LDAP administrator and therefore, already exist, selecting the defaults can cause the
installation to fail. Open the Misc tab, and check the box to ignore user modification
errors.
21. When you have completed the configuration of the services, click Next.
22. On the Review page, verify that your settings are correct. Click Deploy.
23. The Install, Start, and Test page shows the progress of the installation. The progress
bar at the top of the page gives the overall status while the main section of the page
gives the status for each host. Logs for a specific task can be displayed by clicking on the
task. Click the link in the Message column to find out what tasks have been completed for
a specific host or to see the warnings that have been encountered. When the message
"Successfully installed and started the services" appears, click Next.
24. On the Summary page, review the accomplished tasks. Click Complete to go to the IBM
Open Platform with Apache Hadoop dashboard.
Validating IBM Open Platform Install
Ambari provides service checks for all the supported services. These checks run
automatically after each service installation, or they can be run manually at any time. You
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 39
can access the Ambari web interface and use the Services View to make sure all the
components pass their checks successfully.
The following steps provide another way to validate your installation.
1. As the root user on a node on which Apache Hadoop is installed, enter the following
command to become the ambari-qa user:
su - ambari-qa
2. As the ambari-qa user, run the following command:
export HADOOP_MR_DIR=/usr/iop/current/hadoop-mapreduce-client
# Generate data with 1000 rows. Each row is about 100 bytes.
yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teragen 1000 /tmp/tgout
# Sort data
yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar terasort /tmp/tgout
/tmp/tsout
# Validate data
yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teravalidate /tmp/tsout
/tmp/tvout
If the job is successful, you will see a log record similar to the following:
INFO mapreduce.Job: Job job_id completed successfully
Browse to your cluster on port 8088 to see the results of your validation tests, e.g.
http://x.x.x.x:8088/cluster, example YARN test results shown below.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 40
Adding a Hadoop User
You must add a user account for each Linux user that will submit MapReduce jobs. The
procedure below can be used to add a user named hduser1 as an example.
1. Add user to Isilon.
isiloncluster1-1# isi auth groups create hduser1 --zone zone1 --provider local
isiloncluster1-1# isi auth users create hduser1 --primary-group hduser1 --zone zone1 --
provider local --home-directory /ifs/isiloncluster1/zone1/hadoop/user/hduser1
2. Add user to Hadoop nodes.
[root@mycluster1-master-0 ~]# adduser hduser1
3. Create the user’s home directory on HDFS.
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -mkdir -p /user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chown hduser1:hduser1 
/user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chmod 755 /user/hduser1
Additional Service Tests
The tests below should be performed to ensure a proper installation. Perform the tests in the
order shown. You must create the Hadoop user hduser1 before proceeding.
HDFS
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -ls /
Found 5 items
-rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON
drwxr-xr-x - hbase hbase 148 2014-08-05 06:06 /hbase
drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr
drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp
drwxr-xr-x - hdfs supergroup 184 2014-08-05 06:07 /user
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -put -f /etc/hosts /tmp
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -cat /tmp/hosts
127.0.0.1 localhost
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -rm -skipTrash /tmp/hosts
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 41
[root@mycluster1-master-0 ~]# su - hduser1
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls /
Found 5 items
-rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON
drwxr-xr-x - hbase hbase 148 2014-08-05 06:28 /hbase
drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr
drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp
drwxr-xr-x - hdfs supergroup 209 2014-08-05 06:39 /user
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls
...
YARN/MAPREDUCE
[hduser1@mycluster1-master-0 ~]$ hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar 
pi 10 1000
...
Estimated value of Pi is 3.14000000000000000000
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir in
You can put any file into the in directory. It will be used the datasource for subsequent tests.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f /etc/hosts in
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls in
...
[hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r out
[hduser1@mycluster1-master-0 ~]$ hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar 
wordcount in out
...
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls out
Found 4 items
-rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/_SUCCESS
-rw-r--r-- 1 hduser1 hduser1 24 2014-08-05 06:44 out/part-r-00000
-rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/part-r-00001
-rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/part-r-00002
[hduser1@mycluster1-master-0 ~]$ hadoop fs -cat out/part*
localhost 1
127.0.0.1 1
Browse to the YARN Resource Manager GUI http://mycluster1-master-0.example.com:8088/
Browse to the MapReduce History Server GUI http://mycluster1-master-0.lab.example.com:19888/.
In particular, confirm that you can view the complete logs for task attempts.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 42
HIVE
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir -p sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ cat - > tab1.csv
1,true,123.123,2012-10-24 08:55:00
2,false,1243.5,2012-10-25 13:40:00
3,false,24453.325,2008-08-22 09:33:21.123
4,false,243423.325,2007-05-12 22:32:21.33454
5,true,243.325,1953-04-22 09:11:33
Type <Control+D>.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f tab1.csv sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ hive
hive>
DROP TABLE IF EXISTS tab1;
CREATE EXTERNAL TABLE tab1
(
id INT,
col_1 BOOLEAN,
col_2 DOUBLE,
col_3 TIMESTAMP
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘/user/hduser1/sample_data/tab1’;
DROP TABLE IF EXISTS tab2;
CREATE TABLE tab2
(
id INT,
col_1 BOOLEAN,
col_2 DOUBLE,
month INT,
day INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
INSERT OVERWRITE TABLE tab2
SELECT id, col_1, col_2, MONTH(col_3), DAYOFMONTH(col_3)
FROM tab1 WHERE YEAR(col_3) = 2012;
...
OK
Time taken: 28.256 seconds
hive> show tables;
OK
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 43
tab1
tab2
Time taken: 0.889 seconds, Fetched: 2 row(s)
hive> select * from tab1;
OK
1 true 123.123 2012-10-24 08:55:00
2 false 1243.5 2012-10-25 13:40:00
3 false 24453.325 2008-08-22 09:33:21.123
4 false 243423.325 2007-05-12 22:32:21.33454
5 true 243.325 1953-04-22 09:11:33
Time taken: 1.083 seconds, Fetched: 5 row(s)
hive> select * from tab2;
OK
1 true 123.123 10 24
2 false 1243.5 10 25
Time taken: 0.094 seconds, Fetched: 2 row(s)
hive> select * from tab1 where id=1;
OK
1 true 123.123 2012-10-24 08:55:00
Time taken: 15.083 seconds, Fetched: 1 row(s)
hive> select * from tab2 where id=1;
OK
1 true 123.123 10 24
Time taken: 13.094 seconds, Fetched: 1 row(s)
hive> exit;
HBASE
[hduser1@mycluster1-master-0 ~]$ hbase shell
hbase(main):001:0> create ‘test’, ‘cf’
0 row(s) in 3.3680 seconds
=> Hbase::Table - test
hbase(main):002:0> list ‘test’
TABLE
test
1 row(s) in 0.0210 seconds
=> [’’test’’]
hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’
0 row(s) in 0.1320 seconds
hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 44
0 row(s) in 0.0120 seconds
hbase(main):005:0> scan ‘test’
ROW COLUMN+CELL
row1 column=cf:a,timestamp=1407542488028,value=value1
row2 column=cf:b,timestamp=1407542499562,value=value2
2 row(s) in 0.0510 seconds
hbase(main):006:0> get ‘test’, ‘row1’
COLUMN CELL
cf:a timestamp=1407542488028,value=value1
1 row(s) in 0.0240 seconds
hbase(main):007:0> quit
Ambari Service Check
Ambari has built-in functional tests for each component. These are executed automatically
when you install your cluster with Ambari. To execute them after installation, select the service
in Ambari, click the Service Actions button, and select Run Service Check.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 45
Installing IBM Value Packages
Before You Begin
Please note that “BigInsights Analyst” and “BigInsights Data Scientist” value package have been
sanity tested on EMC Isilon, but have not been performance profiled and tested under load with
Isilon 7.2.0.3 version. EMC and IBM BigInsights plan to validate these components under load
as part of future integration efforts. Please refer to EMC – IBM BigInsights Joint Support
Statement for further details.
You must acquire the software from Passport Advantage. The acquired software has a *.bin
extension. The name of the *.bin file depends on whether the BigInsights Analyst or the
BigInsights Data Scientist module was downloaded.
When you run the *.bin file, configuration files are copied to appropriate locations to
enable Ambari to see that value-add services as available. When adding the value-add
services through Ambari, additional software packages can be downloaded. If the
Hadoop cluster cannot directly access the internet, a local mirror repository can be
created.
Where you perform the following steps depends on whether the Hadoop cluster has
direct internet access.
 If the Hadoop cluster has direct access to the internet, perform the steps from the
Ambari server of the Hadoop cluster.
 If the Hadoop cluster does not have direct internet access, perform the steps from
a Linux host with direct internet access. Then, transfer the files, as required, to a
local repository mirror.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 46
Installation Procedure
1. Update the permissions on the downloaded *.bin file to enable execute.
chmod +x <package_name>.bin
2. Run the *.bin file to extract and install the services in the module.
./<package_name>.bin
where <package_name> is BI-Analyst-xxxxx.bin for the Analyst module or BI-DS-
xxxxx.bin for the Data Scientist module.
3. After the prompt, agree to the license terms. Reply yes | y to continue install.
4. After the prompt, choose if you want to do an online (option 1) or offline
(option 2) install.
a. Online install will lay out the Ambari service configuration files and
update the repository locations in the Ambari server file. Skip to step 6.
b. Offline install initiates a download of files to set up a local repository
mirror. A subdirectory called BigInsights will be created with RPMs and
associated files will be located in directory BigInsights/packages
5. Setup a local repository.
A local repository is required if the Hadoop cluster cannot connect directly to the internet,
or if you wish to avoid multiple downloads of the same software when installing services
across multiple nodes. In the following steps, the host that performs the repository mirror
function is called the repository server. If you do not have an additional Linux host, you
can use one of the Hadoop management nodes. The repository server must be accessible
over the network by the Hadoop cluster. The repository server requires an HTTP web
server. The following instructions describe how to set up a repository server by using a
Linux host with an Apache HTTP server.
a. On the repository server, if the Apache HTTP server is not installed,
install it:
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 47
yum install httpd
b. On the repository server, ensure that the createrepo package is
installed.
c. On the repository server, create a directory for your value-add
repository, such as <mirror web server document
root>/repos/valueadds. For example, for Apache httpd, the default is
/var/www/html/repos.
mkdir /var/www/html/repos/valueadds
d. By selecting Option 2 in step 4, RPMs were downloaded to a
subdirectory called BigInsights/packages. Copy all of the RPMs to the
mirror web server location, <your.mirror.web.server.document
root>/repos/valueadds directory.
cp BigInsights/packages/* /var/www/html/repos/valueadds/
e. Start this web server. If you use Apache httpd, start it by using either of
the following commands:
apachect start or service httpd start
f. Test your local repository by browsing to the web directory:
http://<your.mirror.web.server>/repos/valueadds
You should see all of the files that you copied to the repository server.
g. On the repository server, run the createrepo command to initialize the
repository:
createrepo /var/www/html/repos/valueadds
h. In the BigInsights/packages directory, find the RPM to install on the
Ambari Server host of the Hadoop cluster:
BigInsights Analyst
BI-Analyst-X.X.X.X-IOP-X.X.x86_64.rpm
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 48
BigInsights Data Scientist
BI-DS-X.X.X.X-IOP-X.X.x86_64.rpm
Tip: The BigInsights Data Scientist module also entitles you to the features of the
BigInsights Analyst module. Therefore, consider doing the yum install for both of the RPM
packages.
Then, copy the file to the Ambari Server host and install the RPMs by using the following
commands:
sudo yum install <BI-xxx-1.0.0.1-IOP...>.rpm
i. On the Ambari Server node, navigate to the /var/lib/ambari-
server/resources/stacks/BigInsights/<version_number>/repos/repoinfo.
xml file. If the file does not exist, create it. Ensure the <baseurl>
element for the BIGINSIGHTS-VALUEPACK <repo> entry points to your
repository server. Remember, there might be multiple <repo> sections.
Make sure that the URL you tested in step 5.f matches exactly the value
indicated in the <baseurl> element. For example, the repoinfo.xml
might look like the following content after you change http://ibm-open-
platform.ibm.com/repos/BigInsights-Valuepacks/to become
http://your.mirror.web.server/repos/valueadds:
<repo>
<baseurl> http://<your.mirror.web.server>/repos/valueadds
</baseurl>
<repoid>BIGINSIGHTS-VALUEPACK</repoid>
<reponame>BIGINSIGHTS-VALUEPACK</reponame>
</repo>
Note: The new <repo> section might appear as a single line.
Tip: If you later find an error in this configuration file, make corrections and run the
following command:
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 49
yum clean all
Then, restart the ambari server.
j. When the module is installed, restart the Ambari server.
ambari-server restart
k. Open the Ambari web interface and log in. The default address is the
following URL:
http://<server-name>:8080
The default login name is admin and the default password is admin.
l. Click Actions > Add service. In the list of services you will see the
services that you previously added as well as the BigInsights services
you can now add.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 50
Select IBM BigInsights Service to Install
Select the service that you want to install and deploy. Even though your module might
contain multiple services, install the specific service that you want and the BigInsights™
Home service. Installing one value-add service at a time is recommended. Follow the
service specific installation instructions for more information.
At the conclusion of installing all the IBM BigInsights Services, the Ambari GUI Software
List should have green check marks next to each service as shown below:
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 51
Installing BigInsights Home
The BigInsights Home service is the main interface to launch BigInsights - BigSheets,
BigInsights - Text Analytics, and BigInsights - Big SQL.
The BigInsights Home service requires Knox to be installed, configured and started.
Open a browser and access the Ambari server dashboard. The following is the default URL:
http://<server-name>:8080
The default user name is admin, and the default password is admin.
In the Ambari dashboard, click Actions > Add Service.
In the Add Service Wizard > Choose Services, select the BigInsights – BigInsights Home
service. Click Next. If you do not see the option for BigInsights – BigInsights Home, follow the
instructions described in Installing the BigInsights value-add packages.
In the Assign Masters page, select a Management node (edge node) that your users can
communicate with. BigInsights Home is a web application that your users must be able to open
with a web browser.
In the Assign Slaves and Clients page, make selections to assign slaves and clients.
The nodes that you select will have JSQSH (an open source, command line interface to SQL for
Big SQL and other database engines) and SFTP client. Select nodes that might be used to ingest
data as an SFTP client, where you might want to work with Big SQL scripts, or other databases
interactively.
Click Next to review any options that you might want to customize.
Click Deploy.
If the BigInsights – BigInsights Home service fails to install, run the
remove_value_add_services.sh cleanup script. The following code is an example command:
cd /usr/ibmpacks/bin/<version>
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 52
remove_value_add_services.sh
-u admin -p admin
-x 8080 -s WEBUIFRAMEWORK -r
For more information about cleaning the value-add service environment, see Removing
BigInsights value-add services.
After installation is complete, click Next > Complete.
Configure Knox
The Apache Knox gateway is a system that provides a single point of authentication and access
for Apache Hadoop services on the compute nodes in a cluster; however authentication to HDFS
services is completely controlled by Isilon OneFS only.
The Knox gateway simplifies Hadoop security for users that access the cluster and execute jobs
and operators that control access and manage the cluster. The gateway runs as a server, or a
cluster of servers, providing centralized access to one or more Hadoop clusters.
In IBM® Open Platform with Apache Hadoop, Knox is a service that you start, stop, and
configure in the Ambari web interface.
Users access the following BigInsights™ value added components through Knox by going to the
IBM BigInsights home service.
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/index.html
 BigSheets
 Text Analytics
 Big SQL
Knox supports only REST API calls for the following Hadoop services:
 WebHCat
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 53
 Oozie
 HBase
 Hive
 Yarn
Click the Knox service from the Ambari web interface to see the summary page.
Select Service Actions > Restart All to restart it and all of its components.
If you are using LDAP, you must also start LDAP if it is not already started.
Click the BigInsights Home service in the Ambari User Interface.
Select Service Actions > Restart All to restart it and all of its components.
Open the BigInsights Home page from a web.
The URL for BigInsights Home is:
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/index.html
where:
knox_host
The host where Knox is installed and running
knox_port
The port where Knox is listening (by default this is 8443)
knox_gateway_path
The value entered in the gateway.path field in the Knox configuration (by default this is
'gateway')
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 54
For example, the URL might look like the following address:
https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html
If you are using the Knox Demo LDAP, a default user ID and password is created for you. When
you access the web page, use the following preset credentials:
User Name = guest
Password = guest-password
Installing BigSheets
To extend the power of the Open Platform for Apache Hadoop, install and deploy the BigInsights
BigSheets service, which is the IBM spreadsheet interface for big data.
1. Open a browser and access the Ambari server dashboard. The following is the default
URL.
http://<server-name>:8080
The default user name is admin, and the default password is admin.
2. In the Ambari Dashboard, click Actions > Add Service.
3. In the Add Service Wizard, Choose Services, select the BigInsights -
BigSheets service, and if you have not already installed the BigInsights Home service,
select that as well. Click Next.
If you do not see BigInsights – BigSheets service, you need to install the appropriate
module and restart Ambari as described in Installing the BigInsights value-add packages.
4. In the Assign Masters page, decide on which node of your cluster you want to run the
specified BigSheets master.
5. In the Assign Slaves and Clients page all the defaults are automatically accepted and
the next page automatically appears. BigSheets service does not have any slaves and
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 55
clients. The Assign Slaves and Clients page will show and be skipped immediately
during install. This is the expected behavior.
6. In the Customize Services page, accept the recommended configurations for the
BigSheets service, or customize the configuration by expanding the configuration files
and modifying the values. In theAdvanced bigsheets-user-config section, make sure
that you enter the following information:
a. In the bigsheets.user field, leave the default user name, which is bigsheets.
b. In the bigsheets.password field, type a valid password.
c. In the bigsheets.userid, type a valid user ID to use for the bigsheets service
user. This user ID is created across all of the nodes of the cluster, and must be
unique across all nodes of the cluster.
d. Click Next..
7. In the Advanced bigsheets-ambari-config section, in the ambari.password field,
type the correct Ambari administration password.
8. You can review your selections in the Review page before accepting them. If you want
to modify any values, click the Back button. If you are satisfied with your setup,
click Deploy.
9. In the Install, Start and Test page, the BigSheets service is installed and verified. If
you have multiple nodes, you can see the progress on each node. When the installation is
complete, either view the errors or warnings by clicking the link, or click Next to see a
summary and then the new service added to the list of services.
10.Click Complete.
If the BigInsights – BigSheets service fails to install, run
the remove_value_add_services.shcleanup script. The following code is an example of
the command:
cd /usr/ibmpacks/bin/<version>
./remove_value_add_services.sh -u admin -p admin -x 8080 -s BIGSHEETS -r
For more information about cleaning the value-add service environment, see Removing
BigInsights value-add services.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 56
11.After you install BigInsights - BigSheets, you must restart the HDFS, MapReduce2, YARN,
Knox, Nagios and Ganglia client services.
a. For each service that requires restart, select the service.
b. Click Service Actions.
c. Click Restart All.
12.Access the BigInsights - BigSheets service from the BigInsights Home service.
o If the BigInsights Home service has not yet been added, see Installing
BigInsights Home.
o If the BigInsights Home service has been installed, it must be restarted so
the BigInsights - BigSheets icon will display.
13.Launch the BigInsights Home service by typing the following address in your browser:
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/inde
x.html
Where:
knox_host
The host where Knox is installed and running
knox_port
The port where Knox is listening (by default this is 8443)
knox_gateway_path
The value entered in the gateway.path field in the Knox configuration (by default this is
'gateway')
For example, the URL might look like the following address:
https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 57
Installing Big SQL
To extend the power of the Open Platform for Apache Hadoop, install and deploy the BigInsights
- Big SQL service, which is the IBM SQL interface to the Hadoop-based platform, IBM Open
Platform with Apache Hadoop.
1. Open a browser and access the Ambari server dashboard. The following is the default
URL.
http://<server-name>:8080
The default user name is admin, and the default password is admin .
2. In the Ambari web interface, click Actions > Add Service.
3. In the Add Service Wizard, Choose Services, select the BigInsights - Big
SQL service, and theBigInsights Home service. Click Next.
If you do not see the option to select the BigInsights - Big SQL service, complete the
steps.
4. In the Assign Masters page, decide which nodes of your cluster you want to run the
specified components, or accept the default nodes. Follow these guidelines:
o For the Big SQL monitoring and editing tool, make sure that the Data Server
Manager (DSM) is assigned to the same node that is assigned to the Big SQL Head
node.
5. Click Next.
6. In the Assign Slaves and Clients page, accept the defaults, or make specific
assignments for your nodes. Follow these guidelines:
o Select the non-head nodes for the Big SQL Worker components. You must select at
least one node as the worker node.
o Select all nodes for the CLIENT. This puts JSqsh and SFTP clients on the nodes.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 58
7. In the Customize Services page, accept the recommended configurations for the Big
SQL service, or customize the configuration by expanding the configuration files and
modifying the values. Make sure that you have a
valid bigsql_user and bigsql_user_password (see reference screen below) and
user_id (created by the bi_create_users.sh script) in the appropriate fields in
theAdvanced bigsql-users-env section.
8.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 59
9. You can review your selections in the Review page before accepting them. If you want
to modify any values, click the Back button. If you are satisfied with your setup,
click Deploy.
10.In the Install, Start and Test page, the Big SQL service is installed and verified. If you
have multiple nodes, you can see the progress on each node. When the installation is
complete, either view the errors or warnings by clicking the link, or click Next to see a
summary and then the new service added to the list of services.
If the BigInsights – Big SQL service fails to install, run
the remove_value_add_services.shcleanup script. The following code is an example of
the command:
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 60
cd /usr/ibmpacks/bin/<version>
./remove_value_add_services.sh -u admin -p admin -x 8080 -s BIGSQL -r
For more information about cleaning the value-add service environment, see Removing
BigInsights value-add services.
11. A web application interface for Big SQL monitoring and editing is available to your end-
users to work with Big SQL. You access this monitoring utility from the IBM BigInsights
Home service. If you have not added the BigInsights Home service yet, do that now.
12. Restart the Knox Service. Also start the Knox Demo LDAP service if you have not
configured your own LDAP.
13. Restart the BigInsights Home services.
14. To run SQL statements from the Big SQL monitoring and editing tool, type the following
address in your browser to open the BigInsights Home service:
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/inde
x.html
Where:
knox_host
The host where Knox is installed and running
knox_port
The port where Knox is listening (by default this is 8443)
knox_gateway_path
The value entered in the gateway.path field in the Knox configuration (by default this is
'gateway')
For example, the URL might look like the following address:
https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html
If you use the Knox Demo LDAP service, the default credential is:
userid = guest
password = guest-password
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 61
Your end users can also use the JSqsh client, which is a component of
the BigInsights - Big SQL service.
15. If the BigInsights - Big SQL service shows as unavailable, there might have been a
problem with post-installation configuration. Run the following commands
as root (or sudo) where the Big SQL monitoring utility (DSM) server is installed:
a. Run the dsmKnoxSetup script:
b. cd /usr/ibmpacks/bigsql/<version-number>/dsm/1.1/ibm-datasrvrmgr/bin/
./dsmKnoxSetup.sh -knoxHost <knox-host>
where <knox-host> is the node where the Knox gateway service is running.
c. Make sure that you do not stop and restart the Knox gateway service within
Ambari. If you do, then run the dsmKnoxSetup script again.
d. Restart the BigInsights Home service so that the Big SQL monitoring utility
(DSM) can be accessed from the BigInsights Home interface.
16. For HBase, do the following post-installation steps:
. For all nodes where HBase is installed, check that the symlinks to hive-serde.jar
and hive-common.jar in the hbase/lib directory are valid.
 To verify the symlinks are created and valid:
 namei /usr/iop/<version-number>/hbase/lib/hive-serde.jar
 namei /usr/iop/<version-number>/hbase/lib/hive-common.jar
 If they are not valid, do the following steps:
 cd /usr/iop/<version-number>/hbase/lib
 rm -rf hive-serde.jar
 rm -rf hive-common.jar
 ln -s /usr/iop/<version-number>/hive/lib/hive-serde.jar hive-serde.jar
ln -s /usr/iop/<version-number>/hive/lib/hive-common.jar hive-common.jar
a. After installing the Big SQL service, and fixing the symlinks, restart the HBase
service from the Ambari web interface.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 62
After you add Big SQL worker nodes, make sure that you stop and then restart the Hive service.
Connecting to Big SQL
You can run Big SQL queries from Java SQL Shell (JSqsh), or from the IBM Data Server
Manager. You can also run queries from a client application, such as IBM Data Studio,
that uses JDBC or ODBC drivers. You must identify a running Big SQL server and
configure either a JDBC or ODBC driver.
For more information about JSqsh, or IBM Data Studio, see the related topics in the
IBM® BigInsights™ Knowledge Center.
Running JSqsh
JSqsh is installed in /usr/ibmpacks/common-utils/current/jsqsh/bin. Change to that directory
and type./jsqsh to open the JSqsh shell:
cd /usr/ibmpacks/common-utils/current/jsqsh/bin
./jsqsh
You can then run any JSqsh commands from the prompt.
Connection setup
To use the JSqsh command shell, you can use the default connections or define and test a
connection to the Big SQL server.
1. The first time that you open the JSqsh command shell, a configuration wizard is started.
When you are at the Jsqsh command prompt, type drivers to determine the available
drivers.
a. On the driver selection screen, select the Big SQL instance that you want to run
Note: Big SQL is designated as DB2 in this example:
Name Target Class
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 63
- ------- ------------------- --------------------------------------------
...
2 *db2 IBM Data Server(DB2 com.ibm.db2.jcc.DB2Driver
b. Verify the port, server, and user name. Run setup and click C to define a
password for the connection. The username must have database administration
privileges, or must be granted those privileges by the Big SQL administrator.
c. Test the connection to the Big SQL server.
d. Save and name this connection.
2. Generally, you can access JSqsh from /usr/ibmpacks/common-
utils/current/jsqsh/bin with the following command:
3. ./jsqsh --driver=db2 --user=<username>
--password=<user_password>
4. Open the saved configuration wizard any time by typing setup while in the command
interface, or./jsqsh --setup when you open the command interface.
5. Specify the following connection name in the JSqsh command shell to establish a
connection:
./jsqsh name
6. Use the connect command when you are already inside the JSQSH shell to establish a
connection at the JSqsh prompt:
connect name
Commands and queries
At the JSqsh command prompt, you can run JSqsh commands or database server commands.
JSqsh commands usually begin with a backslash () character.
JSqsh commands accept command-line arguments and allow for common shell activities, such
as I/O redirection and pipes.
For example, consider this set of commands:
1> select * from t1
2> where c1 > 10
3> go --style csv > /tmp/t1.csv
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 64
Because the commands do not begin with a backslash character, the first two commands are
assumed to be SQL statements, and are sent to the Big SQL server.
The go command sends the statements to run on the server. The go command has a built-in
alias so that you can omit the backslash. Additionally, you can specify a trailing semicolon to
indicate that you want to run a statement, for example:
1> select * from t1
2> where c1 > 10;
The --style option in the go command indicates that the display shows comma-separated
values (CSV). The go form is most useful if you provide additional arguments to affect how
the query is run. Changing the display style is an example of this feature.
The redirection operator (>) specifies that the results of the command are sent to a file
called /tmp/t1.csv.
A set of frequently run commands does not require the leading backslash. Any JSqsh command
can bealiased to another name (without a leading backslash, if you choose), by using
the alias command. For example, if you want to be able to type bye to leave the JSqsh shell,
you establish that word as the alias for the quit command:
alias bye='quit'
You can run a script that contains one or more SQL statements. For example, assume that you
have a file called mySQL.sql. That file contains these statements:
select tabschema, tabname from syscat.tables fetch first 5 rows only;
select tabschema, colname, colno, typename, length from syscat.columns fetch first 10 rows
only;
You can start JSqsh and run the script at the same time with this command:
/usr/ibmpacks/common-utils/current/jsqsh/bin/jsqsh bigsql < /home/bigsql/mySQL.sql
The redirection operator specifies to JSqsh to get the commands from the file located in
the /home/bigsqldirectory, and then run the statements within the file.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 65
Command and query edit
The JSqsh command shell uses the JLine2 library, which allows you to edit previously entered
commands and queries. You use the command-line edit features to move the arrow keys and to
edit the command or query on the current line.
The JLine2 library provides the same key bindings (vi and emacs) as the GNU Readline library.
In addition, it attempts to apply any custom key maps that you created in a
GNU Readline configuration file, (.inputrc) in the local file system $HOME/ directory.
In addition to individual line editing, the JSqsh command shell remembers the 50 most recently
run statements, which you can view by using the history command:
1> history
(1) use tpch;
(2) select count(*) from lineitem
Previously run statements are prefixed with a number in parentheses. You use this number to
recall that query by using the JSqsh recall operator (!), for example:
1> !2
1> select count(*) from lineitem
2>
The ! recall operator has the following behavior:
!! Recalls the previously run statement.
!5 Recalls the fifth query from history.
!-2 Recalls the query from two prior runs.
You can also edit queries that span multiple lines by using the buf-edit command,
which pulls the current query into an external editor, for example:
1> select id, count(*)
2> from t1, t2
3> where t1.c1 = t2.c2
4> buf-edit
The query is opened in an external editor (/usr/bin/vi by default. However, you can
specify a different editor on the environment variable $EDITOR). When you close the
editor, the edited query is entered at the JSqsh command shell prompt.
EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 66
The JSqsh command shell provides built-in aliases, vi and emacs, for the buf-
edit command. The following commands, for example, open the query in the vi editor:
1> select id, count(*)
2> from t1, t2
3> where t1.c1 = t2.c2
4> vi
Configuration variables
You can use the set command to list or define values for a number of configuration
variables, for example:
1> set
If you want to redefine the prompt in the command shell, you run the following command
with the prompt option:
1> set prompt='foo $lineno> '
foo 1>
Every JSqsh configuration variable has built-in help available:
1> help prompt
If you want to permanently set a specific variable, you can do so by editing
your $HOME/.jsqsh/sqshrc file and including the appropriate set command in it.
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon
EMC Starter Kit - IBM BigInsights - EMC Isilon

More Related Content

What's hot

Scale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC IsilonScale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC IsilonEMC
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...EMC
 
Big Data – General Introduction
Big Data – General IntroductionBig Data – General Introduction
Big Data – General IntroductionEMC
 
White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System  White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System EMC
 
EMC-ISILON_MphasiS_Walk_through
EMC-ISILON_MphasiS_Walk_throughEMC-ISILON_MphasiS_Walk_through
EMC-ISILON_MphasiS_Walk_throughprakashjjaya
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...EMC
 
Emc isilon technical deep dive workshop
Emc isilon technical deep dive workshopEmc isilon technical deep dive workshop
Emc isilon technical deep dive workshopsolarisyougood
 
Deduplication Solutions Are Not All Created Equal: Why Data Domain?
Deduplication Solutions Are Not All Created Equal: Why Data Domain?Deduplication Solutions Are Not All Created Equal: Why Data Domain?
Deduplication Solutions Are Not All Created Equal: Why Data Domain?EMC
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Gina Tragos
 
Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Jürgen Ambrosi
 
2/18 Technical Overview
2/18 Technical Overview2/18 Technical Overview
2/18 Technical OverviewGina Tragos
 
Arcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewArcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewGina Tragos
 
Emc isilon config requirements w tips & tricks
Emc isilon config requirements w tips & tricksEmc isilon config requirements w tips & tricks
Emc isilon config requirements w tips & trickskarlosgaleano
 
Trends in Data Protection with DCIG
Trends in Data Protection with DCIGTrends in Data Protection with DCIG
Trends in Data Protection with DCIGGina Tragos
 
Business Track 3: arcserve udp licensing pricing & support made simple
Business Track 3: arcserve udp licensing pricing & support made simpleBusiness Track 3: arcserve udp licensing pricing & support made simple
Business Track 3: arcserve udp licensing pricing & support made simplearcserve data protection
 
The Value of NetApp with VMware
The Value of NetApp with VMwareThe Value of NetApp with VMware
The Value of NetApp with VMwareCapito Livingstone
 
EMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR EditionEMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR Editionwalshe1
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch WebcastGina Tragos
 
Emc vi pr controller customer presentation
Emc vi pr controller customer presentationEmc vi pr controller customer presentation
Emc vi pr controller customer presentationsolarisyougood
 

What's hot (20)

Scale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC IsilonScale-Out Data Lake with EMC Isilon
Scale-Out Data Lake with EMC Isilon
 
Emc isilon overview
Emc isilon overview Emc isilon overview
Emc isilon overview
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 
Big Data – General Introduction
Big Data – General IntroductionBig Data – General Introduction
Big Data – General Introduction
 
White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System  White Paper: EMC Isilon OneFS Operating System
White Paper: EMC Isilon OneFS Operating System
 
EMC-ISILON_MphasiS_Walk_through
EMC-ISILON_MphasiS_Walk_throughEMC-ISILON_MphasiS_Walk_through
EMC-ISILON_MphasiS_Walk_through
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
Emc isilon technical deep dive workshop
Emc isilon technical deep dive workshopEmc isilon technical deep dive workshop
Emc isilon technical deep dive workshop
 
Deduplication Solutions Are Not All Created Equal: Why Data Domain?
Deduplication Solutions Are Not All Created Equal: Why Data Domain?Deduplication Solutions Are Not All Created Equal: Why Data Domain?
Deduplication Solutions Are Not All Created Equal: Why Data Domain?
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture
 
Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage Scale IO Software Defined Block Storage
Scale IO Software Defined Block Storage
 
2/18 Technical Overview
2/18 Technical Overview2/18 Technical Overview
2/18 Technical Overview
 
Arcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewArcserve Portfolio Technical Overview
Arcserve Portfolio Technical Overview
 
Emc isilon config requirements w tips & tricks
Emc isilon config requirements w tips & tricksEmc isilon config requirements w tips & tricks
Emc isilon config requirements w tips & tricks
 
Trends in Data Protection with DCIG
Trends in Data Protection with DCIGTrends in Data Protection with DCIG
Trends in Data Protection with DCIG
 
Business Track 3: arcserve udp licensing pricing & support made simple
Business Track 3: arcserve udp licensing pricing & support made simpleBusiness Track 3: arcserve udp licensing pricing & support made simple
Business Track 3: arcserve udp licensing pricing & support made simple
 
The Value of NetApp with VMware
The Value of NetApp with VMwareThe Value of NetApp with VMware
The Value of NetApp with VMware
 
EMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR EditionEMC Hadoop Starter Kit - ViPR Edition
EMC Hadoop Starter Kit - ViPR Edition
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch Webcast
 
Emc vi pr controller customer presentation
Emc vi pr controller customer presentationEmc vi pr controller customer presentation
Emc vi pr controller customer presentation
 

Viewers also liked

KNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WPKNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WPBoni Bruno
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsEMC
 

Viewers also liked (6)

KNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WPKNOX-HTTPFS-ONEFS-WP
KNOX-HTTPFS-ONEFS-WP
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 

Similar to EMC Starter Kit - IBM BigInsights - EMC Isilon

Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...Banking at Ho Chi Minh city
 
Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...Banking at Ho Chi Minh city
 
Deploying IBM Sametime 9 on AIX 7.1
Deploying IBM Sametime 9 on AIX 7.1Deploying IBM Sametime 9 on AIX 7.1
Deploying IBM Sametime 9 on AIX 7.1jackdowning
 
Dell PowerEdge Deployment Guide
Dell PowerEdge Deployment GuideDell PowerEdge Deployment Guide
Dell PowerEdge Deployment GuideKara Krautter
 
Configuring a highly available Microsoft Lync Server 2013 environment on Dell...
Configuring a highly available Microsoft Lync Server 2013 environment on Dell...Configuring a highly available Microsoft Lync Server 2013 environment on Dell...
Configuring a highly available Microsoft Lync Server 2013 environment on Dell...Principled Technologies
 
Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...
Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...
Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...Principled Technologies
 
IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…IBM India Smarter Computing
 
Pc 811 troubleshooting_guide
Pc 811 troubleshooting_guidePc 811 troubleshooting_guide
Pc 811 troubleshooting_guidemakhaderms
 
Plesk 8.1 for Linux/UNIX
Plesk 8.1 for Linux/UNIXPlesk 8.1 for Linux/UNIX
Plesk 8.1 for Linux/UNIXwebhostingguy
 
Presentation data center design overview
Presentation   data center design overviewPresentation   data center design overview
Presentation data center design overviewxKinAnx
 
Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951Banking at Ho Chi Minh city
 
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage EMC
 
Plesk 8.2 for Windows Domain Administrator's Guide
Plesk 8.2 for Windows Domain Administrator's GuidePlesk 8.2 for Windows Domain Administrator's Guide
Plesk 8.2 for Windows Domain Administrator's Guidewebhostingguy
 
Integrating ibm tivoli workload scheduler with tivoli products sg246648
Integrating ibm tivoli workload scheduler with tivoli products sg246648Integrating ibm tivoli workload scheduler with tivoli products sg246648
Integrating ibm tivoli workload scheduler with tivoli products sg246648Banking at Ho Chi Minh city
 
Vista deployment using tivoli provisioning manager for os deployment redp4295
Vista deployment using tivoli provisioning manager for os deployment redp4295Vista deployment using tivoli provisioning manager for os deployment redp4295
Vista deployment using tivoli provisioning manager for os deployment redp4295Banking at Ho Chi Minh city
 

Similar to EMC Starter Kit - IBM BigInsights - EMC Isilon (20)

Lenovo midokura
Lenovo midokuraLenovo midokura
Lenovo midokura
 
Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...
 
Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...Setup and configuration for ibm tivoli access manager for enterprise single s...
Setup and configuration for ibm tivoli access manager for enterprise single s...
 
Deploying IBM Sametime 9 on AIX 7.1
Deploying IBM Sametime 9 on AIX 7.1Deploying IBM Sametime 9 on AIX 7.1
Deploying IBM Sametime 9 on AIX 7.1
 
ESM_InstallGuide_5.6.pdf
ESM_InstallGuide_5.6.pdfESM_InstallGuide_5.6.pdf
ESM_InstallGuide_5.6.pdf
 
Dell PowerEdge Deployment Guide
Dell PowerEdge Deployment GuideDell PowerEdge Deployment Guide
Dell PowerEdge Deployment Guide
 
Configuring a highly available Microsoft Lync Server 2013 environment on Dell...
Configuring a highly available Microsoft Lync Server 2013 environment on Dell...Configuring a highly available Microsoft Lync Server 2013 environment on Dell...
Configuring a highly available Microsoft Lync Server 2013 environment on Dell...
 
Rst4userguide
Rst4userguideRst4userguide
Rst4userguide
 
Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...
Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...
Dell 3-2-1 Reference Configurations: Configuration, management, and upgrade g...
 
IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…IBM PowerLinux Open Source Infrastructure Services Implementation and T…
IBM PowerLinux Open Source Infrastructure Services Implementation and T…
 
Ibm system storage solutions handbook sg245250
Ibm system storage solutions handbook sg245250Ibm system storage solutions handbook sg245250
Ibm system storage solutions handbook sg245250
 
Pc 811 troubleshooting_guide
Pc 811 troubleshooting_guidePc 811 troubleshooting_guide
Pc 811 troubleshooting_guide
 
Plesk 8.1 for Linux/UNIX
Plesk 8.1 for Linux/UNIXPlesk 8.1 for Linux/UNIX
Plesk 8.1 for Linux/UNIX
 
Presentation data center design overview
Presentation   data center design overviewPresentation   data center design overview
Presentation data center design overview
 
IBM Workload Deployer
IBM Workload DeployerIBM Workload Deployer
IBM Workload Deployer
 
Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951
 
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
 
Plesk 8.2 for Windows Domain Administrator's Guide
Plesk 8.2 for Windows Domain Administrator's GuidePlesk 8.2 for Windows Domain Administrator's Guide
Plesk 8.2 for Windows Domain Administrator's Guide
 
Integrating ibm tivoli workload scheduler with tivoli products sg246648
Integrating ibm tivoli workload scheduler with tivoli products sg246648Integrating ibm tivoli workload scheduler with tivoli products sg246648
Integrating ibm tivoli workload scheduler with tivoli products sg246648
 
Vista deployment using tivoli provisioning manager for os deployment redp4295
Vista deployment using tivoli provisioning manager for os deployment redp4295Vista deployment using tivoli provisioning manager for os deployment redp4295
Vista deployment using tivoli provisioning manager for os deployment redp4295
 

More from Boni Bruno

Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800 20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800 Boni Bruno
 
Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018Boni Bruno
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBoni Bruno
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMBoni Bruno
 
Decreasing Incident Response Time
Decreasing Incident Response TimeDecreasing Incident Response Time
Decreasing Incident Response TimeBoni Bruno
 

More from Boni Bruno (7)

Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800 20+ Million Records a Second - Running Kafka on Isilon F800
20+ Million Records a Second - Running Kafka on Isilon F800
 
Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018Hadoop Tiering with Dell EMC Isilon - 2018
Hadoop Tiering with Dell EMC Isilon - 2018
 
Splunk-EMC
Splunk-EMCSplunk-EMC
Splunk-EMC
 
BlueTalon-Isilon-Validation
BlueTalon-Isilon-ValidationBlueTalon-Isilon-Validation
BlueTalon-Isilon-Validation
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APM
 
Decreasing Incident Response Time
Decreasing Incident Response TimeDecreasing Incident Response Time
Decreasing Incident Response Time
 

EMC Starter Kit - IBM BigInsights - EMC Isilon

  • 1. #RememberRuddy _____________________________ EMC ISILON HADOOP STARTER KIT Deploying IBM BigInsights v 4.0 with EMC ISILON Release 1.0 October, 2015
  • 2. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 2 To learn more about how EMC products, services, and solutions can help solve your business and IT challenges, contact your local representative or authorized reseller, visit www.emc.com, or explore and compare products in the EMC Store Copyright © 2015 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. EMC are registered trademarks or trademarks of EMC, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners.
  • 3. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 3 Contents INTRODUCTION........................................................................................6 IBM & EMC Technology Highlights ........................................................................ 6 Audience........................................................................................................... 7 Apache Hadoop Projects...................................................................................... 7 IBM Open Platform and the Ambari Manager ......................................................... 8 Isilon Scale-Out NAS for HDFS............................................................................. 8 Overview of Isilon Scale-Out NAS for Big Data....................................................... 9 PRE-INSTALLATION CHECKLIST .............................................................10 Supported Software Versions............................................................................. 10 Hardware Requirements and Suggested Hadoop Service Layout............................. 10 INSTALLATION OVERVIEW .....................................................................12 Prerequisites ................................................................................................... 12 Isilon Scale-Out NAS or Isilon OneFS Simulator ........................................................... 12 Linux...................................................................................................................... 13 Networking ............................................................................................................. 13 DNS ....................................................................................................................... 14 Other ..................................................................................................................... 15 Prepare Isilon .................................................................................................. 15 Assumptions............................................................................................................ 15 SmartConnect for HDFS ............................................................................................ 16 OneFS Access Zones................................................................................................. 17 Sharing Data between Access Zones .......................................................................... 18 User & Group ID’s .................................................................................................... 19 Configuring Isilon for HDFS ....................................................................................... 19 Create DNS Records for Isilon.................................................................................... 25 Prepare Linux Compute Nodes ........................................................................... 25 Linux Operating System packages needed for IBM BigInsights:...................................... 25 Enable NTP on all Linux Compute nodes...................................................................... 26 Disable SELinux on each node if enabled before installing Ambari. ................................. 26
  • 4. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 4 Check UMASK Settings ............................................................................................. 26 Set ulimit Properties................................................................................................. 27 Kernel Modifications ................................................................................................. 27 Create IBM BigInsights Hadoop Users and Groups........................................................ 27 Configure Passwordless SSH...................................................................................... 28 Additional Linux Packages to Install............................................................................ 28 Test DNS Resolution................................................................................................. 29 Edit sudoers file on all Linux compute nodes................................................................ 29 INSTALLING IBM OPEN PLATFORM (OP) ................................................29 Download IBM Open Platform Software............................................................... 29 Create IBM Open Platform Repository ................................................................. 30 Validating IBM Open Platform Install................................................................... 38 Adding a Hadoop User ...................................................................................... 40 Additional Service Tests .................................................................................... 40 HDFS...................................................................................................................... 40 YARN/MAPREDUCE ................................................................................................... 41 HIVE ...................................................................................................................... 42 HBASE.................................................................................................................... 43 Ambari Service Check....................................................................................... 44 INSTALLING IBM VALUE PACKAGES .......................................................45 Before You Begin ............................................................................................. 45 Installation Procedure....................................................................................... 46 Select IBM BigInsights Service to Install ............................................................. 50 Installing BigInsights Home............................................................................... 51 Configure Knox ................................................................................................ 52 Installing BigSheets.......................................................................................... 54 Installing Big SQL............................................................................................. 57 Connecting to Big SQL ...................................................................................... 62 Running JSqsh......................................................................................................... 62 Connection setup ..................................................................................................... 62 Commands and queries ............................................................................................ 63 Command and query edit.......................................................................................... 65
  • 5. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 5 Configuration variables ............................................................................................. 66 Installing Text Analytics .................................................................................... 67 Installing Big R ................................................................................................ 71 IBM BigInsights Online Tutorials................................................................................. 76 SECURITY CONFIGURATION AND ADMINISTRATION..............................77 Setting up HTTPS for Ambari ............................................................................. 77 Configuring SSL support for HBase REST gateway with Knox ................................. 78 Overview of Kerberos ....................................................................................... 82 Enabling Kerberos for IBM Open Platform............................................................ 85 Manually generating keytabs for Kerberos authentication ...................................... 86 Setting up Active Directory or LDAP authentication in Ambari ................................ 91 Enabling Kerberos for HDFS on Isilon.................................................................. 97 Using MIT Kerberos 5 ............................................................................................... 97 Running the Ambari Kerberos Wizard.................................................................. 99 Trouble Shooting and Support ..........................................................................104
  • 6. EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 6 EMC Isilon Hadoop Starter Kit for IBM BigInsights v 4.0 This document describes how to create a Hadoop environment utilizing IBM® Open Platform with Apache Hadoop and an EMC® Isilon® scale-out network-attached storage (NAS) for HDFS accessible shared storage. Installation and configuration of IBM BigInsights Value Packages is also presented in this document. Introduction IBM & EMC Technology Highlights The IBM® Open Platform with Apache Hadoop is comprised of entirely Apache Hadoop open source components, such as Apache Ambari, YARN, Spark, Knox, Slider, Sqoop, Flume, Hive, Oozie, HBase, ZooKeeper, and more. After installing IBM Open Platform, you can install additional IBM value-add service modules. These value-add service modules are installed separately, and they include IBM BigInsights® Analyst, IBM BigInsights Data Scientist, and the IBM BigInsights Enterprise Management module to provide enhanced capabilities to IBM Open Platform to accelerate the conversion of all types of data into business insight and action. The EMC® Isilon® Scale-Out Network-Attached Storage (NAS) platform provides Hadoop clients with direct access to big data through a Hadoop File System (HDFS) interface. Powered by the distributed EMC Isilon OneFS® operating system, an EMC Isilon cluster delivers a powerful yet simple and highly efficient storage platform with native HDFS integration to accelerate analytics, gain new flexibility, and avoid the costs of a separate Hadoop infrastructure.
  • 7. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 7 Audience This document is intended for IT program managers, IT architects, Developers, and IT management to easily deploy IBM BigInsights v4.0 with EMC Isilon OneFS v 7.2.0.3 for HDFS storage. If a physical EMC Isilon Cluster is not available, download the free EMC Isilon OneFS Simulator which can be installed as a virtual machine for integration testing and training purposes. See http://www.emc.com/getisilon for EMC Isilon OneFS Simulator. Apache Hadoop Projects Apache Hadoop is an open source, batch data processing system for enormous amounts of data. Hadoop runs as a platform that provides cost-effective, scalable infrastructure for building Big Data analytic applications. All Hadoop clusters contain a distributed file system called the Hadoop Distributed File System (HDFS) and a computation layer called MapReduce. The Apache Hadoop project contains the following subprojects: • Hadoop Distributed File System (HDFS) – A distributed file system that provides high-throughput access to application data. • Hadoop MapReduce – A software framework for writing applications to reliably process large amounts of data in parallel across a cluster. Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive, Sqoop, Flume, Oozie, Slider, HBase, Zookeeper and more that extend the value of Hadoop and improves its usability. Version 2 of Apache Hadoop introduces YARN, a sub-project of Hadoop that separates the resource management and processing components. YARN was born of a need to enable a broader array of interaction patterns for data stored in HDFS beyond MapReduce. The YARN- based architecture of Hadoop 2.0 provides a more general processing platform that is not constrained to MapReduce. For full details of the Apache Hadoop project see http://hadoop.apache.org/.
  • 8. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 8 IBM Open Platform and the Ambari Manager The IBM Open Platform with Apache Hadoop enables Enterprise Hadoop by providing the complete set of essential Hadoop capabilities required for any enterprise. Utilizing YARN at its core, it provides capabilities for several functional areas including Data Management, Data Access, Data Governance, Integration, Security and Operations. IBM Open Platform delivers the core elements of Hadoop - scalable storage and distributed computing – as well as all of the necessary enterprise capabilities such as security, high availability and integration with a broad range of hardware and software solutions. Apache Ambari is an open operational framework for provisioning, managing and monitoring Apache Hadoop clusters. As of version 4.0 of IBM Open Platform, Ambari can be used to setup and deploy Hadoop clusters for nearly any task. Ambari can provision, manage and monitor every aspect of a Hadoop deployment. More information on IBM Open Platform can be found at: http://www-01.ibm.com/software/data/infosphere/hadoop/enterprise.html Isilon Scale-Out NAS for HDFS EMC Isilon is the only scale-out NAS platform natively integrated with the Hadoop Distributed File System (HDFS). Using HDFS as an over-the-wire protocol, you can deploy a powerful, efficient, and flexible data storage and analytics ecosystem. In addition to native integration with HDFS, EMC Isilon storage easily scales to support massively large Hadoop analytics projects. Isilon scale-out NAS also offers unmatched simplicity, efficiency, flexibility, and reliability that you need to maximize the value of your Hadoop data storage and analytics workflow investment.
  • 9. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 9 Overview of Isilon Scale-Out NAS for Big Data The EMC Isilon scale-out platform combines modular hardware with unified software to provide the storage foundation for data analysis. Isilon scale-out NAS is a fully distributed system that consists of nodes of modular hardware arranged in a cluster. The distributed Isilon OneFS operating system combines the memory, I/O, CPUs, and disks of the nodes into a cohesive storage unit to present a global namespace as a single file system. The nodes work together as peers in a shared-nothing hardware architecture with no single point of failure. Every node adds capacity, performance, and resiliency to the cluster and each node acts as a Hadoop namenode and datanode. The namenode daemon is a distributed process that runs on all the nodes in the cluster. A compute client can connect to any node through HDFS. As nodes are added, the file system expands dynamically and redistributes data, eliminating the work of partitioning disks and creating volumes. The result is a highly efficient and resilient storage architecture that brings all the advantages of an enterprise scale-out NAS system to storing data for analysis. With traditional direct attached storage, the ratio of CPU, RAM, and disk space requirements depends on the workload—these factors make it difficult to size a Hadoop cluster before you have had a chance to measure your MapReduce workload. Expanding data sets also makes sizing decisions upfront problematic. Isilon scale-out NAS lends itself perfectly to this situation: Isilon scale-out NAS lets you increase CPUs, RAM, and disk space by adding nodes to dynamically match storage capacity and performance with the demands of a dynamic Hadoop workload. An Isilon cluster optimizes data protection. OneFS more efficiently and reliably protects data than HDFS. The HDFS protocol, by default, replicates a block of data three times. In contrast, OneFS stripes the data across the cluster and protects the data with forward error correction codes, which consume less space than replication with better protection.
  • 10. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 10 Pre-installation Checklist Supported Software Versions The environment used for this document consists of the following software versions:  Ambari 1.7.0_IBM  IBM Open Platform v 4.0.0.0  Isilon OneFS 7.2.0.3 with patch-159065  All of IBM BigInsights v 4.0 value packs, i.e. Business Analyst, Data Scientist, and Enterprise Management ______________________________________________________________________ Note: IBM BigInsights v 4.0 requires OneFS v 7.2.0.3 with patch-159065. OneFS version 7.2.0.4 should also work as well as version 7.2.1.1 when available. Do not install IBM BigInsights with OneFS versions lower than 7.2.0.3. See EMC Isilon Supportability and Compatibility Guide for the latest compatibility updates: https://support.emc.com/docu44518_Isilon-Supportability-and-Compatibility- Guide.pdf?language=en_US Hardware Requirements and Suggested Hadoop Service Layout Detail system requirements for IBM BigInsights compute nodes can be found at: http://www-01.ibm.com/support/docview.wss?uid=swg27027565 In a multi-node IBM BigInsights cluster, it is suggested that you have at least one management node in your non-high availability environment, if performance is not an issue. If performance is a concern, consider configuring at least three management nodes. If you use the BigInsights - Big SQL service, consider configuring four management nodes. If you use a high availability environment, consider six management nodes. Use
  • 11. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 11 the following list as a guide for the nodes in your IBM/EMC cluster. A suggested layout is shown in Table 1 for both Non-High availability and High availability deployments. ________________________________________________________________________________________ Note: With both deployment options, EMC Isilon provides namenode, secondary namenode and datanode functions for the entire cluster. Do not designate any compute node as a namenode, secondary namenode, or datanode in any aspect of the IBM BigInsights configuration. Table 1. Suggested Service Layout Non-High availability High availability Management node 1  Ambari  PostgreSQL  Knox  Zookeeper  Hive  Spark  Spark History Server  BigInsights Home  BigSheets  Big R  BigSQL Headnode  Text Analytics Management node 2  Resource Manager  HBase Master  Zookeeper  Oozie  Ambari monitoring service Management node 3  Job history server  Zookeeper  App Timeline Server  Kafka Management node 4  Big SQL Scheduler  Hive Server (MySQL)  MySQL metastore  Hive/Oozie metastore  WebHCat Server  Data Server Manager Management node 1  Ambari  PostgreSQL  Spark  Spark History Server  BigSQL Headnode Management node 2  Resource Manager  Zookeeper  Oozie  Ambari monitoring service  BigInsights Home Management node 3  Resource Manager (standby)  Job history server  Zookeeper  App Timeline Server  Kafka  Oozie (Standby) Management node 4  Big SQL Scheduler  HBase Master (standby)  Hive Server  MySQL Server  Hive metastore  WebHCat Server  Data Server Manager Management node 5  Big SQL Headnode (Standby)  Big SQL Scheduler (Standby)  HBase Master  Hive Server (Standby)  Hive Metastore (Standby)  Journal Node  Zookeeper
  • 12. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 12 Installation Overview Below is the overview of the installation process that this document will describe. 1. Confirm prerequisites. 2. Prepare your network infrastructure including DNS. 3. Prepare your Isilon cluster. 4. Prepare Linux compute nodes. 5. Install Ambari Server. 6. Use Ambari Manager to deploy IBM Open Platform to compute nodes. 7. Install IBM BigInsights Value Packages 8. Perform key functional tests. Prerequisites Isilon Scale-Out NAS or Isilon OneFS Simulator  For low-capacity, non-performance testing of Isilon, the EMC Isilon OneFS Simulator can be used instead of a cluster of physical Isilon appliances. This can be downloaded for free from http://www.emc.com/getisilon. Refer to the EMC Isilon OneFS Simulator Install Guide for details. Be sure to follow the section for running the virtual nodes in VMware ESX. Only a single virtual node is required but adding additional nodes will allow you to explore other features such as data protection, SmartPools (tiering), and SmartConnect (network load balancing).  For physical Isilon nodes, you should have already completed the console-based installation process for your first Isilon node and added two other nodes for a minimum of 3 Isilon nodes.  You should have OneFS version 7.2.0.3 + patch 159065 installed on Isilon.
  • 13. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 13  You must obtain OneFS HDFS license code and install it on your Isilon cluster. You can get your free OneFS HDFS license from: http://www.emc.com/campaign/isilon-hadoop/index.htm.  It is recommended, but not required, to have a SmartConnect Advanced license for your Isilon cluster.  To allow for scripts and other small files to be easily shared between all nodes in your environment, it is highly recommended to enable NFS (Unix Sharing) on your Isilon cluster. By default, the entire /ifs directory is already exported and this can remain unchanged. This document assumes that a single Isilon cluster is used for this NFS export as well as for HDFS. However, there is no requirement that the NFS export be on the same Isilon cluster that you are using for HDFS. Linux  RedHat Enterprise Linux (RHEL) Server 6 (Update 5 minimum) or comparable CentOS Server.  100GB Root Partition  At a minimum, 96G RAM for production environments. The more RAM the better for Hadoop. Networking  For the best performance, a single 10 Gigabit Ethernet switch should connect to at least one 10 Gigabit port on each Linux host. Additionally, the same switch should connect to at least one 10 Gigabit port on each Isilon node.  A single dedicated layer-2 network can be used to connect all hosts and Isilon nodes. Although multiple networks can be used for increased security, monitoring, and robustness.
  • 14. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 14  At least an entire /24 IP address block should be allocated to your network. This will allow a DNS reverse lookup zone to be delegated to your Hadoop DNS server.  If using the EMC Isilon OneFS Simulator, you will need at least two static IP addresses (one for the node’s ext-1 interface, another for the SmartConnect service IP). Each additional Isilon node will require an additional IP address.  At a minimum, you will need to allocate to your Isilon cluster one IP address per Access Zone per Isilon node. In general, you will need one Access Zone for each separate Hadoop cluster that will use Isilon for HDFS storage.  For the best possible load balancing during an Isilon node failure scenario, the recommended number of IP addresses is given by the formula below. Of course, this is in addition to any IP addresses used for non-HDFS pools. # of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones) For example, 20 IP addresses are recommended for 5 Isilon nodes and 2 Access Zones.  This document will assume that Internet access is available to all servers to download various components from Internet repositories. DNS  A DNS server is required and you must have the ability to create DNS records and zone delegations.  It is recommended that your DNS server delegate a subdomain to your Isilon cluster. For instance, DNS requests for subnet0-pool0.isiloncluster1.example.com or isiloncluster1.example.com should be delegated to the Service IP defined on your Isilon cluster.  To allow for a convenient way of changing the HDFS Namenode used by all Hadoop applications and services, create a DNS record for your Isilon cluster’s HDFS Namenode service. This should be a CNAME alias to your Isilon SmartConnect zone. Specify a TTL of 1 minute to allow for quick changes. For example, create a CNAME record for mycluster1-hdfs.example.com that targets subnet0-
  • 15. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 15 pool0.isiloncluster1.example.com. If you later want to redirect all HDFS I/O to another cluster or a different pool on the same Isilon cluster, you simply need to change the DNS record and restart all Hadoop services. Other  See http://www.github.com/bonibruno/BigInsights, there are three scripts to download to help automate new IBM BigInsights installations with EMC Isilon: 1. bi_create_users.sh – use this script to create the users and groups on all the Linux nodes before beginning installation. 2. isilon_create_users.sh – use this script to create the users and groups on Isilon before beginning installation. You must first create your access zone (described later in this document, e.g. ibm), before running this script. 3. isilon_create_directories.sh – run this after the script above. More information on the use of these scripts is provided in the installation section of this document. Prepare Isilon Assumptions This document makes the assumptions listed below. These are not necessarily requirements but they are usually valid and simplify the process.  It is assumed that you are not using a directory service such as Active Directory for Hadoop users and groups.  It is assumed that you are not using Kerberos authentication for Hadoop.
  • 16. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 16 SmartConnect for HDFS A best practice for HDFS on Isilon is to utilize two SmartConnect IP address pools for each access zone. One IP address pool should be used by Hadoop clients to connect to the HDFS namenode service on Isilon and it should use the dynamic IP allocation method to minimize connection interruptions in the event that an Isilon node fails. ____________________________________________________________________ Note: Dynamic IP allocation requires a SmartConnect Advanced license. ____________________________________________________________________ A Hadoop client uses a specific SmartConnect IP address pool simply by using its zone name (DNS name) in the HDFS URI: For example, hdfs://subnet0-pool1.isiloncluster1.example.com:8020 A second IP address pool should be used for HDFS datanode connections and it should also use dynamic IP allocation method. To assign specific Smart-Connect IP address pools for datanode connections, you will use the “isi hdfs racks modify” command. If the network is flat, there is no need to use “isi hdfs racks modify”, the default configuration will suffice. If IP addresses are limited and you have a SmartConnect Advanced license, you may choose to use a single dynamic pool for namenode and datanode connections. This may result in uneven utilization of Isilon nodes. If you do not have a SmartConnect Advanced license, you may choose to use a single static pool for namenode and datanode connections. This may result in some failed HDFS connections in the event of a node failure. For more information, see EMC Isilon Best Practices for Hadoop Data Storage white paper online at: https://www.emc.com/collateral/white-papers/h13926-wp-emc-isilon-hadoop- best-practices-onefs72.pdf
  • 17. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 17 OneFS Access Zones Access zones on OneFS are a way to select a distinct configuration for the OneFS cluster based on the IP address that the client connects to. For HDFS, this configuration includes authentication methods, HDFS root path, and authentication providers (AD, LDAP, local, etc.). By default, OneFS includes a single access zone called System. If you will only have a single Hadoop cluster connecting to your Isilon cluster, then you can use the System access zone with no additional configuration. However, to have more than one Hadoop cluster connect to your Isilon cluster, it is best to have each Hadoop cluster connect to a separate OneFS access zone. This will allow OneFS to present each Hadoop cluster with its own HDFS namespace and an independent set of users. For more information, see Security and Compliance for Scale-out Hadoop Data Lakes whitepaper. To view your current list of access zones and the IP pools associated with them: isiloncluster1-1# isi zone zones list Name Path ------------ System /ifs ------------ Total: 1 isiloncluster1-1# isi networks list pools -v subnet0:pool0 In Subnet: subnet0 Allocation: Static Ranges: 1 10.111.129.115-10.111.129.126 Pool Membership: 4 1:10gige-1 (up) 2:10gige-1 (up) 3:10gige-1 (up) 4:10gige-1 (up) Aggregation Mode: Link Aggregation Control Protocol (LACP) Access Zone: System (1) SmartConnect: Suspended Nodes : None Auto Unsuspend ... 0 Zone : subnet0-pool0.isiloncluster1.lab.example.com Time to Live : 0 Service Subnet : subnet0 Connection Policy: Round Robin Failover Policy : Round Robin Rebalance Policy : Automatic Failback
  • 18. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 18 To create a new access zone and an associated IP address pool: isiloncluster1-1# mkdir -p /ifs/isiloncluster1/zone1 isiloncluster1-1# isi zone zones create --name zone1 --path /ifs/isiloncluster1/zone1 isiloncluster1-1# isi networks create pool --name subnet0:pool1 --ranges 10.111.129.127-10.111.129.138 --ifaces 1-4:10gige-1 --access-zone zone1 --zone subnet0-pool1.isiloncluster1.lab.example.com --sc-subnet subnet0 --dynamic Creating pool ‘subnet0:pool1’: OK Saving: OK ____________________________________________________________________ Note: If you do not have a SmartConnect Advanced license, you will need to omit the -- dynamic option. ____________________________________________________________________ Sharing Data between Access Zones By default, the data in one access zone cannot be access by users in another access zone. In certain cases, however, you may need to make the same data set available to more than one Hadoop compute cluster. Using fully qualified HDFS paths, e.g. hdfs://zone1- hdfs.example.com/hadoop/dir1, can render a data set available across two or more access zones. With fully qualified HDFS paths, the data sets do not cross access zones. Instead, the Hadoop jobs can access the data sets from a common shared HDFS namespace. For instance, you can selectively share data between two or more access zones based on referential links and file/directory permissions.
  • 19. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 19 User & Group ID’s Isilon clusters and Hadoop servers each have their own mapping of user IDs (uid) to user names and group IDs (gid) to group names. When Isilon is used only for HDFS storage by the Hadoop servers, the IDs do not need to match. This is due to the fact that the HDFS protocol only refers to users and groups by their names, and never their numeric IDs. In contrast, the NFS protocol refers to users and groups by their numeric IDs. Although NFS is rarely used in traditional Hadoop environments, the high-performance, enterprise- class, and POSIX-compatible NFS functionality of Isilon makes NFS a compelling protocol for certain workflows. If you expect to use both NFS and HDFS on your Isilon cluster (or simply want to be open to the possibility in the future), it is highly recommended to maintain consistent names and numeric IDs for all users and groups on Isilon and your Hadoop servers. In a multi-tenant environment with multiple Hadoop clusters, numeric IDs for users in different clusters should be distinct. For instance, the user bigsql in Hadoop cluster 1 may have ID 1013 and this same ID will be used in the Isilon access zone for Hadoop cluster 1 as well as every server in Hadoop cluster 1. The user bigsql in Hadoop cluster 2 may have ID 710 and this ID will be used in the Isilon access zone for Hadoop cluster 2 as well as every server in Hadoop cluster 2. Configuring Isilon for HDFS _____________________________________________________________________ Note: In the steps below, replace zone1 with System to use the default System access zone or you may specify the name of a new access zone that you previously created. ______________________________________________________________________ 1. Open a web browser to the your Isilon cluster’s web administration page. If you don’t know the URL, simply point your browser to: https://isilon_node_ip_address:8080
  • 20. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 20 The isilon_node_ip_address is any IP address on any Isilon node that is in the System Access Zone. This usually corresponds to the ext-1 interface of any Isilon node. 2. Login with your root account. You specified the root password when you configured your first node using the console. 3. Check, and edit as necessary, your NTP settings. Click Cluster Management -> General Settings -> NTP.
  • 21. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 21 1. SSH into any node in your Isilon cluster as root. 2. Confirm that your Isilon cluster is at OneFS version 7.2.0.3. isiloncluster1-1# isi version Isilon OneFS v7.2.0.3 ... 3. For OneFS version 7.2.0.3, you must have patch-159065 installed. You can view the list of patches you have installed with: # isi pkg info patch-159065: This patch adds support for the Ambari 1.7.0_IBM Server. 4. Install the patch if needed: [user@workstation ~]$ scp patch-159065.tgz root@mycluster1-hdfs:/tmp isiloncluster1-1# gunzip < /tmp/patch-159065.tgz | tar -xvf - isiloncluster1-1# isi pkg install patch-159065.tar Preparing to install the package... Checking the package for installation... Installing the package Committing the installation... Package successfully installed. 5. Verify your HDFS license. isiloncluster1-1# isi license Module License Status Configuration Expiration Date ------ -------------- ------------- --------------- HDFS Evaluation Not Configured November12, 2016
  • 22. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 22 6. Create the HDFS root directory. This is usually called hadoop and must be within the access zone directory. isiloncluster1-1# mkdir -p /ifs/isiloncluster1/zone1/hadoop 7. Set the HDFS root directory for the access zone. isiloncluster1-1# isi zone zones modify zone1 --hdfs-root-directory /ifs/isiloncluster1/zone1/hadoop 8. Set the HDFS block size used for reading from Isilon. isiloncluster1-1# isi hdfs settings modify --default-block-size 128M 9. Create an indicator file so that we can easily determine when we are looking your Isilon cluster via HDFS. isiloncluster1-1# touch /ifs/isiloncluster1/zone1/hadoop/THIS_IS_ISILON_isiloncluster1_zone1 10.Copy the scripts (isilon_create_users.sh & isilon_create_directories.sh) you downloaded from http://www.github.com/bonibruno/BigInsights to Isilon, [user@workstation ~]$ scp isilon_create_*.sh root@isilon_node_ip_address:/ifs/isiloncluster1/scripts 11.Execute the script isilon_create_users.sh. This script will create all required users and groups for IBM BigInsights v 4.0. Warning: The script isilon_create_users.sh will create local user and group accounts on your Isilon cluster for Hadoop services. If you are using a directory service such as Active Directory and you want these users and groups to be defined in your directory service, then DO NOT run this script. Instead, refer to the OneFS documentation and EMC Isilon Best Practices for Hadoop Data Storage.
  • 23. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 23 Script Usage: isilon_create_users.sh –dist <DIST> [–startgid <GID>] [–startuid <UID>] [– zone <ZONE>] dist - This will correspond to your Hadoop distribution – bi4.0 startgid - Group IDs will begin with this value. For example: 1000 startuid - User IDs will begin with this value. This is generally the same as gid_base. For example: 1000. zone – Access Zone name. For example: zone1 isiloncluster1-1# bash /ifs/isiloncluster1/scripts/isilon_create_users.sh --dist bi4.0 --startgid 1000 --startuid 1000 --zone zone1 Example output of script is shown below: Info: Hadoop distribution: bi Info: groups will start at GID 1000 Info: users will start at UID 1000 Info: will put users in zone: zone1 Info: HDFS root: /ifs/isiloncluster1/hadoop Failed to add member UID:1001 to group GROUP:hadoop: User is already in local group SUCCESS -- Hadoop users created successfully! Done!
  • 24. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 24 ______________________________________________________________________ Note: The “User is already in local group” message is expected, this user corresponds to the hadoop user which is already in the hadoop group. 12. Execute the script isilon_create_directories.sh. This script will create all required directories with the appropriate ownership and permissions. Script Usage: isilon_create_directories.sh –dist <DIST> [–fixperm] [–zone <ZONE>] dist - This will correspond to your Hadoop distribution – bi4.0 fixperm - Updates ownership and permissions on hadoop directories. zone - Access Zone name. For example: zone1 isiloncluster1-1# bash /ifs/isiloncluster1/scripts/isilon_create_directories.sh --dist bi4.0 --fixperm --zone zone1 13. Map the hdfs user to the Isilon superuser. This will allow the hdfs user to chown (change ownership of) all files during IBM BigInsights installation. ______________________________________________________________________ Warning: The command below will restart the HDFS service on Isilon to ensure that any cached user mapping rules are flushed. This will temporarily interrupt any HDFS connections coming from other Hadoop clusters. ______________________________________________________________________ isiloncluster1-1# isi zone zones modify --user-mapping-rules=’’hdfs=>root’’ --zone zone1 isiloncluster1-1# isi services isi_hdfs_d disable ; isi services isi_hdfs_d enable The service ‘isi_hdfs_d’ has been disabled. The service ‘isi_hdfs_d’ has been enabled.
  • 25. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 25 Create DNS Records for Isilon You will now create the required DNS records that will be used to access your Isilon cluster. 1. Create a delegation record so that DNS requests for the zone isiloncluster1.example.com are delegated to the Service IP that will be defined on your Isilon cluster. The Service IP can be any unused static IP address in your lab subnet. 2. Create a CNAME alias for your Isilon SmartConnect zone. For example, create a CNAME record for mycluster1-hdfs.example.com that targets subnet0- pool0.isiloncluster1.example.com. 3. Test name resolution. [user@workstation ~]$ ping mycluster1-hdfs.example.com PING subnet0-pool0.isiloncluster1.example.com (10.11.12.13) 56(84) bytes of data. 64 bytes from 10.11.12.13: icmp_seq=1 ttl=64 time=1.15 ms Prepare Linux Compute Nodes Linux Operating System packages needed for IBM BigInsights: 1. Compatibility Libraries 2. Networking Tools 3. Perl Support 4. Ruby Support 5. Web Services add on 6. PHP Support 7. Web Server
  • 26. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 26 8. Mysql* 9. PostGres* 10.snmp support 11.Development Tools 12. Korn Shel Enable NTP on all Linux Compute nodes 1. Edit /etc/ntp.conf file and add your NTP Server. 2. Enable NTP, “service ntpd start” 3. chkconfig –level 2345 ntpd on Disable SELinux on each node if enabled before installing Ambari. 1. Edit /etc/selinux/config 2. Set SELINUX=disabled 3. Reboot ____________________________________________________________________ Note: SELinux can be disabled temporarily with the “setenforce 0” command. ____________________________________________________________________ Check UMASK Settings The umask setting on each node should be set to 0022 in /etc/profile and /etc/bashrc. Just modify existing umask entry if needed, e.g. “umask 0022”.
  • 27. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 27 Set ulimit Properties 1. Edit /etc/security/limits.d/90-nproc.conf #set for all users * hard nofile 65536 * soft nofile 65536 * hard nproc 65536 * hard nproc 65536 Kernel Modifications 1. Edit /etc/sysctl.conf and add the following: vm.swappiness=5 kernel.pid_max=4194303 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.ip_local_port_range = 1024 64000 Create IBM BigInsights Hadoop Users and Groups Create required users on all Linux nodes. It is recommended to create all Hadoop users before installing IBM BigInsights. Use the bi_create_users.sh script obtained from: http://www/github.com/bonibruno/BigInsights [user_workstation ~$] scp bi_create_users.sh [node1]:/root Run script, e.g. #./bi_create_users.sh Repeat above for all nodes.
  • 28. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 28 Configure Passwordless SSH Configure passwordless SSH for all Linux nodes. 1. Create Authentication SSH Keys ssh-keygen -f id_rsa -t rsa -N 2. Create .ssh directories on all nodes ssh root@[node1] mkdir –p .ssh cd .ssh Upload generated keys to all hosts: cat id_rsa.pub | ssh root@[node1] 'cat >> .ssh/authorized_keys' Repeat above for all nodes. 3. Set permissions on .ssh directory ssh root@[node1] "chmod 700 .ssh; chmod 640 .ssh/authorized_keys” Additional Linux Packages to Install Install the following packages on all Linux compute nodes.  deltarpm  python-deltarpm  createrepo  pam-1.1.1-17.el6.i686.rpm  mysql-connector-java-5.1.17-6.el6.noarch.rpm  ksh  nc  libdbi  libstdc  libaio  java-1.7.0-openjdk-devel  python-paramiko  python-rrdtool-1.4.5-1.el6.rfx.x86_64
  • 29. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 29  snappy-1.0.5-1.el6.x86_64  web-ui-framework Install the above packages using the yum install command. Test DNS Resolution Make sure all compute nodes resolve with a fully qualifies domain name. Ping each host with the associated FQDN and make sure it is reachable by FQDN. Edit sudoers file on all Linux compute nodes. 1. Edit /etc/sudoers ## Additions needed for IBM BigInsights hadoop ALL=(ALL) NOPASSWD: ALL bigsql ALL=(ALL) NOPASSWD: ALL Check IBM’s BigInsights Website for more info on preparing Linux nodes. http://www01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsigh ts.install.doc/doc/install_prepare.html Installing IBM Open Platform (OP) Download IBM Open Platform Software Log into the IBM Passport Advantage web portal with your IBM assigned credentials and download the following packages onto the designated Ambari server node: • BI-AH-1.0.0.1-IOP-4.0.x86_64.bin • IOP-4.0.0.0.x86_64.rpm • iop-4.0.0.0.x86_64.tar.gz • iop-utils-1.0-iop-4.0.x86_64.tar.gz
  • 30. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 30 Create IBM Open Platform Repository The IBM Open Platform with Apache Hadoop uses the repository-based Ambari installer. You have two options for specifying the location of the repository from which Ambari obtains the component packages. The IBM Open Platform with Apache Hadoop installation includes OpenJDK 1.7.0. During installation, you can either install the version provided or make sure Java™ 7 is installed on all nodes in the cluster. 1. Log in to your Linux cluster as root, or as a user with root privileges. 2. Ensure that the nc package is installed on all nodes: yum install -y nc If you installed the Basic Server option on your server, the nc package might not be installed, which might result in the failure on datanodes of the IBM Open Platform with Apache Hadoop. 3. Locate the IOP-4.0.0.0.x86_64.rpm file you downloaded from the download site. Run the following command to install the ambari.repo file into /etc/yum.repos.d: yum install IOP-4.0.0.0.x86_64.rpm If using a mirror repository, edit the file /etc/yum.repos.d/ambari.repo and replace baseurl=http://ibm-open-platform.ibm.com/repos/Ambari/RHEL6/x86_64/1.7 with your mirror URL. For example, baseurl=http://<web.server>/repos/Ambari/RHEL6/x86_64/1.7/ Disable the gpgcheck in the ambari.repo file. To disable signature validation, change gpgcheck=1 to gpgcheck=0. Alternatively, you can keep gpgcheck on and change the public key file location to the mirror Ambari repository. To do this, change the following
  • 31. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 31 gpgkey=http://ibm-open-platform.ibm.com/repos/Ambari/RHEL6/x86_64/1.7/BI-GPG- KEY.public to the following: gpgkey=http://<web.server>/repos/Ambari/RHEL6/x86_64/1.7/BI-GPG-KEY.public 4. Clean the yum cache on each node so that the right packages from the remote repository are seen by your local yum. >sudo yum clean all 5. Install the Ambari server on the intended management node, using the following command: >sudo yum install ambari-server Accept the install defaults. 6. If you are using a mirror repository, after you install the Ambari server, update the following file with the mirror repository URLs. /var/lib/ambari-server/resources/stacks/BigInsights/4.0/repos/repoinfo.xml In the file, change the information from the Original content to the Modified content Original content Modified content <os type="redhat6"> <repo> <baseurl> http://ibm-open- platform.ibm.com/repos/IOP/RHEL6/x86_64 /4.0</baseurl> <repoid>IOP-4.0</repoid> <reponame>IOP</reponame> </repo> <repo> <os type="redhat6"> <repo> <baseurl> http://<web.server>/repos/IOP/RHE L6/x86_64/4.0</baseurl> <repoid>IOP-4.0</repoid> <reponame>IOP</reponame> </repo> <repo> <baseurl>
  • 32. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 32 <baseurl> http://ibm-open- platform.ibm.com/repos/IOP- UTILS/RHEL6/x86_64/1.0</baseurl> <repoid>IOP-UTILS-1.0</repoid> <reponame>IOP-UTILS</reponame> </repo> </os> http://<web.server>/repos/IOP- UTILS/RHEL6/x86_64/1.0</baseurl> <repoid>IOP-UTILS-1.0</repoid> <reponame>IOP- UTILS</reponame> </repo> </os> Edit the /etc/ambari-server/conf/ambari.properties file. change the information from the Original content to the Modified content Original content Modified content jdk1.7.url=http://ibm-open- platform.ibm.com/repos/IOP- UTILS/RHEL6/x86_64/1.0/openjdk/jdk- 1.7.0.tar.gz jdk1.7.url=http://<web.server>/r epos/IOP- UTILS/RHEL6/x86_64/1.0/openjdk /jdk-1.7.0.tar.gz 7. Set up the Ambari server, using the following command: >sudo ambari-server setup Accept the setup preferences. A Java JDK is installed as part of the Ambari server setup. However, the Ambari server setup also allows you to reuse an existing JDK. The command is: ambari-server setup -j /full/path/to/JDK The JDK path set by the -j parameter must be the same on each node in the cluster. 8. Start the Ambari server, using the following command: >sudo ambari-server start
  • 33. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 33 9. If the Ambari server had been installed on your node previously, the node may contain old cluster information. Reset the Ambari server to clean up its cluster information in the database, using the following commands: >sudo ambari-server stop >sudo ambari-server reset >sudo ambari-server start 10. Access the Ambari web user interface from a web browser by using the server name (the fully qualified domain name, or the short name) on which you installed the software, and port 8080. For example, enter abc.com:8080. You can use any available port other than 8080 that will allow you to connect to the Ambari server. In some networks, port 8080 is already in use. To use another port, do the following: a. Edit the ambari.properties file: vi /etc/ambari-server/conf/ambari.properties b. Add a line in the file to select another port: client.api.port=8081 c. Save the file and restart the Ambari server: ambari-server restart 11. Log in to the Ambari server with the default username and password: admin/admin. The default username and password is required only for the first login. You can configure users and groups after the first login to the Ambari web interface.
  • 34. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 34 12. On the Welcome page, click Launch Install Wizard. 13. On the Get Started page, enter a name for the cluster you want to create. The name cannot contain blank spaces or special characters. Click Next. 14. You will deploy IBM Open Platform for Apache Hadoop with EMC Isilon. Ambari Server allows for the immediate usage of an Isilon cluster for all HDFS services (NameNode and DataNode), no reconfiguration will be necessary once the IBM Open Platform install is completed. 1. SSH into Isilon as root and configure the Ambari Agent. isiloncluster1-1# isi zone zones modify zone1 --hdfs-ambari-namenode mycluster1-hdfs.example.com isiloncluster1-1# isi zone zones modify zone1 --hdfs-ambari-server manager- svr-1.example.com
  • 35. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 35 15. On the Select Stack page, click the Stack version you want to install (BigInsights™ 4.0). Click Next. 16. On the Install Options page, in Target Hosts, add the list of Linux hosts that the Ambari server will manage and the IBM Open Platform with Apache Hadoop software will deploy one node per line. For example, enter host1.example.com host2.example.com host3.example.com host4.example.com In Host Registration Information, select one of the two options: Provide the SSH Private Key to automatically register hosts
  • 36. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 36 Click SSH Private Key. The private key file is /root/.ssh/id_rsa, where the root user installed the Ambari server. Click Choose File to find the private key file you installed previously. You should have retained a copy of the SSH private key (.ssh/id_rsa) in your local directory when you set up password-less SSH. Copy and paste the key into the text box manually. Click the Register and Confirm button. ____________________________________________________________________ Note: After the Linux hosts register, click the back button and Perform manual registration for Isilon and do not use SSH. ____________________________________________________________________ Isilon has an ambari-agent within OneFS and needs to be manually registered in Ambari. After registering Isilon manually, click the Next button. You should see the Ambari agents on both your Linux hosts and Isilon become registered. 17. On the Confirm Hosts page, you check that the correct hosts for your cluster have been located and that those hosts have the correct directories, packages, and processes to continue the installation. If hosts were selected in error, click the check boxes next to the hosts you want to remove. Click Remove Selected. To remove a single host, click Remove in the Action column. If warnings are found during the check process, you can click Click here to see the warnings to see what caused the warnings. The Host Checks page identifies any issues with the hosts. For example, a host may have Transparent Huge Pages or Firewall issues. You can ignore errors related to user names and groups as we pre-created the users in the pre-installation steps of this document. After you resolve the issues, click Rerun Checks on the Host Checks page. When you have confirmed the hosts, click Next. 18. On the Choose Services page, select the services you want to install.
  • 37. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 37 Ambari shows a confirmation message to install the required service dependencies. For example, when selecting Oozie only, the Ambari web interface shows messages for accepting YARN/MR2, HDFS and Zookeeper installations. It also shows Nagios and Ganglia for monitoring and alerting, but they are not required services. 19. On the Assign Masters page, assign NameNode and SNameNode components to the Isilon SmartConnect address e.g. mycluster1-hdfs.example.com. The rest of the services can be deployed per the recommended services layout - refer back to Table 1. Make sure you assign Namenode and SNameNode only to the Isilon SmartConnect address and none of the Linux nodes, e.g. only mycluster1-hdfs.example.com. Click Next. On the Assign Slaves and Clients page, assign the components to Linux hosts in your cluster and make sure datanode is only assigned to Isilon. Assign Client to the client nodes. Click Next. Tip: If you anticipate adding the Big SQL service at some later time, you must include all clients on all the anticipated Big SQL worker nodes. Big SQL specifically needs the HDFS, Hive, HBase, Sqoop, HCat, and Oozie clients. 20. On the Customize Services page, select configuration settings for the services selected. Default values are filled in automatically when available and they are the recommended values. The installation wizard prompts you for required fields (such as password entries) by displaying a number in a circle next to an installed service. Assign passwords to Hive, Oozie, and any other selected services that require them. The following settings should be checked: • YARN Node Manager log-dirs • YARN Node Manager local-dirs • HBase local directory • ZooKeeper directory
  • 38. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 38 • Oozie Data Dir • Storm storm.local.dir Click the number and enter the requested information in the field outlined in red. Make sure that the service port that is set is not already used by another component. For example, the Knox gateway port is, by default, set as 8443. But, when the Ambari server is set up with HTTPs, and the SSL port is set up using 8443, then you must change the Knox gateway port to some other value. ____________________________________________________________________ Note: If you are working in an LDAP environment where users are set up centrally by the LDAP administrator and therefore, already exist, selecting the defaults can cause the installation to fail. Open the Misc tab, and check the box to ignore user modification errors. 21. When you have completed the configuration of the services, click Next. 22. On the Review page, verify that your settings are correct. Click Deploy. 23. The Install, Start, and Test page shows the progress of the installation. The progress bar at the top of the page gives the overall status while the main section of the page gives the status for each host. Logs for a specific task can be displayed by clicking on the task. Click the link in the Message column to find out what tasks have been completed for a specific host or to see the warnings that have been encountered. When the message "Successfully installed and started the services" appears, click Next. 24. On the Summary page, review the accomplished tasks. Click Complete to go to the IBM Open Platform with Apache Hadoop dashboard. Validating IBM Open Platform Install Ambari provides service checks for all the supported services. These checks run automatically after each service installation, or they can be run manually at any time. You
  • 39. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 39 can access the Ambari web interface and use the Services View to make sure all the components pass their checks successfully. The following steps provide another way to validate your installation. 1. As the root user on a node on which Apache Hadoop is installed, enter the following command to become the ambari-qa user: su - ambari-qa 2. As the ambari-qa user, run the following command: export HADOOP_MR_DIR=/usr/iop/current/hadoop-mapreduce-client # Generate data with 1000 rows. Each row is about 100 bytes. yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teragen 1000 /tmp/tgout # Sort data yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar terasort /tmp/tgout /tmp/tsout # Validate data yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teravalidate /tmp/tsout /tmp/tvout If the job is successful, you will see a log record similar to the following: INFO mapreduce.Job: Job job_id completed successfully Browse to your cluster on port 8088 to see the results of your validation tests, e.g. http://x.x.x.x:8088/cluster, example YARN test results shown below.
  • 40. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 40 Adding a Hadoop User You must add a user account for each Linux user that will submit MapReduce jobs. The procedure below can be used to add a user named hduser1 as an example. 1. Add user to Isilon. isiloncluster1-1# isi auth groups create hduser1 --zone zone1 --provider local isiloncluster1-1# isi auth users create hduser1 --primary-group hduser1 --zone zone1 -- provider local --home-directory /ifs/isiloncluster1/zone1/hadoop/user/hduser1 2. Add user to Hadoop nodes. [root@mycluster1-master-0 ~]# adduser hduser1 3. Create the user’s home directory on HDFS. [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -mkdir -p /user/hduser1 [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chown hduser1:hduser1 /user/hduser1 [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chmod 755 /user/hduser1 Additional Service Tests The tests below should be performed to ensure a proper installation. Perform the tests in the order shown. You must create the Hadoop user hduser1 before proceeding. HDFS [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -ls / Found 5 items -rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON drwxr-xr-x - hbase hbase 148 2014-08-05 06:06 /hbase drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp drwxr-xr-x - hdfs supergroup 184 2014-08-05 06:07 /user [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -put -f /etc/hosts /tmp [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -cat /tmp/hosts 127.0.0.1 localhost [root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -rm -skipTrash /tmp/hosts
  • 41. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 41 [root@mycluster1-master-0 ~]# su - hduser1 [hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls / Found 5 items -rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON drwxr-xr-x - hbase hbase 148 2014-08-05 06:28 /hbase drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp drwxr-xr-x - hdfs supergroup 209 2014-08-05 06:39 /user [hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls ... YARN/MAPREDUCE [hduser1@mycluster1-master-0 ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1000 ... Estimated value of Pi is 3.14000000000000000000 [hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir in You can put any file into the in directory. It will be used the datasource for subsequent tests. [hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f /etc/hosts in [hduser1@mycluster1-master-0 ~]$ hadoop fs -ls in ... [hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r out [hduser1@mycluster1-master-0 ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount in out ... [hduser1@mycluster1-master-0 ~]$ hadoop fs -ls out Found 4 items -rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/_SUCCESS -rw-r--r-- 1 hduser1 hduser1 24 2014-08-05 06:44 out/part-r-00000 -rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/part-r-00001 -rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/part-r-00002 [hduser1@mycluster1-master-0 ~]$ hadoop fs -cat out/part* localhost 1 127.0.0.1 1 Browse to the YARN Resource Manager GUI http://mycluster1-master-0.example.com:8088/ Browse to the MapReduce History Server GUI http://mycluster1-master-0.lab.example.com:19888/. In particular, confirm that you can view the complete logs for task attempts.
  • 42. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 42 HIVE [hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir -p sample_data/tab1 [hduser1@mycluster1-master-0 ~]$ cat - > tab1.csv 1,true,123.123,2012-10-24 08:55:00 2,false,1243.5,2012-10-25 13:40:00 3,false,24453.325,2008-08-22 09:33:21.123 4,false,243423.325,2007-05-12 22:32:21.33454 5,true,243.325,1953-04-22 09:11:33 Type <Control+D>. [hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f tab1.csv sample_data/tab1 [hduser1@mycluster1-master-0 ~]$ hive hive> DROP TABLE IF EXISTS tab1; CREATE EXTERNAL TABLE tab1 ( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LOCATION ‘/user/hduser1/sample_data/tab1’; DROP TABLE IF EXISTS tab2; CREATE TABLE tab2 ( id INT, col_1 BOOLEAN, col_2 DOUBLE, month INT, day INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’; INSERT OVERWRITE TABLE tab2 SELECT id, col_1, col_2, MONTH(col_3), DAYOFMONTH(col_3) FROM tab1 WHERE YEAR(col_3) = 2012; ... OK Time taken: 28.256 seconds hive> show tables; OK
  • 43. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 43 tab1 tab2 Time taken: 0.889 seconds, Fetched: 2 row(s) hive> select * from tab1; OK 1 true 123.123 2012-10-24 08:55:00 2 false 1243.5 2012-10-25 13:40:00 3 false 24453.325 2008-08-22 09:33:21.123 4 false 243423.325 2007-05-12 22:32:21.33454 5 true 243.325 1953-04-22 09:11:33 Time taken: 1.083 seconds, Fetched: 5 row(s) hive> select * from tab2; OK 1 true 123.123 10 24 2 false 1243.5 10 25 Time taken: 0.094 seconds, Fetched: 2 row(s) hive> select * from tab1 where id=1; OK 1 true 123.123 2012-10-24 08:55:00 Time taken: 15.083 seconds, Fetched: 1 row(s) hive> select * from tab2 where id=1; OK 1 true 123.123 10 24 Time taken: 13.094 seconds, Fetched: 1 row(s) hive> exit; HBASE [hduser1@mycluster1-master-0 ~]$ hbase shell hbase(main):001:0> create ‘test’, ‘cf’ 0 row(s) in 3.3680 seconds => Hbase::Table - test hbase(main):002:0> list ‘test’ TABLE test 1 row(s) in 0.0210 seconds => [’’test’’] hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’ 0 row(s) in 0.1320 seconds hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’
  • 44. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 44 0 row(s) in 0.0120 seconds hbase(main):005:0> scan ‘test’ ROW COLUMN+CELL row1 column=cf:a,timestamp=1407542488028,value=value1 row2 column=cf:b,timestamp=1407542499562,value=value2 2 row(s) in 0.0510 seconds hbase(main):006:0> get ‘test’, ‘row1’ COLUMN CELL cf:a timestamp=1407542488028,value=value1 1 row(s) in 0.0240 seconds hbase(main):007:0> quit Ambari Service Check Ambari has built-in functional tests for each component. These are executed automatically when you install your cluster with Ambari. To execute them after installation, select the service in Ambari, click the Service Actions button, and select Run Service Check.
  • 45. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 45 Installing IBM Value Packages Before You Begin Please note that “BigInsights Analyst” and “BigInsights Data Scientist” value package have been sanity tested on EMC Isilon, but have not been performance profiled and tested under load with Isilon 7.2.0.3 version. EMC and IBM BigInsights plan to validate these components under load as part of future integration efforts. Please refer to EMC – IBM BigInsights Joint Support Statement for further details. You must acquire the software from Passport Advantage. The acquired software has a *.bin extension. The name of the *.bin file depends on whether the BigInsights Analyst or the BigInsights Data Scientist module was downloaded. When you run the *.bin file, configuration files are copied to appropriate locations to enable Ambari to see that value-add services as available. When adding the value-add services through Ambari, additional software packages can be downloaded. If the Hadoop cluster cannot directly access the internet, a local mirror repository can be created. Where you perform the following steps depends on whether the Hadoop cluster has direct internet access.  If the Hadoop cluster has direct access to the internet, perform the steps from the Ambari server of the Hadoop cluster.  If the Hadoop cluster does not have direct internet access, perform the steps from a Linux host with direct internet access. Then, transfer the files, as required, to a local repository mirror.
  • 46. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 46 Installation Procedure 1. Update the permissions on the downloaded *.bin file to enable execute. chmod +x <package_name>.bin 2. Run the *.bin file to extract and install the services in the module. ./<package_name>.bin where <package_name> is BI-Analyst-xxxxx.bin for the Analyst module or BI-DS- xxxxx.bin for the Data Scientist module. 3. After the prompt, agree to the license terms. Reply yes | y to continue install. 4. After the prompt, choose if you want to do an online (option 1) or offline (option 2) install. a. Online install will lay out the Ambari service configuration files and update the repository locations in the Ambari server file. Skip to step 6. b. Offline install initiates a download of files to set up a local repository mirror. A subdirectory called BigInsights will be created with RPMs and associated files will be located in directory BigInsights/packages 5. Setup a local repository. A local repository is required if the Hadoop cluster cannot connect directly to the internet, or if you wish to avoid multiple downloads of the same software when installing services across multiple nodes. In the following steps, the host that performs the repository mirror function is called the repository server. If you do not have an additional Linux host, you can use one of the Hadoop management nodes. The repository server must be accessible over the network by the Hadoop cluster. The repository server requires an HTTP web server. The following instructions describe how to set up a repository server by using a Linux host with an Apache HTTP server. a. On the repository server, if the Apache HTTP server is not installed, install it:
  • 47. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 47 yum install httpd b. On the repository server, ensure that the createrepo package is installed. c. On the repository server, create a directory for your value-add repository, such as <mirror web server document root>/repos/valueadds. For example, for Apache httpd, the default is /var/www/html/repos. mkdir /var/www/html/repos/valueadds d. By selecting Option 2 in step 4, RPMs were downloaded to a subdirectory called BigInsights/packages. Copy all of the RPMs to the mirror web server location, <your.mirror.web.server.document root>/repos/valueadds directory. cp BigInsights/packages/* /var/www/html/repos/valueadds/ e. Start this web server. If you use Apache httpd, start it by using either of the following commands: apachect start or service httpd start f. Test your local repository by browsing to the web directory: http://<your.mirror.web.server>/repos/valueadds You should see all of the files that you copied to the repository server. g. On the repository server, run the createrepo command to initialize the repository: createrepo /var/www/html/repos/valueadds h. In the BigInsights/packages directory, find the RPM to install on the Ambari Server host of the Hadoop cluster: BigInsights Analyst BI-Analyst-X.X.X.X-IOP-X.X.x86_64.rpm
  • 48. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 48 BigInsights Data Scientist BI-DS-X.X.X.X-IOP-X.X.x86_64.rpm Tip: The BigInsights Data Scientist module also entitles you to the features of the BigInsights Analyst module. Therefore, consider doing the yum install for both of the RPM packages. Then, copy the file to the Ambari Server host and install the RPMs by using the following commands: sudo yum install <BI-xxx-1.0.0.1-IOP...>.rpm i. On the Ambari Server node, navigate to the /var/lib/ambari- server/resources/stacks/BigInsights/<version_number>/repos/repoinfo. xml file. If the file does not exist, create it. Ensure the <baseurl> element for the BIGINSIGHTS-VALUEPACK <repo> entry points to your repository server. Remember, there might be multiple <repo> sections. Make sure that the URL you tested in step 5.f matches exactly the value indicated in the <baseurl> element. For example, the repoinfo.xml might look like the following content after you change http://ibm-open- platform.ibm.com/repos/BigInsights-Valuepacks/to become http://your.mirror.web.server/repos/valueadds: <repo> <baseurl> http://<your.mirror.web.server>/repos/valueadds </baseurl> <repoid>BIGINSIGHTS-VALUEPACK</repoid> <reponame>BIGINSIGHTS-VALUEPACK</reponame> </repo> Note: The new <repo> section might appear as a single line. Tip: If you later find an error in this configuration file, make corrections and run the following command:
  • 49. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 49 yum clean all Then, restart the ambari server. j. When the module is installed, restart the Ambari server. ambari-server restart k. Open the Ambari web interface and log in. The default address is the following URL: http://<server-name>:8080 The default login name is admin and the default password is admin. l. Click Actions > Add service. In the list of services you will see the services that you previously added as well as the BigInsights services you can now add.
  • 50. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 50 Select IBM BigInsights Service to Install Select the service that you want to install and deploy. Even though your module might contain multiple services, install the specific service that you want and the BigInsights™ Home service. Installing one value-add service at a time is recommended. Follow the service specific installation instructions for more information. At the conclusion of installing all the IBM BigInsights Services, the Ambari GUI Software List should have green check marks next to each service as shown below:
  • 51. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 51 Installing BigInsights Home The BigInsights Home service is the main interface to launch BigInsights - BigSheets, BigInsights - Text Analytics, and BigInsights - Big SQL. The BigInsights Home service requires Knox to be installed, configured and started. Open a browser and access the Ambari server dashboard. The following is the default URL: http://<server-name>:8080 The default user name is admin, and the default password is admin. In the Ambari dashboard, click Actions > Add Service. In the Add Service Wizard > Choose Services, select the BigInsights – BigInsights Home service. Click Next. If you do not see the option for BigInsights – BigInsights Home, follow the instructions described in Installing the BigInsights value-add packages. In the Assign Masters page, select a Management node (edge node) that your users can communicate with. BigInsights Home is a web application that your users must be able to open with a web browser. In the Assign Slaves and Clients page, make selections to assign slaves and clients. The nodes that you select will have JSQSH (an open source, command line interface to SQL for Big SQL and other database engines) and SFTP client. Select nodes that might be used to ingest data as an SFTP client, where you might want to work with Big SQL scripts, or other databases interactively. Click Next to review any options that you might want to customize. Click Deploy. If the BigInsights – BigInsights Home service fails to install, run the remove_value_add_services.sh cleanup script. The following code is an example command: cd /usr/ibmpacks/bin/<version>
  • 52. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 52 remove_value_add_services.sh -u admin -p admin -x 8080 -s WEBUIFRAMEWORK -r For more information about cleaning the value-add service environment, see Removing BigInsights value-add services. After installation is complete, click Next > Complete. Configure Knox The Apache Knox gateway is a system that provides a single point of authentication and access for Apache Hadoop services on the compute nodes in a cluster; however authentication to HDFS services is completely controlled by Isilon OneFS only. The Knox gateway simplifies Hadoop security for users that access the cluster and execute jobs and operators that control access and manage the cluster. The gateway runs as a server, or a cluster of servers, providing centralized access to one or more Hadoop clusters. In IBM® Open Platform with Apache Hadoop, Knox is a service that you start, stop, and configure in the Ambari web interface. Users access the following BigInsights™ value added components through Knox by going to the IBM BigInsights home service. https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/index.html  BigSheets  Text Analytics  Big SQL Knox supports only REST API calls for the following Hadoop services:  WebHCat
  • 53. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 53  Oozie  HBase  Hive  Yarn Click the Knox service from the Ambari web interface to see the summary page. Select Service Actions > Restart All to restart it and all of its components. If you are using LDAP, you must also start LDAP if it is not already started. Click the BigInsights Home service in the Ambari User Interface. Select Service Actions > Restart All to restart it and all of its components. Open the BigInsights Home page from a web. The URL for BigInsights Home is: https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/index.html where: knox_host The host where Knox is installed and running knox_port The port where Knox is listening (by default this is 8443) knox_gateway_path The value entered in the gateway.path field in the Knox configuration (by default this is 'gateway')
  • 54. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 54 For example, the URL might look like the following address: https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html If you are using the Knox Demo LDAP, a default user ID and password is created for you. When you access the web page, use the following preset credentials: User Name = guest Password = guest-password Installing BigSheets To extend the power of the Open Platform for Apache Hadoop, install and deploy the BigInsights BigSheets service, which is the IBM spreadsheet interface for big data. 1. Open a browser and access the Ambari server dashboard. The following is the default URL. http://<server-name>:8080 The default user name is admin, and the default password is admin. 2. In the Ambari Dashboard, click Actions > Add Service. 3. In the Add Service Wizard, Choose Services, select the BigInsights - BigSheets service, and if you have not already installed the BigInsights Home service, select that as well. Click Next. If you do not see BigInsights – BigSheets service, you need to install the appropriate module and restart Ambari as described in Installing the BigInsights value-add packages. 4. In the Assign Masters page, decide on which node of your cluster you want to run the specified BigSheets master. 5. In the Assign Slaves and Clients page all the defaults are automatically accepted and the next page automatically appears. BigSheets service does not have any slaves and
  • 55. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 55 clients. The Assign Slaves and Clients page will show and be skipped immediately during install. This is the expected behavior. 6. In the Customize Services page, accept the recommended configurations for the BigSheets service, or customize the configuration by expanding the configuration files and modifying the values. In theAdvanced bigsheets-user-config section, make sure that you enter the following information: a. In the bigsheets.user field, leave the default user name, which is bigsheets. b. In the bigsheets.password field, type a valid password. c. In the bigsheets.userid, type a valid user ID to use for the bigsheets service user. This user ID is created across all of the nodes of the cluster, and must be unique across all nodes of the cluster. d. Click Next.. 7. In the Advanced bigsheets-ambari-config section, in the ambari.password field, type the correct Ambari administration password. 8. You can review your selections in the Review page before accepting them. If you want to modify any values, click the Back button. If you are satisfied with your setup, click Deploy. 9. In the Install, Start and Test page, the BigSheets service is installed and verified. If you have multiple nodes, you can see the progress on each node. When the installation is complete, either view the errors or warnings by clicking the link, or click Next to see a summary and then the new service added to the list of services. 10.Click Complete. If the BigInsights – BigSheets service fails to install, run the remove_value_add_services.shcleanup script. The following code is an example of the command: cd /usr/ibmpacks/bin/<version> ./remove_value_add_services.sh -u admin -p admin -x 8080 -s BIGSHEETS -r For more information about cleaning the value-add service environment, see Removing BigInsights value-add services.
  • 56. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 56 11.After you install BigInsights - BigSheets, you must restart the HDFS, MapReduce2, YARN, Knox, Nagios and Ganglia client services. a. For each service that requires restart, select the service. b. Click Service Actions. c. Click Restart All. 12.Access the BigInsights - BigSheets service from the BigInsights Home service. o If the BigInsights Home service has not yet been added, see Installing BigInsights Home. o If the BigInsights Home service has been installed, it must be restarted so the BigInsights - BigSheets icon will display. 13.Launch the BigInsights Home service by typing the following address in your browser: https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/inde x.html Where: knox_host The host where Knox is installed and running knox_port The port where Knox is listening (by default this is 8443) knox_gateway_path The value entered in the gateway.path field in the Knox configuration (by default this is 'gateway') For example, the URL might look like the following address: https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html
  • 57. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 57 Installing Big SQL To extend the power of the Open Platform for Apache Hadoop, install and deploy the BigInsights - Big SQL service, which is the IBM SQL interface to the Hadoop-based platform, IBM Open Platform with Apache Hadoop. 1. Open a browser and access the Ambari server dashboard. The following is the default URL. http://<server-name>:8080 The default user name is admin, and the default password is admin . 2. In the Ambari web interface, click Actions > Add Service. 3. In the Add Service Wizard, Choose Services, select the BigInsights - Big SQL service, and theBigInsights Home service. Click Next. If you do not see the option to select the BigInsights - Big SQL service, complete the steps. 4. In the Assign Masters page, decide which nodes of your cluster you want to run the specified components, or accept the default nodes. Follow these guidelines: o For the Big SQL monitoring and editing tool, make sure that the Data Server Manager (DSM) is assigned to the same node that is assigned to the Big SQL Head node. 5. Click Next. 6. In the Assign Slaves and Clients page, accept the defaults, or make specific assignments for your nodes. Follow these guidelines: o Select the non-head nodes for the Big SQL Worker components. You must select at least one node as the worker node. o Select all nodes for the CLIENT. This puts JSqsh and SFTP clients on the nodes.
  • 58. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 58 7. In the Customize Services page, accept the recommended configurations for the Big SQL service, or customize the configuration by expanding the configuration files and modifying the values. Make sure that you have a valid bigsql_user and bigsql_user_password (see reference screen below) and user_id (created by the bi_create_users.sh script) in the appropriate fields in theAdvanced bigsql-users-env section. 8.
  • 59. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 59 9. You can review your selections in the Review page before accepting them. If you want to modify any values, click the Back button. If you are satisfied with your setup, click Deploy. 10.In the Install, Start and Test page, the Big SQL service is installed and verified. If you have multiple nodes, you can see the progress on each node. When the installation is complete, either view the errors or warnings by clicking the link, or click Next to see a summary and then the new service added to the list of services. If the BigInsights – Big SQL service fails to install, run the remove_value_add_services.shcleanup script. The following code is an example of the command:
  • 60. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 60 cd /usr/ibmpacks/bin/<version> ./remove_value_add_services.sh -u admin -p admin -x 8080 -s BIGSQL -r For more information about cleaning the value-add service environment, see Removing BigInsights value-add services. 11. A web application interface for Big SQL monitoring and editing is available to your end- users to work with Big SQL. You access this monitoring utility from the IBM BigInsights Home service. If you have not added the BigInsights Home service yet, do that now. 12. Restart the Knox Service. Also start the Knox Demo LDAP service if you have not configured your own LDAP. 13. Restart the BigInsights Home services. 14. To run SQL statements from the Big SQL monitoring and editing tool, type the following address in your browser to open the BigInsights Home service: https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/inde x.html Where: knox_host The host where Knox is installed and running knox_port The port where Knox is listening (by default this is 8443) knox_gateway_path The value entered in the gateway.path field in the Knox configuration (by default this is 'gateway') For example, the URL might look like the following address: https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html If you use the Knox Demo LDAP service, the default credential is: userid = guest password = guest-password
  • 61. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 61 Your end users can also use the JSqsh client, which is a component of the BigInsights - Big SQL service. 15. If the BigInsights - Big SQL service shows as unavailable, there might have been a problem with post-installation configuration. Run the following commands as root (or sudo) where the Big SQL monitoring utility (DSM) server is installed: a. Run the dsmKnoxSetup script: b. cd /usr/ibmpacks/bigsql/<version-number>/dsm/1.1/ibm-datasrvrmgr/bin/ ./dsmKnoxSetup.sh -knoxHost <knox-host> where <knox-host> is the node where the Knox gateway service is running. c. Make sure that you do not stop and restart the Knox gateway service within Ambari. If you do, then run the dsmKnoxSetup script again. d. Restart the BigInsights Home service so that the Big SQL monitoring utility (DSM) can be accessed from the BigInsights Home interface. 16. For HBase, do the following post-installation steps: . For all nodes where HBase is installed, check that the symlinks to hive-serde.jar and hive-common.jar in the hbase/lib directory are valid.  To verify the symlinks are created and valid:  namei /usr/iop/<version-number>/hbase/lib/hive-serde.jar  namei /usr/iop/<version-number>/hbase/lib/hive-common.jar  If they are not valid, do the following steps:  cd /usr/iop/<version-number>/hbase/lib  rm -rf hive-serde.jar  rm -rf hive-common.jar  ln -s /usr/iop/<version-number>/hive/lib/hive-serde.jar hive-serde.jar ln -s /usr/iop/<version-number>/hive/lib/hive-common.jar hive-common.jar a. After installing the Big SQL service, and fixing the symlinks, restart the HBase service from the Ambari web interface.
  • 62. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 62 After you add Big SQL worker nodes, make sure that you stop and then restart the Hive service. Connecting to Big SQL You can run Big SQL queries from Java SQL Shell (JSqsh), or from the IBM Data Server Manager. You can also run queries from a client application, such as IBM Data Studio, that uses JDBC or ODBC drivers. You must identify a running Big SQL server and configure either a JDBC or ODBC driver. For more information about JSqsh, or IBM Data Studio, see the related topics in the IBM® BigInsights™ Knowledge Center. Running JSqsh JSqsh is installed in /usr/ibmpacks/common-utils/current/jsqsh/bin. Change to that directory and type./jsqsh to open the JSqsh shell: cd /usr/ibmpacks/common-utils/current/jsqsh/bin ./jsqsh You can then run any JSqsh commands from the prompt. Connection setup To use the JSqsh command shell, you can use the default connections or define and test a connection to the Big SQL server. 1. The first time that you open the JSqsh command shell, a configuration wizard is started. When you are at the Jsqsh command prompt, type drivers to determine the available drivers. a. On the driver selection screen, select the Big SQL instance that you want to run Note: Big SQL is designated as DB2 in this example: Name Target Class
  • 63. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 63 - ------- ------------------- -------------------------------------------- ... 2 *db2 IBM Data Server(DB2 com.ibm.db2.jcc.DB2Driver b. Verify the port, server, and user name. Run setup and click C to define a password for the connection. The username must have database administration privileges, or must be granted those privileges by the Big SQL administrator. c. Test the connection to the Big SQL server. d. Save and name this connection. 2. Generally, you can access JSqsh from /usr/ibmpacks/common- utils/current/jsqsh/bin with the following command: 3. ./jsqsh --driver=db2 --user=<username> --password=<user_password> 4. Open the saved configuration wizard any time by typing setup while in the command interface, or./jsqsh --setup when you open the command interface. 5. Specify the following connection name in the JSqsh command shell to establish a connection: ./jsqsh name 6. Use the connect command when you are already inside the JSQSH shell to establish a connection at the JSqsh prompt: connect name Commands and queries At the JSqsh command prompt, you can run JSqsh commands or database server commands. JSqsh commands usually begin with a backslash () character. JSqsh commands accept command-line arguments and allow for common shell activities, such as I/O redirection and pipes. For example, consider this set of commands: 1> select * from t1 2> where c1 > 10 3> go --style csv > /tmp/t1.csv
  • 64. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 64 Because the commands do not begin with a backslash character, the first two commands are assumed to be SQL statements, and are sent to the Big SQL server. The go command sends the statements to run on the server. The go command has a built-in alias so that you can omit the backslash. Additionally, you can specify a trailing semicolon to indicate that you want to run a statement, for example: 1> select * from t1 2> where c1 > 10; The --style option in the go command indicates that the display shows comma-separated values (CSV). The go form is most useful if you provide additional arguments to affect how the query is run. Changing the display style is an example of this feature. The redirection operator (>) specifies that the results of the command are sent to a file called /tmp/t1.csv. A set of frequently run commands does not require the leading backslash. Any JSqsh command can bealiased to another name (without a leading backslash, if you choose), by using the alias command. For example, if you want to be able to type bye to leave the JSqsh shell, you establish that word as the alias for the quit command: alias bye='quit' You can run a script that contains one or more SQL statements. For example, assume that you have a file called mySQL.sql. That file contains these statements: select tabschema, tabname from syscat.tables fetch first 5 rows only; select tabschema, colname, colno, typename, length from syscat.columns fetch first 10 rows only; You can start JSqsh and run the script at the same time with this command: /usr/ibmpacks/common-utils/current/jsqsh/bin/jsqsh bigsql < /home/bigsql/mySQL.sql The redirection operator specifies to JSqsh to get the commands from the file located in the /home/bigsqldirectory, and then run the statements within the file.
  • 65. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 65 Command and query edit The JSqsh command shell uses the JLine2 library, which allows you to edit previously entered commands and queries. You use the command-line edit features to move the arrow keys and to edit the command or query on the current line. The JLine2 library provides the same key bindings (vi and emacs) as the GNU Readline library. In addition, it attempts to apply any custom key maps that you created in a GNU Readline configuration file, (.inputrc) in the local file system $HOME/ directory. In addition to individual line editing, the JSqsh command shell remembers the 50 most recently run statements, which you can view by using the history command: 1> history (1) use tpch; (2) select count(*) from lineitem Previously run statements are prefixed with a number in parentheses. You use this number to recall that query by using the JSqsh recall operator (!), for example: 1> !2 1> select count(*) from lineitem 2> The ! recall operator has the following behavior: !! Recalls the previously run statement. !5 Recalls the fifth query from history. !-2 Recalls the query from two prior runs. You can also edit queries that span multiple lines by using the buf-edit command, which pulls the current query into an external editor, for example: 1> select id, count(*) 2> from t1, t2 3> where t1.c1 = t2.c2 4> buf-edit The query is opened in an external editor (/usr/bin/vi by default. However, you can specify a different editor on the environment variable $EDITOR). When you close the editor, the edited query is entered at the JSqsh command shell prompt.
  • 66. EMC Isilon Hadoop Starter Kit for IBM BigInsights __________________________________________________________________ EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 66 The JSqsh command shell provides built-in aliases, vi and emacs, for the buf- edit command. The following commands, for example, open the query in the vi editor: 1> select id, count(*) 2> from t1, t2 3> where t1.c1 = t2.c2 4> vi Configuration variables You can use the set command to list or define values for a number of configuration variables, for example: 1> set If you want to redefine the prompt in the command shell, you run the following command with the prompt option: 1> set prompt='foo $lineno> ' foo 1> Every JSqsh configuration variable has built-in help available: 1> help prompt If you want to permanently set a specific variable, you can do so by editing your $HOME/.jsqsh/sqshrc file and including the appropriate set command in it.