EMC Starter Kit - IBM BigInsights - EMC Isilon

#RememberRuddy
_____________________________
EMC ISILON HADOOP STARTER KIT
Deploying IBM BigInsights v 4.0 with EMC ISILON
Release 1.0
October, 2015

EMC Isilon Hadoop Starter Kit for IBM BigInsights
__________________________________________________________________
EMC ISILON HADOOP STARTER KIT FOR IBM BIGINSIGHTS 2
To learn more about how EMC products, services, and solutions can help solve your
business and IT challenges, contact your local representative or authorized reseller,
visit www.emc.com, or explore and compare products in the EMC Store
Copyright © 2015 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as of its publication date.
The information is subject to change without notice.
The information in this publication is provided “as is.” EMC Corporation makes no
representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or fitness
for a particular purpose.
Use, copying, and distribution of any EMC software described in this publication
requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation
Trademarks on EMC.com.
EMC are registered trademarks or trademarks of EMC, Inc. in the United States
and/or other jurisdictions. All other trademarks used herein are the property of their
respective owners.

__________________________________________________________________
Contents
INTRODUCTION........................................................................................6
IBM & EMC Technology Highlights ........................................................................ 6
Audience........................................................................................................... 7
Apache Hadoop Projects...................................................................................... 7
IBM Open Platform and the Ambari Manager ......................................................... 8
Isilon Scale-Out NAS for HDFS............................................................................. 8
Overview of Isilon Scale-Out NAS for Big Data....................................................... 9
PRE-INSTALLATION CHECKLIST .............................................................10
Supported Software Versions............................................................................. 10
Hardware Requirements and Suggested Hadoop Service Layout............................. 10
INSTALLATION OVERVIEW .....................................................................12
Prerequisites ................................................................................................... 12
Isilon Scale-Out NAS or Isilon OneFS Simulator ........................................................... 12
Linux...................................................................................................................... 13
Networking ............................................................................................................. 13
DNS ....................................................................................................................... 14
Other ..................................................................................................................... 15
Prepare Isilon .................................................................................................. 15
Assumptions............................................................................................................ 15
SmartConnect for HDFS ............................................................................................ 16
OneFS Access Zones................................................................................................. 17
Sharing Data between Access Zones .......................................................................... 18
User & Group ID’s .................................................................................................... 19
Configuring Isilon for HDFS ....................................................................................... 19
Create DNS Records for Isilon.................................................................................... 25
Prepare Linux Compute Nodes ........................................................................... 25
Linux Operating System packages needed for IBM BigInsights:...................................... 25
Enable NTP on all Linux Compute nodes...................................................................... 26
Disable SELinux on each node if enabled before installing Ambari. ................................. 26

__________________________________________________________________
Check UMASK Settings ............................................................................................. 26
Set ulimit Properties................................................................................................. 27
Kernel Modifications ................................................................................................. 27
Create IBM BigInsights Hadoop Users and Groups........................................................ 27
Configure Passwordless SSH...................................................................................... 28
Additional Linux Packages to Install............................................................................ 28
Test DNS Resolution................................................................................................. 29
Edit sudoers file on all Linux compute nodes................................................................ 29
INSTALLING IBM OPEN PLATFORM (OP) ................................................29
Download IBM Open Platform Software............................................................... 29
Create IBM Open Platform Repository ................................................................. 30
Validating IBM Open Platform Install................................................................... 38
Adding a Hadoop User ...................................................................................... 40
Additional Service Tests .................................................................................... 40
HDFS...................................................................................................................... 40
YARN/MAPREDUCE ................................................................................................... 41
HIVE ...................................................................................................................... 42
HBASE.................................................................................................................... 43
Ambari Service Check....................................................................................... 44
INSTALLING IBM VALUE PACKAGES .......................................................45
Before You Begin ............................................................................................. 45
Installation Procedure....................................................................................... 46
Select IBM BigInsights Service to Install ............................................................. 50
Installing BigInsights Home............................................................................... 51
Configure Knox ................................................................................................ 52
Installing BigSheets.......................................................................................... 54
Installing Big SQL............................................................................................. 57
Connecting to Big SQL ...................................................................................... 62
Running JSqsh......................................................................................................... 62
Connection setup ..................................................................................................... 62
Commands and queries ............................................................................................ 63
Command and query edit.......................................................................................... 65

__________________________________________________________________
Configuration variables ............................................................................................. 66
Installing Text Analytics .................................................................................... 67
Installing Big R ................................................................................................ 71
IBM BigInsights Online Tutorials................................................................................. 76
SECURITY CONFIGURATION AND ADMINISTRATION..............................77
Setting up HTTPS for Ambari ............................................................................. 77
Configuring SSL support for HBase REST gateway with Knox ................................. 78
Overview of Kerberos ....................................................................................... 82
Enabling Kerberos for IBM Open Platform............................................................ 85
Manually generating keytabs for Kerberos authentication ...................................... 86
Setting up Active Directory or LDAP authentication in Ambari ................................ 91
Enabling Kerberos for HDFS on Isilon.................................................................. 97
Using MIT Kerberos 5 ............................................................................................... 97
Running the Ambari Kerberos Wizard.................................................................. 99
Trouble Shooting and Support ..........................................................................104

EMC Isilon Hadoop Starter Kit for
IBM BigInsights v 4.0
This document describes how to create a Hadoop environment utilizing IBM® Open Platform
with Apache Hadoop and an EMC® Isilon® scale-out network-attached storage (NAS) for HDFS
accessible shared storage. Installation and configuration of IBM BigInsights Value Packages is
also presented in this document.
Introduction
IBM & EMC Technology Highlights
The IBM® Open Platform with Apache Hadoop is comprised of entirely Apache Hadoop
open source components, such as Apache Ambari, YARN, Spark, Knox, Slider, Sqoop,
Flume, Hive, Oozie, HBase, ZooKeeper, and more. After installing IBM Open Platform, you
can install additional IBM value-add service modules.
These value-add service modules are installed separately, and they include IBM
BigInsights® Analyst, IBM BigInsights Data Scientist, and the IBM BigInsights Enterprise
Management module to provide enhanced capabilities to IBM Open Platform to accelerate
the conversion of all types of data into business insight and action.
The EMC® Isilon® Scale-Out Network-Attached Storage (NAS) platform provides Hadoop
clients with direct access to big data through a Hadoop File System (HDFS) interface.
Powered by the distributed EMC Isilon OneFS® operating system, an EMC Isilon cluster
delivers a powerful yet simple and highly efficient storage platform with native HDFS
integration to accelerate analytics, gain new flexibility, and avoid the costs of a separate
Hadoop infrastructure.

__________________________________________________________________
Audience
This document is intended for IT program managers, IT architects, Developers, and IT
management to easily deploy IBM BigInsights v4.0 with EMC Isilon OneFS v 7.2.0.3 for
HDFS storage. If a physical EMC Isilon Cluster is not available, download the free EMC Isilon
OneFS Simulator which can be installed as a virtual machine for integration testing and
training purposes. See http://www.emc.com/getisilon for EMC Isilon OneFS Simulator.
Apache Hadoop Projects
Apache Hadoop is an open source, batch data processing system for enormous amounts of
data. Hadoop runs as a platform that provides cost-effective, scalable infrastructure for
building Big Data analytic applications. All Hadoop clusters contain a distributed file system
called the Hadoop Distributed File System (HDFS) and a computation layer called
MapReduce.
The Apache Hadoop project contains the following subprojects:
• Hadoop Distributed File System (HDFS) – A distributed file system that provides
high-throughput access to application data.
• Hadoop MapReduce – A software framework for writing applications to reliably
process large amounts of data in parallel across a cluster.
Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive, Sqoop,
Flume, Oozie, Slider, HBase, Zookeeper and more that extend the value of Hadoop and
improves its usability.
Version 2 of Apache Hadoop introduces YARN, a sub-project of Hadoop that separates the
resource management and processing components. YARN was born of a need to enable a
broader array of interaction patterns for data stored in HDFS beyond MapReduce. The YARN-
based architecture of Hadoop 2.0 provides a more general processing platform that is not
constrained to MapReduce.
For full details of the Apache Hadoop project see http://hadoop.apache.org/.

__________________________________________________________________
IBM Open Platform and the Ambari Manager
The IBM Open Platform with Apache Hadoop enables Enterprise Hadoop by providing the
complete set of essential Hadoop capabilities required for any enterprise. Utilizing YARN at
its core, it provides capabilities for several functional areas including Data Management,
Data Access, Data Governance, Integration, Security and Operations.
IBM Open Platform delivers the core elements of Hadoop - scalable storage and distributed
computing – as well as all of the necessary enterprise capabilities such as security, high
availability and integration with a broad range of hardware and software solutions.
Apache Ambari is an open operational framework for provisioning, managing and monitoring
Apache Hadoop clusters.
As of version 4.0 of IBM Open Platform, Ambari can be used to setup and deploy Hadoop
clusters for nearly any task. Ambari can provision, manage and monitor every aspect of a
Hadoop deployment.
More information on IBM Open Platform can be found at:
http://www-01.ibm.com/software/data/infosphere/hadoop/enterprise.html
Isilon Scale-Out NAS for HDFS
EMC Isilon is the only scale-out NAS platform natively integrated with the Hadoop
Distributed File System (HDFS). Using HDFS as an over-the-wire protocol, you can deploy a
powerful, efficient, and flexible data storage and analytics ecosystem.
In addition to native integration with HDFS, EMC Isilon storage easily scales to support
massively large Hadoop analytics projects. Isilon scale-out NAS also offers unmatched
simplicity, efficiency, flexibility, and reliability that you need to maximize the value of your
Hadoop data storage and analytics workflow investment.

__________________________________________________________________
Overview of Isilon Scale-Out NAS for Big Data
The EMC Isilon scale-out platform combines modular hardware with unified software to
provide the storage foundation for data analysis. Isilon scale-out NAS is a fully distributed
system that consists of nodes of modular hardware arranged in a cluster. The distributed
Isilon OneFS operating system combines the memory, I/O, CPUs, and disks of the nodes into
a cohesive storage unit to present a global namespace as a single file system.
The nodes work together as peers in a shared-nothing hardware architecture with no single
point of failure. Every node adds capacity, performance, and resiliency to the cluster and
each node acts as a Hadoop namenode and datanode.
The namenode daemon is a distributed process that runs on all the nodes in the cluster. A
compute client can connect to any node through HDFS.
As nodes are added, the file system expands dynamically and redistributes data, eliminating
the work of partitioning disks and creating volumes. The result is a highly efficient and
resilient storage architecture that brings all the advantages of an enterprise scale-out NAS
system to storing data for analysis.
With traditional direct attached storage, the ratio of CPU, RAM, and disk space requirements
depends on the workload—these factors make it difficult to size a Hadoop cluster before you
have had a chance to measure your MapReduce workload. Expanding data sets also makes
sizing decisions upfront problematic. Isilon scale-out NAS lends itself perfectly to this
situation: Isilon scale-out NAS lets you increase CPUs, RAM, and disk space by adding nodes
to dynamically match storage capacity and performance with the demands of a dynamic
Hadoop workload.
An Isilon cluster optimizes data protection. OneFS more efficiently and reliably protects data
than HDFS. The HDFS protocol, by default, replicates a block of data three times. In
contrast, OneFS stripes the data across the cluster and protects the data with forward error
correction codes, which consume less space than replication with better protection.

__________________________________________________________________
Pre-installation Checklist
Supported Software Versions
The environment used for this document consists of the following software versions:
 Ambari 1.7.0_IBM
 IBM Open Platform v 4.0.0.0
 Isilon OneFS 7.2.0.3 with patch-159065
 All of IBM BigInsights v 4.0 value packs, i.e. Business Analyst, Data
Scientist, and Enterprise Management
______________________________________________________________________
Note: IBM BigInsights v 4.0 requires OneFS v 7.2.0.3 with patch-159065.
OneFS version 7.2.0.4 should also work as well as version 7.2.1.1 when available.
Do not install IBM BigInsights with OneFS versions lower than 7.2.0.3.
See EMC Isilon Supportability and Compatibility Guide for the latest compatibility updates:
https://support.emc.com/docu44518_Isilon-Supportability-and-Compatibility-
Guide.pdf?language=en_US
Hardware Requirements and Suggested Hadoop Service Layout
Detail system requirements for IBM BigInsights compute nodes can be found at:
http://www-01.ibm.com/support/docview.wss?uid=swg27027565
In a multi-node IBM BigInsights cluster, it is suggested that you have at least one
management node in your non-high availability environment, if performance is not an
issue. If performance is a concern, consider configuring at least three management nodes.
If you use the BigInsights - Big SQL service, consider configuring four management
nodes. If you use a high availability environment, consider six management nodes. Use

__________________________________________________________________
the following list as a guide for the nodes in your IBM/EMC cluster. A suggested layout is
shown in Table 1 for both Non-High availability and High availability deployments.
________________________________________________________________________________________
Note: With both deployment options, EMC Isilon provides namenode, secondary
namenode and datanode functions for the entire cluster. Do not designate any compute
node as a namenode, secondary namenode, or datanode in any aspect of the IBM
BigInsights configuration.
Table 1. Suggested Service Layout
Non-High availability High availability
Management node 1
 Ambari
 PostgreSQL
 Knox
 Zookeeper
 Hive
 Spark
 Spark History Server
 BigInsights Home
 BigSheets
 Big R
 BigSQL Headnode
 Text Analytics
Management node 2
 Resource Manager
 HBase Master
 Zookeeper
 Oozie
 Ambari monitoring service
Management node 3
 Job history server
 Zookeeper
 App Timeline Server
 Kafka
Management node 4
 Big SQL Scheduler
 Hive Server (MySQL)
 MySQL metastore
 Hive/Oozie metastore
 WebHCat Server
 Data Server Manager
Management node 1
 Ambari
 PostgreSQL
 Spark
 Spark History Server
 BigSQL Headnode
Management node 2
 Resource Manager
 Zookeeper
 Oozie
 Ambari monitoring service
 BigInsights Home
Management node 3
 Resource Manager (standby)
 Job history server
 Zookeeper
 App Timeline Server
 Kafka
 Oozie (Standby)
Management node 4
 Big SQL Scheduler
 HBase Master (standby)
 Hive Server
 MySQL Server
 Hive metastore
 WebHCat Server
 Data Server Manager
Management node 5
 Big SQL Headnode (Standby)
 Big SQL Scheduler (Standby)
 HBase Master
 Hive Server (Standby)
 Hive Metastore (Standby)
 Journal Node
 Zookeeper

__________________________________________________________________
Installation Overview
Below is the overview of the installation process that this document will describe.
1. Confirm prerequisites.
2. Prepare your network infrastructure including DNS.
3. Prepare your Isilon cluster.
4. Prepare Linux compute nodes.
5. Install Ambari Server.
6. Use Ambari Manager to deploy IBM Open Platform to compute nodes.
7. Install IBM BigInsights Value Packages
8. Perform key functional tests.
Prerequisites
Isilon Scale-Out NAS or Isilon OneFS Simulator
 For low-capacity, non-performance testing of Isilon, the EMC Isilon OneFS Simulator can
be used instead of a cluster of physical Isilon appliances. This can be downloaded for free
from http://www.emc.com/getisilon.
Refer to the EMC Isilon OneFS Simulator Install Guide for details. Be sure to follow the
section for running the virtual nodes in VMware ESX. Only a single virtual node is required
but adding additional nodes will allow you to explore other features such as data
protection, SmartPools (tiering), and SmartConnect (network load balancing).
 For physical Isilon nodes, you should have already completed the console-based
installation process for your first Isilon node and added two other nodes for a
minimum of 3 Isilon nodes.
 You should have OneFS version 7.2.0.3 + patch 159065 installed on Isilon.

__________________________________________________________________
 You must obtain OneFS HDFS license code and install it on your Isilon cluster. You can
get your free OneFS HDFS license from:
http://www.emc.com/campaign/isilon-hadoop/index.htm.
 It is recommended, but not required, to have a SmartConnect Advanced license for
your Isilon cluster.
 To allow for scripts and other small files to be easily shared between all nodes in your
environment, it is highly recommended to enable NFS (Unix Sharing) on your Isilon
cluster. By default, the entire /ifs directory is already exported and this can remain
unchanged. This document assumes that a single Isilon cluster is used for this NFS
export as well as for HDFS. However, there is no requirement that the NFS export be
on the same Isilon cluster that you are using for HDFS.
Linux
 RedHat Enterprise Linux (RHEL) Server 6 (Update 5 minimum) or comparable
CentOS Server.
 100GB Root Partition
 At a minimum, 96G RAM for production environments. The more RAM the better
for Hadoop.
Networking
 For the best performance, a single 10 Gigabit Ethernet switch should connect to at
least one 10 Gigabit port on each Linux host. Additionally, the same switch should
connect to at least one 10 Gigabit port on each Isilon node.
 A single dedicated layer-2 network can be used to connect all hosts and Isilon nodes.
Although multiple networks can be used for increased security, monitoring, and
robustness.

__________________________________________________________________
 At least an entire /24 IP address block should be allocated to your network. This will
allow a DNS reverse lookup zone to be delegated to your Hadoop DNS server.
 If using the EMC Isilon OneFS Simulator, you will need at least two static IP addresses
(one for the node’s ext-1 interface, another for the SmartConnect service IP). Each
additional Isilon node will require an additional IP address.
 At a minimum, you will need to allocate to your Isilon cluster one IP address per
Access Zone per Isilon node. In general, you will need one Access Zone for each
separate Hadoop cluster that will use Isilon for HDFS storage.
 For the best possible load balancing during an Isilon node failure scenario, the
recommended number of IP addresses is given by the formula below. Of course, this
is in addition to any IP addresses used for non-HDFS pools.
# of IP addresses = 2 * (# of Isilon Nodes) * (# of Access Zones)
For example, 20 IP addresses are recommended for 5 Isilon nodes and 2 Access Zones.
 This document will assume that Internet access is available to all servers to download
various components from Internet repositories.
DNS
 A DNS server is required and you must have the ability to create DNS records and
zone delegations.
 It is recommended that your DNS server delegate a subdomain to your Isilon cluster.
For instance, DNS requests for subnet0-pool0.isiloncluster1.example.com or
isiloncluster1.example.com should be delegated to the Service IP defined on your
Isilon cluster.
 To allow for a convenient way of changing the HDFS Namenode used by all Hadoop
applications and services, create a DNS record for your Isilon cluster’s HDFS
Namenode service. This should be a CNAME alias to your Isilon SmartConnect zone.
Specify a TTL of 1 minute to allow for quick changes. For example, create a CNAME
record for mycluster1-hdfs.example.com that targets subnet0-

__________________________________________________________________
pool0.isiloncluster1.example.com. If you later want to redirect all HDFS I/O to another
cluster or a different pool on the same Isilon cluster, you simply need to change the
DNS record and restart all Hadoop services.
Other
 See http://www.github.com/bonibruno/BigInsights, there are three scripts to
download to help automate new IBM BigInsights installations with EMC Isilon:
1. bi_create_users.sh – use this script to create the users and groups on all the
Linux nodes before beginning installation.
2. isilon_create_users.sh – use this script to create the users and groups on
Isilon before beginning installation. You must first create your access zone
(described later in this document, e.g. ibm), before running this script.
3. isilon_create_directories.sh – run this after the script above.
More information on the use of these scripts is provided in the installation section of this
document.
Prepare Isilon
Assumptions
This document makes the assumptions listed below. These are not necessarily
requirements but they are usually valid and simplify the process.
 It is assumed that you are not using a directory service such as Active
Directory for Hadoop users and groups.
 It is assumed that you are not using Kerberos authentication for Hadoop.

__________________________________________________________________
SmartConnect for HDFS
A best practice for HDFS on Isilon is to utilize two SmartConnect IP address pools for each
access zone. One IP address pool should be used by Hadoop clients to connect to the HDFS
namenode service on Isilon and it should use the dynamic IP allocation method to
minimize connection interruptions in the event that an Isilon node fails.
____________________________________________________________________
Note: Dynamic IP allocation requires a SmartConnect Advanced license.
____________________________________________________________________
A Hadoop client uses a specific SmartConnect IP address pool simply by using its zone
name (DNS name) in the HDFS URI:
For example, hdfs://subnet0-pool1.isiloncluster1.example.com:8020
A second IP address pool should be used for HDFS datanode connections and it should also
use dynamic IP allocation method. To assign specific Smart-Connect IP address pools for
datanode connections, you will use the “isi hdfs racks modify” command. If the network
is flat, there is no need to use “isi hdfs racks modify”, the default configuration will suffice.
If IP addresses are limited and you have a SmartConnect Advanced license, you may
choose to use a single dynamic pool for namenode and datanode connections. This may
result in uneven utilization of Isilon nodes.
If you do not have a SmartConnect Advanced license, you may choose to use a single
static pool for namenode and datanode connections. This may result in some failed HDFS
connections in the event of a node failure.
For more information, see EMC Isilon Best Practices for Hadoop Data Storage white paper
online at: https://www.emc.com/collateral/white-papers/h13926-wp-emc-isilon-hadoop-
best-practices-onefs72.pdf

__________________________________________________________________
OneFS Access Zones
Access zones on OneFS are a way to select a distinct configuration for the OneFS cluster
based on the IP address that the client connects to. For HDFS, this configuration includes
authentication methods, HDFS root path, and authentication providers (AD, LDAP, local,
etc.). By default, OneFS includes a single access zone called System.
If you will only have a single Hadoop cluster connecting to your Isilon cluster, then you can
use the System access zone with no additional configuration. However, to have more than
one Hadoop cluster connect to your Isilon cluster, it is best to have each Hadoop cluster
connect to a separate OneFS access zone. This will allow OneFS to present each Hadoop
cluster with its own HDFS namespace and an independent set of users.
For more information, see Security and Compliance for Scale-out Hadoop Data Lakes
whitepaper.
To view your current list of access zones and the IP pools associated with them:
isiloncluster1-1# isi zone zones list
Name Path
------------
System /ifs
------------
Total: 1
isiloncluster1-1# isi networks list pools -v
subnet0:pool0
In Subnet: subnet0
Allocation: Static
Ranges: 1
10.111.129.115-10.111.129.126
Pool Membership: 4
1:10gige-1 (up)
2:10gige-1 (up)
3:10gige-1 (up)
4:10gige-1 (up)
Aggregation Mode: Link Aggregation Control Protocol (LACP)
Access Zone: System (1)
SmartConnect:
Suspended Nodes : None
Auto Unsuspend ... 0
Zone : subnet0-pool0.isiloncluster1.lab.example.com
Time to Live : 0
Service Subnet : subnet0
Connection Policy: Round Robin
Failover Policy : Round Robin
Rebalance Policy : Automatic Failback

__________________________________________________________________
To create a new access zone and an associated IP address pool:
isiloncluster1-1# mkdir -p /ifs/isiloncluster1/zone1
isiloncluster1-1# isi zone zones create --name zone1
--path /ifs/isiloncluster1/zone1
isiloncluster1-1# isi networks create pool --name subnet0:pool1
--ranges 10.111.129.127-10.111.129.138 --ifaces 1-4:10gige-1
--access-zone zone1 --zone subnet0-pool1.isiloncluster1.lab.example.com
--sc-subnet subnet0 --dynamic
Creating pool
‘subnet0:pool1’: OK
Saving: OK
____________________________________________________________________
Note: If you do not have a SmartConnect Advanced license, you will need to omit the --
dynamic option.
____________________________________________________________________
Sharing Data between Access Zones
By default, the data in one access zone cannot be access by users in another access zone.
In certain cases, however, you may need to make the same data set available to more
than one Hadoop compute cluster. Using fully qualified HDFS paths, e.g. hdfs://zone1-
hdfs.example.com/hadoop/dir1, can render a data set available across two or more
access zones.
With fully qualified HDFS paths, the data sets do not cross access zones. Instead, the
Hadoop jobs can access the data sets from a common shared HDFS namespace. For
instance, you can selectively share data between two or more access zones based on
referential links and file/directory permissions.

__________________________________________________________________
User & Group ID’s
Isilon clusters and Hadoop servers each have their own mapping of user IDs (uid) to user
names and group IDs (gid) to group names. When Isilon is used only for HDFS storage by
the Hadoop servers, the IDs do not need to match. This is due to the fact that the HDFS
protocol only refers to users and groups by their names, and never their numeric IDs.
In contrast, the NFS protocol refers to users and groups by their numeric IDs. Although
NFS is rarely used in traditional Hadoop environments, the high-performance, enterprise-
class, and POSIX-compatible NFS functionality of Isilon makes NFS a compelling protocol
for certain workflows. If you expect to use both NFS and HDFS on your Isilon cluster (or
simply want to be open to the possibility in the future), it is highly recommended to
maintain consistent names and numeric IDs for all users and groups on Isilon and your
Hadoop servers. In a multi-tenant environment with multiple Hadoop clusters, numeric IDs
for users in different clusters should be distinct.
For instance, the user bigsql in Hadoop cluster 1 may have ID 1013 and this same ID will
be used in the Isilon access zone for Hadoop cluster 1 as well as every server in Hadoop
cluster 1. The user bigsql in Hadoop cluster 2 may have ID 710 and this ID will be used in
the Isilon access zone for Hadoop cluster 2 as well as every server in Hadoop cluster 2.
Configuring Isilon for HDFS
_____________________________________________________________________
Note: In the steps below, replace zone1 with System to use the default System access
zone or you may specify the name of a new access zone that you previously created.
______________________________________________________________________
1. Open a web browser to the your Isilon cluster’s web administration page. If you
don’t know the URL, simply point your browser to:
https://isilon_node_ip_address:8080

__________________________________________________________________
The isilon_node_ip_address is any IP address on any Isilon node that is in the System
Access Zone. This usually corresponds to the ext-1 interface of any Isilon node.
2. Login with your root account. You specified the root password when you configured
your first node using the console.
3. Check, and edit as necessary, your NTP settings. Click Cluster Management ->
General Settings -> NTP.

__________________________________________________________________
1. SSH into any node in your Isilon cluster as root.
2. Confirm that your Isilon cluster is at OneFS version 7.2.0.3.
isiloncluster1-1# isi version
Isilon OneFS v7.2.0.3 ...
3. For OneFS version 7.2.0.3, you must have patch-159065 installed. You can view
the list of patches you have installed with:
# isi pkg info
patch-159065: This patch adds support for the Ambari 1.7.0_IBM Server.
4. Install the patch if needed:
[user@workstation ~]$ scp patch-159065.tgz root@mycluster1-hdfs:/tmp
isiloncluster1-1# gunzip < /tmp/patch-159065.tgz | tar -xvf -
isiloncluster1-1# isi pkg install patch-159065.tar
Preparing to install the package...
Checking the package for installation...
Installing the package
Committing the installation...
Package successfully installed.
5. Verify your HDFS license.
isiloncluster1-1# isi license
Module License Status Configuration Expiration Date
------ -------------- ------------- ---------------
HDFS Evaluation Not Configured November12, 2016

__________________________________________________________________
6. Create the HDFS root directory. This is usually called hadoop and must be within
the access zone directory.
isiloncluster1-1# mkdir -p /ifs/isiloncluster1/zone1/hadoop
7. Set the HDFS root directory for the access zone.
isiloncluster1-1# isi zone zones modify zone1
--hdfs-root-directory /ifs/isiloncluster1/zone1/hadoop
8. Set the HDFS block size used for reading from Isilon.
isiloncluster1-1# isi hdfs settings modify --default-block-size 128M
9. Create an indicator file so that we can easily determine when we are looking your
Isilon cluster via HDFS.
isiloncluster1-1# touch
/ifs/isiloncluster1/zone1/hadoop/THIS_IS_ISILON_isiloncluster1_zone1
10.Copy the scripts (isilon_create_users.sh & isilon_create_directories.sh) you
downloaded from http://www.github.com/bonibruno/BigInsights to Isilon,
[user@workstation ~]$ scp isilon_create_*.sh
root@isilon_node_ip_address:/ifs/isiloncluster1/scripts
11.Execute the script isilon_create_users.sh. This script will create all required
users and groups for IBM BigInsights v 4.0.
Warning: The script isilon_create_users.sh will create local user and group accounts on
your Isilon cluster for Hadoop services. If you are using a directory service such as Active
Directory and you want these users and groups to be defined in your directory service,
then DO NOT run this script.
Instead, refer to the OneFS documentation and EMC Isilon Best Practices for Hadoop Data
Storage.

__________________________________________________________________
Script Usage:
isilon_create_users.sh –dist <DIST> [–startgid <GID>] [–startuid <UID>] [–
zone <ZONE>]
dist - This will correspond to your Hadoop distribution – bi4.0
startgid - Group IDs will begin with this value. For example: 1000
startuid - User IDs will begin with this value. This is generally the same as gid_base. For
example: 1000.
zone – Access Zone name. For example: zone1
isiloncluster1-1# bash /ifs/isiloncluster1/scripts/isilon_create_users.sh
--dist bi4.0 --startgid 1000 --startuid 1000 --zone zone1
Example output of script is shown below:
Info: Hadoop distribution: bi
Info: groups will start at GID 1000
Info: users will start at UID 1000
Info: will put users in zone: zone1
Info: HDFS root: /ifs/isiloncluster1/hadoop
Failed to add member UID:1001 to group GROUP:hadoop: User is already in local group
SUCCESS -- Hadoop users created successfully!
Done!

__________________________________________________________________
______________________________________________________________________
Note: The “User is already in local group” message is expected, this user corresponds to
the hadoop user which is already in the hadoop group.
12. Execute the script isilon_create_directories.sh. This script will create all
required directories with the appropriate ownership and permissions.
Script Usage:
isilon_create_directories.sh –dist <DIST> [–fixperm] [–zone <ZONE>]
dist - This will correspond to your Hadoop distribution – bi4.0
fixperm - Updates ownership and permissions on hadoop directories.
zone - Access Zone name. For example: zone1
isiloncluster1-1# bash /ifs/isiloncluster1/scripts/isilon_create_directories.sh
--dist bi4.0 --fixperm --zone zone1
13. Map the hdfs user to the Isilon superuser. This will allow the hdfs user to chown
(change ownership of) all files during IBM BigInsights installation.
______________________________________________________________________
Warning: The command below will restart the HDFS service on Isilon to ensure that any
cached user mapping rules are flushed. This will temporarily interrupt any HDFS
connections coming from other Hadoop clusters.
______________________________________________________________________
isiloncluster1-1# isi zone zones modify --user-mapping-rules=’’hdfs=>root’’ --zone zone1
isiloncluster1-1# isi services isi_hdfs_d disable ; isi services isi_hdfs_d enable
The service ‘isi_hdfs_d’ has been disabled.
The service ‘isi_hdfs_d’ has been enabled.

__________________________________________________________________
Create DNS Records for Isilon
You will now create the required DNS records that will be used to access your Isilon
cluster.
1. Create a delegation record so that DNS requests for the zone
isiloncluster1.example.com are delegated to the Service IP that will be defined on
your Isilon cluster. The Service IP can be any unused static IP address in your lab
subnet.
2. Create a CNAME alias for your Isilon SmartConnect zone. For example, create a
CNAME record for mycluster1-hdfs.example.com that targets subnet0-
pool0.isiloncluster1.example.com.
3. Test name resolution.
[user@workstation ~]$ ping mycluster1-hdfs.example.com
PING subnet0-pool0.isiloncluster1.example.com (10.11.12.13) 56(84) bytes of data.
64 bytes from 10.11.12.13: icmp_seq=1 ttl=64 time=1.15 ms
Prepare Linux Compute Nodes
Linux Operating System packages needed for IBM BigInsights:
1. Compatibility Libraries
2. Networking Tools
3. Perl Support
4. Ruby Support
5. Web Services add on
6. PHP Support
7. Web Server

__________________________________________________________________
8. Mysql*
9. PostGres*
10.snmp support
11.Development Tools
12. Korn Shel
Enable NTP on all Linux Compute nodes
1. Edit /etc/ntp.conf file and add your NTP Server.
2. Enable NTP, “service ntpd start”
3. chkconfig –level 2345 ntpd on
Disable SELinux on each node if enabled before installing Ambari.
1. Edit /etc/selinux/config
2. Set SELINUX=disabled
3. Reboot
____________________________________________________________________
Note: SELinux can be disabled temporarily with the “setenforce 0” command.
____________________________________________________________________
Check UMASK Settings
The umask setting on each node should be set to 0022 in /etc/profile and /etc/bashrc.
Just modify existing umask entry if needed, e.g. “umask 0022”.

__________________________________________________________________
Set ulimit Properties
1. Edit /etc/security/limits.d/90-nproc.conf
#set for all users
* hard nofile 65536
* soft nofile 65536
* hard nproc 65536
* hard nproc 65536
Kernel Modifications
1. Edit /etc/sysctl.conf and add the following:
vm.swappiness=5
kernel.pid_max=4194303
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv4.ip_local_port_range = 1024 64000
Create IBM BigInsights Hadoop Users and Groups
Create required users on all Linux nodes. It is recommended to create all Hadoop users
before installing IBM BigInsights. Use the bi_create_users.sh script obtained from:
http://www/github.com/bonibruno/BigInsights
[user_workstation ~$] scp bi_create_users.sh [node1]:/root
Run script, e.g. #./bi_create_users.sh
Repeat above for all nodes.

__________________________________________________________________
Configure Passwordless SSH
Configure passwordless SSH for all Linux nodes.
1. Create Authentication SSH Keys
ssh-keygen -f id_rsa -t rsa -N
2. Create .ssh directories on all nodes
ssh root@[node1]
mkdir –p .ssh
cd .ssh
Upload generated keys to all hosts:
cat id_rsa.pub | ssh root@[node1] 'cat >> .ssh/authorized_keys'
Repeat above for all nodes.
3. Set permissions on .ssh directory
ssh root@[node1] "chmod 700 .ssh; chmod 640 .ssh/authorized_keys”
Additional Linux Packages to Install
Install the following packages on all Linux compute nodes.
 deltarpm
 python-deltarpm
 createrepo
 pam-1.1.1-17.el6.i686.rpm
 mysql-connector-java-5.1.17-6.el6.noarch.rpm
 ksh
 nc
 libdbi
 libstdc
 libaio
 java-1.7.0-openjdk-devel
 python-paramiko
 python-rrdtool-1.4.5-1.el6.rfx.x86_64

__________________________________________________________________
 snappy-1.0.5-1.el6.x86_64
 web-ui-framework
Install the above packages using the yum install command.
Test DNS Resolution
Make sure all compute nodes resolve with a fully qualifies domain name.
Ping each host with the associated FQDN and make sure it is reachable by FQDN.
Edit sudoers file on all Linux compute nodes.
1. Edit /etc/sudoers
## Additions needed for IBM BigInsights
hadoop ALL=(ALL) NOPASSWD: ALL
bigsql ALL=(ALL) NOPASSWD: ALL
Check IBM’s BigInsights Website for more info on preparing Linux nodes.
http://www01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsigh
ts.install.doc/doc/install_prepare.html
Installing IBM Open Platform (OP)
Download IBM Open Platform Software
Log into the IBM Passport Advantage web portal with your IBM assigned credentials and
download the following packages onto the designated Ambari server node:
• BI-AH-1.0.0.1-IOP-4.0.x86_64.bin
• IOP-4.0.0.0.x86_64.rpm
• iop-4.0.0.0.x86_64.tar.gz
• iop-utils-1.0-iop-4.0.x86_64.tar.gz

__________________________________________________________________
Create IBM Open Platform Repository
The IBM Open Platform with Apache Hadoop uses the repository-based Ambari installer.
You have two options for specifying the location of the repository from which Ambari
obtains the component packages.
The IBM Open Platform with Apache Hadoop installation includes OpenJDK 1.7.0. During
installation, you can either install the version provided or make sure Java™ 7 is installed
on all nodes in the cluster.
1. Log in to your Linux cluster as root, or as a user with root privileges.
2. Ensure that the nc package is installed on all nodes:
yum install -y nc
If you installed the Basic Server option on your server, the nc package might not be
installed, which might result in the failure on datanodes of the IBM Open Platform with
Apache Hadoop.
3. Locate the IOP-4.0.0.0.x86_64.rpm file you downloaded from the download site. Run the
following command to install the ambari.repo file into /etc/yum.repos.d:
yum install IOP-4.0.0.0.x86_64.rpm
If using a mirror repository, edit the file /etc/yum.repos.d/ambari.repo and replace
baseurl=http://ibm-open-platform.ibm.com/repos/Ambari/RHEL6/x86_64/1.7
with your mirror URL. For example,
baseurl=http://<web.server>/repos/Ambari/RHEL6/x86_64/1.7/
Disable the gpgcheck in the ambari.repo file. To disable signature validation,
change gpgcheck=1 to gpgcheck=0.
Alternatively, you can keep gpgcheck on and change the public key file location to the
mirror Ambari repository. To do this, change the following

__________________________________________________________________
gpgkey=http://ibm-open-platform.ibm.com/repos/Ambari/RHEL6/x86_64/1.7/BI-GPG-
KEY.public
to the following:
gpgkey=http://<web.server>/repos/Ambari/RHEL6/x86_64/1.7/BI-GPG-KEY.public
4. Clean the yum cache on each node so that the right packages from the remote repository
are seen by your local yum.
>sudo yum clean all
5. Install the Ambari server on the intended management node, using the following
command:
>sudo yum install ambari-server
Accept the install defaults.
6. If you are using a mirror repository, after you install the Ambari server, update the
following file with the mirror repository URLs.
/var/lib/ambari-server/resources/stacks/BigInsights/4.0/repos/repoinfo.xml
In the file, change the information from the Original content to the Modified content
Original content Modified content
<os type="redhat6">
<repo>
<baseurl>
http://ibm-open-
platform.ibm.com/repos/IOP/RHEL6/x86_64
/4.0</baseurl>
<repoid>IOP-4.0</repoid>
<reponame>IOP</reponame>
</repo>
<repo>
<os type="redhat6">
<repo>
<baseurl>
http://<web.server>/repos/IOP/RHE
L6/x86_64/4.0</baseurl>
<repoid>IOP-4.0</repoid>
<reponame>IOP</reponame>
</repo>
<repo>
<baseurl>

__________________________________________________________________
<baseurl>
http://ibm-open-
platform.ibm.com/repos/IOP-
UTILS/RHEL6/x86_64/1.0</baseurl>
<repoid>IOP-UTILS-1.0</repoid>
<reponame>IOP-UTILS</reponame>
</repo>
</os>
http://<web.server>/repos/IOP-
UTILS/RHEL6/x86_64/1.0</baseurl>
<repoid>IOP-UTILS-1.0</repoid>
<reponame>IOP-
UTILS</reponame>
</repo>
</os>
Edit the /etc/ambari-server/conf/ambari.properties file. change the information from the
Original content to the Modified content
Original content Modified content
jdk1.7.url=http://ibm-open-
platform.ibm.com/repos/IOP-
UTILS/RHEL6/x86_64/1.0/openjdk/jdk-
1.7.0.tar.gz
jdk1.7.url=http://<web.server>/r
epos/IOP-
UTILS/RHEL6/x86_64/1.0/openjdk
/jdk-1.7.0.tar.gz
7. Set up the Ambari server, using the following command:
>sudo ambari-server setup
Accept the setup preferences.
A Java JDK is installed as part of the Ambari server setup. However, the Ambari server
setup also allows you to reuse an existing JDK. The command is:
ambari-server setup -j /full/path/to/JDK
The JDK path set by the -j parameter must be the same on each node in the cluster.
8. Start the Ambari server, using the following command:
>sudo ambari-server start

__________________________________________________________________
9. If the Ambari server had been installed on your node previously, the node may contain
old cluster information. Reset the Ambari server to clean up its cluster information in the
database, using the following commands:
>sudo ambari-server stop
>sudo ambari-server reset
>sudo ambari-server start
10. Access the Ambari web user interface from a web browser by using the server name
(the fully qualified domain name, or the short name) on which you installed the software,
and port 8080. For example, enter abc.com:8080.
You can use any available port other than 8080 that will allow you to connect to the
Ambari server. In some networks, port 8080 is already in use. To use another port, do
the following:
a. Edit the ambari.properties file:
vi /etc/ambari-server/conf/ambari.properties
b. Add a line in the file to select another port:
client.api.port=8081
c. Save the file and restart the Ambari server:
ambari-server restart
11. Log in to the Ambari server with the default username and password: admin/admin.
The default username and password is required only for the first login. You can
configure users and groups after the first login to the Ambari web interface.

__________________________________________________________________
12. On the Welcome page, click Launch Install Wizard.
13. On the Get Started page, enter a name for the cluster you want to create. The name
cannot contain blank spaces or special characters. Click Next.
14. You will deploy IBM Open Platform for Apache Hadoop with EMC Isilon. Ambari Server
allows for the immediate usage of an Isilon cluster for all HDFS services (NameNode and
DataNode), no reconfiguration will be necessary once the IBM Open Platform install is
completed.
1. SSH into Isilon as root and configure the Ambari Agent.
isiloncluster1-1# isi zone zones modify zone1 --hdfs-ambari-namenode
mycluster1-hdfs.example.com
isiloncluster1-1# isi zone zones modify zone1 --hdfs-ambari-server manager-
svr-1.example.com

__________________________________________________________________
15. On the Select Stack page, click the Stack version you want to install (BigInsights™ 4.0).
Click Next.
16. On the Install Options page, in Target Hosts, add the list of Linux hosts that the
Ambari server will manage and the IBM Open Platform with Apache Hadoop software will
deploy one node per line. For example, enter
host1.example.com
host2.example.com
host3.example.com
host4.example.com
In Host Registration Information, select one of the two options:
Provide the SSH Private Key to automatically register hosts

__________________________________________________________________
Click SSH Private Key. The private key file is /root/.ssh/id_rsa, where the root user
installed the Ambari server. Click Choose File to find the private key file you installed
previously. You should have retained a copy of the SSH private key (.ssh/id_rsa) in your
local directory when you set up password-less SSH. Copy and paste the key into the text
box manually. Click the Register and Confirm button.
____________________________________________________________________
Note: After the Linux hosts register, click the back button and Perform manual
registration for Isilon and do not use SSH.
____________________________________________________________________
Isilon has an ambari-agent within OneFS and needs to be manually registered in Ambari.
After registering Isilon manually, click the Next button. You should see the Ambari
agents on both your Linux hosts and Isilon become registered.
17. On the Confirm Hosts page, you check that the correct hosts for your cluster have been
located and that those hosts have the correct directories, packages, and processes to
continue the installation.
If hosts were selected in error, click the check boxes next to the hosts you want to
remove. Click Remove Selected. To remove a single host, click Remove in
the Action column.
If warnings are found during the check process, you can click Click here to see the
warnings to see what caused the warnings. The Host Checks page identifies any issues
with the hosts. For example, a host may have Transparent Huge Pages or Firewall issues.
You can ignore errors related to user names and groups as we pre-created the
users in the pre-installation steps of this document.
After you resolve the issues, click Rerun Checks on the Host Checks page. When you
have confirmed the hosts, click Next.
18. On the Choose Services page, select the services you want to install.

__________________________________________________________________
Ambari shows a confirmation message to install the required service dependencies. For
example, when selecting Oozie only, the Ambari web interface shows messages for
accepting YARN/MR2, HDFS and Zookeeper installations. It also shows Nagios and
Ganglia for monitoring and alerting, but they are not required services.
19. On the Assign Masters page, assign NameNode and SNameNode components to the
Isilon SmartConnect address e.g. mycluster1-hdfs.example.com. The rest of the services
can be deployed per the recommended services layout - refer back to Table 1. Make
sure you assign Namenode and SNameNode only to the Isilon SmartConnect
address and none of the Linux nodes, e.g. only mycluster1-hdfs.example.com. Click
Next.
On the Assign Slaves and Clients page, assign the components to Linux hosts in your
cluster and make sure datanode is only assigned to Isilon.
Assign Client to the client nodes. Click Next.
Tip: If you anticipate adding the Big SQL service at some later time, you must include all
clients on all the anticipated Big SQL worker nodes. Big SQL specifically needs the HDFS,
Hive, HBase, Sqoop, HCat, and Oozie clients.
20. On the Customize Services page, select configuration settings for the services selected.
Default values are filled in automatically when available and they are the recommended
values. The installation wizard prompts you for required fields (such as password entries)
by displaying a number in a circle next to an installed service.
Assign passwords to Hive, Oozie, and any other selected services that require them.
The following settings should be checked:
• YARN Node Manager log-dirs
• YARN Node Manager local-dirs
• HBase local directory
• ZooKeeper directory

__________________________________________________________________
• Oozie Data Dir
• Storm storm.local.dir
Click the number and enter the requested information in the field outlined in red. Make
sure that the service port that is set is not already used by another component. For
example, the Knox gateway port is, by default, set as 8443. But, when the Ambari server
is set up with HTTPs, and the SSL port is set up using 8443, then you must change the
Knox gateway port to some other value.
____________________________________________________________________
Note: If you are working in an LDAP environment where users are set up centrally by the
LDAP administrator and therefore, already exist, selecting the defaults can cause the
installation to fail. Open the Misc tab, and check the box to ignore user modification
errors.
21. When you have completed the configuration of the services, click Next.
22. On the Review page, verify that your settings are correct. Click Deploy.
23. The Install, Start, and Test page shows the progress of the installation. The progress
bar at the top of the page gives the overall status while the main section of the page
gives the status for each host. Logs for a specific task can be displayed by clicking on the
task. Click the link in the Message column to find out what tasks have been completed for
a specific host or to see the warnings that have been encountered. When the message
"Successfully installed and started the services" appears, click Next.
24. On the Summary page, review the accomplished tasks. Click Complete to go to the IBM
Open Platform with Apache Hadoop dashboard.
Validating IBM Open Platform Install
Ambari provides service checks for all the supported services. These checks run
automatically after each service installation, or they can be run manually at any time. You

__________________________________________________________________
can access the Ambari web interface and use the Services View to make sure all the
components pass their checks successfully.
The following steps provide another way to validate your installation.
1. As the root user on a node on which Apache Hadoop is installed, enter the following
command to become the ambari-qa user:
su - ambari-qa
2. As the ambari-qa user, run the following command:
export HADOOP_MR_DIR=/usr/iop/current/hadoop-mapreduce-client
# Generate data with 1000 rows. Each row is about 100 bytes.
yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teragen 1000 /tmp/tgout
# Sort data
yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar terasort /tmp/tgout
/tmp/tsout
# Validate data
yarn jar $HADOOP_MR_DIR/hadoop-mapreduce-examples.jar teravalidate /tmp/tsout
/tmp/tvout
If the job is successful, you will see a log record similar to the following:
INFO mapreduce.Job: Job job_id completed successfully
Browse to your cluster on port 8088 to see the results of your validation tests, e.g.
http://x.x.x.x:8088/cluster, example YARN test results shown below.

__________________________________________________________________
Adding a Hadoop User
You must add a user account for each Linux user that will submit MapReduce jobs. The
procedure below can be used to add a user named hduser1 as an example.
1. Add user to Isilon.
isiloncluster1-1# isi auth groups create hduser1 --zone zone1 --provider local
isiloncluster1-1# isi auth users create hduser1 --primary-group hduser1 --zone zone1 --
provider local --home-directory /ifs/isiloncluster1/zone1/hadoop/user/hduser1
2. Add user to Hadoop nodes.
[root@mycluster1-master-0 ~]# adduser hduser1
3. Create the user’s home directory on HDFS.
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -mkdir -p /user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chown hduser1:hduser1
/user/hduser1
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -chmod 755 /user/hduser1
Additional Service Tests
The tests below should be performed to ensure a proper installation. Perform the tests in the
order shown. You must create the Hadoop user hduser1 before proceeding.
HDFS
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -ls /
Found 5 items
-rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON
drwxr-xr-x - hbase hbase 148 2014-08-05 06:06 /hbase
drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr
drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp
drwxr-xr-x - hdfs supergroup 184 2014-08-05 06:07 /user
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -put -f /etc/hosts /tmp
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -cat /tmp/hosts
127.0.0.1 localhost
[root@mycluster1-master-0 ~]# sudo -u hdfs hdfs dfs -rm -skipTrash /tmp/hosts

__________________________________________________________________
[root@mycluster1-master-0 ~]# su - hduser1
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls /
Found 5 items
-rw-r--r-- 1 root hadoop 0 2014-08-05 05:59 /THIS_IS_ISILON
drwxr-xr-x - hbase hbase 148 2014-08-05 06:28 /hbase
drwxrwxr-x - solr solr 0 2014-08-05 06:07 /solr
drwxrwxrwt - hdfs supergroup 107 2014-08-05 06:07 /tmp
drwxr-xr-x - hdfs supergroup 209 2014-08-05 06:39 /user
[hduser1@mycluster1-master-0 ~]$ hdfs dfs -ls
...
YARN/MAPREDUCE
[hduser1@mycluster1-master-0 ~]$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
pi 10 1000
...
Estimated value of Pi is 3.14000000000000000000
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir in
You can put any file into the in directory. It will be used the datasource for subsequent tests.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f /etc/hosts in
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls in
...
[hduser1@mycluster1-master-0 ~]$ hadoop fs -rm -r out
[hduser1@mycluster1-master-0 ~]$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
wordcount in out
...
[hduser1@mycluster1-master-0 ~]$ hadoop fs -ls out
Found 4 items
-rw-r--r-- 1 hduser1 hduser1 0 2014-08-05 06:44 out/_SUCCESS
-rw-r--r-- 1 hduser1 hduser1 24 2014-08-05 06:44 out/part-r-00000
[hduser1@mycluster1-master-0 ~]$ hadoop fs -cat out/part*
localhost 1
127.0.0.1 1
Browse to the YARN Resource Manager GUI http://mycluster1-master-0.example.com:8088/
Browse to the MapReduce History Server GUI http://mycluster1-master-0.lab.example.com:19888/.
In particular, confirm that you can view the complete logs for task attempts.

__________________________________________________________________
HIVE
[hduser1@mycluster1-master-0 ~]$ hadoop fs -mkdir -p sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ cat - > tab1.csv
1,true,123.123,2012-10-24 08:55:00
2,false,1243.5,2012-10-25 13:40:00
3,false,24453.325,2008-08-22 09:33:21.123
4,false,243423.325,2007-05-12 22:32:21.33454
5,true,243.325,1953-04-22 09:11:33
Type <Control+D>.
[hduser1@mycluster1-master-0 ~]$ hadoop fs -put -f tab1.csv sample_data/tab1
[hduser1@mycluster1-master-0 ~]$ hive
hive>
DROP TABLE IF EXISTS tab1;
CREATE EXTERNAL TABLE tab1
(
id INT,
col_1 BOOLEAN,
col_2 DOUBLE,
col_3 TIMESTAMP
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘/user/hduser1/sample_data/tab1’;
DROP TABLE IF EXISTS tab2;
CREATE TABLE tab2
(
id INT,
col_1 BOOLEAN,
col_2 DOUBLE,
month INT,
day INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
INSERT OVERWRITE TABLE tab2
SELECT id, col_1, col_2, MONTH(col_3), DAYOFMONTH(col_3)
FROM tab1 WHERE YEAR(col_3) = 2012;
...
OK
Time taken: 28.256 seconds
hive> show tables;
OK

__________________________________________________________________
tab1
tab2
Time taken: 0.889 seconds, Fetched: 2 row(s)
hive> select * from tab1;
OK
1 true 123.123 2012-10-24 08:55:00
2 false 1243.5 2012-10-25 13:40:00
3 false 24453.325 2008-08-22 09:33:21.123
4 false 243423.325 2007-05-12 22:32:21.33454
5 true 243.325 1953-04-22 09:11:33
hive> select * from tab2;
OK
1 true 123.123 10 24
2 false 1243.5 10 25
hive> select * from tab1 where id=1;
OK
1 true 123.123 2012-10-24 08:55:00
hive> select * from tab2 where id=1;
OK
1 true 123.123 10 24
hive> exit;
HBASE
[hduser1@mycluster1-master-0 ~]$ hbase shell
hbase(main):001:0> create ‘test’, ‘cf’
0 row(s) in 3.3680 seconds
=> Hbase::Table - test
hbase(main):002:0> list ‘test’
TABLE
test
=> [’’test’’]
hbase(main):003:0> put ‘test’, ‘row1’, ‘cf:a’, ‘value1’
hbase(main):004:0> put ‘test’, ‘row2’, ‘cf:b’, ‘value2’

__________________________________________________________________
hbase(main):005:0> scan ‘test’
ROW COLUMN+CELL
row1 column=cf:a,timestamp=1407542488028,value=value1
row2 column=cf:b,timestamp=1407542499562,value=value2
hbase(main):006:0> get ‘test’, ‘row1’
COLUMN CELL
cf:a timestamp=1407542488028,value=value1
hbase(main):007:0> quit
Ambari Service Check
Ambari has built-in functional tests for each component. These are executed automatically
when you install your cluster with Ambari. To execute them after installation, select the service
in Ambari, click the Service Actions button, and select Run Service Check.

__________________________________________________________________
Installing IBM Value Packages
Before You Begin
Please note that “BigInsights Analyst” and “BigInsights Data Scientist” value package have been
sanity tested on EMC Isilon, but have not been performance profiled and tested under load with
Isilon 7.2.0.3 version. EMC and IBM BigInsights plan to validate these components under load
as part of future integration efforts. Please refer to EMC – IBM BigInsights Joint Support
Statement for further details.
You must acquire the software from Passport Advantage. The acquired software has a *.bin
extension. The name of the *.bin file depends on whether the BigInsights Analyst or the
BigInsights Data Scientist module was downloaded.
When you run the *.bin file, configuration files are copied to appropriate locations to
enable Ambari to see that value-add services as available. When adding the value-add
services through Ambari, additional software packages can be downloaded. If the
Hadoop cluster cannot directly access the internet, a local mirror repository can be
created.
Where you perform the following steps depends on whether the Hadoop cluster has
direct internet access.
 If the Hadoop cluster has direct access to the internet, perform the steps from the
Ambari server of the Hadoop cluster.
 If the Hadoop cluster does not have direct internet access, perform the steps from
a Linux host with direct internet access. Then, transfer the files, as required, to a
local repository mirror.

__________________________________________________________________
Installation Procedure
1. Update the permissions on the downloaded *.bin file to enable execute.
chmod +x <package_name>.bin
2. Run the *.bin file to extract and install the services in the module.
./<package_name>.bin
where <package_name> is BI-Analyst-xxxxx.bin for the Analyst module or BI-DS-
xxxxx.bin for the Data Scientist module.
3. After the prompt, agree to the license terms. Reply yes | y to continue install.
4. After the prompt, choose if you want to do an online (option 1) or offline
(option 2) install.
a. Online install will lay out the Ambari service configuration files and
update the repository locations in the Ambari server file. Skip to step 6.
b. Offline install initiates a download of files to set up a local repository
mirror. A subdirectory called BigInsights will be created with RPMs and
associated files will be located in directory BigInsights/packages
5. Setup a local repository.
A local repository is required if the Hadoop cluster cannot connect directly to the internet,
or if you wish to avoid multiple downloads of the same software when installing services
across multiple nodes. In the following steps, the host that performs the repository mirror
function is called the repository server. If you do not have an additional Linux host, you
can use one of the Hadoop management nodes. The repository server must be accessible
over the network by the Hadoop cluster. The repository server requires an HTTP web
server. The following instructions describe how to set up a repository server by using a
Linux host with an Apache HTTP server.
a. On the repository server, if the Apache HTTP server is not installed,
install it:

__________________________________________________________________
yum install httpd
b. On the repository server, ensure that the createrepo package is
installed.
c. On the repository server, create a directory for your value-add
repository, such as <mirror web server document
root>/repos/valueadds. For example, for Apache httpd, the default is
/var/www/html/repos.
mkdir /var/www/html/repos/valueadds
d. By selecting Option 2 in step 4, RPMs were downloaded to a
subdirectory called BigInsights/packages. Copy all of the RPMs to the
mirror web server location, <your.mirror.web.server.document
root>/repos/valueadds directory.
cp BigInsights/packages/* /var/www/html/repos/valueadds/
e. Start this web server. If you use Apache httpd, start it by using either of
the following commands:
apachect start or service httpd start
f. Test your local repository by browsing to the web directory:
http://<your.mirror.web.server>/repos/valueadds
You should see all of the files that you copied to the repository server.
g. On the repository server, run the createrepo command to initialize the
repository:
createrepo /var/www/html/repos/valueadds
h. In the BigInsights/packages directory, find the RPM to install on the
Ambari Server host of the Hadoop cluster:
BigInsights Analyst
BI-Analyst-X.X.X.X-IOP-X.X.x86_64.rpm

__________________________________________________________________
BigInsights Data Scientist
BI-DS-X.X.X.X-IOP-X.X.x86_64.rpm
Tip: The BigInsights Data Scientist module also entitles you to the features of the
BigInsights Analyst module. Therefore, consider doing the yum install for both of the RPM
packages.
Then, copy the file to the Ambari Server host and install the RPMs by using the following
commands:
sudo yum install <BI-xxx-1.0.0.1-IOP...>.rpm
i. On the Ambari Server node, navigate to the /var/lib/ambari-
server/resources/stacks/BigInsights/<version_number>/repos/repoinfo.
xml file. If the file does not exist, create it. Ensure the <baseurl>
element for the BIGINSIGHTS-VALUEPACK <repo> entry points to your
repository server. Remember, there might be multiple <repo> sections.
Make sure that the URL you tested in step 5.f matches exactly the value
indicated in the <baseurl> element. For example, the repoinfo.xml
might look like the following content after you change http://ibm-open-
platform.ibm.com/repos/BigInsights-Valuepacks/to become
http://your.mirror.web.server/repos/valueadds:
<repo>
<baseurl> http://<your.mirror.web.server>/repos/valueadds
</baseurl>
<repoid>BIGINSIGHTS-VALUEPACK</repoid>
<reponame>BIGINSIGHTS-VALUEPACK</reponame>
</repo>
Note: The new <repo> section might appear as a single line.
Tip: If you later find an error in this configuration file, make corrections and run the
following command:

__________________________________________________________________
yum clean all
Then, restart the ambari server.
j. When the module is installed, restart the Ambari server.
ambari-server restart
k. Open the Ambari web interface and log in. The default address is the
following URL:
http://<server-name>:8080
The default login name is admin and the default password is admin.
l. Click Actions > Add service. In the list of services you will see the
services that you previously added as well as the BigInsights services
you can now add.

__________________________________________________________________
Select IBM BigInsights Service to Install
Select the service that you want to install and deploy. Even though your module might
contain multiple services, install the specific service that you want and the BigInsights™
Home service. Installing one value-add service at a time is recommended. Follow the
service specific installation instructions for more information.
At the conclusion of installing all the IBM BigInsights Services, the Ambari GUI Software
List should have green check marks next to each service as shown below:

__________________________________________________________________
Installing BigInsights Home
The BigInsights Home service is the main interface to launch BigInsights - BigSheets,
BigInsights - Text Analytics, and BigInsights - Big SQL.
The BigInsights Home service requires Knox to be installed, configured and started.
Open a browser and access the Ambari server dashboard. The following is the default URL:
The default user name is admin, and the default password is admin.
In the Ambari dashboard, click Actions > Add Service.
In the Add Service Wizard > Choose Services, select the BigInsights – BigInsights Home
service. Click Next. If you do not see the option for BigInsights – BigInsights Home, follow the
instructions described in Installing the BigInsights value-add packages.
In the Assign Masters page, select a Management node (edge node) that your users can
communicate with. BigInsights Home is a web application that your users must be able to open
with a web browser.
In the Assign Slaves and Clients page, make selections to assign slaves and clients.
The nodes that you select will have JSQSH (an open source, command line interface to SQL for
Big SQL and other database engines) and SFTP client. Select nodes that might be used to ingest
data as an SFTP client, where you might want to work with Big SQL scripts, or other databases
interactively.
Click Next to review any options that you might want to customize.
Click Deploy.
If the BigInsights – BigInsights Home service fails to install, run the
remove_value_add_services.sh cleanup script. The following code is an example command:
cd /usr/ibmpacks/bin/<version>

__________________________________________________________________
remove_value_add_services.sh
-u admin -p admin
-x 8080 -s WEBUIFRAMEWORK -r
For more information about cleaning the value-add service environment, see Removing
BigInsights value-add services.
After installation is complete, click Next > Complete.
Configure Knox
The Apache Knox gateway is a system that provides a single point of authentication and access
for Apache Hadoop services on the compute nodes in a cluster; however authentication to HDFS
services is completely controlled by Isilon OneFS only.
The Knox gateway simplifies Hadoop security for users that access the cluster and execute jobs
and operators that control access and manage the cluster. The gateway runs as a server, or a
cluster of servers, providing centralized access to one or more Hadoop clusters.
In IBM® Open Platform with Apache Hadoop, Knox is a service that you start, stop, and
configure in the Ambari web interface.
Users access the following BigInsights™ value added components through Knox by going to the
IBM BigInsights home service.
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/index.html
 BigSheets
 Text Analytics
 Big SQL
Knox supports only REST API calls for the following Hadoop services:
 WebHCat

__________________________________________________________________
 Oozie
 HBase
 Hive
 Yarn
Click the Knox service from the Ambari web interface to see the summary page.
Select Service Actions > Restart All to restart it and all of its components.
If you are using LDAP, you must also start LDAP if it is not already started.
Click the BigInsights Home service in the Ambari User Interface.
Select Service Actions > Restart All to restart it and all of its components.
Open the BigInsights Home page from a web.
The URL for BigInsights Home is:
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/index.html
where:
knox_host
The host where Knox is installed and running
knox_port
The port where Knox is listening (by default this is 8443)
knox_gateway_path
The value entered in the gateway.path field in the Knox configuration (by default this is
'gateway')

__________________________________________________________________
For example, the URL might look like the following address:
https://myhost.company.com:8443/gateway/default/BigInsightsWeb/index.html
If you are using the Knox Demo LDAP, a default user ID and password is created for you. When
you access the web page, use the following preset credentials:
User Name = guest
Password = guest-password
Installing BigSheets
To extend the power of the Open Platform for Apache Hadoop, install and deploy the BigInsights
BigSheets service, which is the IBM spreadsheet interface for big data.
1. Open a browser and access the Ambari server dashboard. The following is the default
URL.
The default user name is admin, and the default password is admin.
2. In the Ambari Dashboard, click Actions > Add Service.
3. In the Add Service Wizard, Choose Services, select the BigInsights -
BigSheets service, and if you have not already installed the BigInsights Home service,
select that as well. Click Next.
If you do not see BigInsights – BigSheets service, you need to install the appropriate
module and restart Ambari as described in Installing the BigInsights value-add packages.
4. In the Assign Masters page, decide on which node of your cluster you want to run the
specified BigSheets master.
5. In the Assign Slaves and Clients page all the defaults are automatically accepted and
the next page automatically appears. BigSheets service does not have any slaves and

__________________________________________________________________
clients. The Assign Slaves and Clients page will show and be skipped immediately
during install. This is the expected behavior.
6. In the Customize Services page, accept the recommended configurations for the
BigSheets service, or customize the configuration by expanding the configuration files
and modifying the values. In theAdvanced bigsheets-user-config section, make sure
that you enter the following information:
a. In the bigsheets.user field, leave the default user name, which is bigsheets.
b. In the bigsheets.password field, type a valid password.
c. In the bigsheets.userid, type a valid user ID to use for the bigsheets service
user. This user ID is created across all of the nodes of the cluster, and must be
unique across all nodes of the cluster.
d. Click Next..
7. In the Advanced bigsheets-ambari-config section, in the ambari.password field,
type the correct Ambari administration password.
8. You can review your selections in the Review page before accepting them. If you want
to modify any values, click the Back button. If you are satisfied with your setup,
click Deploy.
9. In the Install, Start and Test page, the BigSheets service is installed and verified. If
you have multiple nodes, you can see the progress on each node. When the installation is
complete, either view the errors or warnings by clicking the link, or click Next to see a
summary and then the new service added to the list of services.
10.Click Complete.
If the BigInsights – BigSheets service fails to install, run
the remove_value_add_services.shcleanup script. The following code is an example of
the command:
./remove_value_add_services.sh -u admin -p admin -x 8080 -s BIGSHEETS -r

__________________________________________________________________
11.After you install BigInsights - BigSheets, you must restart the HDFS, MapReduce2, YARN,
Knox, Nagios and Ganglia client services.
a. For each service that requires restart, select the service.
b. Click Service Actions.
c. Click Restart All.
12.Access the BigInsights - BigSheets service from the BigInsights Home service.
o If the BigInsights Home service has not yet been added, see Installing
BigInsights Home.
o If the BigInsights Home service has been installed, it must be restarted so
the BigInsights - BigSheets icon will display.
13.Launch the BigInsights Home service by typing the following address in your browser:
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/inde
x.html
Where:
knox_host
knox_port
knox_gateway_path
'gateway')

__________________________________________________________________
Installing Big SQL
To extend the power of the Open Platform for Apache Hadoop, install and deploy the BigInsights
- Big SQL service, which is the IBM SQL interface to the Hadoop-based platform, IBM Open
Platform with Apache Hadoop.
1. Open a browser and access the Ambari server dashboard. The following is the default
URL.
The default user name is admin, and the default password is admin .
2. In the Ambari web interface, click Actions > Add Service.
3. In the Add Service Wizard, Choose Services, select the BigInsights - Big
SQL service, and theBigInsights Home service. Click Next.
If you do not see the option to select the BigInsights - Big SQL service, complete the
steps.
4. In the Assign Masters page, decide which nodes of your cluster you want to run the
specified components, or accept the default nodes. Follow these guidelines:
o For the Big SQL monitoring and editing tool, make sure that the Data Server
Manager (DSM) is assigned to the same node that is assigned to the Big SQL Head
node.
5. Click Next.
6. In the Assign Slaves and Clients page, accept the defaults, or make specific
assignments for your nodes. Follow these guidelines:
o Select the non-head nodes for the Big SQL Worker components. You must select at
least one node as the worker node.
o Select all nodes for the CLIENT. This puts JSqsh and SFTP clients on the nodes.

__________________________________________________________________
7. In the Customize Services page, accept the recommended configurations for the Big
SQL service, or customize the configuration by expanding the configuration files and
modifying the values. Make sure that you have a
valid bigsql_user and bigsql_user_password (see reference screen below) and
user_id (created by the bi_create_users.sh script) in the appropriate fields in
theAdvanced bigsql-users-env section.
8.

__________________________________________________________________
9. You can review your selections in the Review page before accepting them. If you want
to modify any values, click the Back button. If you are satisfied with your setup,
click Deploy.
10.In the Install, Start and Test page, the Big SQL service is installed and verified. If you
have multiple nodes, you can see the progress on each node. When the installation is
complete, either view the errors or warnings by clicking the link, or click Next to see a
summary and then the new service added to the list of services.
If the BigInsights – Big SQL service fails to install, run
the remove_value_add_services.shcleanup script. The following code is an example of
the command:

__________________________________________________________________
./remove_value_add_services.sh -u admin -p admin -x 8080 -s BIGSQL -r
11. A web application interface for Big SQL monitoring and editing is available to your end-
users to work with Big SQL. You access this monitoring utility from the IBM BigInsights
Home service. If you have not added the BigInsights Home service yet, do that now.
12. Restart the Knox Service. Also start the Knox Demo LDAP service if you have not
configured your own LDAP.
13. Restart the BigInsights Home services.
14. To run SQL statements from the Big SQL monitoring and editing tool, type the following
address in your browser to open the BigInsights Home service:
https://<knox_host>:<knox_port>/<knox_gateway_path>/default/BigInsightsWeb/inde
x.html
Where:
knox_host
knox_port
knox_gateway_path
'gateway')
If you use the Knox Demo LDAP service, the default credential is:
userid = guest
password = guest-password

__________________________________________________________________
Your end users can also use the JSqsh client, which is a component of
the BigInsights - Big SQL service.
15. If the BigInsights - Big SQL service shows as unavailable, there might have been a
problem with post-installation configuration. Run the following commands
as root (or sudo) where the Big SQL monitoring utility (DSM) server is installed:
a. Run the dsmKnoxSetup script:
b. cd /usr/ibmpacks/bigsql/<version-number>/dsm/1.1/ibm-datasrvrmgr/bin/
./dsmKnoxSetup.sh -knoxHost <knox-host>
where <knox-host> is the node where the Knox gateway service is running.
c. Make sure that you do not stop and restart the Knox gateway service within
Ambari. If you do, then run the dsmKnoxSetup script again.
d. Restart the BigInsights Home service so that the Big SQL monitoring utility
(DSM) can be accessed from the BigInsights Home interface.
16. For HBase, do the following post-installation steps:
. For all nodes where HBase is installed, check that the symlinks to hive-serde.jar
and hive-common.jar in the hbase/lib directory are valid.
 To verify the symlinks are created and valid:
 namei /usr/iop/<version-number>/hbase/lib/hive-serde.jar
 namei /usr/iop/<version-number>/hbase/lib/hive-common.jar
 If they are not valid, do the following steps:
 cd /usr/iop/<version-number>/hbase/lib
 rm -rf hive-serde.jar
 rm -rf hive-common.jar
 ln -s /usr/iop/<version-number>/hive/lib/hive-serde.jar hive-serde.jar
ln -s /usr/iop/<version-number>/hive/lib/hive-common.jar hive-common.jar
a. After installing the Big SQL service, and fixing the symlinks, restart the HBase
service from the Ambari web interface.

__________________________________________________________________
After you add Big SQL worker nodes, make sure that you stop and then restart the Hive service.
Connecting to Big SQL
You can run Big SQL queries from Java SQL Shell (JSqsh), or from the IBM Data Server
Manager. You can also run queries from a client application, such as IBM Data Studio,
that uses JDBC or ODBC drivers. You must identify a running Big SQL server and
configure either a JDBC or ODBC driver.
For more information about JSqsh, or IBM Data Studio, see the related topics in the
IBM® BigInsights™ Knowledge Center.
Running JSqsh
JSqsh is installed in /usr/ibmpacks/common-utils/current/jsqsh/bin. Change to that directory
and type./jsqsh to open the JSqsh shell:
cd /usr/ibmpacks/common-utils/current/jsqsh/bin
./jsqsh
You can then run any JSqsh commands from the prompt.
Connection setup
To use the JSqsh command shell, you can use the default connections or define and test a
connection to the Big SQL server.
1. The first time that you open the JSqsh command shell, a configuration wizard is started.
When you are at the Jsqsh command prompt, type drivers to determine the available
drivers.
a. On the driver selection screen, select the Big SQL instance that you want to run
Note: Big SQL is designated as DB2 in this example:
Name Target Class

__________________________________________________________________
- ------- ------------------- --------------------------------------------
...
2 *db2 IBM Data Server(DB2 com.ibm.db2.jcc.DB2Driver
b. Verify the port, server, and user name. Run setup and click C to define a
password for the connection. The username must have database administration
privileges, or must be granted those privileges by the Big SQL administrator.
c. Test the connection to the Big SQL server.
d. Save and name this connection.
2. Generally, you can access JSqsh from /usr/ibmpacks/common-
utils/current/jsqsh/bin with the following command:
3. ./jsqsh --driver=db2 --user=<username>
--password=<user_password>
4. Open the saved configuration wizard any time by typing setup while in the command
interface, or./jsqsh --setup when you open the command interface.
5. Specify the following connection name in the JSqsh command shell to establish a
connection:
./jsqsh name
6. Use the connect command when you are already inside the JSQSH shell to establish a
connection at the JSqsh prompt:
connect name
Commands and queries
At the JSqsh command prompt, you can run JSqsh commands or database server commands.
JSqsh commands usually begin with a backslash () character.
JSqsh commands accept command-line arguments and allow for common shell activities, such
as I/O redirection and pipes.
For example, consider this set of commands:
1> select * from t1
2> where c1 > 10
3> go --style csv > /tmp/t1.csv

__________________________________________________________________
Because the commands do not begin with a backslash character, the first two commands are
assumed to be SQL statements, and are sent to the Big SQL server.
The go command sends the statements to run on the server. The go command has a built-in
alias so that you can omit the backslash. Additionally, you can specify a trailing semicolon to
indicate that you want to run a statement, for example:
1> select * from t1
2> where c1 > 10;
The --style option in the go command indicates that the display shows comma-separated
values (CSV). The go form is most useful if you provide additional arguments to affect how
the query is run. Changing the display style is an example of this feature.
The redirection operator (>) specifies that the results of the command are sent to a file
called /tmp/t1.csv.
A set of frequently run commands does not require the leading backslash. Any JSqsh command
can bealiased to another name (without a leading backslash, if you choose), by using
the alias command. For example, if you want to be able to type bye to leave the JSqsh shell,
you establish that word as the alias for the quit command:
alias bye='quit'
You can run a script that contains one or more SQL statements. For example, assume that you
have a file called mySQL.sql. That file contains these statements:
select tabschema, tabname from syscat.tables fetch first 5 rows only;
select tabschema, colname, colno, typename, length from syscat.columns fetch first 10 rows
only;
You can start JSqsh and run the script at the same time with this command:
/usr/ibmpacks/common-utils/current/jsqsh/bin/jsqsh bigsql < /home/bigsql/mySQL.sql
The redirection operator specifies to JSqsh to get the commands from the file located in
the /home/bigsqldirectory, and then run the statements within the file.

__________________________________________________________________
Command and query edit
The JSqsh command shell uses the JLine2 library, which allows you to edit previously entered
commands and queries. You use the command-line edit features to move the arrow keys and to
edit the command or query on the current line.
The JLine2 library provides the same key bindings (vi and emacs) as the GNU Readline library.
In addition, it attempts to apply any custom key maps that you created in a
GNU Readline configuration file, (.inputrc) in the local file system $HOME/ directory.
In addition to individual line editing, the JSqsh command shell remembers the 50 most recently
run statements, which you can view by using the history command:
1> history
(1) use tpch;
(2) select count(*) from lineitem
Previously run statements are prefixed with a number in parentheses. You use this number to
recall that query by using the JSqsh recall operator (!), for example:
1> !2
1> select count(*) from lineitem
2>
The ! recall operator has the following behavior:
!! Recalls the previously run statement.
!5 Recalls the fifth query from history.
!-2 Recalls the query from two prior runs.
You can also edit queries that span multiple lines by using the buf-edit command,
which pulls the current query into an external editor, for example:
1> select id, count(*)
2> from t1, t2
3> where t1.c1 = t2.c2
4> buf-edit
The query is opened in an external editor (/usr/bin/vi by default. However, you can
specify a different editor on the environment variable $EDITOR). When you close the
editor, the edited query is entered at the JSqsh command shell prompt.

__________________________________________________________________
The JSqsh command shell provides built-in aliases, vi and emacs, for the buf-
edit command. The following commands, for example, open the query in the vi editor:
1> select id, count(*)
2> from t1, t2
3> where t1.c1 = t2.c2
4> vi
Configuration variables
You can use the set command to list or define values for a number of configuration
variables, for example:
1> set
If you want to redefine the prompt in the command shell, you run the following command
with the prompt option:
1> set prompt='foo $lineno> '
foo 1>
Every JSqsh configuration variable has built-in help available:
1> help prompt
If you want to permanently set a specific variable, you can do so by editing
your $HOME/.jsqsh/sqshrc file and including the appropriate set command in it.

EMC Starter Kit - IBM BigInsights - EMC Isilon

EMC Starter Kit - IBM BigInsights - EMC Isilon

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to EMC Starter Kit - IBM BigInsights - EMC Isilon

Similar to EMC Starter Kit - IBM BigInsights - EMC Isilon (20)

More from Boni Bruno

More from Boni Bruno (7)

EMC Starter Kit - IBM BigInsights - EMC Isilon