2. Table of Contents
Taming to get the SAS Hadoop Environment Set up
5 ways SAS gets to the data inside Hadoop
SAS-HADOOP Talk Talk?
Connecting SAS 9.4 (Windows) Cloudera CDH5.8 VM
Submitting HDFS Commands from SAS
Submitting pig commands from SAS
Q& A
3. Taming to get the SAS
Hadoop Environment Set up
Storing structured and unstructured data inside Hadoop
- A new industry Normal
SAS Hadoop are now friends together
- Is Your Team ready to embrace the new Environments?
What First Question your analytical mind triggers?
• How many ways SAS can get to the data
inside Hadoop?
• How SAS & Hadoop can listen to each
other?
• What configuration changes are required?
4. 5 ways SAS gets data inside Hadoop
- Depends on several SAS technology products
BASE SAS
Base SAS can access hdfs files and can perform read-write operations only on plain Text and SAS Scalable
Performance Data Engine (spde) files.
SAS Scalable Performance Data Server (SPD Server)
SPD server when connected to Hadoop, can directly read-write partitioned SPD server files To & Fro hdfs.
SAS Access Hadoop Interface
Provides capability to interact with Hive tables. Can read-write data to Hive directly Using SAS SQL pass through
facility and SAS libname statements.
SAS LASER Analytics Server
It is an in-memory analytics engine that can process data directly to hdfs using the SASHDAT file format -a highly
optimized, fastest and most efficient way of processing the data
SAS In-Database Products
SAS In-Database Code Accelerator for Hadoop enables to run data and thread programs ( DS2 programming)
in map-reduce framework. The In-Database products offers several speedy methods for data preparation based
on DS2 thread programming on multicore symmetric multiprocessing (SMP) and massively parallel processing
(MPP) machines.
5. SAS-Hadoop Talk Talk?
Connect and Configure SAS and Hadoop requires:
Hadoop Jar Files
Hadoop Configuration Properties
Define the new SAS Environment Variables
HIVE Tables
HDFS
Directory
SAS_HADOOP_JAR_PATH
SAS_HADOOP_CONFIG_PATH
SAS_HADOOP_RESTFUL
(Optional- Enable WebHDFS)
New Environment
Variables
SAS Access to Hadoop
Interface
6. Connecting SAS(9.4 Windows)
Cloudera CDH VM 5.8
A Step-by-Step Guide
Install the following
- SAS 9.4 windows
- Download and install VM Player and CDH 5.8
https://www.cloudera.com/downloads/quickstart_vms/5-8.html
Import the CDH 5.8 into VM player
Go to VM Settings and
- Allocate 16GB RAM and 2 cores
- Ensure NAT Adopter is Enabled
- Create a shared Folder Location that is accessible by SAS
Validate Hadoop is up and running
Note the IP address of VM machine
In Windows, add VM IP address and hostname to host file
continued. . . . .
7. Connecting SAS(9.4 Windows)
Cloudera CDH VM 5.8
A Step-by-Step Guide
Locate your latest hadoop jars and configuration files
- download hadoop tracer python script to get the jars and config files
ftp://ftp.sas.com/techsup/download/blind/access/hadooptracer.zip -
unzip the hadoop tracer python script in the shared folder location
- run the python script as
python ./hadooptracer.py --filterby=latest
It will pulls all the hadoop jars and config files under the directory /tmp/jars and /tmp/sitesxml
Copy the jars and sitesxml to your shared location accessible to SAS
Set the New SAS Environment Variables to point to Jar path and Config path