SlideShare a Scribd company logo
How-to create a multi
tenancy for an interactive
data analysis
Spark Cluster + Livy + Zeppelin
Introduction
With this presentation you should be able to create an architecture for a framework of an
interactive data analysis by using a Spark Cluster with Kerberos, a Livy Server and
Zeppelin notebook with Kerberos authentication.
Architecture
This architecture enables the following:
● Transparent data-science development
● Upgrades on Cluster won’t affect the
developments.
● Controlled access to the data and
resources by Kerberos/Sentry.
● High availability.
● Several coding API’s (Scala, R, Python,
PySpark, etc…).
Pre-Assumptions
1. Cluster hostname: cm1.localdomain Zeppelin hostname: cm2.localdomain
2. Cluster supergroup: bdamanager
3. Cluster Manager: Cloudera Manager 5.12.2
4. Service Yarn Installed
5. Cluster Authentication Pre-Installed: Kerberos
a. Kerberos Realm DOMAIN.COM
6. Chosen IDE: Zeppelin
7. Zeppelin Machine Authentication Not-Installed: Kerberos
Livy server configuration
Create User and Group for Livy
sudo useradd livy
sudo passwd livy
sudo usermod -G bdamanager livy
Create User Zeppelin for the IDE
sudo useradd zeppelin
sudo passwd zeppelin
Note 1: due to the livy impersonation, livy should be added to the cluster supergroup, so you should replace the highlighted name with your
supergroup name.
Note 2: the chosen IDE it’s the zeppelin if you chose other just replace the highlighted field.
Livy server configuration
Download and installation
su livy
cd home/livy
wget
http://mirrors.up.pt/pub/apache/incubator/livy
/0.5.0-incubating/livy-0.5.0-incubating-bin.zip
unzip livy-0.5.0-incubating-bin.zip
cd livy-0.5.0-incubating-bin/
mkdir logs
cd conf/
mv livy.conf.template livy.conf
mv livy-env.sh.template livy-env.sh
mv livy-client.conf.template livy-client.conf
Edit Livy environment variables
nano livy-env.sh
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.12.2-1.cdh5.12.2.p0.4/lib/spark/
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.12.2-1.cdh5.12.2.p0.4
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/
export HADOOP_CONF_DIR=/etc/hadoop/conf
export LIVY_HOME=/home/livy/livy-0.5.0-incubating-bin/
export LIVY_LOG_DIR=/var/log/livy2
export LIVY_SERVER_JAVA_OPTS="-Xmx2g"
Make livy hive aware
sudo ln -s /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml
Livy server configuration
Edit livy configuration file
nano livy.conf
# What spark master Livy sessions should use.
livy.spark.master = yarn
# What spark deploy mode Livy sessions should use.
livy.spark.deploy-mode = cluster
# If livy should impersonate the requesting users when creating a new session.
livy.impersonation.enabled = true
# Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
# on user request and then livy server classpath automatically.
livy.repl.enable-hive-context = true
Livy server configuration
Edit livy configuration file
# Add Kerberos Config
livy.server.launch.kerberos.keytab = /home/livy/livy.keytab
livy.server.launch.kerberos.principal=livy/cm1.localdomain@DOMAIN.COM
livy.server.auth.type = kerberos
livy.server.auth.kerberos.keytab=/home/livy/spnego.keytab
livy.server.auth.kerberos.principal=HTTP/cm1.localdomain@DOMAIN.COM
livy.server.access-control.enabled=true
livy.server.access-control.users=zeppelin,livy
livy.superusers=zeppelin,livy
Note 1: on this example the chosen IDE is the zeppelin.
Note 2: with livy.impersonation.enabled = true it implies that livy will be able to impersonate any user present on the Cluster (proxyUser).
Note 3: with livy.server.auth.type = kerberos it implies that to interact with livy, requires to the user be correctly authenticated.
Note 4: it’s only necessary to change the highlighted, ex: for your hostname.
Livy server configuration
Create Kerberos Livy and Zeppelin principal and keytabs
sudo kadmin.local <<eoj
addprinc -pw welcome1 livy/cm1.localdomain@DOMAIN.COM
modprinc -maxrenewlife 1week livy/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/livy/livy.keytab livy/cm1.localdomain@DOMAIN.COM
addprinc -pw welcome1 zeppelin/cm1.localdomain@DOMAIN.COM
modprinc -maxrenewlife 1week zeppelin/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/livy/zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/livy/spnego.keytab HTTP/cm1.localdomain@DOMAIN.COM
eoj
Create Log Dir and add Permissions
cd /home
sudo chown -R livy:livy livy/
sudo mkdir /var/log/livy2
sudo chown -R livy:bdamanager /var/log/livy2
Note: it’s only necessary to change the highlighted names , for your hostname and for last your supergroup name..
Cloudera configuration
HUE - Create Users Livy, Zeppelin and add Livy to a Supergroup HDFS - Add Livy proxyuser permissions
On the Cloudera Manager menu:
HDFS > Advanced Configuration Snippet for core-site.xml
you should add the following xml:
<property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
</property>
Interact with Livy server
Start Livy server
sudo -u livy /home/livy/livy-0.5.0-incubating-bin/bin/livy-server
Verify that the server is running by connecting to its web UI, which uses the port 8998 (default)
http://cm1.localdomain:8998/ui
Authenticate with a user principal
Example:
kinit livy/cm1.localdomain@DOMAIN.COM
kinit tpsimoes/cm1.localdomain@DOMAIN.COM
Livy offers a REST APIs to create interactive sessions and therefore submit Spark code the same way you can do with a
Spark shell or a PySpark shell. The following interaction examples with livy server will be in Python.
Interact with Livy server
Create session
curl --negotiate -u:livy -H "Content-Type: application/json" -X POST -d '{"kind":"pyspark", "proxyUser": "livy"}' -i
http://cm1.localdomain:8998/sessions
curl --negotiate -u:livy -H "Content-Type: application/json" -X POST -d '{"kind":"spark", "proxyUser": "livy"}' -i
http://cm1.localdomain:8998/sessions
Check for sessions with details
curl --negotiate -u:livy cm1.localdomain:8998/sessions | python -m json.tool
Note 1: using livy with a kerberized cluster all commands must have --negotiate -u:user -or --negotiate -u:user:password
Note 2: to create a different code language session just have to change the highlighted field
Interact with Livy server
Submit a job
curl -H "Content-Type: application/json" -X POST -d '{"code":"2 + 2"}' -i --negotiate -u:livy
cm1.localdomain:8998/sessions/0/statements
{"id":0,"code":"2 + 2","state":"waiting","output":null,"progress":0.0}
Check result from statement
curl --negotiate -u:livy cm1.localdomain:8998/sessions/0/statements/0
{"id":0,"code":"2 + 2","state":"available","output":{"status":"ok","execution_count":0,"data":{"text/plain":"4"}},"progress":1.0}
Interact with Livy server
Submit another job
curl -H "Content-Type: application/json" -X POST -d '{"code":"println(sc.parallelize(1 to 5).collect())"}' -i --negotiate -u:livy
http://cm1.localdomain:8998/sessions/1/statements
curl -H "Content-Type: application/json" -X POST -d '{"code":"a = 10"}' -i --negotiate -u:livy
cm1.localdomain:8998/sessions/2/statements
curl -H "Content-Type: application/json" -X POST -d '{"code":"a + 1"}' -i --negotiate -u:livy
cm1.localdomain:8998/sessions/2/statements
Submit another job
curl --negotiate -u:livy cm1.localdomain:8998/sessions/0 -X DELETE
Note: while submitting jobs or check for details pay attention to session number in the highlighted field ex: sessions/2
Zeppelin Architecture
Zeppelin it’s a multi-purpose notebook that
enables:
● Data Ingestion & Discovery.
● Data Analytics.
● Data Visualization & Collaboration.
And with the livy interpreter enables spark
integration with a Multiple Language Backend.
Configure Zeppelin Machine
Download and Install UnlimitedJCEPolicyJDK8 from Oracle
wget http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
unzip jce_policy-8.zip
sudo cp local_policy.jar US_export_policy.jar /usr/java/jdk1.8.0_131/jre/lib/security/
Note: confirm the java directory and replace in the highlighted field.
Configure Zeppelin Machine
Assuming that on Zeppelin machine we will require kerberos authentication and it’s not installed, i’ll
provide quick steps for the installation and respective configuration.
Install Kerberos server and open ldap client
sudo yum install -y krb5-server openldap-clients krb5-workstation
Set Kerberos Realm
sudo sed -i.orig 's/EXAMPLE.COM/DOMAIN.COM/g' cd
Set the hostname for the Kerberos server
sudo sed -i.m1 's/kerberos.example.com/cm1.localdomain/g' /etc/krb5.conf
Change domain name to cloudera
sudo sed -i.m2 's/example.com/DOMAIN.COM/g' /etc/krb5.conf
Note: replace you hostname and realm on the highlighted fields.
Configure Zeppelin Machine
Create the Kerberos database
sudo kdb5_util create -s
Acl file needs to be updated so the */admin is enabled with admin privileges
sudo sed -i 's/EXAMPLE.COM/DOMAIN.COM/' /var/kerberos/krb5kdc/kadm5.acl
Update the kdc.conf file to allow renewable
sudo sed -i.m3 '/supported_enctypes/a default_principal_flags = +renewable, +forwardable'
/var/kerberos/krb5kdc/kdc.conf
Fix the indenting
sudo sed -i.m4 's/^default_principal_flags/ default_principal_flags/' /var/kerberos/krb5kdc/kdc.conf
Configure Zeppelin Machine
Update kdc.conf file
sudo sed -i.orig 's/EXAMPLE.COM/DOMAIN.COM/g' /var/kerberos/krb5kdc/kdc.conf
Acl file needs to be updated so the */admin is enabled with admin privileges
sudo sed -i 's/EXAMPLE.COM/DOMAIN.COM/' /var/kerberos/krb5kdc/kadm5.acl
Add a line to the file with ticket life
sudo sed -i.m1 '/dict_file/a max_life = 1d' /var/kerberos/krb5kdc/kdc.conf
Add a max renewable life
sudo sed -i.m2 '/dict_file/a max_renewable_life = 7d' /var/kerberos/krb5kdc/kdc.conf
Indent the two new lines in the file
sudo sed -i.m3 's/^max_/ max_/' /var/kerberos/krb5kdc/kdc.conf
Configure Zeppelin Machine
Start up the kdc server and the admin server
sudo service krb5kdc start;
sudo service kadmin start;
Make the kerberos services autostart
sudo chkconfig kadmin on
sudo chkconfig krb5kdc on
Add a line to the file with ticket life
sudo sed -i.m1 '/dict_file/a max_life = 1d' /var/kerberos/krb5kdc/kdc.conf
Add a max renewable life
sudo sed -i.m2 '/dict_file/a max_renewable_life = 7d' /var/kerberos/krb5kdc/kdc.conf
Indent the two new lines in the file
sudo sed -i.m3 's/^max_/ max_/' /var/kerberos/krb5kdc/kdc.conf
Configure Zeppelin Machine
Create Kerberos Livy and Zeppelin principal and keytabs
sudo kadmin.local <<eoj
addprinc -pw welcome1 zeppelin/cm1.localdomain@DOMAIN.COM
modprinc -maxrenewlife 1week zeppelin/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/zeppelin/zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM
eoj
Set Hostname and make Zeppelin aware of Livy/Cluster Machine
sudo /etc/hosts
# Zeppelin IP HOST
10.222.33.200 cm2.localdomain
# Livy/Cluster IP HOST
10.222.33.100 cm1.localdomain
sudo hostname cm2.localdomain
Configure Zeppelin Machine
Set Hostname and make Zeppelin aware of Livy Machine
sudo nano /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=cm2.localdomain
NTPSERVERARGS=iburst
Disable SELinux
sudo nano /etc/selinux/config
SELINUX=disabled
sudo setenforce 0
Clean iptables rules
sudo iptables -F
sudo nano /etc/rc.local
iptables -F
Note: after all operations it’s recommended a restart.
Make executable to operation run at startup
sudo chmod +x /etc/rc.d/rc.local
Save iptables rules on restart
sudo nano /etc/sysconfig/iptables-config
# Save current firewall rules on restart.
IPTABLES_SAVE_ON_RESTART="yes"
Disable firewall
sudo systemctl disable firewalld;
sudo systemctl stop firewalld;
Configure Zeppelin Machine
Create User Zeppelin
sudo useradd zeppelin
sudo passwd zeppelin
Add user Zeppelin to sudoers
sudo cat /etc/sudoers
## Same thing without a password
# %wheel ALL=(ALL) NOPASSWD: ALL
zeppelin ALL=(ALL) NOPASSWD: ALL
Note: on the highlighted fields you should replace with chosen IDE and the available java installation.
Download and Install Zeppelin
su zeppelin
cd ~
wget http://mirrors.up.pt/pub/apache/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
tar -zxvf zeppelin-0.7.3-bin-all.tgz
cd /home/
sudo chown -R zeppelin:zeppelin zeppelin/
Create Zeppelin environment variables
cd /home/zeppelin/zeppelin-0.7.3-bin-all/conf
cp zeppelin-env.sh.template zeppelin-env.sh
cp zeppelin-site.xml.template zeppelin-site.xml
Export Java properties
export port JAVA_HOME=/usr/java/jdk1.7.0_67
Configure Zeppelin Machine
Add User Zeppelin and Authentication type on the configuration file (shiro.ini)
cd /home/zeppelin/zeppelin-0.7.3-bin-all/conf/
nano shiro.ini
[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html
admin = welcome1, admin
zeppelin = welcome1, admin
user2 = password3, role3
…
[urls]
# This section is used for url-based security.
# anon means the access is anonymous.
# authc means Form based Auth Security
# To enfore security, comment the line below and uncomment the next one
/api/version = authc
/api/interpreter/** = authc, roles[admin]
/api/configurations/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
#/** = anon
/** = authc
Interact with Zeppelin
Kinit User
cd /home/zeppelin/
kinit -kt zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM
Start/Stop Zeppelin
cd ~/zeppelin-0.7.3-bin-all
sudo ./bin/zeppelin-daemon.sh start
sudo ./bin/zeppelin-daemon.sh stop
Open Zeppelin UI
http://cm2.localdomain:8080/#/
Note: change with your hostname and domain in the highlighted field.
Login Zeppelin User
Create Livy Notebook
Interact with Zeppelin
Configure Livy Interpreter
zeppelin.livy.keytab: zeppelin/cm1.localdomain@DOMAIN.COM
zeppelin.livy.principal: /home/zeppelin/zeppelin.keytab
zeppelin.livy.url: http://cm1.localdomain:8998
Using Livy Interpreter
spark
%livy.spark
sc.version
sparkR
%livy.sparkr
hello <- function( name ) {
sprintf( "Hello, %s", name );
}
hello("livy")
pyspark
%livy.pyspark
print "1"
Interact with Zeppelin
Using Livy Interpreter
%pyspark
from pyspark.sql import HiveContext
hiveCtx= HiveContext(sc)
hiveCtx.sql("show databases").show()
hiveCtx.sql("select current_user()").show()
Note: due to livy impersonation, we will see every database on Hive Metadata, but on a valid user can access to the correspondent data.
%pyspark
from pyspark.sql import HiveContext
hiveCtx.sql("select * from notMyDB.TAB_TPS").show()
hiveCtx.sql("Create External Table myDB.TAB_TST (Operation_Type String, Operation String)")
hiveCtx.sql("Insert Into Table myDB.TAB_TST select 'ZEPPELIN','FIRST'")
hiveCtx.sql("select * from myDB.TAB_TST").show()
Interact with Zeppelin
Using Livy Interpreter
%livy.pyspark
from pyspark.sql import HiveContext
sc._conf.setAppName("Zeppelin-HiveOnSpark")
hiveCtx = HiveContext(sc)
hiveCtx.sql("set yarn.nodemanager.resource.cpu-vcores=4")
hiveCtx.sql("set yarn.nodemanager.resource.memory-mb=16384")
hiveCtx.sql("set yarn.scheduler.maximum-allocation-vcores=4")
hiveCtx.sql("set yarn.scheduler.minimum-allocation-mb=4096")
hiveCtx.sql("set yarn.scheduler.maximum-allocation-mb=8192")
hiveCtx.sql("set spark.executor.memory=1684354560")
hiveCtx.sql("set spark.yarn.executor.memoryOverhead=1000")
hiveCtx.sql("set spark.driver.memory=10843545604")
hiveCtx.sql("set spark.yarn.driver.memoryOverhead=800")
hiveCtx.sql("set spark.executor.instances=10")
hiveCtx.sql("set spark.executor.cores=8")
hiveCtx.sql("set hive.map.aggr.hash.percentmemory=0.7")
hiveCtx.sql("set hive.limit.pushdown.memory.usage=0.5")
countryList = hiveCtx.sql("select distinct country from myDB.SALES_WORLD")
countryList.show(4)
Thanks
Big Data Engineer
Tiago Simões

More Related Content

What's hot

Automated Java Deployments With Rpm
Automated Java Deployments With RpmAutomated Java Deployments With Rpm
Automated Java Deployments With Rpm
Martin Jackson
 
Multinode kubernetes-cluster
Multinode kubernetes-clusterMultinode kubernetes-cluster
Multinode kubernetes-cluster
Ram Nath
 
Replacing Squid with ATS
Replacing Squid with ATSReplacing Squid with ATS
Replacing Squid with ATS
Kit Chan
 
Describing Kafka security in AsyncAPI
Describing Kafka security in AsyncAPIDescribing Kafka security in AsyncAPI
Describing Kafka security in AsyncAPI
Dale Lane
 
Failsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo HomepageFailsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo Homepage
Kit Chan
 
Useful Kafka tools
Useful Kafka toolsUseful Kafka tools
Useful Kafka tools
Dale Lane
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
Ji-Woong Choi
 
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformDrupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Hector Iribarne
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
Sudheer Kondla
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
Open Source Consulting
 
How Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksHow Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, Works
Matthew Farina
 
Connect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceConnect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux Instance
VCP Muthukrishna
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
Anis LARGUEM
 
Build Automation 101
Build Automation 101Build Automation 101
Build Automation 101
Martin Jackson
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Stephane Jourdan
 
IT Infrastructure Through The Public Network Challenges And Solutions
IT Infrastructure Through The Public Network   Challenges And SolutionsIT Infrastructure Through The Public Network   Challenges And Solutions
IT Infrastructure Through The Public Network Challenges And Solutions
Martin Jackson
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
YoungHeon (Roy) Kim
 
10 Million hits a day with WordPress using a $15 VPS
10 Million hits a day  with WordPress using a $15 VPS10 Million hits a day  with WordPress using a $15 VPS
10 Million hits a day with WordPress using a $15 VPS
Paolo Tonin
 

What's hot (20)

Automated Java Deployments With Rpm
Automated Java Deployments With RpmAutomated Java Deployments With Rpm
Automated Java Deployments With Rpm
 
Multinode kubernetes-cluster
Multinode kubernetes-clusterMultinode kubernetes-cluster
Multinode kubernetes-cluster
 
Replacing Squid with ATS
Replacing Squid with ATSReplacing Squid with ATS
Replacing Squid with ATS
 
Describing Kafka security in AsyncAPI
Describing Kafka security in AsyncAPIDescribing Kafka security in AsyncAPI
Describing Kafka security in AsyncAPI
 
Failsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo HomepageFailsafe Mechanism for Yahoo Homepage
Failsafe Mechanism for Yahoo Homepage
 
Useful Kafka tools
Useful Kafka toolsUseful Kafka tools
Useful Kafka tools
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
[오픈소스컨설팅] 프로메테우스 모니터링 살펴보고 구성하기
 
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platformDrupal camp South Florida 2011 - Introduction to the Aegir hosting platform
Drupal camp South Florida 2011 - Introduction to the Aegir hosting platform
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
 
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법[오픈소스컨설팅] EFK Stack 소개와 설치 방법
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
 
How Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, WorksHow Helm, The Package Manager For Kubernetes, Works
How Helm, The Package Manager For Kubernetes, Works
 
Connect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux InstanceConnect Amazon EC2 Linux Instance
Connect Amazon EC2 Linux Instance
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
 
Build Automation 101
Build Automation 101Build Automation 101
Build Automation 101
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
 
IT Infrastructure Through The Public Network Challenges And Solutions
IT Infrastructure Through The Public Network   Challenges And SolutionsIT Infrastructure Through The Public Network   Challenges And Solutions
IT Infrastructure Through The Public Network Challenges And Solutions
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
10 Million hits a day with WordPress using a $15 VPS
10 Million hits a day  with WordPress using a $15 VPS10 Million hits a day  with WordPress using a $15 VPS
10 Million hits a day with WordPress using a $15 VPS
 

Similar to How to create a multi tenancy for an interactive data analysis

Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
CJ Cullen
 
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, OrchestrationThe Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
Erica Windisch
 
Bhushan m dev_ops_engr_31june
Bhushan m dev_ops_engr_31juneBhushan m dev_ops_engr_31june
Bhushan m dev_ops_engr_31june
Bhushan Mahajan
 
Dockers zero to hero
Dockers zero to heroDockers zero to hero
Dockers zero to hero
Nicolas De Loof
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations Center
Jimmy Mesta
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
Matt Ray
 
Bhushan m dev_ops_engr_aws
Bhushan m dev_ops_engr_awsBhushan m dev_ops_engr_aws
Bhushan m dev_ops_engr_aws
Bhushan Mahajan
 
Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)
HungWei Chiu
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
Docker, Inc.
 
Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3
Velocidex Enterprises
 
Who is afraid of privileged containers ?
Who is afraid of privileged containers ?Who is afraid of privileged containers ?
Who is afraid of privileged containers ?
Marko Bevc
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
Alessandro Arrichiello
 
PHP on Heroku: Deploying and Scaling Apps in the Cloud
PHP on Heroku: Deploying and Scaling Apps in the CloudPHP on Heroku: Deploying and Scaling Apps in the Cloud
PHP on Heroku: Deploying and Scaling Apps in the Cloud
Salesforce Developers
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
Paul Czarkowski
 
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipelineKubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeAcademy
 
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
DevDay.org
 
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 Cloud-native applications with Java and Kubernetes - Yehor Volkov Cloud-native applications with Java and Kubernetes - Yehor Volkov
Cloud-native applications with Java and Kubernetes - Yehor Volkov
Kuberton
 
Exploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in PythonExploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in Python
Ivan Ma
 
Kubernetes - training micro-dragons without getting burnt
Kubernetes -  training micro-dragons without getting burntKubernetes -  training micro-dragons without getting burnt
Kubernetes - training micro-dragons without getting burnt
Amir Moghimi
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAkshaya Mahapatra
 

Similar to How to create a multi tenancy for an interactive data analysis (20)

Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, OrchestrationThe Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
The Docker "Gauntlet" - Introduction, Ecosystem, Deployment, Orchestration
 
Bhushan m dev_ops_engr_31june
Bhushan m dev_ops_engr_31juneBhushan m dev_ops_engr_31june
Bhushan m dev_ops_engr_31june
 
Dockers zero to hero
Dockers zero to heroDockers zero to hero
Dockers zero to hero
 
Containerizing your Security Operations Center
Containerizing your Security Operations CenterContainerizing your Security Operations Center
Containerizing your Security Operations Center
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
 
Bhushan m dev_ops_engr_aws
Bhushan m dev_ops_engr_awsBhushan m dev_ops_engr_aws
Bhushan m dev_ops_engr_aws
 
Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)
 
Docker Security workshop slides
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
 
Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3Digital Forensics and Incident Response in The Cloud Part 3
Digital Forensics and Incident Response in The Cloud Part 3
 
Who is afraid of privileged containers ?
Who is afraid of privileged containers ?Who is afraid of privileged containers ?
Who is afraid of privileged containers ?
 
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 
PHP on Heroku: Deploying and Scaling Apps in the Cloud
PHP on Heroku: Deploying and Scaling Apps in the CloudPHP on Heroku: Deploying and Scaling Apps in the Cloud
PHP on Heroku: Deploying and Scaling Apps in the Cloud
 
A DevOps guide to Kubernetes
A DevOps guide to KubernetesA DevOps guide to Kubernetes
A DevOps guide to Kubernetes
 
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipelineKubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipeline
 
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
 
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 Cloud-native applications with Java and Kubernetes - Yehor Volkov Cloud-native applications with Java and Kubernetes - Yehor Volkov
Cloud-native applications with Java and Kubernetes - Yehor Volkov
 
Exploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in PythonExploring MySQL Operator for Kubernetes in Python
Exploring MySQL Operator for Kubernetes in Python
 
Kubernetes - training micro-dragons without getting burnt
Kubernetes -  training micro-dragons without getting burntKubernetes -  training micro-dragons without getting burnt
Kubernetes - training micro-dragons without getting burnt
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

How to create a multi tenancy for an interactive data analysis

  • 1. How-to create a multi tenancy for an interactive data analysis Spark Cluster + Livy + Zeppelin
  • 2. Introduction With this presentation you should be able to create an architecture for a framework of an interactive data analysis by using a Spark Cluster with Kerberos, a Livy Server and Zeppelin notebook with Kerberos authentication.
  • 3. Architecture This architecture enables the following: ● Transparent data-science development ● Upgrades on Cluster won’t affect the developments. ● Controlled access to the data and resources by Kerberos/Sentry. ● High availability. ● Several coding API’s (Scala, R, Python, PySpark, etc…).
  • 4. Pre-Assumptions 1. Cluster hostname: cm1.localdomain Zeppelin hostname: cm2.localdomain 2. Cluster supergroup: bdamanager 3. Cluster Manager: Cloudera Manager 5.12.2 4. Service Yarn Installed 5. Cluster Authentication Pre-Installed: Kerberos a. Kerberos Realm DOMAIN.COM 6. Chosen IDE: Zeppelin 7. Zeppelin Machine Authentication Not-Installed: Kerberos
  • 5. Livy server configuration Create User and Group for Livy sudo useradd livy sudo passwd livy sudo usermod -G bdamanager livy Create User Zeppelin for the IDE sudo useradd zeppelin sudo passwd zeppelin Note 1: due to the livy impersonation, livy should be added to the cluster supergroup, so you should replace the highlighted name with your supergroup name. Note 2: the chosen IDE it’s the zeppelin if you chose other just replace the highlighted field.
  • 6. Livy server configuration Download and installation su livy cd home/livy wget http://mirrors.up.pt/pub/apache/incubator/livy /0.5.0-incubating/livy-0.5.0-incubating-bin.zip unzip livy-0.5.0-incubating-bin.zip cd livy-0.5.0-incubating-bin/ mkdir logs cd conf/ mv livy.conf.template livy.conf mv livy-env.sh.template livy-env.sh mv livy-client.conf.template livy-client.conf Edit Livy environment variables nano livy-env.sh export SPARK_HOME=/opt/cloudera/parcels/CDH-5.12.2-1.cdh5.12.2.p0.4/lib/spark/ export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.12.2-1.cdh5.12.2.p0.4 export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/ export HADOOP_CONF_DIR=/etc/hadoop/conf export LIVY_HOME=/home/livy/livy-0.5.0-incubating-bin/ export LIVY_LOG_DIR=/var/log/livy2 export LIVY_SERVER_JAVA_OPTS="-Xmx2g" Make livy hive aware sudo ln -s /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml
  • 7. Livy server configuration Edit livy configuration file nano livy.conf # What spark master Livy sessions should use. livy.spark.master = yarn # What spark deploy mode Livy sessions should use. livy.spark.deploy-mode = cluster # If livy should impersonate the requesting users when creating a new session. livy.impersonation.enabled = true # Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected # on user request and then livy server classpath automatically. livy.repl.enable-hive-context = true
  • 8. Livy server configuration Edit livy configuration file # Add Kerberos Config livy.server.launch.kerberos.keytab = /home/livy/livy.keytab livy.server.launch.kerberos.principal=livy/cm1.localdomain@DOMAIN.COM livy.server.auth.type = kerberos livy.server.auth.kerberos.keytab=/home/livy/spnego.keytab livy.server.auth.kerberos.principal=HTTP/cm1.localdomain@DOMAIN.COM livy.server.access-control.enabled=true livy.server.access-control.users=zeppelin,livy livy.superusers=zeppelin,livy Note 1: on this example the chosen IDE is the zeppelin. Note 2: with livy.impersonation.enabled = true it implies that livy will be able to impersonate any user present on the Cluster (proxyUser). Note 3: with livy.server.auth.type = kerberos it implies that to interact with livy, requires to the user be correctly authenticated. Note 4: it’s only necessary to change the highlighted, ex: for your hostname.
  • 9. Livy server configuration Create Kerberos Livy and Zeppelin principal and keytabs sudo kadmin.local <<eoj addprinc -pw welcome1 livy/cm1.localdomain@DOMAIN.COM modprinc -maxrenewlife 1week livy/cm1.localdomain@DOMAIN.COM xst -norandkey -k /home/livy/livy.keytab livy/cm1.localdomain@DOMAIN.COM addprinc -pw welcome1 zeppelin/cm1.localdomain@DOMAIN.COM modprinc -maxrenewlife 1week zeppelin/cm1.localdomain@DOMAIN.COM xst -norandkey -k /home/livy/zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM xst -norandkey -k /home/livy/spnego.keytab HTTP/cm1.localdomain@DOMAIN.COM eoj Create Log Dir and add Permissions cd /home sudo chown -R livy:livy livy/ sudo mkdir /var/log/livy2 sudo chown -R livy:bdamanager /var/log/livy2 Note: it’s only necessary to change the highlighted names , for your hostname and for last your supergroup name..
  • 10. Cloudera configuration HUE - Create Users Livy, Zeppelin and add Livy to a Supergroup HDFS - Add Livy proxyuser permissions On the Cloudera Manager menu: HDFS > Advanced Configuration Snippet for core-site.xml you should add the following xml: <property> <name>hadoop.proxyuser.livy.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.livy.hosts</name> <value>*</value> </property>
  • 11. Interact with Livy server Start Livy server sudo -u livy /home/livy/livy-0.5.0-incubating-bin/bin/livy-server Verify that the server is running by connecting to its web UI, which uses the port 8998 (default) http://cm1.localdomain:8998/ui Authenticate with a user principal Example: kinit livy/cm1.localdomain@DOMAIN.COM kinit tpsimoes/cm1.localdomain@DOMAIN.COM Livy offers a REST APIs to create interactive sessions and therefore submit Spark code the same way you can do with a Spark shell or a PySpark shell. The following interaction examples with livy server will be in Python.
  • 12. Interact with Livy server Create session curl --negotiate -u:livy -H "Content-Type: application/json" -X POST -d '{"kind":"pyspark", "proxyUser": "livy"}' -i http://cm1.localdomain:8998/sessions curl --negotiate -u:livy -H "Content-Type: application/json" -X POST -d '{"kind":"spark", "proxyUser": "livy"}' -i http://cm1.localdomain:8998/sessions Check for sessions with details curl --negotiate -u:livy cm1.localdomain:8998/sessions | python -m json.tool Note 1: using livy with a kerberized cluster all commands must have --negotiate -u:user -or --negotiate -u:user:password Note 2: to create a different code language session just have to change the highlighted field
  • 13. Interact with Livy server Submit a job curl -H "Content-Type: application/json" -X POST -d '{"code":"2 + 2"}' -i --negotiate -u:livy cm1.localdomain:8998/sessions/0/statements {"id":0,"code":"2 + 2","state":"waiting","output":null,"progress":0.0} Check result from statement curl --negotiate -u:livy cm1.localdomain:8998/sessions/0/statements/0 {"id":0,"code":"2 + 2","state":"available","output":{"status":"ok","execution_count":0,"data":{"text/plain":"4"}},"progress":1.0}
  • 14. Interact with Livy server Submit another job curl -H "Content-Type: application/json" -X POST -d '{"code":"println(sc.parallelize(1 to 5).collect())"}' -i --negotiate -u:livy http://cm1.localdomain:8998/sessions/1/statements curl -H "Content-Type: application/json" -X POST -d '{"code":"a = 10"}' -i --negotiate -u:livy cm1.localdomain:8998/sessions/2/statements curl -H "Content-Type: application/json" -X POST -d '{"code":"a + 1"}' -i --negotiate -u:livy cm1.localdomain:8998/sessions/2/statements Submit another job curl --negotiate -u:livy cm1.localdomain:8998/sessions/0 -X DELETE Note: while submitting jobs or check for details pay attention to session number in the highlighted field ex: sessions/2
  • 15. Zeppelin Architecture Zeppelin it’s a multi-purpose notebook that enables: ● Data Ingestion & Discovery. ● Data Analytics. ● Data Visualization & Collaboration. And with the livy interpreter enables spark integration with a Multiple Language Backend.
  • 16. Configure Zeppelin Machine Download and Install UnlimitedJCEPolicyJDK8 from Oracle wget http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html unzip jce_policy-8.zip sudo cp local_policy.jar US_export_policy.jar /usr/java/jdk1.8.0_131/jre/lib/security/ Note: confirm the java directory and replace in the highlighted field.
  • 17. Configure Zeppelin Machine Assuming that on Zeppelin machine we will require kerberos authentication and it’s not installed, i’ll provide quick steps for the installation and respective configuration. Install Kerberos server and open ldap client sudo yum install -y krb5-server openldap-clients krb5-workstation Set Kerberos Realm sudo sed -i.orig 's/EXAMPLE.COM/DOMAIN.COM/g' cd Set the hostname for the Kerberos server sudo sed -i.m1 's/kerberos.example.com/cm1.localdomain/g' /etc/krb5.conf Change domain name to cloudera sudo sed -i.m2 's/example.com/DOMAIN.COM/g' /etc/krb5.conf Note: replace you hostname and realm on the highlighted fields.
  • 18. Configure Zeppelin Machine Create the Kerberos database sudo kdb5_util create -s Acl file needs to be updated so the */admin is enabled with admin privileges sudo sed -i 's/EXAMPLE.COM/DOMAIN.COM/' /var/kerberos/krb5kdc/kadm5.acl Update the kdc.conf file to allow renewable sudo sed -i.m3 '/supported_enctypes/a default_principal_flags = +renewable, +forwardable' /var/kerberos/krb5kdc/kdc.conf Fix the indenting sudo sed -i.m4 's/^default_principal_flags/ default_principal_flags/' /var/kerberos/krb5kdc/kdc.conf
  • 19. Configure Zeppelin Machine Update kdc.conf file sudo sed -i.orig 's/EXAMPLE.COM/DOMAIN.COM/g' /var/kerberos/krb5kdc/kdc.conf Acl file needs to be updated so the */admin is enabled with admin privileges sudo sed -i 's/EXAMPLE.COM/DOMAIN.COM/' /var/kerberos/krb5kdc/kadm5.acl Add a line to the file with ticket life sudo sed -i.m1 '/dict_file/a max_life = 1d' /var/kerberos/krb5kdc/kdc.conf Add a max renewable life sudo sed -i.m2 '/dict_file/a max_renewable_life = 7d' /var/kerberos/krb5kdc/kdc.conf Indent the two new lines in the file sudo sed -i.m3 's/^max_/ max_/' /var/kerberos/krb5kdc/kdc.conf
  • 20. Configure Zeppelin Machine Start up the kdc server and the admin server sudo service krb5kdc start; sudo service kadmin start; Make the kerberos services autostart sudo chkconfig kadmin on sudo chkconfig krb5kdc on Add a line to the file with ticket life sudo sed -i.m1 '/dict_file/a max_life = 1d' /var/kerberos/krb5kdc/kdc.conf Add a max renewable life sudo sed -i.m2 '/dict_file/a max_renewable_life = 7d' /var/kerberos/krb5kdc/kdc.conf Indent the two new lines in the file sudo sed -i.m3 's/^max_/ max_/' /var/kerberos/krb5kdc/kdc.conf
  • 21. Configure Zeppelin Machine Create Kerberos Livy and Zeppelin principal and keytabs sudo kadmin.local <<eoj addprinc -pw welcome1 zeppelin/cm1.localdomain@DOMAIN.COM modprinc -maxrenewlife 1week zeppelin/cm1.localdomain@DOMAIN.COM xst -norandkey -k /home/zeppelin/zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM eoj Set Hostname and make Zeppelin aware of Livy/Cluster Machine sudo /etc/hosts # Zeppelin IP HOST 10.222.33.200 cm2.localdomain # Livy/Cluster IP HOST 10.222.33.100 cm1.localdomain sudo hostname cm2.localdomain
  • 22. Configure Zeppelin Machine Set Hostname and make Zeppelin aware of Livy Machine sudo nano /etc/sysconfig/network NETWORKING=yes HOSTNAME=cm2.localdomain NTPSERVERARGS=iburst Disable SELinux sudo nano /etc/selinux/config SELINUX=disabled sudo setenforce 0 Clean iptables rules sudo iptables -F sudo nano /etc/rc.local iptables -F Note: after all operations it’s recommended a restart. Make executable to operation run at startup sudo chmod +x /etc/rc.d/rc.local Save iptables rules on restart sudo nano /etc/sysconfig/iptables-config # Save current firewall rules on restart. IPTABLES_SAVE_ON_RESTART="yes" Disable firewall sudo systemctl disable firewalld; sudo systemctl stop firewalld;
  • 23. Configure Zeppelin Machine Create User Zeppelin sudo useradd zeppelin sudo passwd zeppelin Add user Zeppelin to sudoers sudo cat /etc/sudoers ## Same thing without a password # %wheel ALL=(ALL) NOPASSWD: ALL zeppelin ALL=(ALL) NOPASSWD: ALL Note: on the highlighted fields you should replace with chosen IDE and the available java installation. Download and Install Zeppelin su zeppelin cd ~ wget http://mirrors.up.pt/pub/apache/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz tar -zxvf zeppelin-0.7.3-bin-all.tgz cd /home/ sudo chown -R zeppelin:zeppelin zeppelin/ Create Zeppelin environment variables cd /home/zeppelin/zeppelin-0.7.3-bin-all/conf cp zeppelin-env.sh.template zeppelin-env.sh cp zeppelin-site.xml.template zeppelin-site.xml Export Java properties export port JAVA_HOME=/usr/java/jdk1.7.0_67
  • 24. Configure Zeppelin Machine Add User Zeppelin and Authentication type on the configuration file (shiro.ini) cd /home/zeppelin/zeppelin-0.7.3-bin-all/conf/ nano shiro.ini [users] # List of users with their password allowed to access Zeppelin. # To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html admin = welcome1, admin zeppelin = welcome1, admin user2 = password3, role3 … [urls] # This section is used for url-based security. # anon means the access is anonymous. # authc means Form based Auth Security # To enfore security, comment the line below and uncomment the next one /api/version = authc /api/interpreter/** = authc, roles[admin] /api/configurations/** = authc, roles[admin] /api/credential/** = authc, roles[admin] #/** = anon /** = authc
  • 25. Interact with Zeppelin Kinit User cd /home/zeppelin/ kinit -kt zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM Start/Stop Zeppelin cd ~/zeppelin-0.7.3-bin-all sudo ./bin/zeppelin-daemon.sh start sudo ./bin/zeppelin-daemon.sh stop Open Zeppelin UI http://cm2.localdomain:8080/#/ Note: change with your hostname and domain in the highlighted field. Login Zeppelin User Create Livy Notebook
  • 26. Interact with Zeppelin Configure Livy Interpreter zeppelin.livy.keytab: zeppelin/cm1.localdomain@DOMAIN.COM zeppelin.livy.principal: /home/zeppelin/zeppelin.keytab zeppelin.livy.url: http://cm1.localdomain:8998 Using Livy Interpreter spark %livy.spark sc.version sparkR %livy.sparkr hello <- function( name ) { sprintf( "Hello, %s", name ); } hello("livy") pyspark %livy.pyspark print "1"
  • 27. Interact with Zeppelin Using Livy Interpreter %pyspark from pyspark.sql import HiveContext hiveCtx= HiveContext(sc) hiveCtx.sql("show databases").show() hiveCtx.sql("select current_user()").show() Note: due to livy impersonation, we will see every database on Hive Metadata, but on a valid user can access to the correspondent data. %pyspark from pyspark.sql import HiveContext hiveCtx.sql("select * from notMyDB.TAB_TPS").show() hiveCtx.sql("Create External Table myDB.TAB_TST (Operation_Type String, Operation String)") hiveCtx.sql("Insert Into Table myDB.TAB_TST select 'ZEPPELIN','FIRST'") hiveCtx.sql("select * from myDB.TAB_TST").show()
  • 28. Interact with Zeppelin Using Livy Interpreter %livy.pyspark from pyspark.sql import HiveContext sc._conf.setAppName("Zeppelin-HiveOnSpark") hiveCtx = HiveContext(sc) hiveCtx.sql("set yarn.nodemanager.resource.cpu-vcores=4") hiveCtx.sql("set yarn.nodemanager.resource.memory-mb=16384") hiveCtx.sql("set yarn.scheduler.maximum-allocation-vcores=4") hiveCtx.sql("set yarn.scheduler.minimum-allocation-mb=4096") hiveCtx.sql("set yarn.scheduler.maximum-allocation-mb=8192") hiveCtx.sql("set spark.executor.memory=1684354560") hiveCtx.sql("set spark.yarn.executor.memoryOverhead=1000") hiveCtx.sql("set spark.driver.memory=10843545604") hiveCtx.sql("set spark.yarn.driver.memoryOverhead=800") hiveCtx.sql("set spark.executor.instances=10") hiveCtx.sql("set spark.executor.cores=8") hiveCtx.sql("set hive.map.aggr.hash.percentmemory=0.7") hiveCtx.sql("set hive.limit.pushdown.memory.usage=0.5") countryList = hiveCtx.sql("select distinct country from myDB.SALES_WORLD") countryList.show(4)