How to scheduled jobs in a cloudera cluster without oozieTiago Simões
This presentation, it’s for everyone that is looking for an oozie alternative to scheduled jobs in a secured Cloudera Cluster.With this, you will be able to add and configure the Airflow Service an manage it with in Cloudera Manager.
How to create a secured cloudera clusterTiago Simões
This presentation, it’s for everyone that is curious with Big Data and does have the know how to start learning...
With this, you will be able to create quickly a Kerberos secured Cloudera Cluster.
How to configure a hive high availability connection with zeppelinTiago Simões
With this presentation, you not only should be able to configure a Hive Interpreter on Zeppelin but also with a High Availability, Load balancing and Concurrency architecture.
It will be created a JDBC connection with kerberos authentication that will communicate with your Zookeeper on the cluster.
How to implement a gdpr solution in a cloudera architectureTiago Simões
Since the implementation of GDPR regulation, all data processors across the world have been struggling to be GDPR compliant and also deal with the new reality in Big Data, that data is constantly drifting and mutating.
In this presentation, the approach will be:
Cloudera architecture
No additional financial cost
Masking & Encrypting
How to create a multi tenancy for an interactive data analysis with jupyter h...Tiago Simões
With this presentation you should be able to create an architecture for a framework of an interactive data analysis by using a Cloudera Spark Cluster with Kerberos, a Jupyter machine with JupyterHub and authentication via LDAP.
How to scheduled jobs in a cloudera cluster without oozieTiago Simões
This presentation, it’s for everyone that is looking for an oozie alternative to scheduled jobs in a secured Cloudera Cluster.With this, you will be able to add and configure the Airflow Service an manage it with in Cloudera Manager.
How to create a secured cloudera clusterTiago Simões
This presentation, it’s for everyone that is curious with Big Data and does have the know how to start learning...
With this, you will be able to create quickly a Kerberos secured Cloudera Cluster.
How to configure a hive high availability connection with zeppelinTiago Simões
With this presentation, you not only should be able to configure a Hive Interpreter on Zeppelin but also with a High Availability, Load balancing and Concurrency architecture.
It will be created a JDBC connection with kerberos authentication that will communicate with your Zookeeper on the cluster.
How to implement a gdpr solution in a cloudera architectureTiago Simões
Since the implementation of GDPR regulation, all data processors across the world have been struggling to be GDPR compliant and also deal with the new reality in Big Data, that data is constantly drifting and mutating.
In this presentation, the approach will be:
Cloudera architecture
No additional financial cost
Masking & Encrypting
How to create a multi tenancy for an interactive data analysis with jupyter h...Tiago Simões
With this presentation you should be able to create an architecture for a framework of an interactive data analysis by using a Cloudera Spark Cluster with Kerberos, a Jupyter machine with JupyterHub and authentication via LDAP.
This session will quickly show you how to describe the security configuration of your Kafka cluster in an AsyncAPI document. And if you've been given an AsyncAPI document, this session will show you how to use that to configure a Kafka client or application to connect to the cluster, using the details in the AsyncAPI spec.
기존에 저희 회사에서 사용하던 모니터링은 Zabbix 였습니다.
컨테이너 모니터링 부분으로 옮겨가면서 변화가 필요하였고, 이에 대해서 프로메테우스를 활용한 모니터링 방법을 자연스럽게 고민하게 되었습니다.
이에 이영주님께서 테크세션을 진행하였고, 이에 발표자료를 올립니다.
5개의 부분으로 구성되어 있으며, 세팅 방법에 대한 내용까지 포함합니다.
01. Prometheus?
02. Usage
03. Alertmanager
04. Cluster
05. Performance
EFK Stack이란 ElasticSearch, Fluentd, Kibana라는 오픈소스의 조합으로, 방대한 양의 데이터를 신속하고 실시간으로 수집/저장/분석/시각화 할 수 있는 솔루션입니다. 특히 컨테이너 환경에서 로그 수집을 위해 주로 사용되는 기술 스택입니다.
Elasitc Stack에 대한 소개와 EFK Stack 설치 방법에 대해 설명합니다.
IT Infrastructure Through The Public Network Challenges And SolutionsMartin Jackson
Identifying the challenges that companies face when they wish to adopt Infrastructure as a Service like those from Amazon and Rackspace and possible solutions to those problems. This presentation seeks to provide insight and possible solutions, covering the areas of security, availability, cloud standards, interoperability, vendor lock in and performance management.
This session will quickly show you how to describe the security configuration of your Kafka cluster in an AsyncAPI document. And if you've been given an AsyncAPI document, this session will show you how to use that to configure a Kafka client or application to connect to the cluster, using the details in the AsyncAPI spec.
기존에 저희 회사에서 사용하던 모니터링은 Zabbix 였습니다.
컨테이너 모니터링 부분으로 옮겨가면서 변화가 필요하였고, 이에 대해서 프로메테우스를 활용한 모니터링 방법을 자연스럽게 고민하게 되었습니다.
이에 이영주님께서 테크세션을 진행하였고, 이에 발표자료를 올립니다.
5개의 부분으로 구성되어 있으며, 세팅 방법에 대한 내용까지 포함합니다.
01. Prometheus?
02. Usage
03. Alertmanager
04. Cluster
05. Performance
EFK Stack이란 ElasticSearch, Fluentd, Kibana라는 오픈소스의 조합으로, 방대한 양의 데이터를 신속하고 실시간으로 수집/저장/분석/시각화 할 수 있는 솔루션입니다. 특히 컨테이너 환경에서 로그 수집을 위해 주로 사용되는 기술 스택입니다.
Elasitc Stack에 대한 소개와 EFK Stack 설치 방법에 대해 설명합니다.
IT Infrastructure Through The Public Network Challenges And SolutionsMartin Jackson
Identifying the challenges that companies face when they wish to adopt Infrastructure as a Service like those from Amazon and Rackspace and possible solutions to those problems. This presentation seeks to provide insight and possible solutions, covering the areas of security, availability, cloud standards, interoperability, vendor lock in and performance management.
présentation de l'utilisation de Docker, du niveau 0 "je joue avec sur mon poste" au niveau Docker Hero "je tourne en prod".
Ce talk fait suite à l'intro de @dgageot et ne comporte donc pas l'intro "c'est quoi Docker ?".
Containerizing your Security Operations CenterJimmy Mesta
AppSec USA 2016 talk on using containers and Kubernetes to manage a variety of security tools. Includes best practices for securing Kubernetes implementations.
Bare Metal to OpenStack with Razor and ChefMatt Ray
Slides from the OpenStack Spring 2013 Summit workshop presented by Egle Sigler (@eglute) and Matt Ray (@mattray) from Rackspace and Opscode respectively. Please refer to http://anystacker.com/ for additional content.
Build Your Own CaaS (Container as a Service)HungWei Chiu
In this slide, I introduce the kubernetes and show an example what is CaaS and what it can provides.
Besides, I also introduce how to setup a continuous integration and continuous deployment for the CaaS platform.
Get hands-on with security features and best practices to protect your containerized services. Learn to push and verify signed images with Docker Content Trust, and collaborate with delegation roles. Intermediate to advanced level Docker experience recommended, participants will be building and pushing with Docker during the workshop.
Led By Docker Security Experts:
Riyaz Faizullabhoy
David Lawrence
Viktor Stanchev
Experience Level: Intermediate to advanced level Docker experience recommended
Do any VM's contain a particular indicator of compromise? E.g. Run a YARA signature over all executables on my virtual machines and tell me which ones match.
Who is afraid of privileged containers ?Marko Bevc
This talk will focus on a possible privilege escalation to bypass RBAC rules when running privileged containers without any security policies in place. We will also do a live demo and show how this can be achieved in AWS EKS cluster. Afterwards we will show how to remediate this using PodSecurityPolicies and what to watch for when implementing those in an active cluster.
In addition to authorization policies that control what a user can do, OpenShift Container Platform gives its administrators the ability to manage a set of security context constraints (SCCs) for limiting pods and securing their cluster.
Default security context may be too restrictive for containers pulled down from DockerHub, thorugh this talk we'll explore the various steps to execute for enabling required permissions on selected OpenShift's pods.
Join us to discover how to use the PHP frameworks and tools you love in the Cloud with Heroku. We will cover best practices for deploying and scaling your PHP apps and show you how easy it can be. We will show you examples of how to deploy your code from Git and use Composer to manage dependencies during deployment. You will also discover how to maintain parity through all your environments, from development to production. If your apps are database-driven, you can also instantly create a database from the Heroku add-ons and have it automatically attached to your PHP app. Horizontal scalability has always been at the core of PHP application design, and by using Heroku for your PHP apps, you can focus on code features, not infrastructure.
KubeCon EU 2016: Leveraging ephemeral namespaces in a CI/CD pipelineKubeAcademy
One of the most underrated features of Kubernetes is namespaces. In the market, instead of using this feature, people are still stuck with having different clusters for their environments. This talk will try to break this approach, and will introduce how we end up using ephemeral namespaces within our CI/CD pipeline. It will cover the architecture of our system for running the user acceptance tests on isolated ephemeral namespaces with every bits and pieces running within pods. While doing this, we will set up our CI/CD pipeline on top of TravisCI, GoCD, and Selenium that is controlled by Nightwatch.js.
Sched Link: http://sched.co/6Bcb
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...DevDay.org
This session discusses OpenShift Enterprise (or OpenShift Container Platform). OpenShift Container Platform is Red Hat's on-premise private platform as a service product, built around a core of application containers powered by Docker, with orchestration and management provided by Kubernetes, on a foundation of Red Hat Enterprise Linux.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
How to create a multi tenancy for an interactive data analysis
1. How-to create a multi
tenancy for an interactive
data analysis
Spark Cluster + Livy + Zeppelin
2. Introduction
With this presentation you should be able to create an architecture for a framework of an
interactive data analysis by using a Spark Cluster with Kerberos, a Livy Server and
Zeppelin notebook with Kerberos authentication.
3. Architecture
This architecture enables the following:
● Transparent data-science development
● Upgrades on Cluster won’t affect the
developments.
● Controlled access to the data and
resources by Kerberos/Sentry.
● High availability.
● Several coding API’s (Scala, R, Python,
PySpark, etc…).
5. Livy server configuration
Create User and Group for Livy
sudo useradd livy
sudo passwd livy
sudo usermod -G bdamanager livy
Create User Zeppelin for the IDE
sudo useradd zeppelin
sudo passwd zeppelin
Note 1: due to the livy impersonation, livy should be added to the cluster supergroup, so you should replace the highlighted name with your
supergroup name.
Note 2: the chosen IDE it’s the zeppelin if you chose other just replace the highlighted field.
6. Livy server configuration
Download and installation
su livy
cd home/livy
wget
http://mirrors.up.pt/pub/apache/incubator/livy
/0.5.0-incubating/livy-0.5.0-incubating-bin.zip
unzip livy-0.5.0-incubating-bin.zip
cd livy-0.5.0-incubating-bin/
mkdir logs
cd conf/
mv livy.conf.template livy.conf
mv livy-env.sh.template livy-env.sh
mv livy-client.conf.template livy-client.conf
Edit Livy environment variables
nano livy-env.sh
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.12.2-1.cdh5.12.2.p0.4/lib/spark/
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.12.2-1.cdh5.12.2.p0.4
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/
export HADOOP_CONF_DIR=/etc/hadoop/conf
export LIVY_HOME=/home/livy/livy-0.5.0-incubating-bin/
export LIVY_LOG_DIR=/var/log/livy2
export LIVY_SERVER_JAVA_OPTS="-Xmx2g"
Make livy hive aware
sudo ln -s /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml
7. Livy server configuration
Edit livy configuration file
nano livy.conf
# What spark master Livy sessions should use.
livy.spark.master = yarn
# What spark deploy mode Livy sessions should use.
livy.spark.deploy-mode = cluster
# If livy should impersonate the requesting users when creating a new session.
livy.impersonation.enabled = true
# Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
# on user request and then livy server classpath automatically.
livy.repl.enable-hive-context = true
8. Livy server configuration
Edit livy configuration file
# Add Kerberos Config
livy.server.launch.kerberos.keytab = /home/livy/livy.keytab
livy.server.launch.kerberos.principal=livy/cm1.localdomain@DOMAIN.COM
livy.server.auth.type = kerberos
livy.server.auth.kerberos.keytab=/home/livy/spnego.keytab
livy.server.auth.kerberos.principal=HTTP/cm1.localdomain@DOMAIN.COM
livy.server.access-control.enabled=true
livy.server.access-control.users=zeppelin,livy
livy.superusers=zeppelin,livy
Note 1: on this example the chosen IDE is the zeppelin.
Note 2: with livy.impersonation.enabled = true it implies that livy will be able to impersonate any user present on the Cluster (proxyUser).
Note 3: with livy.server.auth.type = kerberos it implies that to interact with livy, requires to the user be correctly authenticated.
Note 4: it’s only necessary to change the highlighted, ex: for your hostname.
9. Livy server configuration
Create Kerberos Livy and Zeppelin principal and keytabs
sudo kadmin.local <<eoj
addprinc -pw welcome1 livy/cm1.localdomain@DOMAIN.COM
modprinc -maxrenewlife 1week livy/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/livy/livy.keytab livy/cm1.localdomain@DOMAIN.COM
addprinc -pw welcome1 zeppelin/cm1.localdomain@DOMAIN.COM
modprinc -maxrenewlife 1week zeppelin/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/livy/zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/livy/spnego.keytab HTTP/cm1.localdomain@DOMAIN.COM
eoj
Create Log Dir and add Permissions
cd /home
sudo chown -R livy:livy livy/
sudo mkdir /var/log/livy2
sudo chown -R livy:bdamanager /var/log/livy2
Note: it’s only necessary to change the highlighted names , for your hostname and for last your supergroup name..
10. Cloudera configuration
HUE - Create Users Livy, Zeppelin and add Livy to a Supergroup HDFS - Add Livy proxyuser permissions
On the Cloudera Manager menu:
HDFS > Advanced Configuration Snippet for core-site.xml
you should add the following xml:
<property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
</property>
11. Interact with Livy server
Start Livy server
sudo -u livy /home/livy/livy-0.5.0-incubating-bin/bin/livy-server
Verify that the server is running by connecting to its web UI, which uses the port 8998 (default)
http://cm1.localdomain:8998/ui
Authenticate with a user principal
Example:
kinit livy/cm1.localdomain@DOMAIN.COM
kinit tpsimoes/cm1.localdomain@DOMAIN.COM
Livy offers a REST APIs to create interactive sessions and therefore submit Spark code the same way you can do with a
Spark shell or a PySpark shell. The following interaction examples with livy server will be in Python.
12. Interact with Livy server
Create session
curl --negotiate -u:livy -H "Content-Type: application/json" -X POST -d '{"kind":"pyspark", "proxyUser": "livy"}' -i
http://cm1.localdomain:8998/sessions
curl --negotiate -u:livy -H "Content-Type: application/json" -X POST -d '{"kind":"spark", "proxyUser": "livy"}' -i
http://cm1.localdomain:8998/sessions
Check for sessions with details
curl --negotiate -u:livy cm1.localdomain:8998/sessions | python -m json.tool
Note 1: using livy with a kerberized cluster all commands must have --negotiate -u:user -or --negotiate -u:user:password
Note 2: to create a different code language session just have to change the highlighted field
13. Interact with Livy server
Submit a job
curl -H "Content-Type: application/json" -X POST -d '{"code":"2 + 2"}' -i --negotiate -u:livy
cm1.localdomain:8998/sessions/0/statements
{"id":0,"code":"2 + 2","state":"waiting","output":null,"progress":0.0}
Check result from statement
curl --negotiate -u:livy cm1.localdomain:8998/sessions/0/statements/0
{"id":0,"code":"2 + 2","state":"available","output":{"status":"ok","execution_count":0,"data":{"text/plain":"4"}},"progress":1.0}
14. Interact with Livy server
Submit another job
curl -H "Content-Type: application/json" -X POST -d '{"code":"println(sc.parallelize(1 to 5).collect())"}' -i --negotiate -u:livy
http://cm1.localdomain:8998/sessions/1/statements
curl -H "Content-Type: application/json" -X POST -d '{"code":"a = 10"}' -i --negotiate -u:livy
cm1.localdomain:8998/sessions/2/statements
curl -H "Content-Type: application/json" -X POST -d '{"code":"a + 1"}' -i --negotiate -u:livy
cm1.localdomain:8998/sessions/2/statements
Submit another job
curl --negotiate -u:livy cm1.localdomain:8998/sessions/0 -X DELETE
Note: while submitting jobs or check for details pay attention to session number in the highlighted field ex: sessions/2
15. Zeppelin Architecture
Zeppelin it’s a multi-purpose notebook that
enables:
● Data Ingestion & Discovery.
● Data Analytics.
● Data Visualization & Collaboration.
And with the livy interpreter enables spark
integration with a Multiple Language Backend.
16. Configure Zeppelin Machine
Download and Install UnlimitedJCEPolicyJDK8 from Oracle
wget http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
unzip jce_policy-8.zip
sudo cp local_policy.jar US_export_policy.jar /usr/java/jdk1.8.0_131/jre/lib/security/
Note: confirm the java directory and replace in the highlighted field.
17. Configure Zeppelin Machine
Assuming that on Zeppelin machine we will require kerberos authentication and it’s not installed, i’ll
provide quick steps for the installation and respective configuration.
Install Kerberos server and open ldap client
sudo yum install -y krb5-server openldap-clients krb5-workstation
Set Kerberos Realm
sudo sed -i.orig 's/EXAMPLE.COM/DOMAIN.COM/g' cd
Set the hostname for the Kerberos server
sudo sed -i.m1 's/kerberos.example.com/cm1.localdomain/g' /etc/krb5.conf
Change domain name to cloudera
sudo sed -i.m2 's/example.com/DOMAIN.COM/g' /etc/krb5.conf
Note: replace you hostname and realm on the highlighted fields.
18. Configure Zeppelin Machine
Create the Kerberos database
sudo kdb5_util create -s
Acl file needs to be updated so the */admin is enabled with admin privileges
sudo sed -i 's/EXAMPLE.COM/DOMAIN.COM/' /var/kerberos/krb5kdc/kadm5.acl
Update the kdc.conf file to allow renewable
sudo sed -i.m3 '/supported_enctypes/a default_principal_flags = +renewable, +forwardable'
/var/kerberos/krb5kdc/kdc.conf
Fix the indenting
sudo sed -i.m4 's/^default_principal_flags/ default_principal_flags/' /var/kerberos/krb5kdc/kdc.conf
19. Configure Zeppelin Machine
Update kdc.conf file
sudo sed -i.orig 's/EXAMPLE.COM/DOMAIN.COM/g' /var/kerberos/krb5kdc/kdc.conf
Acl file needs to be updated so the */admin is enabled with admin privileges
sudo sed -i 's/EXAMPLE.COM/DOMAIN.COM/' /var/kerberos/krb5kdc/kadm5.acl
Add a line to the file with ticket life
sudo sed -i.m1 '/dict_file/a max_life = 1d' /var/kerberos/krb5kdc/kdc.conf
Add a max renewable life
sudo sed -i.m2 '/dict_file/a max_renewable_life = 7d' /var/kerberos/krb5kdc/kdc.conf
Indent the two new lines in the file
sudo sed -i.m3 's/^max_/ max_/' /var/kerberos/krb5kdc/kdc.conf
20. Configure Zeppelin Machine
Start up the kdc server and the admin server
sudo service krb5kdc start;
sudo service kadmin start;
Make the kerberos services autostart
sudo chkconfig kadmin on
sudo chkconfig krb5kdc on
Add a line to the file with ticket life
sudo sed -i.m1 '/dict_file/a max_life = 1d' /var/kerberos/krb5kdc/kdc.conf
Add a max renewable life
sudo sed -i.m2 '/dict_file/a max_renewable_life = 7d' /var/kerberos/krb5kdc/kdc.conf
Indent the two new lines in the file
sudo sed -i.m3 's/^max_/ max_/' /var/kerberos/krb5kdc/kdc.conf
21. Configure Zeppelin Machine
Create Kerberos Livy and Zeppelin principal and keytabs
sudo kadmin.local <<eoj
addprinc -pw welcome1 zeppelin/cm1.localdomain@DOMAIN.COM
modprinc -maxrenewlife 1week zeppelin/cm1.localdomain@DOMAIN.COM
xst -norandkey -k /home/zeppelin/zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM
eoj
Set Hostname and make Zeppelin aware of Livy/Cluster Machine
sudo /etc/hosts
# Zeppelin IP HOST
10.222.33.200 cm2.localdomain
# Livy/Cluster IP HOST
10.222.33.100 cm1.localdomain
sudo hostname cm2.localdomain
22. Configure Zeppelin Machine
Set Hostname and make Zeppelin aware of Livy Machine
sudo nano /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=cm2.localdomain
NTPSERVERARGS=iburst
Disable SELinux
sudo nano /etc/selinux/config
SELINUX=disabled
sudo setenforce 0
Clean iptables rules
sudo iptables -F
sudo nano /etc/rc.local
iptables -F
Note: after all operations it’s recommended a restart.
Make executable to operation run at startup
sudo chmod +x /etc/rc.d/rc.local
Save iptables rules on restart
sudo nano /etc/sysconfig/iptables-config
# Save current firewall rules on restart.
IPTABLES_SAVE_ON_RESTART="yes"
Disable firewall
sudo systemctl disable firewalld;
sudo systemctl stop firewalld;
23. Configure Zeppelin Machine
Create User Zeppelin
sudo useradd zeppelin
sudo passwd zeppelin
Add user Zeppelin to sudoers
sudo cat /etc/sudoers
## Same thing without a password
# %wheel ALL=(ALL) NOPASSWD: ALL
zeppelin ALL=(ALL) NOPASSWD: ALL
Note: on the highlighted fields you should replace with chosen IDE and the available java installation.
Download and Install Zeppelin
su zeppelin
cd ~
wget http://mirrors.up.pt/pub/apache/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
tar -zxvf zeppelin-0.7.3-bin-all.tgz
cd /home/
sudo chown -R zeppelin:zeppelin zeppelin/
Create Zeppelin environment variables
cd /home/zeppelin/zeppelin-0.7.3-bin-all/conf
cp zeppelin-env.sh.template zeppelin-env.sh
cp zeppelin-site.xml.template zeppelin-site.xml
Export Java properties
export port JAVA_HOME=/usr/java/jdk1.7.0_67
24. Configure Zeppelin Machine
Add User Zeppelin and Authentication type on the configuration file (shiro.ini)
cd /home/zeppelin/zeppelin-0.7.3-bin-all/conf/
nano shiro.ini
[users]
# List of users with their password allowed to access Zeppelin.
# To use a different strategy (LDAP / Database / ...) check the shiro doc at http://shiro.apache.org/configuration.html
admin = welcome1, admin
zeppelin = welcome1, admin
user2 = password3, role3
…
[urls]
# This section is used for url-based security.
# anon means the access is anonymous.
# authc means Form based Auth Security
# To enfore security, comment the line below and uncomment the next one
/api/version = authc
/api/interpreter/** = authc, roles[admin]
/api/configurations/** = authc, roles[admin]
/api/credential/** = authc, roles[admin]
#/** = anon
/** = authc
25. Interact with Zeppelin
Kinit User
cd /home/zeppelin/
kinit -kt zeppelin.keytab zeppelin/cm1.localdomain@DOMAIN.COM
Start/Stop Zeppelin
cd ~/zeppelin-0.7.3-bin-all
sudo ./bin/zeppelin-daemon.sh start
sudo ./bin/zeppelin-daemon.sh stop
Open Zeppelin UI
http://cm2.localdomain:8080/#/
Note: change with your hostname and domain in the highlighted field.
Login Zeppelin User
Create Livy Notebook
26. Interact with Zeppelin
Configure Livy Interpreter
zeppelin.livy.keytab: zeppelin/cm1.localdomain@DOMAIN.COM
zeppelin.livy.principal: /home/zeppelin/zeppelin.keytab
zeppelin.livy.url: http://cm1.localdomain:8998
Using Livy Interpreter
spark
%livy.spark
sc.version
sparkR
%livy.sparkr
hello <- function( name ) {
sprintf( "Hello, %s", name );
}
hello("livy")
pyspark
%livy.pyspark
print "1"
27. Interact with Zeppelin
Using Livy Interpreter
%pyspark
from pyspark.sql import HiveContext
hiveCtx= HiveContext(sc)
hiveCtx.sql("show databases").show()
hiveCtx.sql("select current_user()").show()
Note: due to livy impersonation, we will see every database on Hive Metadata, but on a valid user can access to the correspondent data.
%pyspark
from pyspark.sql import HiveContext
hiveCtx.sql("select * from notMyDB.TAB_TPS").show()
hiveCtx.sql("Create External Table myDB.TAB_TST (Operation_Type String, Operation String)")
hiveCtx.sql("Insert Into Table myDB.TAB_TST select 'ZEPPELIN','FIRST'")
hiveCtx.sql("select * from myDB.TAB_TST").show()
28. Interact with Zeppelin
Using Livy Interpreter
%livy.pyspark
from pyspark.sql import HiveContext
sc._conf.setAppName("Zeppelin-HiveOnSpark")
hiveCtx = HiveContext(sc)
hiveCtx.sql("set yarn.nodemanager.resource.cpu-vcores=4")
hiveCtx.sql("set yarn.nodemanager.resource.memory-mb=16384")
hiveCtx.sql("set yarn.scheduler.maximum-allocation-vcores=4")
hiveCtx.sql("set yarn.scheduler.minimum-allocation-mb=4096")
hiveCtx.sql("set yarn.scheduler.maximum-allocation-mb=8192")
hiveCtx.sql("set spark.executor.memory=1684354560")
hiveCtx.sql("set spark.yarn.executor.memoryOverhead=1000")
hiveCtx.sql("set spark.driver.memory=10843545604")
hiveCtx.sql("set spark.yarn.driver.memoryOverhead=800")
hiveCtx.sql("set spark.executor.instances=10")
hiveCtx.sql("set spark.executor.cores=8")
hiveCtx.sql("set hive.map.aggr.hash.percentmemory=0.7")
hiveCtx.sql("set hive.limit.pushdown.memory.usage=0.5")
countryList = hiveCtx.sql("select distinct country from myDB.SALES_WORLD")
countryList.show(4)