SlideShare a Scribd company logo
Running Zeppelin in Production
Vinay Shukla
Product Management, Director
Twitter: @neomythos
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Whoami?
 Product Management
 Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years
 Blog at www.vinayshukla.com
 Twitter: @neomythos
 Addicted to Yoga, Hiking, & Coffee
 Smallest contributor to Apache Zeppelin
Programmer > Product Management > Programmer > Product Management
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Apache Zeppelin?
 Browser based access to Big Data
 Make Spark accessible to more users
 Abstract users from dealing with Kerberos
 Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters
 Beautiful Visualization built in, easy to extend
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How does Zeppelin work?
Notebook
Author
Collaborators/R
eport viewer
Zeppelin
Cluster
Spark | Hive | HBase
Any of 30+ back
ends/Interpreters
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Architecture
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Interpreter Modes
 Basic unit of work is Note
– Note has paragraphs
 3 Modes
– Shared (All notes use same Interpreter process & Interpreter group)
– Scoped (Notes still shared the process, but separate interpreter group, possible to share objects)
– Isolated (Each note runs its own interpeter process & group)
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploying Zeppelin
 Master Node
 Worker Node
 Management Node
 Client/Gateway Node ✔
Node Choices
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Deployment
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
Hadoop Cluster
Hive
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with Spark
Ex
Spark on YARN
Zeppelin
Spark-
Shell
Ex
Spark
Thrift
Server
Livy
REST
Server
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
D
r
i
v
e
r
Spark Driver
Livy
REST
Server
D
r
i
v
e
r
With Livy
Interpreter
Built In Spark
Interpreter
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin Security
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin: Authentication + SSL
Spark on YARN
Ex Ex
LDAP
John Doe
1
2
3
SSL
Firewall
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Security in Apache Zeppelin?
Zeppelin leverages Apache Shiro for
authentication/authorization
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example Shiro.ini
# =======================
# Shiro INI configuration
# =======================
[main]
## LDAP/AD configuration
[users]
# The 'users' section is for simple deployments
# when you only need a small number of statically-defined
# set of User accounts.
[urls]
# The 'urls' section is used for url-based security
#
Edit with Ambari or your
favorite text editor
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP Authentication in Zeppelin
 LDAP Bind
– uid=jsmith,ou=users,dc=mycompany,dc=com
– uid={0},ou=users,dc=mycompany,dc=com
– ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com
 LDAP Search
– ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net
– ldapRealm.contextFactory.systemPassword=SomePassw0rd
– ldapRealm.contextFactory.authenticationMechanism=simple
– ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net
– ldapRealm.userObjectClass=person
– ldapRealm.groupObjectClass=group
– ldapRealm.userSearchAttributeName = sAMAccountName
– # Set search scopes for user and group. Values: subtree (default), onelevel, object
– ldapRealm.userSearchScope = subtree
– ldapRealm.groupSearchScope = subtree
– ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0})
– ldapRealm.memberAttribute=member
http://bit.ly/2rMTgLw
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Want to connect to LDAP over SSL?
 Change protocol to ldaps in shiro.ini
ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636
 If LDAP is using self signed certificate, import the certificate into truststore of JVM running
Zeppelin
echo -n | openssl s_client –connect ldap.example.com:389 | 
sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' >
/tmp/examplecert.crt
keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts 
-storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid LDAP password in clear in shiro.ini
 Create an entry for AD credential
–Zeppelin leverages Hadoop Credential API
–hadoop credential create
ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks
Enter password:
Enter password again:
ldapRealm.contextFactory.systemPassword has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
 Make credentials.jceks only Zeppelin user readable
 chmod 400 with only Zeppelin process r/w access, no other user allowed access to
credentials
 Edit shiro.in
 ldapRealm.contextFactory.systemPassword -provider
jceks://etc/zeppelin/conf/credentials.jceks
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Avoid JDBC password in shiro.ini
 Create a credential for JDBC password in Hadoop Credential store
hadoop credential create jdbc.password -provider
jceks://file/user/zeppelin/conf/zeppelin.jceks
 Use the credential in shiro.in
default.jceks.credentialKey jdbc.password
default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks
 Details at JIRA ZEPPELIN-1935
JDBC password only needed
for non-hive ID, Hive leverage
ID propagation
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation in Zeppelin
 Interpreter Dependent
– Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Identity Propagation with Livy
Zeppelin
Spark
Yarn
Livy
Ispark Group
Interpreter
SPNego: Kerberos Kerberos/RPC
Livy APIs
LDAP
John Doe
Job runs as John Doe
LDAP/LDAPS
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization in Zeppelin
 Control access to Note
 Grant Permissions (Owner, Reader, Writer)
to users/groups on Notes
 LDAP Group integration
 Control access to Zeppelin UI
 Allow only admins to configure interpreter
 Configured in shiro.ini
 For Spark with Zeppelin > Livy > Spark
 Identity Propagation Jobs run as End-User
 For Hive with Zeppelin > JDBC interpreter
 Leverage Ranger based Row/Column
Security for Hive SparkSQL
 Shell Interpreter
 Runs as end-user
Authorization in Zeppelin Authorization at Data Level
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Control Who can modify Interpreter Settings
[urls]
/api/interpreter/** = authc, roles[admin_role]
/api/configurations/** = authc, roles[admin_role]
/api/credential/** = authc, roles[admin_role]
 Step 1
– Define Protected URL pattern in Shiro.ini
– Assign URL patterns to a role
 Step 2
– Map role to LDAP group
ldapRealm.rolesByGroup = "hadoop-admins":admin_role
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scalability & HA
 Memory/Core for Zeppelin Server
 Consider (20-30 GB)
 Memory/Core for Zeppelin Interpreter
 (4-8 GB)
 Memory/Core for Livy
 (4-8 GB)
 Memory/Core for Spark
 Depends on Spark Jobs (See Spark
Performance Tuning)
 https://spark.apache.org/docs/latest/tuning
.html
 Horizontal Scaling
 Spin up multiple Zeppelin instance
 Need external load balancer
 Sticky sessions
Scalability HA
Shared Storage
Shared Configuration
Communication between Z & Interpreters
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Using R & Python with Zeppelin
 Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter
 Deploy R/Python binaries on all worker node
 Leverage Livy Interpreter for SparkR & PySpark
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy2 OOB Job as admin user fails
 Scenario: Simple HDP 2.6 Install with default config
 Failure : Livy 2 Interpreter job fails as admin user
 Reason: HDFS dir does not exist /user/admin
 Work Around: Manually create /user/admin with admin:hdfs dir ownership
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin + Livy 500 Error with PySpark
 Scenario: Cut & Paste code into Zeppelin
 Failure : Livy interpreter reports 500
 Work Around: Manually type code into Zeppelin Livy interpreter
 Fixed with HDP 2.6.1
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other Zeppelin Livy Interpreter Issues
 matplotlib doesn’t work in Livy pyspark interpreter
 Job progress is not shown in frontend.
 ZeppelinContext is not available in Livy interpreter
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin Future Plans
 More Visualization
 More Stability
 More Security
– SSO Integration with Knox
– Zeppelin > Livy over SSL
– Ranger Integration
– Atlas Integration
 Integration with Data Science Experience
 HA & More Collaboration
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
&
Questions
Vinay Shukla
@neomythos

More Related Content

What's hot

What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
DataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
Hortonworks
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
DataWorks Summit/Hadoop Summit
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
DataWorks Summit
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
DataWorks Summit
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
DataWorks Summit
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 

What's hot (20)

What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache Falcon
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 

Similar to Running Zeppelin in Enterprise

Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
Yuta Imai
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
DataWorks Summit
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016
Kellyn Pot'Vin-Gorman
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
Hortonworks
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
Jeff Smith
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
Jeff Smith
 
Building microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipelineBuilding microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipeline
DonghuKIM2
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
Josh Elser
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
Adam Gibson
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Api design and prototype
Api design and prototypeApi design and prototype
Api design and prototype
DonghuKIM2
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 

Similar to Running Zeppelin in Enterprise (20)

Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing SparkDon't Let the Spark Burn Your House: Perspectives on Securing Spark
Don't Let the Spark Burn Your House: Perspectives on Securing Spark
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016Database as a Service, Collaborate 2016
Database as a Service, Collaborate 2016
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
 
Building microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipelineBuilding microservice for api with helidon and cicd pipeline
Building microservice for api with helidon and cicd pipeline
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Api design and prototype
Api design and prototypeApi design and prototype
Api design and prototype
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Running Zeppelin in Enterprise

  • 1. Running Zeppelin in Production Vinay Shukla Product Management, Director Twitter: @neomythos
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Whoami?  Product Management  Spark for 3 + years, Hadoop for 4 years, Zeppelin for 2 years  Blog at www.vinayshukla.com  Twitter: @neomythos  Addicted to Yoga, Hiking, & Coffee  Smallest contributor to Apache Zeppelin Programmer > Product Management > Programmer > Product Management
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache Zeppelin?  Browser based access to Big Data  Make Spark accessible to more users  Abstract users from dealing with Kerberos  Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters  Beautiful Visualization built in, easy to extend
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Zeppelin work? Notebook Author Collaborators/R eport viewer Zeppelin Cluster Spark | Hive | HBase Any of 30+ back ends/Interpreters
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Interpreter Modes  Basic unit of work is Note – Note has paragraphs  3 Modes – Shared (All notes use same Interpreter process & Interpreter group) – Scoped (Notes still shared the process, but separate interpreter group, possible to share objects) – Isolated (Each note runs its own interpeter process & group)
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deploying Zeppelin  Master Node  Worker Node  Management Node  Client/Gateway Node ✔ Node Choices
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Deployment Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall Hadoop Cluster Hive
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with Spark Ex Spark on YARN Zeppelin Spark- Shell Ex Spark Thrift Server Livy REST Server D r i v e r D r i v e r D r i v e r D r i v e r D r i v e r Spark Driver Livy REST Server D r i v e r With Livy Interpreter Built In Spark Interpreter
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Security
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin: Authentication + SSL Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security in Apache Zeppelin? Zeppelin leverages Apache Shiro for authentication/authorization
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example Shiro.ini # ======================= # Shiro INI configuration # ======================= [main] ## LDAP/AD configuration [users] # The 'users' section is for simple deployments # when you only need a small number of statically-defined # set of User accounts. [urls] # The 'urls' section is used for url-based security # Edit with Ambari or your favorite text editor
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP Authentication in Zeppelin  LDAP Bind – uid=jsmith,ou=users,dc=mycompany,dc=com – uid={0},ou=users,dc=mycompany,dc=com – ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com  LDAP Search – ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net – ldapRealm.contextFactory.systemPassword=SomePassw0rd – ldapRealm.contextFactory.authenticationMechanism=simple – ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userObjectClass=person – ldapRealm.groupObjectClass=group – ldapRealm.userSearchAttributeName = sAMAccountName – # Set search scopes for user and group. Values: subtree (default), onelevel, object – ldapRealm.userSearchScope = subtree – ldapRealm.groupSearchScope = subtree – ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0}) – ldapRealm.memberAttribute=member http://bit.ly/2rMTgLw
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Want to connect to LDAP over SSL?  Change protocol to ldaps in shiro.ini ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636  If LDAP is using self signed certificate, import the certificate into truststore of JVM running Zeppelin echo -n | openssl s_client –connect ldap.example.com:389 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/examplecert.crt keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid LDAP password in clear in shiro.ini  Create an entry for AD credential –Zeppelin leverages Hadoop Credential API –hadoop credential create ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks Enter password: Enter password again: ldapRealm.contextFactory.systemPassword has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.  Make credentials.jceks only Zeppelin user readable  chmod 400 with only Zeppelin process r/w access, no other user allowed access to credentials  Edit shiro.in  ldapRealm.contextFactory.systemPassword -provider jceks://etc/zeppelin/conf/credentials.jceks
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid JDBC password in shiro.ini  Create a credential for JDBC password in Hadoop Credential store hadoop credential create jdbc.password -provider jceks://file/user/zeppelin/conf/zeppelin.jceks  Use the credential in shiro.in default.jceks.credentialKey jdbc.password default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks  Details at JIRA ZEPPELIN-1935 JDBC password only needed for non-hive ID, Hive leverage ID propagation
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation in Zeppelin  Interpreter Dependent – Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation with Livy Zeppelin Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos/RPC Livy APIs LDAP John Doe Job runs as John Doe LDAP/LDAPS
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization in Zeppelin  Control access to Note  Grant Permissions (Owner, Reader, Writer) to users/groups on Notes  LDAP Group integration  Control access to Zeppelin UI  Allow only admins to configure interpreter  Configured in shiro.ini  For Spark with Zeppelin > Livy > Spark  Identity Propagation Jobs run as End-User  For Hive with Zeppelin > JDBC interpreter  Leverage Ranger based Row/Column Security for Hive SparkSQL  Shell Interpreter  Runs as end-user Authorization in Zeppelin Authorization at Data Level
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Control Who can modify Interpreter Settings [urls] /api/interpreter/** = authc, roles[admin_role] /api/configurations/** = authc, roles[admin_role] /api/credential/** = authc, roles[admin_role]  Step 1 – Define Protected URL pattern in Shiro.ini – Assign URL patterns to a role  Step 2 – Map role to LDAP group ldapRealm.rolesByGroup = "hadoop-admins":admin_role
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scalability & HA  Memory/Core for Zeppelin Server  Consider (20-30 GB)  Memory/Core for Zeppelin Interpreter  (4-8 GB)  Memory/Core for Livy  (4-8 GB)  Memory/Core for Spark  Depends on Spark Jobs (See Spark Performance Tuning)  https://spark.apache.org/docs/latest/tuning .html  Horizontal Scaling  Spin up multiple Zeppelin instance  Need external load balancer  Sticky sessions Scalability HA Shared Storage Shared Configuration Communication between Z & Interpreters
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Using R & Python with Zeppelin  Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter  Deploy R/Python binaries on all worker node  Leverage Livy Interpreter for SparkR & PySpark
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy2 OOB Job as admin user fails  Scenario: Simple HDP 2.6 Install with default config  Failure : Livy 2 Interpreter job fails as admin user  Reason: HDFS dir does not exist /user/admin  Work Around: Manually create /user/admin with admin:hdfs dir ownership
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy 500 Error with PySpark  Scenario: Cut & Paste code into Zeppelin  Failure : Livy interpreter reports 500  Work Around: Manually type code into Zeppelin Livy interpreter  Fixed with HDP 2.6.1
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other Zeppelin Livy Interpreter Issues  matplotlib doesn’t work in Livy pyspark interpreter  Job progress is not shown in frontend.  ZeppelinContext is not available in Livy interpreter
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Future Plans  More Visualization  More Stability  More Security – SSO Integration with Knox – Zeppelin > Livy over SSL – Ranger Integration – Atlas Integration  Integration with Data Science Experience  HA & More Collaboration
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You & Questions Vinay Shukla @neomythos

Editor's Notes

  1. Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
  2. Thank you Prasad Wagle (Twitter) & Prabhjot Singh (Hortonworks)
  3. All Images from Flicker Commons