Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Running Zeppelin in Enterprise

2,268 views

Published on

Apache Zeppelin has become a popular way to unlock the value of data lake due to its user interface and appeal to business users. These business users ask their IT department for access to Zeppelin. Enterprise IT department want to help their business users but they have several enterprise concerns such as enterprise security, integration with their corporate LDAP/AD, scalability and multi-user environment, integration with Ranger and Kerberos. This session will walk through enterprise concerns and how these concerns can be handled with Zeppelin.

Speaker
Simon Elliston Ball, Director Product Management, Cyber Security, Hortonworks

Published in: Technology

Running Zeppelin in Enterprise

  1. 1. Running Zeppelin in Production Simon Elliston Ball Product Management, Director Twitter: @sireb 20 Sept 2017 – DataWorks Sydney
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache Zeppelin? à Browser based access to Big Data à Make Spark accessible to more users à Abstract users from dealing with Kerberos à Leverage built in Spark, Livy, Hive, JDBC & 20 other interpreters à Beautiful Visualization built in, easy to extend
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How does Zeppelin work? Notebook Author Collaborators/R eport viewer 30+ backends
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Interpreter Modes All note use same interpreter process Same group Shared Notes share a process Interpreter group per note Scoped Each note has its own process Isolated
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deploying Zeppelin à Master Node à Worker Node à Management Node à Client/Gateway Node ✔ Node Choices
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Deployment LDAP John Doe 3 SSL Firewall Hadoop Cluster
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with Spark Spark- Shell Spark Thrift Server Livy REST Server D r i v e r Livy REST Server Built In Spark Interpreter D r i v e r D r i v e r D r i v e r D r i v e r
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin Security
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin: Authentication + SSL Spark on YARN Ex Ex LDAP John Doe 1 2 3 SSL Firewall
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Security in Apache Zeppelin? Zeppelin leverages Apache Shiro for authentication and authorization
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example Shiro.ini # ======================= # Shiro INI configuration # ======================= [main] ## LDAP/AD configuration [users] # The 'users' section is for simple deployments # when you only need a small number of statically-defined # set of User accounts. [urls] # The 'urls' section is used for url-based security # Edit with Ambari or your favorite text editor
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP Authentication in Zeppelin à LDAP Bind – uid=jsmith,ou=users,dc=mycompany,dc=com – uid={0},ou=users,dc=mycompany,dc=com – ldapRealm.userDnTemplate = uid={0},ou=users,dc=company,dc=com à LDAP Search – ldapRealm.contextFactory.systemUsername=cn=ldap-reader,ou=ServiceUsers,dc=lab,dc=hortonworks,dc=net – ldapRealm.contextFactory.systemPassword=SomePassw0rd – ldapRealm.contextFactory.authenticationMechanism=simple – ldapRealm.searchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.groupSearchBase=OU=CorpUsers,DC=lab,DC=hortonworks,DC=net – ldapRealm.userObjectClass=person – ldapRealm.groupObjectClass=group – ldapRealm.userSearchAttributeName = sAMAccountName – # Set search scopes for user and group. Values: subtree (default), onelevel, object – ldapRealm.userSearchScope = subtree – ldapRealm.groupSearchScope = subtree – ldapRealm.userSearchFilter=(&(objectclass=person)(sAMAccountName={0}) – ldapRealm.memberAttribute=member http://bit.ly/2rMTgLw
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Want to connect to LDAP over SSL? Ã Change protocol to ldaps in shiro.ini ldapRealm.contextFactory.url = ldaps://hdpqa.example.com:636 Ã If LDAP is using self signed certificate, import the certificate into truststore of JVM running Zeppelin echo -n | openssl s_client –connect ldap.example.com:389 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /tmp/examplecert.crt keytool –import -keystore $JAVA_HOME/jre/lib/security/cacerts -storepass changeit -noprompt -alias mycert -file /tmp/examplecert.crt
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid LDAP password in clear in shiro.ini à Create an entry for AD credential –Zeppelin leverages Hadoop Credential API –hadoop credential create ldapRealm.contextFactory.systemPassword -provider jceks:///etc/zeppelin/conf/credentials.jceks Enter password: Enter password again: ldapRealm.contextFactory.systemPassword has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated. Ø Make credentials.jceks only Zeppelin user readable Ø chmod 400 with only Zeppelin process r/w access, no other user allowed access to credentials Ø Edit shiro.in Ø ldapRealm.contextFactory.systemPassword -provider jceks://etc/zeppelin/conf/credentials.jceks
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Avoid JDBC password in shiro.ini à Create a credential for JDBC password in Hadoop Credential store hadoop credential create jdbc.password -provider jceks://file/user/zeppelin/conf/zeppelin.jceks à Use the credential in shiro.in default.jceks.credentialKey jdbc.password default.jceks.file jceks://file/user/zeppelin/conf/zeppelin.jceks à Details at JIRA ZEPPELIN-1935 JDBC password only needed for non-hive ID, Hive leverage ID propagation
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation in Zeppelin à Interpreter Dependent – Works for Livy (Spark), Hive (JDBC) & Shell Interpreter
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Identity Propagation with Livy Zeppelin Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos/RPC Livy APIs LDAP John Doe Job runs as John Doe LDAP/LDAPS
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authorization in Zeppelin à Control access to Note à Grant Permissions (Owner, Reader, Writer) to users/groups on Notes à LDAP Group integration à Control access to Zeppelin UI à Allow only admins to configure interpreter à Configured in shiro.ini à For Spark with Zeppelin > Livy > Spark à Identity Propagation Jobs run as End-User à For Hive with Zeppelin > JDBC interpreter à Leverage Ranger based Row/Column Security for Hive SparkSQL à Shell Interpreter à Runs as end-user Authorization in Zeppelin Authorization at Data Level
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Control Who can modify Interpreter Settings [urls] /api/interpreter/** = authc, roles[admin_role] /api/configurations/** = authc, roles[admin_role] /api/credential/** = authc, roles[admin_role] à Step 1 – Define Protected URL pattern in Shiro.ini – Assign URL patterns to a role à Step 2 – Map role to LDAP group ldapRealm.rolesByGroup = "hadoop-admins":admin_role
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scalability & HA à Memory/Core for Zeppelin Server à Consider (20-30 GB) à Memory/Core for Zeppelin Interpreter à (4-8 GB) à Memory/Core for Livy à (4-8 GB) à Memory/Core for Spark à Depends on Spark Jobs (See Spark Performance Tuning) à https://spark.apache.org/docs/latest/tuning .html à Horizontal Scaling à Spin up multiple Zeppelin instance à Need external load balancer à Sticky sessions Scalability HA Shared Storage Shared Configuration Communication between Z & Interpreters
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Using R & Python with Zeppelin à Multiple choices, Spark Interpreter, Python Interpreter, Livy Interpreter à Deploy R/Python binaries on all worker node à Leverage Livy Interpreter for SparkR & PySpark
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy2 OOB Job as admin user fails à Scenario: Simple HDP 2.6 Install with default config à Failure : Livy 2 Interpreter job fails as admin user à Reason: HDFS dir does not exist /user/admin à Work Around: Manually create /user/admin with admin:hdfs dir ownership
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin + Livy 500 Error with PySpark à Scenario: Cut & Paste code into Zeppelin à Failure : Livy interpreter reports 500 à Work Around: Manually type code into Zeppelin Livy interpreter à Fixed with HDP 2.6.1
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other Zeppelin Livy Interpreter Issues à matplotlib doesn’t work in Livy pyspark interpreter à Job progress is not shown in frontend. à ZeppelinContext is not available in Livy interpreter
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Future Plans à More Visualization à More Stability à More Security – SSO Integration with Knox – Zeppelin > Livy over SSL – Ranger Integration – Atlas Integration à Integration with Data Science Experience à HA & More Collaboration
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You & Questions Simon Elliston Ball @sireb

×