Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis

1,267 views

Published on

Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis

Published in: Technology
  • Be the first to comment

Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis Rohit Choudhary & Jeff Zhang June 28, 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more What’s Apache Zeppelin?
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 1.0 (Spark-shell)
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 2.0 (Zeppelin) Spark Interpreter
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 3.0 (Zeppelin + Livy) Livy Interpreter
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Activity
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Quick Stats: Zeppelin  Zeppelin graduated in May 2016 and is now TLP  Incubated by Apache Foundation, since Dec- 2014  9 Committers, 120+ contributors, growing list  1000+ JIRAs filed  900 PRs via the community  Zeppelin just got a new friend “R”
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recent Updates  Multi-tenancy with Livy  Generic JDBC Interpreter – Hive, Phoenix , RedShift – Postgres, MySql – Several others  Notebook Authentication and Authorization  UI Automation through Selenium  Security for other interpreters (on its way)
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Usage Patterns & Feedback  Cluster monitoring, memory analysis  Telecom data usage, Concert attendees travel patterns
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Upcoming  GA with HDP 2.5 & Ambari 2.4.0, ETA – End July
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture & Usage
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture Current Interpreter Support  HDFS  PySpark, SparkR, Spark  Hive, Phoenix, SQL  Shell  …
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Features Collate/Load Data Collate/Load data from existing data sources, load from external CSVs. i.e. Eureka, Smartsense Visualize Robust visualization mechanism to visualize data, and enable insights Collaborate Notebook base collaboration, export Notebooks, soon to be added, tagging to Notebook generated data
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Popular Usage Scenarios Customized Dashboards Intended for usage towards customized dashboards for Big Data clusters Security Analytics Understanding nature of data coming through multiple sources and analyzing the effects of it Bio-sciences Medical research companies are interested in using this for their research
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bringing Multi-tenancy to Zeppelin
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-Tenancy: Motivation  Supporting workloads of multiple customers  Supporting multiple LOBs (lines of business), on a single data systems  Support fine grained audits  Inability to provision capacity for multiple user groups  Inability to Audit user actions, as all jobs are run via ‘zeppelin’ proxy user  Inability to share state/data with other users as well Objectives Requirements
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Livy Interaction LDAP Zeppelin Shiro Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos Security Across Zeppelin-Livy-Spark Livy APIs
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep dive on Livy
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Livy Livy ServerLivy Client Http Http (RPC) Http (RPC) Livy is an open source REST interface for interacting with Spark from anywhere. Spark Interactive Session SparkContext Spark Batch Session SparkContext
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why we need Livy with Zeppelin Reduce the pressure on client machine Make the job submission/monitoring easy Customize the job schedule
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Session – Create Session 2 1 3 4 curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions {"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]} Request Response Livy Client Livy Server Spark Interactive Session SparkContext
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Session – Execute Code {"id":0,"state":"running","output":null} Request Response curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}' 2 1 3 4 Livy Client Livy Server Spark Interactive Session SparkContext
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkContext Sharing Livy Server Client 1 Client 2 Client 3 Session-1 Session-1 Session-2 Session-2 Session-1 SparkSession-1 SparkContext SparkSession-2 SparkContext
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Livy Security Client Livy Server (Impersonation) Shared SecretSpengo SparkSession • Only authorized users can launch spark session / submit code • Each user can access his own session • Only Livy server can submit job securely to spark session
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SPNEGO Client (Kerbrose TGT) Livy Server (SPENGO enabled) Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go” It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. Http Get http://site/a.html Error 401 Unauthorized Http Get Request Authorization: Negotiation Http Get Request
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Impersonation Alice (Kerberos TGT) Shared Secret Bob (Kerberos TGT) Shared SecretSpengo Spengo Livy Server (super user: livy) Spark Session Spark Session
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Shared Secret 1. Livy Server generate secret key 2. Livy Server pass secret key to spark session when launching spark session 3. Use the secret key to communicate with each other Spark Session Shared Secret Livy Server
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi Tenant: Zeppelin Demo
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Direction  Workspaces and Collaboration  Customizable Visualization – Helium – Custom, data type based visualization (Geolocation/Maps)  Enterprise Readiness – Bring security to all interpreters – Performance improvements  Collaboration  Data Lineage
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

×