Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis

7,194 views

Published on

Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis

Published in: Technology
  • Be the first to comment

Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis Prabhjyot Singh & Jeff Zhang April, 14 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more What’s Apache Zeppelin?
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 1.0 (Spark-shell)
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 2.0 (Zeppelin) Spark Interpreter
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Analysis 3.0 (Zeppelin + Livy) Livy Interpreter
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Activity
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Quick Stats: Zeppelin  Incubated by Apache Foundation, first PR – Mar, 2015  Github history dates to Jul 2013, pre-incubation  9 Committers, 100+ contributors, growing list  ~800 JIRAs filed  ~900 PRs via the community  Zeppelin just got a new friend “R”  Zeppelin graduation in on the cards
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Architecture & Usage
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Architecture Current Interpreter Support  HDFS  PySpark, SparkR, Spark  Hive  HBase  Phoenix  Shell  SQL  …
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Features Collate/Load Data Collate/Load data from existing data sources, load from external CSVs. i.e. Eureka, Smartsense Visualize Robust visualization mechanism to visualize data, and enable insights Collaborate Notebook base collaboration, export Notebooks, soon to be added, tagging to Notebook generated data
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Popular Usage Scenarios Customized Dashboards Intended for usage towards customized dashboards for Big Data clusters Security Analytics Understanding nature of data coming through multiple sources and analyzing the effects of it Bio-sciences Medical research companies are interested in using this for their research
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bringing Multi-tenancy to Zeppelin
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-Tenancy: Motivation  Supporting workloads of multiple customers  Supporting multiple LOBs (lines of business), on a single data systems  Support fine grained audits  Inability to provision capacity for multiple user groups  Inability to Audit user actions, as all jobs are run via ‘zeppelin’ proxy user  Inability to share state/data with other users as well Objectives Requirements
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zeppelin Livy Interaction LDAP Zeppelin Shiro Spark Yarn Livy Ispark Group Interpreter SPNego: Kerberos Kerberos Security Across Zeppelin-Livy-Spark Livy APIs
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep dive on Livy
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Livy Livy ServerLivy Client Http Http (RPC) Http (RPC) Livy is an open source REST interface for interacting with Spark from anywhere. Spark Interactive Session SparkContext Spark Batch Session SparkContext
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why we need Livy with Zeppelin Reduce the pressure on client machine Make the job submission/monitoring easy Customize the job schedule
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Session – Create Session 2 1 3 4 curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions {"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]} Request Response Livy Client Livy Server Spark Interactive Session SparkContext
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interactive Session – Execute Code {"id":0,"state":"running","output":null} Request Response curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}' 2 1 3 4 Livy Client Livy Server Spark Interactive Session SparkContext
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SparkContext Sharing Livy Server Client 1 Client 2 Client 3 Session-1 Session-1 Session-2 Session-2 Session-1 SparkSession-1 SparkContext SparkSession-2 SparkContext
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Livy Security Client Livy Server (Impersonation) Shared SecretSpengo SparkSession • Only authorized users can launch spark session / submit code • Each user can access his own session • Only Livy server can submit job securely to spark session
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SPNEGO Client (Kerbrose TGT) Livy Server (SPENGO enabled) Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go” It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. Http Get http://site/a.html Error 401 Unauthorized Http Get Request Authorization: Negotiation Http Get Request
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Impersonation Alice (Kerberos TGT) Shared Secret Bob (Kerberos TGT) Shared SecretSpengo Spengo Livy Server (super user: livy) Spark Session Spark Session
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Shared Secret 1. Livy Server generate secret key 2. Livy Server pass secret key to spark session when launching spark session 3. Use the secret key to communicate with each other Spark Session Shared Secret Livy Server
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi Tenant: Zeppelin Demo
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q & A
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

×