Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi User Data science with Zeppelin


Published on

Running Apache Zeppelin in Multi Tenant/User Environment

Published in: Software
  • Dating direct: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ♥♥♥ ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here

Multi User Data science with Zeppelin

  1. 1. Vinay Shukla Twitter: @neomythos Feb 17th, 2016 Multi User Data Science with Zeppelin® ®
  2. 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  3. 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introducing Apache Zeppelin Web-based Notebook for interactive analytics Features Ad-hoc experimentation Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc Deeply integrated with Spark + Hadoop Can be managed via Ambari Stacks Supports multiple language backends Pluggable “Interpreters” Incubating at Apache 100% open source and open community Use Case Data exploration and discovery Visualization tables, graphs and charts Interactive snippet-at-a-time experience Collaboration and publishing “Modern Data Science Studio”
  4. 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Zeppelin
  5. 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved PySpark / Spark SQL
  6. 6. Page6 © Hortonworks Inc. 2014 Spark & Zeppelin Pace of Innovation HDP 2.2.4 Spark 1.2.1 GA HDP 2.3.2 Spark 1.4.1 GA HDP 2.3.0 Spark 1.3.1 GA HDP 2.3.4 Spark 1.5.2* GA Spark Spark 1.3.1 TP 5/2015 Spark 1.4.1 TP 8/2015 Spark 1.5.1 TP Nov/2015 Now Zeppelin TP Oct/2015 Apache Zeppelin Zeppelin TP Refresh March 1st 2016 Dec 2015 HDP 2.4.0 Spark 1.6 GA Zeppelin GA Q1, 2016 Spark 1.6 TP Jan/2015 March 1st 2016 HDP 2.5.x Spark 1.6.1* GA Q1, 2016
  7. 7. © Hortonworks Inc. 2015. All Rights Reserved What’s New in HDP 2.4.0? • Spark 1.6 GA – GA of Dynamic Resource Allocation* • Zeppelin TP#2 – Notebook import/export features – LDAP Authentication* Marketing announcement coming March 1st
  8. 8. © Hortonworks Inc. 2015. All Rights Reserved Requirements for Zeppelin in a M/T Env • Support multiple users • Security - Provide security sandbox by default • Authentication – LDAP – Integrate with Corporate Identity Store • Authorization – Access Control for both Data & Notebooks • Encryption – Work with both Wire & encrypted data • Audit – Keep track of who did, what, when & what results with non-repudiation • Manageability • Sharing/Collaboration of both data & notebooks
  9. 9. Page9 © Hortonworks Inc. 2014 Zeppelin GA – Features •Ambari Managed Install/Configuration •Runs in a Kerberos Cluster •LDAP Authentication •SSL •Notebook Import/Export Coming April, 2016
  10. 10. Page10 © Hortonworks Inc. 2014 Zeppelin Missing Features •R Interpreter •Better Visualizations –GGPlot,, Shiny equivalent visualizations •Access Control on Notebooks •Library Management
  11. 11. Page11 © Hortonworks Inc. 2014 What is coming later? – H2, 2016 •Zeppelin Improvements –Zeppelin Access Control –Ambari managed LDAP Configuration –Pluggable Visualization –R Interpreter
  12. 12. Page12 © Hortonworks Inc. 2014 Various Apache Zeppelin JIRA/Pull Requests –Identity Propagation: –LDAP Authentication: –Notebook Access Control: zeppelin/pull/681 –Notebook Import/Export: –R Interpreter:
  13. 13. Page13 © Hortonworks Inc. 2014 Thank You Twitter: @neomythos