Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science with Spark & Zeppelin

6,018 views

Published on

Hadoop Summit, San Jose.

Introducing Data science with Spark & Zeppelin

Published in: Technology
  • Get access to 16,000 woodworking plans. =>> https://t.cn/A62Ygslz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 16,000 woodworking Projects. Decks, Sheds, Greenhouses, Chairs & Tables, File Cabinets and Much More! #1 Recommended. Download Your Plans NOW! ➢➢➢ https://t.cn/A62Ygslz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/36cXjBY ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❤❤❤ http://bit.ly/36cXjBY ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Science with Spark & Zeppelin

  1. 1. DataScience with Spark & Zeppelin Ofer Mendelevitch Vinay Shukla Moon Soo Lee
  2. 2. Page 2 © Hortonworks Inc. 2014 Data Science with iPython Ofer Mendelevitch
  3. 3. © Hortonworks Inc. 2015 The Data Science Workflow… Page 3 What is the question I'm answering? What data will I need? Plan Acquire the data Analyze data quality Reformat Impute etc Clean Data Analyze data Visualize Create model Evaluate results Create features Create report Deploy in Production Publish & Share Start here End here Script VisualizeScript
  4. 4. Introducing Apache Zeppelin Lee Moon Soo, Vinay Shukla
  5. 5. Apache Zeppelin • A web-based notebook for interactive analytics • Deeply integrated with Spark and Hadoop • Supports multiple language backends • Incubating
  6. 6. Use cases for Zeppelin • Data exploration & discovery • Visualization - tables, graphs, charts • Interactive snippet-at-a-time experience • Collaboration and publishing “Modern Data Science Studio”
  7. 7. DEMO I A day in the life of a data scientist with Zeppelin
  8. 8. Apache Spark Integration • Supports scala, pyspark and spark sql • SparkContext injected automatically • Supports 3rd party dependencies • Spark-on-YARN and Spark standalone modes • Full Spark interpreter configuration • Multiple Spark interpreter profiles
  9. 9. DEMO I I Apache Spark using Zeppelin
  10. 10. Support for multiple back-ends • Scala, Python, spark sql • Hive, Tajo, Ignite, Mysql, …. • Apache Flink • Markdown, shell Driven by the community - thank you! How is this so easy to do?
  11. 11. Zeppelin Interpreter Architecture Interpreter is connector between Zeppelin and Backend data processing system. ZeppelinServer InterpreterGroup Separate JVM process Interpreter Interpreter Interpreter Spark Spark PySpark SparkSQL Dep Load libraries Maven repositorySpark cluster Share single SparkDriver Thrift
  12. 12. Notebook - Interpreter Selection Spark spark pyspark sql dep Load libraries Maven repositorySpark cluster Share single SparkDriver
  13. 13. DEMO III Interpreter Deep Dive
  14. 14. Join the community • Try out Apache Zeppelin today • https://zeppelin.incubator.apache.org/ • Join us on the community discussions • Help define how we shape the roadmap and features • Lets get this party started!
  15. 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloud of your choice Storage YARN: Data Operating System Governance Security Operations Resource Management Questions? Thank you

×