Successfully reported this slideshow.
Your SlideShare is downloading. ×

Livy: A REST Web Service For Apache Spark

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 20 Ad

More Related Content

Slideshows for you (20)

Similar to Livy: A REST Web Service For Apache Spark (20)

Advertisement

More from Jen Aman (20)

Recently uploaded (20)

Advertisement

Livy: A REST Web Service For Apache Spark

  1. 1. Livy: A REST Web Service for Spark Pravin Mittal, Microsoft Anand Iyer, Cloudera
  2. 2. Reduce friction to use Spark while maintaining all its power and flexibility
  3. 3. What is Livy? A Service that manages long running Spark Contexts in your cluster • Open Source Apache Licensed • REST based interface • Lets you manage multiple Spark Contexts • Fine grained job submission • Retrieve job results over REST asynchronously or synchronously • Client APIs in java, scala and soon in python
  4. 4. What is Livy? Livy Server Cluster (Managed by YARN, Mesos, etc) Driver ExecutorExecutor Clients Driver ExecutorExecutor HTTP Context 1 Context 2 Context 2 Context 1
  5. 5. Spark on Azure HDInsight Fully Managed Service • 100% open source Apache Spark and Hadoop bits • Latest releases of Spark • Fully supported by Microsoft and Hortonworks • 99.9% Azure Cloud SLA; 24/7 Managed Service • Certifications: PCI, ISO 27018, SOC, HIPAA, EU-MC Optimized for experimentation and development • Jupyter Notebooks (scala, python, automatic data visualizations) • IntelliJ plugin (job submission, remote debugging) • ODBC connector for Power BI, Tableau, Qlik, SAP, Excel, etc
  6. 6. Make Spark Simple - Integrated with Azure Ecosystem • Microsoft R Server - Multi-threaded math libraries and transparent parallelization in R Server means handling up to 1000x more data and up to 50x faster speeds than open source R. This is based on open source R, it does require any change to R scripts • Azure Data Lake Store – HDFS for the cloud, optimized for massive throughput, Ultra-high capacity, Low Latency, Secure ACL support • Azure Data Factory orchestrates Spark ETL pipeline • PowerBI connector for Spark for rich visualization. New in Power BI is a streaming connector allowing you to publish real-time events from Spark Streaming directly to Power BI. • EventsHub connector as a data source for Spark streaming • Azure SQL Datawarehouse & Hbase connector for fast & scalable storage
  7. 7. Jupyter-Spark Integration via Livy • Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program • Thousands of Spark clusters in production providing feedback to further improve the experience https://github.com/jupyter-incubator/sparkmagic
  8. 8. Architectural Advantages of Jupyter integration via Livy • Run Spark code completely remotely; no Spark components need to be installed on the Jupyter server • Multi-language support; the Python, Scala and R kernels are equally feature-rich • Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and against different remote clusters • Easy integration with any Python library for data science or visualization, like Pandas or Plotly
  9. 9. HN0 HN1 YARN RM Livy Server Metadata ID, State, ... Gateway HDI Cluster WN0 Driver WN1 Executor WN2 Executor Job Status
  10. 10. HN0 HN1 YARN RM Livy Server Metadata ID, State, Unique Tag, Application ID Gateway HDI Cluster WN0 Driver WN1 Executor WN2 Executor Job Status ZK0 ZK1 ZK2 Livy Metadata ID, Unique Tag, App ID
  11. 11. DEMO
  12. 12. Resources https://github.com/aggFTW/sparksummit2016
  13. 13. Livy Architecture Highlights
  14. 14. Cluster (Managed by YARN, Mesos, etc) Manage multiple independent Spark Contexts Livy Server Executor Driver ExecutorExecutorClient A HTTP Context 1 Context 1 Driver ExecutorExecutor Context 2 Driver ExecutorExecutor Context 3 Executor Executor Client B Context 2 Client C Context 2 Client D Context 3
  15. 15. Cluster (Managed by YARN, Mesos, etc) User Impersonation Livy Server Driver ExecutorExecutor HTTP Context 1 Driver ExecutorExecutor Context 2 Executor Client authenticates as user A Context 1 Client authenticates as user B Context 2 Running as user A. Only has access to files and resources that user A has access to. Running as user B. Only has access to files and resources that user B has access to.
  16. 16. Livy Client API
  17. 17. Client API: Job Interface
  18. 18. Client API: Submitting a job
  19. 19. Client API Architecture Livy Server Cluster (Managed by YARN, Mesos, etc) Driver ExecutorExecutor Client Application Context 1 Serialized Result Data Serialized Closure
  20. 20. Community http://livy.io

×