SlideShare a Scribd company logo
1 of 15
Livy: A REST
Web Service for
Spark
River IQ
What is Livy?
A Service that manages long running Spark Contexts in
your cluster.
• A Service which provides interaction with Apache
Spark Cluster through Rest Interface.
• Open Source Apache Licensed.
• multi-tenant environment as it manages multiple
Spark context efficiently.
• Livy removes the need of Local Spark Environment
due to which we can submit jobs from mobile or
web environment.
• Fine grained job submission.
• Retrieve job results over REST asynchronously or
synchronously.
• Client APIs in java, Scala and soon in python.
Features of Livy
• Interactive Scala, Python, and R shells
• Batch submissions in Scala, Java, Python
• Can handle Multiple spark jobs at the same time.
• Reliable for Multi-tenant executions.
• Can be used for submitting jobs from anywhere with REST
• Support Spark1/ Spark2, Scala 2.10/2.11 within one build.
• It is 100% open source Apache Licensed API.
• LIVY supports impersonation by which multiple users can share the same server.
• For using Livy there is no need to change the existing code just instead of defining the spark
context we have to use the predefined sparkcontext in LIVY.
• Share Cached RDD’s or Dataframes between multiple jobs or clients.
Jupyter-Spark Integration via Livy
Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program. Thousands of Spark
clusters in production providing feedback to further improve the experience
Architectural Advantages of Jupyter integration via Livy
• Run Spark code completely remotely; no Spark components need to be
• installed on the Jupyter server
• Multi-language support; the Python, Scala and R kernels are equally feature-rich
• Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and
against different remote clusters
• Easy integration with any Python library for data science or visualization, like Pandas or Plotly
Manage multiple independent Spark Contexts
User Impersonation
Zeppelin Livy Interaction
Interactive Session – Create Session
2
1
3
4
curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions
{"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]}
Request
Response
Livy Client
Livy Server
Spark Interactive
Session
Spark Context
Interactive Session – Execute Code
{"id":0,"state":"running","output":null}
Request
Response
curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to
100).sum()"}'
2
1
3
4
Livy Client
Livy Server
Spark Interactive
Session
SparkContext
SparkContext Sharing
Livy Server
Client 1
Client 2
Client 3
Session-1
Session-1
Session-2 Session-2
Session-1
SparkSession-1
SparkContext
SparkSession-2
SparkContext
Livy Security
Client Livy Server
(Impersonation)
Shared SecretSpengo
SparkSession
• Only authorized users can launch spark session / submit code
• Each user can access his own session
• Only Livy server can submit job securely to spark session
SPNEGO
Client
(Kerbrose TGT)
Livy Server
(SPENGO enabled)
• Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go”
• It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology.
Http Get http://site/a.html
Error 401 Unauthorized
Http Get Request
Authorization: Negotiation
Http Get Request
Impersonation
Alice
(Kerberos TGT)
Shared Secret
Bob
(Kerberos TGT)
Shared SecretSpengo
Spengo
Livy Server
(super user: livy)
Spark Session
Spark Session
Shared Secret
• Livy Server generate secret key
• Livy Server pass secret key to spark session when launching spark session
• Use the secret key to communicate with each other
Spark Session
Shared Secret
Livy Server
Livy: A REST Web Service for Spark

More Related Content

What's hot

Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberYing Zheng
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an ExporterBrian Brazil
 
ZgPHP 97 - Microservice architecture in Laravel
ZgPHP 97 - Microservice architecture in LaravelZgPHP 97 - Microservice architecture in Laravel
ZgPHP 97 - Microservice architecture in LaravelFrano Šašvari
 
Performance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern PerformancePerformance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern PerformanceScyllaDB
 
End-to-end Streaming Between gRPC Services Via Kafka with John Fallows
End-to-end Streaming Between gRPC Services Via Kafka with John FallowsEnd-to-end Streaming Between gRPC Services Via Kafka with John Fallows
End-to-end Streaming Between gRPC Services Via Kafka with John FallowsHostedbyConfluent
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservicesAnil Allewar
 
Building an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPABuilding an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPANeo4j
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Rest API Security - A quick understanding of Rest API Security
Rest API Security - A quick understanding of Rest API SecurityRest API Security - A quick understanding of Rest API Security
Rest API Security - A quick understanding of Rest API SecurityMohammed Fazuluddin
 
Understanding REST
Understanding RESTUnderstanding REST
Understanding RESTNitin Pande
 
Api gateway
Api gatewayApi gateway
Api gatewayenyert
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETconfluent
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...confluent
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for productionconfluent
 

What's hot (20)

Improving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at UberImproving Kafka at-least-once performance at Uber
Improving Kafka at-least-once performance at Uber
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
ZgPHP 97 - Microservice architecture in Laravel
ZgPHP 97 - Microservice architecture in LaravelZgPHP 97 - Microservice architecture in Laravel
ZgPHP 97 - Microservice architecture in Laravel
 
Performance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern PerformancePerformance Engineering Masterclass: Introduction to Modern Performance
Performance Engineering Masterclass: Introduction to Modern Performance
 
End-to-end Streaming Between gRPC Services Via Kafka with John Fallows
End-to-end Streaming Between gRPC Services Via Kafka with John FallowsEnd-to-end Streaming Between gRPC Services Via Kafka with John Fallows
End-to-end Streaming Between gRPC Services Via Kafka with John Fallows
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
Building an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPABuilding an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPA
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Kong API Gateway.pdf
Kong API Gateway.pdfKong API Gateway.pdf
Kong API Gateway.pdf
 
Serverless
ServerlessServerless
Serverless
 
Rest API Security - A quick understanding of Rest API Security
Rest API Security - A quick understanding of Rest API SecurityRest API Security - A quick understanding of Rest API Security
Rest API Security - A quick understanding of Rest API Security
 
Understanding REST
Understanding RESTUnderstanding REST
Understanding REST
 
Api gateway
Api gatewayApi gateway
Api gateway
 
Stream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NETStream Processing with Apache Kafka and .NET
Stream Processing with Apache Kafka and .NET
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
 
Deep-Dive: Secure API Management
Deep-Dive: Secure API ManagementDeep-Dive: Secure API Management
Deep-Dive: Secure API Management
 
API Governance in the Enterprise
API Governance in the EnterpriseAPI Governance in the Enterprise
API Governance in the Enterprise
 

Similar to Livy: A REST Web Service for Spark

Data Engineer’s Lunch #45: Apache Livy
Data Engineer’s Lunch #45: Apache LivyData Engineer’s Lunch #45: Apache Livy
Data Engineer’s Lunch #45: Apache LivyAnant Corporation
 
Quick Tour On Zeppelin
Quick Tour On ZeppelinQuick Tour On Zeppelin
Quick Tour On ZeppelinKnoldus Inc.
 
Interactive Apache Spark in Your Browser
Interactive Apache Spark in Your BrowserInteractive Apache Spark in Your Browser
Interactive Apache Spark in Your BrowserCloudera, Inc.
 
Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8Timothy Spann
 
Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8DoKC
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyRikin Tanna
 
[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event StreamingTimothy Spann
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Timothy Spann
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache SparkSachin Aggarwal
 
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Lucas Jellema
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarTimothy Spann
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration StoryJoan Viladrosa Riera
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopYu-Jhe Li
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
 

Similar to Livy: A REST Web Service for Spark (20)

Data Engineer’s Lunch #45: Apache Livy
Data Engineer’s Lunch #45: Apache LivyData Engineer’s Lunch #45: Apache Livy
Data Engineer’s Lunch #45: Apache Livy
 
Quick Tour On Zeppelin
Quick Tour On ZeppelinQuick Tour On Zeppelin
Quick Tour On Zeppelin
 
Interactive Apache Spark in Your Browser
Interactive Apache Spark in Your BrowserInteractive Apache Spark in Your Browser
Interactive Apache Spark in Your Browser
 
Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8
 
Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8
 
E2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/LivyE2E Data Pipeline - Apache Spark/Airflow/Livy
E2E Data Pipeline - Apache Spark/Airflow/Livy
 
[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
Introducing Apache Kafka and why it is important to Oracle, Java and IT profe...
 
Scapy talk
Scapy talkScapy talk
Scapy talk
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
 
Using the Splunk Java SDK
Using the Splunk Java SDKUsing the Splunk Java SDK
Using the Splunk Java SDK
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
 

Recently uploaded

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Livy: A REST Web Service for Spark

  • 1. Livy: A REST Web Service for Spark River IQ
  • 2. What is Livy? A Service that manages long running Spark Contexts in your cluster. • A Service which provides interaction with Apache Spark Cluster through Rest Interface. • Open Source Apache Licensed. • multi-tenant environment as it manages multiple Spark context efficiently. • Livy removes the need of Local Spark Environment due to which we can submit jobs from mobile or web environment. • Fine grained job submission. • Retrieve job results over REST asynchronously or synchronously. • Client APIs in java, Scala and soon in python.
  • 3. Features of Livy • Interactive Scala, Python, and R shells • Batch submissions in Scala, Java, Python • Can handle Multiple spark jobs at the same time. • Reliable for Multi-tenant executions. • Can be used for submitting jobs from anywhere with REST • Support Spark1/ Spark2, Scala 2.10/2.11 within one build. • It is 100% open source Apache Licensed API. • LIVY supports impersonation by which multiple users can share the same server. • For using Livy there is no need to change the existing code just instead of defining the spark context we have to use the predefined sparkcontext in LIVY. • Share Cached RDD’s or Dataframes between multiple jobs or clients.
  • 4. Jupyter-Spark Integration via Livy Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program. Thousands of Spark clusters in production providing feedback to further improve the experience Architectural Advantages of Jupyter integration via Livy • Run Spark code completely remotely; no Spark components need to be • installed on the Jupyter server • Multi-language support; the Python, Scala and R kernels are equally feature-rich • Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and against different remote clusters • Easy integration with any Python library for data science or visualization, like Pandas or Plotly
  • 8. Interactive Session – Create Session 2 1 3 4 curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions {"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]} Request Response Livy Client Livy Server Spark Interactive Session Spark Context
  • 9. Interactive Session – Execute Code {"id":0,"state":"running","output":null} Request Response curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}' 2 1 3 4 Livy Client Livy Server Spark Interactive Session SparkContext
  • 10. SparkContext Sharing Livy Server Client 1 Client 2 Client 3 Session-1 Session-1 Session-2 Session-2 Session-1 SparkSession-1 SparkContext SparkSession-2 SparkContext
  • 11. Livy Security Client Livy Server (Impersonation) Shared SecretSpengo SparkSession • Only authorized users can launch spark session / submit code • Each user can access his own session • Only Livy server can submit job securely to spark session
  • 12. SPNEGO Client (Kerbrose TGT) Livy Server (SPENGO enabled) • Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go” • It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. Http Get http://site/a.html Error 401 Unauthorized Http Get Request Authorization: Negotiation Http Get Request
  • 13. Impersonation Alice (Kerberos TGT) Shared Secret Bob (Kerberos TGT) Shared SecretSpengo Spengo Livy Server (super user: livy) Spark Session Spark Session
  • 14. Shared Secret • Livy Server generate secret key • Livy Server pass secret key to spark session when launching spark session • Use the secret key to communicate with each other Spark Session Shared Secret Livy Server

Editor's Notes

  1. Now let’s talk about how livy works for the interactive session First we will talk about how livy create session. Before you submit any piece of code, you need to create session. Here we use the curl command to invoke the rest api. This is a POST request, and we specify the kind as spark, it can also be pyspark/sparkr, and we also need to specify the url of the rest api And this is the response we get. The response contains the state of the session, here it is starting, the proxyUser is null, Now let’s see how that request is routed. First livy client send request to livy server Then livy server will launch the session After the spark session session is created, it will send back its address to livy server, so that they can establish connection between livy server and spark session And finally livy server will send back the session status to livy client.
  2. Now let’s see how livy execute code Here’s the request we send, it contains the code that we want to execute and we also need to specify the rest api url. And here’s the response which contains the statement id, state, and output. Here we notice that the output is null, because this piece of code won’t finish in in short time, but we can get the output by calling another pull job status request. Now let’s see how this request is routed First livy client send request to livy server Livy server will forward the request to its spark session Spark session will execute the code and send back output to livy server Finally Livy server will send back output to livy client
  3. Now let’s talk about the SparkContext sharing Because clients don’t own the spark session, all the spark sessions are launched by livy server. So that makes the spark context sharing possible. Here we can see that client-1 and client-2 use the same spark session ( session-1). While client-3 use its own session (session-2) When the client interact with the livy server, he need to specify the session id, so as long as they specify the same session id, they are using the same spark context. Of course this is for non-secure mode, it is more complicated for secure mode.
  4. Now let’s talk about the security. Mainly there’s 3 secure problems we need to solve. First we need to make sure that only authorized users launch spark session. We don’t want everyone to launch spark session through livy server Second is that each user can access its own session. Third is only livy server can submit job securely to spark session To resolve these 3 problems we use several technics: spengo, impersonation and shared secret. I will talk about them one by one Spengo is used between livy client and livy server, it can make sure that only authroized users can launch spark session /submit code Impersonation is used to for make sure each user can access his own session. Without impersonation, all the spark session is launched as the user who launch the livy server process, but with impernation, the spark session is launched as the user in the client And the shared secret is used to protect the communication between livy server and spark session, only livy server and spark session know the shared secret
  5. First let’s talk about spengo. Spengo can make sure that only authorized user can launch spark session / submit code to livy server. The full name of spnego is Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. So it is pluggable with the underlying security technology, but most of often it is used with kerbrose. Now let see how that works. First the client will send the request to server Then the server will repponse with status code 401 which means unauthorized And then the client will send the request to server again, but this time it will put the kerborse service ticket information to the request Finally the server will authrozie the user with the ticket info and response with content of the page.
  6. The next thing is impersonation We want to protect each user’s session. We don’t want user Alice to access user bob’s session for security reason. The livy server process is launched by super user livy. Without impersonation all the spark session is launched as user livy, but with impersonation, the spark session can be launched as user of the client. This is very similar to the impersonation in hive server 2. So to enable this impersonation, we need to make the following configuration changes in core-site.xml
  7. The next thing we will talk about is the the share sceret. Once the spark session is started, it can accept request from outside, but we don’t want anyone to connect with the spark session except the livy server So here we use the shared scret to protect the communication between livy server and spark session. Only the livy server and spark session know the shared secret. Now let’s see how that works. Livy Server will generate secret key Livy Server pass secret key to spark session when launching spark Session Then they will use the secret key to communicate with each other