Livy is an open source REST interface for interacting with Apache Spark clusters. It allows submitting Spark jobs via REST from anywhere and manages Spark contexts. Key features include interactive shells for Scala, Python and R; batch job submission; handling multiple jobs simultaneously; and using existing code by interfacing with a predefined Spark context. Livy also integrates with Jupyter notebooks and supports sharing cached data between jobs. It provides security via user impersonation and communication encryption.
2. What is Livy?
A Service that manages long running Spark Contexts in
your cluster.
• A Service which provides interaction with Apache
Spark Cluster through Rest Interface.
• Open Source Apache Licensed.
• multi-tenant environment as it manages multiple
Spark context efficiently.
• Livy removes the need of Local Spark Environment
due to which we can submit jobs from mobile or
web environment.
• Fine grained job submission.
• Retrieve job results over REST asynchronously or
synchronously.
• Client APIs in java, Scala and soon in python.
3. Features of Livy
• Interactive Scala, Python, and R shells
• Batch submissions in Scala, Java, Python
• Can handle Multiple spark jobs at the same time.
• Reliable for Multi-tenant executions.
• Can be used for submitting jobs from anywhere with REST
• Support Spark1/ Spark2, Scala 2.10/2.11 within one build.
• It is 100% open source Apache Licensed API.
• LIVY supports impersonation by which multiple users can share the same server.
• For using Livy there is no need to change the existing code just instead of defining the spark
context we have to use the predefined sparkcontext in LIVY.
• Share Cached RDD’s or Dataframes between multiple jobs or clients.
4. Jupyter-Spark Integration via Livy
Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program. Thousands of Spark
clusters in production providing feedback to further improve the experience
Architectural Advantages of Jupyter integration via Livy
• Run Spark code completely remotely; no Spark components need to be
• installed on the Jupyter server
• Multi-language support; the Python, Scala and R kernels are equally feature-rich
• Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and
against different remote clusters
• Easy integration with any Python library for data science or visualization, like Pandas or Plotly
11. Livy Security
Client Livy Server
(Impersonation)
Shared SecretSpengo
SparkSession
• Only authorized users can launch spark session / submit code
• Each user can access his own session
• Only Livy server can submit job securely to spark session
12. SPNEGO
Client
(Kerbrose TGT)
Livy Server
(SPENGO enabled)
• Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go”
• It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology.
Http Get http://site/a.html
Error 401 Unauthorized
Http Get Request
Authorization: Negotiation
Http Get Request
14. Shared Secret
• Livy Server generate secret key
• Livy Server pass secret key to spark session when launching spark session
• Use the secret key to communicate with each other
Spark Session
Shared Secret
Livy Server
Editor's Notes
Now let’s talk about how livy works for the interactive session
First we will talk about how livy create session. Before you submit any piece of code, you need to create session.
Here we use the curl command to invoke the rest api. This is a POST request, and we specify the kind as spark, it can also be pyspark/sparkr, and we also need to specify the url of the rest api
And this is the response we get. The response contains the state of the session, here it is starting, the proxyUser is null,
Now let’s see how that request is routed.
First livy client send request to livy server
Then livy server will launch the session
After the spark session session is created, it will send back its address to livy server, so that they can establish connection between livy server and spark session
And finally livy server will send back the session status to livy client.
Now let’s see how livy execute code
Here’s the request we send, it contains the code that we want to execute and we also need to specify the rest api url.
And here’s the response which contains the statement id, state, and output. Here we notice that the output is null, because this piece of code won’t finish in in short time, but we can get the output by calling another pull job status request.
Now let’s see how this request is routed
First livy client send request to livy server
Livy server will forward the request to its spark session
Spark session will execute the code and send back output to livy server
Finally Livy server will send back output to livy client
Now let’s talk about the SparkContext sharing
Because clients don’t own the spark session, all the spark sessions are launched by livy server. So that makes the spark context sharing possible.
Here we can see that client-1 and client-2 use the same spark session ( session-1). While client-3 use its own session (session-2)
When the client interact with the livy server, he need to specify the session id, so as long as they specify the same session id, they are using the same spark context. Of course this is for non-secure mode, it is more complicated for secure mode.
Now let’s talk about the security.
Mainly there’s 3 secure problems we need to solve.
First we need to make sure that only authorized users launch spark session. We don’t want everyone to launch spark session through livy server
Second is that each user can access its own session.
Third is only livy server can submit job securely to spark session
To resolve these 3 problems we use several technics: spengo, impersonation and shared secret. I will talk about them one by one
Spengo is used between livy client and livy server, it can make sure that only authroized users can launch spark session /submit code
Impersonation is used to for make sure each user can access his own session. Without impersonation, all the spark session is launched as the user who launch the livy server process, but with impernation, the spark session is launched as the user in the client
And the shared secret is used to protect the communication between livy server and spark session, only livy server and spark session know the shared secret
First let’s talk about spengo.
Spengo can make sure that only authorized user can launch spark session / submit code to livy server.
The full name of spnego is Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO)
It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology. So it is pluggable with the underlying security technology, but most of often it is used with kerbrose.
Now let see how that works.
First the client will send the request to server
Then the server will repponse with status code 401 which means unauthorized
And then the client will send the request to server again, but this time it will put the kerborse service ticket information to the request
Finally the server will authrozie the user with the ticket info and response with content of the page.
The next thing is impersonation
We want to protect each user’s session.
We don’t want user Alice to access user bob’s session for security reason. The livy server process is launched by super user livy. Without impersonation all the spark session is launched as user livy, but with impersonation, the spark session can be launched as user of the client.
This is very similar to the impersonation in hive server 2. So to enable this impersonation, we need to make the following configuration changes in core-site.xml
The next thing we will talk about is the the share sceret.
Once the spark session is started, it can accept request from outside, but we don’t want anyone to connect with the spark session except the livy server
So here we use the shared scret to protect the communication between livy server and spark session. Only the livy server and spark session know the shared secret.
Now let’s see how that works.
Livy Server will generate secret key
Livy Server pass secret key to spark session when launching spark Session
Then they will use the secret key to communicate with each other