SlideShare a Scribd company logo
1 of 43
INTERACTIVE
SPARK IN YOUR
BROWSER
Romain Rigaux romain@cloudera.com
Erick Tryzelaar erickt@cloudera.com
GOAL
OF HUE
WEB INTERFACE FOR ANALYZING DATA
WITH APACHE HADOOP
SIMPLIFY AND INTEGRATE
FREE AND OPEN SOURCE
—> WEB “EXCEL” FOR HADOOP
VIEW FROM
30K FEET
Hadoop Web Server
You, your colleagues and even that
friend that uses IE9 ;)
WHY SPARK?
SIMPLER (PYTHON, STREAMING,
INTERACTIVE…)
OPENS UP DATA TO SCIENCE
SPARK —> MR
Apache Spark
Spark
Streaming
MLlib
(machine learning)
GraphX
(graph)
Spark SQL
WHY
IN HUE?
MARRIED WITH FULL HADOOP ECOSYSTEM
(Hive Tables, HDFS, Job Browser…)
WHY
IN HUE?
Multi user, YARN, Impersonation/Security
Not yet-another-app-to-install
...
• It works
HISTORY
V1: OOZIE
THE GOOD
• Submit through Oozie
• Slow
THE BAD
• It works better
HISTORY
V2: SPARK IGNITER
THE GOOD
• Compiler Jar
• Batch
THE BAD
• It works even better
• Scala / Python / R shells
• Jar / Py batches
• Notebook UI
• YARN
HISTORY
V3: NOTEBOOK
THE GOOD
• Still new
THE BAD
GENERAL
ARCHITECTURE
Livy
Spark
Spark
Spark
YARN
Backend partWeb part
GENERAL
ARCHITECTURE
Livy
Spark
Spark
Spark
YARN
Backend partWeb part
Notebook with snippets
WEB
ARCHITECTURE
Server
Spark
Scala
Common API
Pig Hive
Livy … HS2
Scala
Hive
Specific APIs
AJAX
create_session()
execute()
…
REST Thrift
OpenSession()
ExecuteStatement()
/session
/sessions/{sessionId}/statements
LIVY SPARK SERVER
• REST Web server in Scala
• Interactive Spark Sessions and Batch Jobs
• Type Introspection for Visualization
• Running sessions in YARN local
• Backends: Scala, Python, R
• Open Source:
https://github.com/cloudera/hue/tree/master/app
s/spark/java
LIVY
SPARK SERVER
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
Livy Server
YARN
Master
Scalatra
Spark Client
Session Manager
Session
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
4
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
4
5
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
4
5
6
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1 7
2
3
4
5
6
Livy Server
Scalatra
Session Manager
Session
SESSION CREATION
AND EXECUTION
% curl -XPOST localhost:8998/sessions 
-d '{"kind": "spark"}'
{
"id": 0,
"kind": "spark",
"log": [...],
"state": "idle"
}
% curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}'
{
"id": 0,
"output": {
"data": { "text/plain": "res0: Int = 2" },
"execution_count": 0,
"status": "ok"
},
"state": "available"
}
LIVY INTERPRETERS
Scala, Python, R…
INTERPRETERS
• Pipe stdin/stdout to a running shell
• Execute the code / send to Spark workers
• Perform magic operations
• One interpreter by language
• “Swappable” with other kernels (python,
spark..)
Interpreter
> println(1 + 1)
2
println(1 + 1)
2
INTERPRETER FLOW
CURL
Hue
Livy Server Livy Session Interpreter
1+1
2
{
“data”: {
“application/json”: “2”
}
}
1+1
2
INTERPRETER FLOW CHART
Receive lines Split lines
Send output
to server
Success
Incomplete
Merge with
next line
Error
Execute LineMagic!
Lines
left?
Magic line?
No
Yes
NoYes
LIVY INTERPRETERS
trait Interpreter {
def state: State
def execute(code: String): Future[JValue]
def close(): Unit
}
sealed trait State
case class NotStarted() extends State
case class Starting() extends State
case class Idle() extends State
case class Running() extends State
case class Busy() extends State
case class Error() extends State
case class ShuttingDown() extends State
case class Dead() extends State
LIVY INTERPRETERS
trait Interpreter {
def state: State
def execute(code: String): Future[JValue]
def close(): Unit
}
sealed trait State
case class NotStarted() extends State
case class Starting() extends State
case class Idle() extends State
case class Running() extends State
case class Busy() extends State
case class Error() extends State
case class ShuttingDown() extends State
case class Dead() extends State
SPARK INTERPRETER
class SparkInterpeter extends Interpreter {
…
private var _state: State = NotStarted()
private val outputStream = new ByteArrayOutputStream()
private var sparkIMain: SparkIMain = _
def start() = {
...
_state = Starting()
sparkIMain = new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
sparkIMain.initializeSynchronous()
...
Interpreter
new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
SPARK INTERPRETER
private var sparkContext: SparkContext = _
def start() = {
...
val sparkConf = new SparkConf(true)
sparkContext = new SparkContext(sparkConf)
sparkIMain.beQuietDuring {
sparkIMain.bind("sc", "org.apache.spark.SparkContext",
sparkContext, List("""@transient"""))
}
_state = Idle()
}
sparkIMain.bind("sc", "org.apache.spark.SparkContext",
sparkContext, List("""@transient"""))
EXECUTING SPARK
private def executeLine(code: String): ExecuteResult = {
code match {
case MAGIC_REGEX(magic, rest) =>
executeMagic(magic, rest)
case _ =>
scala.Console.withOut(outputStream) {
sparkIMain.interpret(code) match {
case Results.Success => ExecuteComplete(readStdout())
case Results.Incomplete => ExecuteIncomplete(readStdout())
case Results.Error => ExecuteError(readStdout())
}
...
case MAGIC_REGEX(magic, rest) =>
case _ =>
INTERPRETER MAGIC
private val MAGIC_REGEX = "^%(w+)W*(.*)".r
private def executeMagic(magic: String, rest: String): ExecuteResponse = {
magic match {
case "json" => executeJsonMagic(rest)
case "table" => executeTableMagic(rest)
case _ => ExecuteError(f"Unknown magic command $magic")
}
}
case "json" => executeJsonMagic(rest)
case "table" => executeTableMagic(rest)
case _ => ExecuteError(f"Unknown magic command $magic")
INTERPRETER MAGIC
private def executeJsonMagic(name: String): ExecuteResponse = {
sparkIMain.valueOfTerm(name) match {
case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value.asInstanceOf[RDD[_]].take(10))))
case Some(value) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value)))
case None => ExecuteError(f"Value $name does not exist")
}
}
case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value.asInstanceOf[RDD[_]].take(10))))
case Some(value) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value)))
TABLE MAGIC
"application/vnd.livy.table.v1+json": {
"headers": [
{ "name": "count", "type": "BIGINT_TYPE" },
{ "name": "name", "type": "STRING_TYPE" }
],
"data": [
[ 23407, "the" ],
[ 19540, "I" ],
[ 18358, "and" ],
...
]
}
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%table counts%table counts
TABLE MAGIC
"application/vnd.livy.table.v1+json": {
"headers": [
{ "name": "count", "type": "BIGINT_TYPE" },
{ "name": "name", "type": "STRING_TYPE" }
],
"data": [
[ 23407, "the" ],
[ 19540, "I" ],
[ 18358, "and" ],
...
]
}
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%table counts
JSON MAGIC
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%json counts
{
"id": 0,
"output": {
"application/json": [
{ "count": 506610, "word": "" },
{ "count": 23407, "word": "the" },
{ "count": 19540, "word": "I" },
...
]
...
}
%json counts
JSON MAGIC
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%json counts
{
"id": 0,
"output": {
"application/json": [
{ "count": 506610, "word": "" },
{ "count": 23407, "word": "the" },
{ "count": 19540, "word": "I" },
...
]
...
}
• Stability and Scaling
• Security
• iPython/Jupyter backends
and file format
COMING SOON
DEMO
TIME
TWITTER
@gethue
USER GROUP
hue-user@
WEBSITE
http://gethue.com
LEARN
http://learn.gethue.com
THANKS!

More Related Content

What's hot

Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksLucidworks
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrChitturi Kiran
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceChitturi Kiran
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
 
5分で説明する Play! scala
5分で説明する Play! scala5分で説明する Play! scala
5分で説明する Play! scalamasahitojp
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
2011/10/08_Playframework_GAE_to_Heroku
2011/10/08_Playframework_GAE_to_Heroku2011/10/08_Playframework_GAE_to_Heroku
2011/10/08_Playframework_GAE_to_HerokuTakeshi Hagikura
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbLucidworks
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
MongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquareMongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquarejorgeortiz85
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampAlexei Gorobets
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 

What's hot (20)

Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache Solr
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
 
5分で説明する Play! scala
5分で説明する Play! scala5分で説明する Play! scala
5分で説明する Play! scala
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
2011/10/08_Playframework_GAE_to_Heroku
2011/10/08_Playframework_GAE_to_Heroku2011/10/08_Playframework_GAE_to_Heroku
2011/10/08_Playframework_GAE_to_Heroku
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
MongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquareMongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquare
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Oak Lucene Indexes
Oak Lucene IndexesOak Lucene Indexes
Oak Lucene Indexes
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Node.js and Parse
Node.js and ParseNode.js and Parse
Node.js and Parse
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 

Viewers also liked

Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0gethue
 
Building a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a ServiceBuilding a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a ServiceCloudera, Inc.
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache SparkSachin Aggarwal
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkJen Aman
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperDataWorks Summit
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Spark Summit
 
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayJen Aman
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsDatabricks
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedDatabricks
 

Viewers also liked (20)

Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
 
Building a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a ServiceBuilding a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a Service
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
 
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 

Similar to Big Data Scala by the Bay: Interactive Spark in your Browser

Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Provectus
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangaloreappaji intelhunt
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
Things about Functional JavaScript
Things about Functional JavaScriptThings about Functional JavaScript
Things about Functional JavaScriptChengHui Weng
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghStuart Roebuck
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing UpDavid Padbury
 
Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with ExamplesGabriele Lana
 
Nodejsexplained 101116115055-phpapp02
Nodejsexplained 101116115055-phpapp02Nodejsexplained 101116115055-phpapp02
Nodejsexplained 101116115055-phpapp02Sunny Gupta
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryAlexandre Morgaut
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streamingQuentin Ambard
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xincaidezhi655
 
Testing batch and streaming Spark applications
Testing batch and streaming Spark applicationsTesting batch and streaming Spark applications
Testing batch and streaming Spark applicationsŁukasz Gawron
 
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark ApplicationsFuture Processing
 

Similar to Big Data Scala by the Bay: Interactive Spark in your Browser (20)

Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Things about Functional JavaScript
Things about Functional JavaScriptThings about Functional JavaScript
Things about Functional JavaScript
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup Edinburgh
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with Examples
 
Nodejsexplained 101116115055-phpapp02
Nodejsexplained 101116115055-phpapp02Nodejsexplained 101116115055-phpapp02
Nodejsexplained 101116115055-phpapp02
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love Story
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xin
 
Testing batch and streaming Spark applications
Testing batch and streaming Spark applicationsTesting batch and streaming Spark applications
Testing batch and streaming Spark applications
 
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 

More from gethue

Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop MeetupSqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetupgethue
 
Hadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in HueHadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in Huegethue
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetupgethue
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singaporegethue
 
SF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKSF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKgethue
 
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF gethue
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
Hue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG FranceHue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG Francegethue
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGgethue
 

More from gethue (9)

Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop MeetupSqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
 
Hadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in HueHadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in Hue
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singapore
 
SF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKSF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDK
 
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Hue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG FranceHue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG France
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUG
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Big Data Scala by the Bay: Interactive Spark in your Browser

  • 1. INTERACTIVE SPARK IN YOUR BROWSER Romain Rigaux romain@cloudera.com Erick Tryzelaar erickt@cloudera.com
  • 2. GOAL OF HUE WEB INTERFACE FOR ANALYZING DATA WITH APACHE HADOOP SIMPLIFY AND INTEGRATE FREE AND OPEN SOURCE —> WEB “EXCEL” FOR HADOOP
  • 3. VIEW FROM 30K FEET Hadoop Web Server You, your colleagues and even that friend that uses IE9 ;)
  • 4. WHY SPARK? SIMPLER (PYTHON, STREAMING, INTERACTIVE…) OPENS UP DATA TO SCIENCE SPARK —> MR Apache Spark Spark Streaming MLlib (machine learning) GraphX (graph) Spark SQL
  • 5.
  • 6.
  • 7. WHY IN HUE? MARRIED WITH FULL HADOOP ECOSYSTEM (Hive Tables, HDFS, Job Browser…)
  • 8. WHY IN HUE? Multi user, YARN, Impersonation/Security Not yet-another-app-to-install ...
  • 9. • It works HISTORY V1: OOZIE THE GOOD • Submit through Oozie • Slow THE BAD
  • 10. • It works better HISTORY V2: SPARK IGNITER THE GOOD • Compiler Jar • Batch THE BAD
  • 11. • It works even better • Scala / Python / R shells • Jar / Py batches • Notebook UI • YARN HISTORY V3: NOTEBOOK THE GOOD • Still new THE BAD
  • 14. Notebook with snippets WEB ARCHITECTURE Server Spark Scala Common API Pig Hive Livy … HS2 Scala Hive Specific APIs AJAX create_session() execute() … REST Thrift OpenSession() ExecuteStatement() /session /sessions/{sessionId}/statements
  • 16. • REST Web server in Scala • Interactive Spark Sessions and Batch Jobs • Type Introspection for Visualization • Running sessions in YARN local • Backends: Scala, Python, R • Open Source: https://github.com/cloudera/hue/tree/master/app s/spark/java LIVY SPARK SERVER
  • 17. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker Livy Server Scalatra Session Manager Session
  • 18. LIVY WEB SERVER ARCHITECTURE Livy Server YARN Master Scalatra Spark Client Session Manager Session YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1
  • 19. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 Livy Server Scalatra Session Manager Session
  • 20. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 Livy Server Scalatra Session Manager Session
  • 21. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 4 Livy Server Scalatra Session Manager Session
  • 22. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 4 5 Livy Server Scalatra Session Manager Session
  • 23. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 4 5 6 Livy Server Scalatra Session Manager Session
  • 24. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 7 2 3 4 5 6 Livy Server Scalatra Session Manager Session
  • 25. SESSION CREATION AND EXECUTION % curl -XPOST localhost:8998/sessions -d '{"kind": "spark"}' { "id": 0, "kind": "spark", "log": [...], "state": "idle" } % curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}' { "id": 0, "output": { "data": { "text/plain": "res0: Int = 2" }, "execution_count": 0, "status": "ok" }, "state": "available" }
  • 27. INTERPRETERS • Pipe stdin/stdout to a running shell • Execute the code / send to Spark workers • Perform magic operations • One interpreter by language • “Swappable” with other kernels (python, spark..) Interpreter > println(1 + 1) 2 println(1 + 1) 2
  • 28. INTERPRETER FLOW CURL Hue Livy Server Livy Session Interpreter 1+1 2 { “data”: { “application/json”: “2” } } 1+1 2
  • 29. INTERPRETER FLOW CHART Receive lines Split lines Send output to server Success Incomplete Merge with next line Error Execute LineMagic! Lines left? Magic line? No Yes NoYes
  • 30. LIVY INTERPRETERS trait Interpreter { def state: State def execute(code: String): Future[JValue] def close(): Unit } sealed trait State case class NotStarted() extends State case class Starting() extends State case class Idle() extends State case class Running() extends State case class Busy() extends State case class Error() extends State case class ShuttingDown() extends State case class Dead() extends State
  • 31. LIVY INTERPRETERS trait Interpreter { def state: State def execute(code: String): Future[JValue] def close(): Unit } sealed trait State case class NotStarted() extends State case class Starting() extends State case class Idle() extends State case class Running() extends State case class Busy() extends State case class Error() extends State case class ShuttingDown() extends State case class Dead() extends State
  • 32. SPARK INTERPRETER class SparkInterpeter extends Interpreter { … private var _state: State = NotStarted() private val outputStream = new ByteArrayOutputStream() private var sparkIMain: SparkIMain = _ def start() = { ... _state = Starting() sparkIMain = new SparkIMain(new Settings(), new JPrintWriter(outputStream, true)) sparkIMain.initializeSynchronous() ... Interpreter new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
  • 33. SPARK INTERPRETER private var sparkContext: SparkContext = _ def start() = { ... val sparkConf = new SparkConf(true) sparkContext = new SparkContext(sparkConf) sparkIMain.beQuietDuring { sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext, List("""@transient""")) } _state = Idle() } sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext, List("""@transient"""))
  • 34. EXECUTING SPARK private def executeLine(code: String): ExecuteResult = { code match { case MAGIC_REGEX(magic, rest) => executeMagic(magic, rest) case _ => scala.Console.withOut(outputStream) { sparkIMain.interpret(code) match { case Results.Success => ExecuteComplete(readStdout()) case Results.Incomplete => ExecuteIncomplete(readStdout()) case Results.Error => ExecuteError(readStdout()) } ... case MAGIC_REGEX(magic, rest) => case _ =>
  • 35. INTERPRETER MAGIC private val MAGIC_REGEX = "^%(w+)W*(.*)".r private def executeMagic(magic: String, rest: String): ExecuteResponse = { magic match { case "json" => executeJsonMagic(rest) case "table" => executeTableMagic(rest) case _ => ExecuteError(f"Unknown magic command $magic") } } case "json" => executeJsonMagic(rest) case "table" => executeTableMagic(rest) case _ => ExecuteError(f"Unknown magic command $magic")
  • 36. INTERPRETER MAGIC private def executeJsonMagic(name: String): ExecuteResponse = { sparkIMain.valueOfTerm(name) match { case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value.asInstanceOf[RDD[_]].take(10)))) case Some(value) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value))) case None => ExecuteError(f"Value $name does not exist") } } case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value.asInstanceOf[RDD[_]].take(10)))) case Some(value) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value)))
  • 37. TABLE MAGIC "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts%table counts
  • 38. TABLE MAGIC "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts
  • 39. JSON MAGIC val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... } %json counts
  • 40. JSON MAGIC val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... }
  • 41. • Stability and Scaling • Security • iPython/Jupyter backends and file format COMING SOON

Editor's Notes

  1. Why do we want to do this? Currently it’s difficult to visualize results from Spark. Spark has a great interactive tool called “spark-shell” that allows you to interact with large datasets on the commandline. For example, here is a session where we are counting the words used by shakespeare. Running this computation is easy, but spark-shell doesn’t provide any tools for visualizing the results.
  2. One option is to save the output to a file, then use a tool like Hue to import it into a Hive table and visualize it. We are obviously big fans of Hue, but there are still too many steps to go through to get to this point. If we want to change the script, say to filter out words like “the” and “and”, we need to go back to the shell, rerun our code snippet, save it to a file, then reimport it into the UI. It’s a slow process.
  3. Multi languages Inherit Hue’s sharing, export/import
  4. Hello, I’m Erick Tryzelaar, and I’m going to talk about the Livy Spark Server, which is our backend for Hue’s Notebook application.
  5. Livy is a REST web server that allows a tool like Hue to interactively execute scala and spark commands, just like spark-shell. It goes beyond it by adding type introspection, which allows a frontend like Hue to render results in interactive visualizations. Furthermore it allows sessions to be run inside YARN to support horizontally scaling out to hundreds of active sessions. It also supports a Python and R backend. Finally, it’s fully open source, and currently being developed in Hue.
  6. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  7. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  8. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  9. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  10. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  11. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  12. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  13. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  14. Let’s see it in action. On the left we see creating a “spark” session. You could also fill in “pyspark” and “sparkR” here if you want those sessions. On the right is us executing simple math in the session itself.
  15. We don’t have too much time to drill down into the code, but we did want to take this moment to at least dive into how the interpreters work.
  16. Livy’s interpreters are conceptually very simple devices. They take in one or more lines of code and execute them in a shell environment. These shells perform the computation and interact with the spark environment. They’re also abstract. As I mentioned earlier, Livy currently has 3 languages built into it: Scala, Python and R, with more to come.
  17. Here is the interpreter loop that livy manages. First is to split up the lines and feed them one at a time into the interpreter. If the line is a regular, non-magic line, it gets executed and the result can be of three states. Success, where we’ll continue to execute the next line, incomplete, where the input is not a complete statement, such as an “if” statement with an open bracket. Or an error, which stops the execution of these lines. The other case are magic lines, which are special commands to the interpreter itself. For example, asking the interpreter to convert a value into a json type.
  18. Now for some code. As we saw earlier, the interpreter is a simple state machine that executes code and eventually produces JSON responses by way of a Future.
  19. Now for some code. As we saw earlier, the interpreter is a simple state machine that executes code and eventually produces JSON responses by way of a Future.
  20. In order to implement this interface, the spark interpreter needs to first create the real interpreter, SparkIMain. It’s pretty simple to create. We just need to construct it with a buffer that acts as the interpreters Standard Output.
  21. Once the SparkIMain has been initialized, we need to create the Spark Context that communicates with all of the spark workers. Injecting this variable into the interpreter is quite simple with this “bind” method.
  22. Now that the session is up and running we can execute code inside of it. I’ve skipped some of the other book keeping in order to show the actual heart of the execution here. Ignore the magic case at the moment. Execution is also quite simple, we first temporarily replace standard out with our buffer, and then have the interpreter execute the code. There are three conditions for the response. First the command executed. Second, this code is incomplete because maybe it has an open parenthesis. Finally, an error if some exception occurred. Altogether quite simple and doesn’t require any changes to Spark to do this.
  23. And now the magic. I mentioned earlier that livy supports type introspection. The way it does it is through these in-band magic commands which start with the percent command. The spark interpreter currently supports two magic commands, “json” and “table”. The “json” will convert any type into a json value, and “table” will convert any type into a table-ish object that’s used for our visualization.
  24. Here is our json magic. it takes advantage of json4s’s Extraction.decompose to try to convert values. We special case RDDs since they can’t be directly transformed into json. Instead we just pull out the first 10 items so we can at least show something.
  25. The table magic does something similar, but it’s a bit large to compress into slides. We’ll see it’s results next.
  26. The table magic does something similar, but it’s a bit large to compress into slides. We’ll see it’s results next.
  27. Finally here it is in action. Here we’re taking our shakespeare code from earlier. If we run this snippet inside livy, it returns an output mimetype of application/json, with the results inlined without encoding in the output.
  28. Finally here it is in action. Here we’re taking our shakespeare code from earlier. If we run this snippet inside livy, it returns an output mimetype of application/json, with the results inlined without encoding in the output.
  29. Fingers crossed for a lot of reasons, it’s master and the VM was broken till 4 AM. Next: learn more