Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
BUILDING A REST JOB SERVER

FOR INTERACTIVE SPARK
AS A SERVICE
Romain Rigaux - Cloudera
Erick Tryzelaar - Cloudera
WHY?
NOTEBOOKS

EASY	ACCESS	FROM	ANYWHERE

SHARE	SPARK	CONTEXTS	AND	RDDs

BUILD	APPS

SPARK	MAGIC

…
WHY SPARK

AS A SERVICE?
MARRIED	WITH	FULL	HADOOP	ECOSYSTEM		
WHY SPARK

IN HUE?
HISTORY

V1: OOZIE
• It	works	
• Code	snippet
THE GOOD
• Submit	through	Oozie	
• Shell	ac:on	
• Very	Slow	
• Batch
THE BAD...
HISTORY

V2: SPARK IGNITER
• It	works	beAer
THE GOOD
• Compiler	Jar	
• Batch	only,	no	shell	
• No	Python,	R	
• Security	
•...
HISTORY

V3: NOTEBOOK
• Like	spark-submit	/	spark	shells	
• Scala	/	Python	/	R	shells	
• Jar	/	Python	batch	Jobs	
• Notebo...
GENERAL ARCHITECTURE
Spark
Spark
Spark
Livy YARN
!"
# $
Livy
Spark
Spark
Spark
YARN
API
!"
# $
GENERAL ARCHITECTURE
LIVY SPARK SERVER
LIVY

SPARK SERVER
•REST	Web	server	in	Scala	for	Spark	submissions	
•Interac:ve	Shell	Sessions	or	Batch	Jobs	
•Backends:	S...
ARCHITECTURE
• Standard	web	service:	wrapper	around	spark-submit	/	Spark	shells	
• YARN	mode,	Spark	drivers	run	inside	the...
LIVY WEB SERVER

ARCHITECTURE
LOCAL	“DEV”	MODE YARN	MODE
LOCAL
MODE
Livy	Server
Scalatra
Session	Manager
Session
Spark

ContextSpark	
Client
Spark	
Client
Spark

Interpreter
LOCAL
MODE
Livy	Server
Scalatra
Session	Manager
Session
Spark	
Client
Spark	
Client
Spark

Context
Spark

Interpreter
LOCAL
MODE
Spark	
Client
1
Livy	Server
Scalatra
Session	Manager
Session
Spark	
Client
Spark

Context
Spark

Interpreter
LOCAL
MODE
Spark	
Client
1
2
Livy	Server
Scalatra
Session	Manager
Session
Spark	
Client
Spark

Context
Spark

Interpreter
LOCAL
MODE
Spark	
Client
Spark

Interpreter
1
2
Livy	Server
Scalatra
Session	Manager
Session
Spark	
Client
Spark

Context
3
LOCAL
MODE
Spark	
Client
1
2
Livy	Server
Scalatra
Session	Manager
Session
Spark	
Client
Spark

Context
3
4 Spark

Interpre...
LOCAL
MODE
Spark	
Client
1
2
Livy	Server
Scalatra
Session	Manager
Session
Spark	
Client
Spark

Context
3
4
5
Spark

Interp...
YARN-CLUSTER

MODE
PRODUCTION SCALABLE
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
Livy	Server
Scalatr...
Livy	Server
YARN	
Master
Scalatra
Spark	
Client
Session	Manager
Session
YARN

Node
Spark

Context
YARN

Node
Spark

Worker...
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
1
2
Livy	Server
Sca...
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
1
2
3
Livy	Server
S...
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
1
2
3
4
Livy	Server...
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
1
2
3
4
5
Livy	Serv...
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
1
2
3
4
5
6
Livy	Se...
YARN	
Master
Spark	
Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
1 7
2
3
4
5
6
Livy	...
SESSION CREATION AND EXECUTION
%	curl	-XPOST	localhost:8998/sessions		
		-d	'{"kind":	"spark"}'	
{	
		"id":	0,	
		"kind":	...
Jar
Py
Scala
Python
R
Livy
Spark
Spark
Spark
YARN
/batches
/sessions
BATCH OR INTERACTIVE
SHELL OR BATCH?
YARN	
Master
Spark	
Client
YARN

Node
Spark

Interpreter
Spark

Context
YARN

Node
Spark

Worker
YARN

Nod...
SHELL
YARN	
Master
Spark	
Client
YARN

Node
pyspark
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
Livy	...
BATCH
YARN	
Master
Spark	
Client
YARN

Node
spark-
submit
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker...
LIVY INTERPRETERSScala,	Python,	R…
REMEMBER?
YARN	
Master
Spark	Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
Livy	Serve...
INTERPRETERS
• Pipe	stdin/stdout	to	a	running	shell	
• Execute	the	code	/	send	to	Spark	
workers	
• Perform	magic	opera:on...
Livy	Server
INTERPRETER FLOW
Interpreter
Livy	Server
>	1	+	1
Interpreter
INTERPRETER FLOW
Livy	Server
{“code”:	“1+1”}
>	1	+	1
Interpreter
INTERPRETER FLOW
Livy	Server Interpreter
1+1	
{“code”:	“1+1”}
>	1	+	1
INTERPRETER FLOW
Livy	Server Interpreter
1+1	
{“code”:	“1+1”}
>	1	+	1
Magic
INTERPRETER FLOW
Livy	Server
2	
Interpreter
1+1	
{“code”:	“1+1”}
>	1	+	1
Magic
INTERPRETER FLOW
{	
		“data”:	{	
				“application/json”:	“2”	
		}	
}	
Livy	Server
2	
Interpreter
1+1	
{“code”:	“1+1”}
>	1	+	1
Magic
INTERPR...
{	
		“data”:	{	
				“application/json”:	“2”	
		}	
}	
Livy	Server
2	
Interpreter
1+1	
{“code”:	“1+1”}
>	1	+	1
2 Magic
INTER...
INTERPRETER FLOW CHART
Receive	lines
Split	into	
Chunks
Send	output

to	server
Send	error	to	
server
Success
Execute	Chunk...
INTERPRETER MAGIC
• table	
• json	
• plotting	
• ...
NO MAGIC
>	1	+	1
Interpreter
1+1
sparkIMain.interpret(“1+1”)
{	
		"id":	0,	
		"output":	{	
				"application/json":	2	
		}	...
[('',	506610),	('the',	23407),	('I',	19540)...	]	
JSON MAGIC
>	counts
sparkIMain.valueOfTerm(“counts”)	
.toJson()
Interpre...
JSON MAGIC
>	counts
sparkIMain.valueOfTerm(“counts”)	
.toJson()
Interpreter
{	
		"id":	0,	
		"output":	{	
				"application...
[('',	506610),	('the',	23407),	('I',	19540)...	]	
TABLE MAGIC
>	counts
Interpreter
val	lines	=	sc.textFile("shakespeare.tx...
TABLE MAGIC
>	counts
sparkIMain.valueOfTerm(“counts”)	
.guessHeaders().toList()
Interpreter
val	lines	=	sc.textFile("shake...
PLOT MAGIC
	>
sparkIMain.interpret(“png(‘/tmp/
plot.png’)	barplot	dev.off()”)	
Interpreter
...	
barplot(sorted_data
$count,...
PLOT MAGIC
	>
sparkIMain.interpret(“png(‘/tmp/
plot.png’)	barplot	dev.off()”)	
Interpreter
...	
barplot(sorted_data
$count,...
PLOT MAGIC
	>	png(‘/tmp/..’)	
	>	barplot	
	>	dev.off()
sparkIMain.interpret(“png(‘/tmp/
plot.png’)	barplot	dev.off()”)	
Inte...
PLOT MAGIC
	>	png(‘/tmp/..’)	
	>	barplot	
	>	dev.off()
sparkIMain.interpret(“png(‘/tmp/
plot.png’)	barplot	dev.off()”)	
File...
PLOT MAGIC
	>	png(‘/tmp/..’)	
	>	barplot	
	>	dev.off()
sparkIMain.interpret(“png(‘/tmp/
plot.png’)	barplot	dev.off()”)	
File...
• Pluggable	Backends	
• Livy's	Spark	Backends	
– Scala	
– pyspark	
– R	
• IPython	/	Jupyter	support	coming	soon
PLUGGABLE ...
• Re-using	it	
• Generic	Framework	
for	Interpreters	
• 51	Kernels
JUPYTER BACKEND

SPARK AS A SERVICE
REMEMBER AGAIN?
YARN	
Master
Spark	Client
YARN

Node
Spark

Context
YARN

Node
Spark

Worker
YARN

Node
Spark

Worker
Livy...
MULTI USERS
YARN

Node
Spark

Context
Livy	Server
Scalatra
Session	Manager
Session
Spark

Interpreter YARN

Node
Spark

Co...
SHARED CONTEXTS?
YARN

Node
Spark

Context
Livy	Server
Scalatra
Session	Manager
Session
Spark

Interpreter
Spark	
Client
S...
SHARED RDD?
YARN

Node
Spark

Context
Livy	Server
Scalatra
Session	Manager
Session
Spark

Interpreter
Spark	
Client
Spark	...
SHARED RDDS?
YARN

Node
Spark

Context
Livy	Server
Scalatra
Session	Manager
Session
Spark

Interpreter
Spark	
Client
Spark...
YARN

Node
Spark

Context
Livy	Server
Scalatra
Session	Manager
Session
Spark

Interpreter
Spark	
Client
Spark	
Client
Spar...
YARN

Node
Spark

Context
Livy	Server
Scalatra
Session	Manager
Session
Spark

Interpreter
Spark	
Client
Spark	
Client
Spar...
Livy	Server
Spark
Spark	
Client
Spark	
Client
Spark	
Client
SPARK AS SERVICE
Spark
SHARING RDDS
PySpark	shell
RDD
Shell
Python	
Shell
PySpark	shell
RDD
Shell
Python	
Shell
PySpark	shell
RDD
Shell
Python	
Shell
r	=	sc.parallelize([])	
srdd	=	ShareableRdd(r)
PySpark	shell
RDD
{'ak':	'Alaska'}
{'ca':	'California'}
Shell
Python	
Shell
r	=	sc.parallelize([])	
srdd	=	ShareableRdd(r)
PySpark	shell
RDD
{'ak':	'Alaska'}
{'ca':	'California'}
Shell
Python	
Shell
curl	-XPOST	/sessions/0/statement	{	
			'code'...
PySpark	shell
RDD
{'ak':	'Alaska'}
{'ca':	'California'}
Shell
Python	
Shell
states	=	SharedRdd('host/sessions/0',	'srdd')	...
DEMO
TIME

https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
• SSL	Support	
• Persistent	Sessions	
• Kerberos
SECURITY
SPARK MAGIC
•From	Microsop	
•Python	magics	for	working	with	remote	Spark	
clusters	
•Open	Source:	hAps://github.com/jupyte...
FUTURE
•Move	to	ext	repo?	
•Security	
•iPython/Jupyter	backends	and	file	format	
•Shared	named	RDD	/	contexts?	
•Share	data...
• Open	Source:	hAps://github.com/cloudera/
hue/tree/master/apps/spark/java	
• Read	about	it:	hAp://gethue.com/spark/
•Scal...
BEDANKT!

TWITTER
@gethue
USER GROUP
hue-user@
WEBSITE
hAp://gethue.com
LEARN
hAp://learn.gethue.com
Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar
Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar
Upcoming SlideShare
Loading in …5
×

Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar

8,113 views

Published on

Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar

  1. 1. BUILDING A REST JOB SERVER
 FOR INTERACTIVE SPARK AS A SERVICE Romain Rigaux - Cloudera Erick Tryzelaar - Cloudera
  2. 2. WHY?
  3. 3. NOTEBOOKS
 EASY ACCESS FROM ANYWHERE
 SHARE SPARK CONTEXTS AND RDDs
 BUILD APPS
 SPARK MAGIC
 … WHY SPARK
 AS A SERVICE?
  4. 4. MARRIED WITH FULL HADOOP ECOSYSTEM WHY SPARK
 IN HUE?
  5. 5. HISTORY
 V1: OOZIE • It works • Code snippet THE GOOD • Submit through Oozie • Shell ac:on • Very Slow • Batch THE BAD workflow.xml snippet.py stdout
  6. 6. HISTORY
 V2: SPARK IGNITER • It works beAer THE GOOD • Compiler Jar • Batch only, no shell • No Python, R • Security • Single point of failure THE BAD Compile Implement Upload json output Batch Scala jar Ooyala
  7. 7. HISTORY
 V3: NOTEBOOK • Like spark-submit / spark shells • Scala / Python / R shells • Jar / Python batch Jobs • Notebook UI • YARN THE GOOD • Beta? THE BAD Livy code snippet batch
  8. 8. GENERAL ARCHITECTURE Spark Spark Spark Livy YARN !" # $
  9. 9. Livy Spark Spark Spark YARN API !" # $ GENERAL ARCHITECTURE
  10. 10. LIVY SPARK SERVER
  11. 11. LIVY
 SPARK SERVER •REST Web server in Scala for Spark submissions •Interac:ve Shell Sessions or Batch Jobs •Backends: Scala, Java, Python, R •No dependency on Hue •Open Source: hAps://github.com/cloudera/ hue/tree/master/apps/spark/java •Read about it: hAp://gethue.com/spark/
  12. 12. ARCHITECTURE • Standard web service: wrapper around spark-submit / Spark shells • YARN mode, Spark drivers run inside the cluster (supports crashes) • No need to inherit any interface or compile code • Extended to work with additional backends
  13. 13. LIVY WEB SERVER
 ARCHITECTURE LOCAL “DEV” MODE YARN MODE
  14. 14. LOCAL MODE Livy Server Scalatra Session Manager Session Spark
 ContextSpark Client Spark Client Spark
 Interpreter
  15. 15. LOCAL MODE Livy Server Scalatra Session Manager Session Spark Client Spark Client Spark
 Context Spark
 Interpreter
  16. 16. LOCAL MODE Spark Client 1 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context Spark
 Interpreter
  17. 17. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context Spark
 Interpreter
  18. 18. LOCAL MODE Spark Client Spark
 Interpreter 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3
  19. 19. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3 4 Spark
 Interpreter
  20. 20. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3 4 5 Spark
 Interpreter
  21. 21. YARN-CLUSTER
 MODE PRODUCTION SCALABLE
  22. 22. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  23. 23. Livy Server YARN Master Scalatra Spark Client Session Manager Session YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 YARN-CLUSTER
 MODE Spark
 Interpreter
  24. 24. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  25. 25. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  26. 26. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  27. 27. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 5 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  28. 28. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 5 6 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  29. 29. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 7 2 3 4 5 6 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  30. 30. SESSION CREATION AND EXECUTION % curl -XPOST localhost:8998/sessions -d '{"kind": "spark"}' { "id": 0, "kind": "spark", "log": [...], "state": "idle" } % curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}' { "id": 0, "output": { "data": { "text/plain": "res0: Int = 2" }, "execution_count": 0, "status": "ok" }, "state": "available" }
  31. 31. Jar Py Scala Python R Livy Spark Spark Spark YARN /batches /sessions BATCH OR INTERACTIVE
  32. 32. SHELL OR BATCH? YARN Master Spark Client YARN
 Node Spark
 Interpreter Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  33. 33. SHELL YARN Master Spark Client YARN
 Node pyspark Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  34. 34. BATCH YARN Master Spark Client YARN
 Node spark- submit Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  35. 35. LIVY INTERPRETERSScala, Python, R…
  36. 36. REMEMBER? YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session Spark
 Interpreter
  37. 37. INTERPRETERS • Pipe stdin/stdout to a running shell • Execute the code / send to Spark workers • Perform magic opera:ons • One interpreter per language • “Swappable” with other kernels (python, spark..) Interpreter > println(1 + 1) 2 println(1 + 1) 2
  38. 38. Livy Server INTERPRETER FLOW Interpreter
  39. 39. Livy Server > 1 + 1 Interpreter INTERPRETER FLOW
  40. 40. Livy Server {“code”: “1+1”} > 1 + 1 Interpreter INTERPRETER FLOW
  41. 41. Livy Server Interpreter 1+1 {“code”: “1+1”} > 1 + 1 INTERPRETER FLOW
  42. 42. Livy Server Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  43. 43. Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  44. 44. { “data”: { “application/json”: “2” } } Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  45. 45. { “data”: { “application/json”: “2” } } Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 2 Magic INTERPRETER FLOW
  46. 46. INTERPRETER FLOW CHART Receive lines Split into Chunks Send output
 to server Send error to server Success Execute ChunkMagic! Chunks le[? Magic chunk? No Yes NoYes Example of parsing
  47. 47. INTERPRETER MAGIC • table • json • plotting • ...
  48. 48. NO MAGIC > 1 + 1 Interpreter 1+1 sparkIMain.interpret(“1+1”) { "id": 0, "output": { "application/json": 2 } }
  49. 49. [('', 506610), ('the', 23407), ('I', 19540)... ] JSON MAGIC > counts sparkIMain.valueOfTerm(“counts”) .toJson() Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts
  50. 50. JSON MAGIC > counts sparkIMain.valueOfTerm(“counts”) .toJson() Interpreter { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts
  51. 51. [('', 506610), ('the', 23407), ('I', 19540)... ] TABLE MAGIC > counts Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts sparkIMain.valueOfTerm(“counts”) .guessHeaders().toList()
  52. 52. TABLE MAGIC > counts sparkIMain.valueOfTerm(“counts”) .guessHeaders().toList() Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] }
  53. 53. PLOT MAGIC > sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  54. 54. PLOT MAGIC > sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  55. 55. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  56. 56. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) File(’/tmp/plot.png’).read().toBase64() Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  57. 57. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) File(’/tmp/plot.png’).read().toBase64() Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300)) { "data": { "image/png": "iVBORw0KGgoAAAANSUhEU ... } ... }
  58. 58. • Pluggable Backends • Livy's Spark Backends – Scala – pyspark – R • IPython / Jupyter support coming soon PLUGGABLE INTERPRETERS
  59. 59. • Re-using it • Generic Framework for Interpreters • 51 Kernels JUPYTER BACKEND

  60. 60. SPARK AS A SERVICE
  61. 61. REMEMBER AGAIN? YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session Spark
 Interpreter
  62. 62. MULTI USERS YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter YARN
 Node Spark
 Context Spark
 Interpreter YARN
 Node Spark
 Context Spark
 Interpreter Spark Client Spark Client Spark Client
  63. 63. SHARED CONTEXTS? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client
  64. 64. SHARED RDD? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD
  65. 65. SHARED RDDS? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD
  66. 66. YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD SECURE IT?
  67. 67. YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD SECURE IT?
  68. 68. Livy Server Spark Spark Client Spark Client Spark Client SPARK AS SERVICE Spark
  69. 69. SHARING RDDS
  70. 70. PySpark shell RDD Shell Python Shell
  71. 71. PySpark shell RDD Shell Python Shell
  72. 72. PySpark shell RDD Shell Python Shell r = sc.parallelize([]) srdd = ShareableRdd(r)
  73. 73. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell r = sc.parallelize([]) srdd = ShareableRdd(r)
  74. 74. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell curl -XPOST /sessions/0/statement { 'code': srdd.get('ak') } r = sc.parallelize([]) srdd = ShareableRdd(r)
  75. 75. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell states = SharedRdd('host/sessions/0', 'srdd') states.get('ak') r = sc.parallelize([]) srdd = ShareableRdd(r) curl -XPOST /sessions/0/statement { 'code': srdd.get('ak') }
  76. 76. DEMO TIME
 https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
  77. 77. • SSL Support • Persistent Sessions • Kerberos SECURITY
  78. 78. SPARK MAGIC •From Microsop •Python magics for working with remote Spark clusters •Open Source: hAps://github.com/jupyter- incubator/sparkmagic
  79. 79. FUTURE •Move to ext repo? •Security •iPython/Jupyter backends and file format •Shared named RDD / contexts? •Share data •Spark specific, language generic, both? •Leverage Hue 4 https://issues.cloudera.org/browse/HUE-2990
  80. 80. • Open Source: hAps://github.com/cloudera/ hue/tree/master/apps/spark/java • Read about it: hAp://gethue.com/spark/ •Scala, Java, Python, R •Type Introspec:on for Visualiza:on •YARN-cluster or local modes •Code snippets / compiled •REST API •Pluggable backends •Magic keywords •Failure resilient •Security LIVY’S
 CHEAT SHEET
  81. 81. BEDANKT!
 TWITTER @gethue USER GROUP hue-user@ WEBSITE hAp://gethue.com LEARN hAp://learn.gethue.com

×