Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a REST Job Server for Interactive Spark as a Service

6,477 views

Published on

from Romain Rigaux and Erick Tryzelaar at Spark Summit EU 2015

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I am so glad to read your article, it is really helpful for me on http://www.downcoatseshop.com/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Building a REST Job Server for Interactive Spark as a Service

  1. 1. BUILDING A REST JOB SERVER
 FOR INTERACTIVE SPARK AS A SERVICE Romain Rigaux - Cloudera Erick Tryzelaar - Cloudera
  2. 2. WHY?
  3. 3. NOTEBOOKS
 EASY ACCESS FROM ANYWHERE
 SHARE SPARK CONTEXTS AND RDDs
 BUILD APPS
 SPARK MAGIC
 … WHY SPARK
 AS A SERVICE?
  4. 4. MARRIED WITH FULL HADOOP ECOSYSTEM WHY SPARK
 IN HUE?
  5. 5. HISTORY
 V1: OOZIE • It works • Code snippet THE GOOD • Submit through Oozie • Shell ac:on • Very Slow • Batch THE BAD workflow.xml snippet.py stdout
  6. 6. HISTORY
 V2: SPARK IGNITER • It works beAer THE GOOD • Compiler Jar • Batch only, no shell • No Python, R • Security • Single point of failure THE BAD Compile Implement Upload json output Batch Scala jar Ooyala
  7. 7. HISTORY
 V3: NOTEBOOK • Like spark-submit / spark shells • Scala / Python / R shells • Jar / Python batch Jobs • Notebook UI • YARN THE GOOD • Beta? THE BAD Livy code snippet batch
  8. 8. GENERAL ARCHITECTURE Spark Spark Spark Livy YARN !" # $
  9. 9. Livy Spark Spark Spark YARN API !" # $ GENERAL ARCHITECTURE
  10. 10. LIVY SPARK SERVER
  11. 11. LIVY
 SPARK SERVER •REST Web server in Scala for Spark submissions •Interac:ve Shell Sessions or Batch Jobs •Backends: Scala, Java, Python, R •No dependency on Hue •Open Source: hAps://github.com/cloudera/ hue/tree/master/apps/spark/java •Read about it: hAp://gethue.com/spark/
  12. 12. ARCHITECTURE • Standard web service: wrapper around spark-submit / Spark shells • YARN mode, Spark drivers run inside the cluster (supports crashes) • No need to inherit any interface or compile code • Extended to work with additional backends
  13. 13. LIVY WEB SERVER
 ARCHITECTURE LOCAL “DEV” MODE YARN MODE
  14. 14. LOCAL MODE Livy Server Scalatra Session Manager Session Spark
 ContextSpark Client Spark Client Spark
 Interpreter
  15. 15. LOCAL MODE Livy Server Scalatra Session Manager Session Spark Client Spark Client Spark
 Context Spark
 Interpreter
  16. 16. LOCAL MODE Spark Client 1 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context Spark
 Interpreter
  17. 17. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context Spark
 Interpreter
  18. 18. LOCAL MODE Spark Client Spark
 Interpreter 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3
  19. 19. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3 4 Spark
 Interpreter
  20. 20. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3 4 5 Spark
 Interpreter
  21. 21. YARN-CLUSTER
 MODE PRODUCTION SCALABLE
  22. 22. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  23. 23. Livy Server YARN Master Scalatra Spark Client Session Manager Session YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 YARN-CLUSTER
 MODE Spark
 Interpreter
  24. 24. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  25. 25. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  26. 26. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  27. 27. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 5 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  28. 28. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 5 6 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  29. 29. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 7 2 3 4 5 6 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  30. 30. SESSION CREATION AND EXECUTION % curl -XPOST localhost:8998/sessions -d '{"kind": "spark"}' { "id": 0, "kind": "spark", "log": [...], "state": "idle" } % curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}' { "id": 0, "output": { "data": { "text/plain": "res0: Int = 2" }, "execution_count": 0, "status": "ok" }, "state": "available" }
  31. 31. Jar Py Scala Python R Livy Spark Spark Spark YARN /batches /sessions BATCH OR INTERACTIVE
  32. 32. SHELL OR BATCH? YARN Master Spark Client YARN
 Node Spark
 Interpreter Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  33. 33. SHELL YARN Master Spark Client YARN
 Node pyspark Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  34. 34. BATCH YARN Master Spark Client YARN
 Node spark- submit Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  35. 35. LIVY INTERPRETERSScala, Python, R…
  36. 36. REMEMBER? YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session Spark
 Interpreter
  37. 37. INTERPRETERS • Pipe stdin/stdout to a running shell • Execute the code / send to Spark workers • Perform magic opera:ons • One interpreter per language • “Swappable” with other kernels (python, spark..) Interpreter > println(1 + 1) 2 println(1 + 1) 2
  38. 38. Livy Server INTERPRETER FLOW Interpreter
  39. 39. Livy Server > 1 + 1 Interpreter INTERPRETER FLOW
  40. 40. Livy Server {“code”: “1+1”} > 1 + 1 Interpreter INTERPRETER FLOW
  41. 41. Livy Server Interpreter 1+1 {“code”: “1+1”} > 1 + 1 INTERPRETER FLOW
  42. 42. Livy Server Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  43. 43. Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  44. 44. { “data”: { “application/json”: “2” } } Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  45. 45. { “data”: { “application/json”: “2” } } Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 2 Magic INTERPRETER FLOW
  46. 46. INTERPRETER FLOW CHART Receive lines Split into Chunks Send output
 to server Send error to server Success Execute ChunkMagic! Chunks le[? Magic chunk? No Yes NoYes Example of parsing
  47. 47. INTERPRETER MAGIC • table • json • plotting • ...
  48. 48. NO MAGIC > 1 + 1 Interpreter 1+1 sparkIMain.interpret(“1+1”) { "id": 0, "output": { "application/json": 2 } }
  49. 49. [('', 506610), ('the', 23407), ('I', 19540)... ] JSON MAGIC > counts sparkIMain.valueOfTerm(“counts”) .toJson() Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts
  50. 50. JSON MAGIC > counts sparkIMain.valueOfTerm(“counts”) .toJson() Interpreter { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts
  51. 51. [('', 506610), ('the', 23407), ('I', 19540)... ] TABLE MAGIC > counts Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts sparkIMain.valueOfTerm(“counts”) .guessHeaders().toList()
  52. 52. TABLE MAGIC > counts sparkIMain.valueOfTerm(“counts”) .guessHeaders().toList() Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] }
  53. 53. PLOT MAGIC > sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  54. 54. PLOT MAGIC > sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  55. 55. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  56. 56. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) File(’/tmp/plot.png’).read().toBase64() Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  57. 57. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) File(’/tmp/plot.png’).read().toBase64() Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300)) { "data": { "image/png": "iVBORw0KGgoAAAANSUhEU ... } ... }
  58. 58. • Pluggable Backends • Livy's Spark Backends – Scala – pyspark – R • IPython / Jupyter support coming soon PLUGGABLE INTERPRETERS
  59. 59. • Re-using it • Generic Framework for Interpreters • 51 Kernels JUPYTER BACKEND

  60. 60. SPARK AS A SERVICE
  61. 61. REMEMBER AGAIN? YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session Spark
 Interpreter
  62. 62. MULTI USERS YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter YARN
 Node Spark
 Context Spark
 Interpreter YARN
 Node Spark
 Context Spark
 Interpreter Spark Client Spark Client Spark Client
  63. 63. SHARED CONTEXTS? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client
  64. 64. SHARED RDD? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD
  65. 65. SHARED RDDS? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD
  66. 66. YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD SECURE IT?
  67. 67. YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD SECURE IT?
  68. 68. Livy Server Spark Spark Client Spark Client Spark Client SPARK AS SERVICE Spark
  69. 69. SHARING RDDS
  70. 70. PySpark shell RDD Shell Python Shell
  71. 71. PySpark shell RDD Shell Python Shell
  72. 72. PySpark shell RDD Shell Python Shell r = sc.parallelize([]) srdd = ShareableRdd(r)
  73. 73. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell r = sc.parallelize([]) srdd = ShareableRdd(r)
  74. 74. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell curl -XPOST /sessions/0/statement { 'code': srdd.get('ak') } r = sc.parallelize([]) srdd = ShareableRdd(r)
  75. 75. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell states = SharedRdd('host/sessions/0', 'srdd') states.get('ak') r = sc.parallelize([]) srdd = ShareableRdd(r) curl -XPOST /sessions/0/statement { 'code': srdd.get('ak') }
  76. 76. DEMO TIME
 https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
  77. 77. • SSL Support • Persistent Sessions • Kerberos SECURITY
  78. 78. SPARK MAGIC •From Microsop •Python magics for working with remote Spark clusters •Open Source: hAps://github.com/jupyter- incubator/sparkmagic
  79. 79. FUTURE •Move to ext repo? •Security •iPython/Jupyter backends and file format •Shared named RDD / contexts? •Share data •Spark specific, language generic, both? •Leverage Hue 4 https://issues.cloudera.org/browse/HUE-2990
  80. 80. • Open Source: hAps://github.com/cloudera/ hue/tree/master/apps/spark/java • Read about it: hAp://gethue.com/spark/ •Scala, Java, Python, R •Type Introspec:on for Visualiza:on •YARN-cluster or local modes •Code snippets / compiled •REST API •Pluggable backends •Magic keywords •Failure resilient •Security LIVY’S
 CHEAT SHEET
  81. 81. BEDANKT!
 TWITTER @gethue USER GROUP hue-user@ WEBSITE hAp://gethue.com LEARN hAp://learn.gethue.com

×