Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark Summit EU talk by William Benton

722 views

Published on

Containerized Spark on Kubernetes

Published in: Data & Analytics
  • Be the first to comment

Spark Summit EU talk by William Benton

  1. 1. William Benton Red Hat, Inc. @willb • willb@redhat.com CONTAINERIZED SPARK 
 ON KUBERNETES
  2. 2. BACKGROUND
  3. 3. BACKGROUND
  4. 4. BACKGROUND
  5. 5. BACKGROUND
  6. 6. BACKGROUND
  7. 7. BACKGROUND
  8. 8. BACKGROUND
  9. 9. BACKGROUND
  10. 10. WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014
  11. 11. WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor
  12. 12. WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Networked POSIX FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor
  13. 13. Mesos WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Networked POSIX FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor
  14. 14. Mesos WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Networked POSIX FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor 1 2 3 4
  15. 15. Mesos WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Networked POSIX FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor 1 2 3 4 1 1 2 3 3 4
  16. 16. Mesos WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014 Networked POSIX FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor 1 2 3 4 1 1 2 3 3 4
  17. 17. Analytics is no longer a separate workload.
  18. 18. Analytics is an essential component of modern data- driven applications.
  19. 19. OUR GOALS
  20. 20. OUR GOALS
  21. 21. OUR GOALS
  22. 22. OUR GOALS
  23. 23. OUR GOALS git
  24. 24. OUR GOALS git
  25. 25. FORECAST Motivating containerized microservices Architectures for analytics and applications Spark clusters in containers: practicalities and pitfalls Play along at home Future work
  26. 26. MOTIVATING MICROSERVICES
  27. 27. A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.
  28. 28. BENEFITS OF MICROSERVICE ARCHITECTURES
  29. 29. BENEFITS OF MICROSERVICE ARCHITECTURES
  30. 30. BENEFITS OF MICROSERVICE ARCHITECTURES
  31. 31. BENEFITS OF MICROSERVICE ARCHITECTURES
  32. 32. BENEFITS OF MICROSERVICE ARCHITECTURES
  33. 33. BENEFITS OF MICROSERVICE ARCHITECTURES
  34. 34. BENEFITS OF MICROSERVICE ARCHITECTURES
  35. 35. BENEFITS OF MICROSERVICE ARCHITECTURES 2 + 2
  36. 36. BENEFITS OF MICROSERVICE ARCHITECTURES 2 + 2 5
  37. 37. BENEFITS OF MICROSERVICE ARCHITECTURES 2 + 2 5
  38. 38. BENEFITS OF MICROSERVICE ARCHITECTURES
  39. 39. BENEFITS OF MICROSERVICE ARCHITECTURES
  40. 40. BENEFITS OF MICROSERVICE ARCHITECTURES ?
  41. 41. BENEFITS OF MICROSERVICE ARCHITECTURES ?
  42. 42. BENEFITS OF MICROSERVICE ARCHITECTURES ?
  43. 43. MICROSERVICES AND SPARK executor 1 2 3 executor 4 5 6 executor 7 8 9 executor 10 11 12 master
  44. 44. MICROSERVICES AND SPARK executor 1 2 3 executor 4 5 6 executor 7 8 9 executor 10 11 12 master λ x: x * 2
  45. 45. MICROSERVICES AND SPARK executor 1 2 3 executor 4 5 6 executor 7 8 9 executor 10 11 12 master λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24 λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
  46. 46. MICROSERVICES AND SPARK executor 1 2 3 executor 4 5 6 executor 7 8 9 executor 10 11 12 master λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24 λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
  47. 47. MICROSERVICES AND SPARK executor 1 2 3 executor 4 5 6 executor 7 8 9 executor 10 11 12 master λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24 λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2
  48. 48. ARCHITECTURES FOR 
 ANALYTICS AND APPLICATIONS
  49. 49. APPLICATION RESPONSIBILITIES transform transform transform events databases file, object storage
  50. 50. APPLICATION RESPONSIBILITIES transform transform transform aggregate events databases file, object storage
  51. 51. APPLICATION RESPONSIBILITIES trainmodels transform transform transform aggregate events databases file, object storage
  52. 52. APPLICATION RESPONSIBILITIES trainmodels transform transform transform aggregate events databases file, object storage
  53. 53. APPLICATION RESPONSIBILITIES archive trainmodels transform transform transform aggregate events databases file, object storage
  54. 54. APPLICATION RESPONSIBILITIES archive trainmodels transform transform transform aggregate events databases file, object storage
  55. 55. APPLICATION RESPONSIBILITIES archive trainmodels transform transform transform aggregate events databases file, object storage web and mobile reporting developer UI
  56. 56. APPLICATION RESPONSIBILITIES archive trainmodels transform transform transform aggregate events databases file, object storage management web and mobile reporting developer UI
  57. 57. LEGACY ARCHITECTURES
  58. 58. CONVENTIONAL DATA WAREHOUSE events
  59. 59. CONVENTIONAL DATA WAREHOUSE transformevents
  60. 60. CONVENTIONAL DATA WAREHOUSE transformevents UI
  61. 61. CONVENTIONAL DATA WAREHOUSE transformevents UI business logic
  62. 62. CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS
  63. 63. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS
  64. 64. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS
  65. 65. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS RDBMS
  66. 66. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS RDBMS analysis
  67. 67. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS RDBMS analysis reporting
  68. 68. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS RDBMS analysis reporting
  69. 69. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS RDBMS analysis interactive
 query reporting
  70. 70. transaction
 processing CONVENTIONAL DATA WAREHOUSE transformevents UI business logic RDBMS analytic
 processing RDBMS analysis interactive
 query reporting
  71. 71. HADOOP-STYLE “DATA LAKE” HDFS HDFS HDFS
  72. 72. HADOOP-STYLE “DATA LAKE” HDFS HDFS HDFS HDFS HDFS
  73. 73. HADOOP-STYLE “DATA LAKE” HDFS events HDFS HDFS HDFS HDFS
  74. 74. HADOOP-STYLE “DATA LAKE” HDFS events HDFS HDFS HDFS HDFS
  75. 75. HADOOP-STYLE “DATA LAKE” HDFS compute events HDFS compute HDFS compute compute compute HDFS HDFS
  76. 76. MODERN ARCHITECTURES
  77. 77. THE LAMBDA ARCHITECTURE events (imprecise)
 analysistransform
  78. 78. THE LAMBDA ARCHITECTURE events (imprecise)
 analysistransform DFS
  79. 79. speed layer THE LAMBDA ARCHITECTURE events (imprecise)
 analysistransform DFS
  80. 80. speed layer THE LAMBDA ARCHITECTURE events (precise)
 analysistransform (imprecise)
 analysistransform DFS
  81. 81. speed layer THE LAMBDA ARCHITECTURE events batch layer (precise)
 analysistransform (imprecise)
 analysistransform DFS
  82. 82. speed layer THE LAMBDA ARCHITECTURE events batch layer federate (precise)
 analysistransform (imprecise)
 analysistransform DFS
  83. 83. speed layer THE LAMBDA ARCHITECTURE events batch layer UIfederate (precise)
 analysistransform (imprecise)
 analysistransform DFS
  84. 84. serving layerspeed layer THE LAMBDA ARCHITECTURE events batch layer UIfederate (precise)
 analysistransform (imprecise)
 analysistransform DFS
  85. 85. THE KAPPA ARCHITECTURE events
  86. 86. queue for “raw data” topic THE KAPPA ARCHITECTURE events
  87. 87. queue for “raw data” topic THE KAPPA ARCHITECTURE events transform queue for “preprocessed data” topic
  88. 88. queue for “raw data” topic THE KAPPA ARCHITECTURE events transform analysis queue for “preprocessed data” topic queue for “analysis results” topic
  89. 89. queue for “raw data” topic THE KAPPA ARCHITECTURE events transform analysis queue for “preprocessed data” topic queue for “analysis results” topic reporting end-user UI
  90. 90. DATA FEDERATION IN THE COMPUTE LAYER aggregate trainmodels archive events databases file, object storage management web and mobile reporting developer UItransform transform transform
  91. 91. DATA FEDERATION IN THE COMPUTE LAYER aggregate trainmodels archive events databases file, object storage management web and mobile reporting developer UItransform transform transform
  92. 92. DATA FEDERATION IN THE COMPUTE LAYER aggregate trainmodels archive events databases file, object storage management web and mobile reporting developer UItransform transform transform
  93. 93. Cluster scheduler SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN Shared FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor Resource manager app 1 app 2 app 4app 3
  94. 94. Cluster scheduler SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN Shared FS Spark executor Spark executor Spark executor Spark executor Spark executor Spark executor Resource manager app 1 app 2 app 4app 3
  95. 95. Resource manager ONE CLUSTER PER APPLICATION Object stores app 1 app 2 app 5app 4 app 3 app 6 Databases
  96. 96. Resource manager ONE CLUSTER PER APPLICATION Object stores app 1 app 2 app 5app 4 app 3 app 6 app 2 app 4 Databases
  97. 97. PRACTICALITIES AND POTENTIAL PITFALLS
  98. 98. SCHEDULING Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 Object stores Databases
  99. 99. SCHEDULING Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1
  100. 100. SCHEDULING Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1
  101. 101. SCHEDULING Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1
  102. 102. SCHEDULING Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1
  103. 103. SCHEDULING Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1
  104. 104. SECURITY
  105. 105. SECURITY
  106. 106. SECURITY $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker master:7077 pid root net
  107. 107. SECURITY $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker master:7077 pid root net /tmp/foo
  108. 108. SECURITY
  109. 109. SECURITY k8s namespace
  110. 110. SECURITY k8s namespace*
  111. 111. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6
  112. 112. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 POSIX filesystem
  113. 113. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 POSIX filesystem ✓ familiar interface ✓ interoperability with other programs
  114. 114. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 POSIX filesystem ✓ familiar interface ✓ interoperability with other programs ✗ unnecessary semantic guarantees ✗ difficult to manage
  115. 115. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 HDFS HDFS HDFS HDFS HDFS HDFS
  116. 116. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 HDFS HDFS HDFS HDFS HDFS HDFS ✓ support for legacy Hadoop installations
  117. 117. ✗ inelastic ✗ stateful ✗ can’t collocate compute and data STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 HDFS HDFS HDFS HDFS HDFS HDFS ✓ support for legacy Hadoop installations
  118. 118. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 object store
  119. 119. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 object store ✓ interoperability ✓ fine-grained AC ✓ many implementations
  120. 120. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 object store ✓ interoperability ✓ fine-grained AC ✓ many implementations ✗ consistency model ✗ performance (?)
  121. 121. STORAGE Kubernetes app 1 app 2 app 5app 4 app 3 app 6 app 1 app 2 app 5app 4 app 3 app 6 object store ✓ interoperability ✓ fine-grained AC ✓ many implementations ✗ consistency model ✗ performance
  122. 122. “…in a cloud native architecture, the benefit of HDFS is actually very small and that is why many cloud-first organizations no longer run HDFS, or only run it as a caching layer for S3.” —Reynold Xin on Quora (http://qr.ae/TAF4cN)
  123. 123. NETWORKING
  124. 124. NETWORKING
  125. 125. NETWORKING
  126. 126. NETWORKING http://app1:8080
  127. 127. NETWORKING http://app1:8080 ✗ can’t access worker web UI (but wait for Spark 2.1!)
  128. 128. NETWORKING http://app1:8080 ✗ can’t access worker web UI (but wait for Spark 2.1!)
  129. 129. NETWORKING http://app1:8080 ✗ can’t access worker web UI (but wait for Spark 2.1!) http://app1:80
  130. 130. NEXT STEPS: FUTURE WORK & PLAYING ALONG AT HOME
  131. 131. NEXT STEPS Further performance evaluation Better developer experience Improved scheduling of Spark tasks on Kubernetes
  132. 132. TRY IT OUT YOURSELF Kubernetes standalone Spark example:
 https://github.com/kubernetes/kubernetes/tree/master/examples/spark Enabling Spark on OpenShift: https://github.com/radanalyticsio Native Spark on Kubernetes proposal:
 https://github.com/kubernetes/kubernetes/issues/34377
  133. 133. @willb • willb@redhat.com
 https://chapeau.freevariable.com THANKS!

×