Successfully reported this slideshow.
Your SlideShare is downloading. ×

Cost Efficiency Strategies for Managed Apache Spark Service

Cost Efficiency Strategies for Managed Apache Spark Service

Download to read offline

Today with cloud-native rising, the conversation of infrastructure costs seeped from R&D Directors to every person in the R&D: “How much a VM costs?”, “can we use that managed services? How much will it cost us with our workload??” , “I need a stronger machine with more GPU, how do we make it happen within the budget?” sounds familiar?

Today with cloud-native rising, the conversation of infrastructure costs seeped from R&D Directors to every person in the R&D: “How much a VM costs?”, “can we use that managed services? How much will it cost us with our workload??” , “I need a stronger machine with more GPU, how do we make it happen within the budget?” sounds familiar?

More Related Content

Similar to Cost Efficiency Strategies for Managed Apache Spark Service

Cost Efficiency Strategies for Managed Apache Spark Service

  1. 1. Cost Efficiency Strategies for Managed Apache Spark Services Adi Polak Microsoft
  2. 2. Find me on Social Media – Adi Polak Twitter @adipolak Medium – https://medium.com/@adipolak Dev.to - dev.to/adipolak LinkedIn - https://www.linkedin.com/in/adi-polak-68548365/
  3. 3. Agenda ▪ Motivation ▪ Tools ▪ Azure Databricks ▪ Cost Optimizations Strategies ▪ Wrap-up
  4. 4. Motivation
  5. 5. Start from the Beginning Business idea ( Product Manager ) Prioritization Process (R&D) Design & Build (Software Dev HLD)
  6. 6. HDL ▪ Requirements ▪ Features ▪ Architecture ▪ Test Plans ▪ Security ▪ Deployments ▪ Monitoring /Audit Trails ▪ Maintenance High Level Design
  7. 7. Yeah, but why should I care about costs ?! - Understand how Budget works – P&L - Be able to influence Technical Decisions - Culture of Financial Accountability
  8. 8. https://www.insightpartners.com/blog/product-leaders-are-rd-costs-part-of-your-strategic-discussions/
  9. 9. Tools
  10. 10. Many, many Services
  11. 11. Apache Spark & Cloud Computing delivery model IaaS vs. PaaS vs. SaaS
  12. 12. Cloud Pricing Calculator Azure Pricing Calculator - https://azure.microsoft.com/pricing/calculator/ AWS Pricing Calculator - https://calculator.aws/ GCP Pricing Calculator - https://cloud.google.com/products/calculator
  13. 13. Organize Resources for Cost Awareness ▪ Report and billings - Azure Cost Management ▪ Organize – Resources Groups and/or subscriptions control, reporting, and attribution of costs
  14. 14. Subscription and Billing models • Pay as you go • Enterprise Agreements • …
  15. 15. Azure Databricks
  16. 16. Where to run Spark Workloads ▪ Small - Mid-size Team ▪ Spark expertise ▪ Optimizations ▪ Machines –VMs ▪ Network ▪ Storage ▪ DBU Kubernetes/IaaS vs. Azure Databricks ( Managed Spark Service ) ▪ Bigger Team ▪ K8s + Spark expertise ▪ Optimizations Experts ▪ Machines – VMs ▪ Network ▪ Storage
  17. 17. Resource Consumed:
  18. 18. Plan Tiers
  19. 19. Premium vs Standard Performance Security Monitoring
  20. 20. Databricks Units ▪ DATA ENGINEERING LIGHT ▪ DATA ENGINEERING ▪ DATA ANALYTICS Three levels of service, AWS + Azure have the same levels
  21. 21. Databricks Data Engineering Light supports: ▪ scheduled JAR, Python, or spark-submit job ▪ Only.
  22. 22. Databricks Light does NOT support: ▪ Delta Lake ▪ Autopilot features such as autoscaling ▪ Highly concurrent, all-purpose clusters ▪ Notebooks, dashboards, and collaboration features ▪ Connectors to various data sources and BI tools ▪ Databricks Light is a runtime environment for jobs (or “automated workloads”).
  23. 23. DBU: Standard vs. Premium DBU Standard Premium Analytics 0.4 0.55 Engineering 0.15 0.5 Engineering Light 0.07 0.22 https://bit.ly/2Tp5Zkh
  24. 24. Workloads Examples ▪ Scheduled Job - Data Engineer ▪ On Demand Job – triggered - Data Engineer / BI / Analytics ▪ Exploratory – Interactive - BI/ML
  25. 25. VMs and DBUs Prosenjit Chakraborty - https://medium.com/@cprosenjit/azure-databricks-cost-optimizations-5e1e17b39125
  26. 26. Scenario Breakdown ▪ # VMs = 400 ▪ Hours run = 1 ▪ Cores in VM = 4 ▪ General workload
  27. 27. Cost - Engineering VS. Engineering Light - Standart $$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor) 156.6 140.94 125.28 109.62 93.96 78.3 132.6 132.6 132.6 132.6 132.6 132.6 0 20 40 60 80 100 120 140 160 180 1 0.9 0.8 0.7 0.6 0.5 Engineering Engineering Light 1- Performance factor Cost #VMs = 400 VMs $/hour = 0.279 #DBUs = #VMs*0.75 $EngineeringLight = 0.07 $Engineering = 0.15
  28. 28. Cost - Engineering VS. Engineering Light - Premium $$$ = (#VMs*VMs $/hour + $RuntimeType*#DBUs)*(1-performance factor) #VMs = 400 VMs $/hour = 0.279 #DBUs = #VMs*0.75 $EngineeringLight = 0.22 $Engineering = 0. 5 261.6 235.44 209.28 183.12 156.96 130.8 177.6 177.6 177.6 177.6 177.6 177.6 0 50 100 150 200 250 300 1 0.9 0.8 0.7 0.6 0.5 Premium: Engineering VS. Engineering Light Engineering Engineering Light Cost 1- Performance factor
  29. 29. Cost Optimizations strategies
  30. 30. 1 - Pre Purchase Plan – 1 & 3 years
  31. 31. 2 – Select the right runtime & frameworks • DeltaLake ▪ PySpark Pandas UDF ▪ Photon Engine
  32. 32. 3 – Don’t use tmp/local files system storage ▪ dbutils storage is RA-GRS (read-access geo- redundant storage) - you might not need this type of storage! https://bit.ly/2TdXsAi
  33. 33. Cost Optimizations tips
  34. 34. Manage Spending limit ▪ Per subscription ▪ Per management group ▪ Per resource group ▪ Enable Quota alerts
  35. 35. Enable AutoScale ▪ Scaling machines up and down automatically https://bit.ly/3dMOROK
  36. 36. VMs ▪ Think about your needs
  37. 37. Thank You! @adipolak

×