Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Predicting Optimal Parallelism for Data Analytics

Download to read offline

A key benefit of serverless computing is that resources can be allocated on demand, but the quantity of resources to request, and allocate, for a job can profoundly impact its running time and cost. For a job that has not yet run, how can we provide users with an estimate of how the job’s performance changes with provisioned resources, so that users can make an informed choice upfront about cost-performance tradeoffs?

This talk will describe several related research efforts at Microsoft to address this question. We focus on optimizing the amount of computational resources that control a data analytics query’s achieved intra-parallelism. These use machine learning models on query characteristics to predict the run time or Performance Characteristic Curve (PCC) as a function of the maximum parallelism that the query will be allowed to exploit.

The AutoToken project uses models to predict the peak number of tokens (resource units) that is determined by the maximum parallelism that the recurring SCOPE job can ever exploit while running in Cosmos, an Exascale Big Data analytics platform at Microsoft. AutoToken_vNext, or TASQ, predicts the PCC as a function of the number of allocated tokens (limited parallelism). The AutoExecutor project uses models to predict the PCC for Apache Spark SQL queries as a function of the number of executors. The AutoDOP project uses models to predict the run time for SQL Server analytics queries, running on a single machine, as a function of their maximum allowed Degree Of Parallelism (DOP).

We will present our approaches and prediction results for these scenarios, discuss some common challenges that we handled, and outline some open research questions in this space.

  • Be the first to like this

Predicting Optimal Parallelism for Data Analytics

  1. 1. Predicting Optimal Parallelism for Data Analytics Rathijit Sen, Vishal Rohra
  2. 2. Agenda ▪ Overview ▪ AutoDOP ▪ AutoToken ▪ TASQ (AutoToken_vNext) ▪ AutoExecutor ▪ Summary
  3. 3. Resource Provisioning in the Cloud • Focus: Automatically predict Optimal Parallelism for jobs • Allow flexibility in selecting optimal point for cost-efficient performance • Enable optimal resource provisioning Users dynamic, fine-grained provisioning for jobs Providers Provision cluster capacities How much resources does a job actually need?
  4. 4. General Approach • Prediction of job run time or peak parallelism : Peak Parallelism = f (query characteristics) [lowest run time] Run time = f (query characteristics, #parallelism) • Query characteristics: compile/optimization time- properties and estimates • Learn f using Machine Learning models on past executions
  5. 5. Case Studies Performance Characteristic Curve (PCC) Run Time Parallelism Study Platform Num Nodes Prediction AutoDOP SQL Server Single Run Time AutoToken Cosmos Multiple Peak Parallelism AutoToken_vNext / TASQ Cosmos Multiple Run Time, PCC (Strictly Monotonic) AutoExecutor Spark Multiple PCC (Monotonic)
  6. 6. AutoDOP Zhiwei Fan, RathijitSen, Paris Koutris, Aws Albarghouthi,“Automated Tuning of Query Degree of Parallelism via MachineLearning”, aiDM@SIGMOD, 2020 Zhiwei Fan, RathijitSen, Paris Koutris, Aws Albarghouthi,“A ComparativeExplorationof ML Techniques for Tuning Query Degree of Parallelism”, arXiv, 2020
  7. 7. Context • Platform: SQL Server, single node • Degree Of Parallelism (DOP) • Maximum number of threads that can be active at any time for query execution • Per-query selection • Impact of DOP for running a query: • Query Performance and Cost • Resource Utilization of Multicore Servers • Resource Provisioning in Cloud-Computing Platforms
  8. 8. Dependence on query characteristics TPC-DS1000 Example Queries Well-Parallelizable Queries Other Queries
  9. 9. Dependence on data size (scale factor) • The average and median shift towards larger DOP values as the scale factor/dataset size increases • More variation in TPC-DS compared to TPC-H due to the larger variety of query templates in TPC-DS • No workload has a single per-query optimal DOP value
  10. 10. Approach • Goal: predict optimal DOP • ML model type: Regression, not Classification • More flexibility in choosing optimal point for cost vs performance tradeoffs ML Model Random Forest … • Query plan operators • Number of tuples (cardinality), other compile/optimization- time estimates Run time DOP +
  11. 11. Example results • AutoDOP is closer to optimal (oracle selection) than static DOP selection policies • ML: each query at predicted-optimal DOP given by ML model • Query-Optimal: each query at Optimal DOP (oracle selection) • Workload-Optimal: all queries at optimal DOP for overall workload (oracle selection) • 40: each query at DOP 40 • 80: each query at DOP 40 • Speedup over DOP 64 (default DOP) TPC-DS1000 Queries (subset) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Test 1 Test 2 Speedup ML Query-Optimal Workload-Optimal 40 80
  12. 12. Case Studies Performance Characteristic Curve (PCC) Run Time Parallelism Study Platform Num Nodes Prediction AutoDOP SQL Server Single Run Time AutoToken Cosmos Multiple Peak Parallelism AutoToken_vNext / TASQ Cosmos Multiple Run Time, PCC (Strictly Monotonic) AutoExecutor Spark Multiple PCC (Monotonic)
  13. 13. AutoToken RathijitSen, Alekh Jindal,Hiren Patel, Shi Qiao, “AutoToken:Predicting Peak Parallelismfor Big Data Analyticsat Microsoft”, VLDB, 2020
  14. 14. Context • Platform: Exabyte-scale Big Data analytics platform for SCOPE queries • Token: unit of resource allocation • Per-job allocation • Guaranteed and spare tokens • Impact of number of tokens for running a job: • Query performance and cost • Resource utilization and provisioning
  15. 15. Peak Parallelism / Peak Resource Provisioning • How many guaranteed tokens to request for the job? • Depends on peak parallelism • More tokens: unnecessary wait time, unused guaranteed tokens • Less tokens: loss of performance or predictable performance • Possible options: • Default value • User guesstimate • Default VC percentage
  16. 16. Approach • Automatically eliminate over-allocations for recurring jobs • Ideally, no performance impact • Use ML models to learn peak tokens from past behavior • Simple models per job group (signature) Default Allocation Over- allocation Resources Ideal Allocation AutoToken
  17. 17. Results • Overall prediction accuracy: • Median error: 0 • 90th percentile error  50% • Coverage: 10.7%—28.1% • #Jobs: • Total: approx. 8.8M • 0.8—2.4M training • 162—528K testing RequestedTokens/ActualPeak Cumulative Percentage
  18. 18. Resource Allocation Policies Peak Allocation Resources Resources TightAllocation AutoToken (only recurring jobs) TASQ
  19. 19. TASQ Anish Pimpley, Shuo Li, AnubhaSrivastava, Vishal Rohra, Yi Zhu, Soundarajan Srinivasan,Alekh Jindal,Hiren Patel, Shi Qiao, Rathijit Sen, “OptimalResource Allocationfor Serverless Queries”, [Under Submission]
  20. 20. Why Tight Allocation • Cost Savings • Negligible change in performance • 50% of the jobs can request fewer tokens • 20% require less than 50% of requested tokens • 5% performance loss • 92% of the jobs can request fewer tokens • 30% require less than 50% of requested tokens • Reduces job wait times • Wider resource availability
  21. 21. TASQ’s Approach Given compile time features of a job => Predict Tight Allocation Observation • Optimal allocation means different thing for different users, f(cost, time) • Predicting the relationship between tokens and runtime >> Predicting Tight allocation • Relationship between tokens and runtime is an exponentially decaying curve, referred to as performance characteristic curve (PCC) Parameters (a, b) for PCC
  22. 22. Challenge: Limited Trend Data • Historical workloads executed with single token count • In order to predict PCC, we need data for multiple token counts
  23. 23. Solution: Data Augmentation • Area Preserving Allocation Simulator (AREPAS) • Based on past skylines, generate skylines for multiple token counts using the simulator. • Assumptions • Total computations stay constant • Total tokens-seconds used stay constant • Area under skyline stays constant
  24. 24. Modeling the Runtime vs Token relationship • Need for monotonically non-increasing curve • User expectation: more resources → faster runtime • ‘Elbow’ region of the curve usually emerges before parallelism overhead • How do you enforce that in modeling • Expect a power-law curve Runtime t(n) = f (n: TokenAllocation) = b * n -a where a, b > 0 Predict: Scalar parameters ‘a’ and ‘b’
  25. 25. Results • XGBoost are not designed to enforce monotonicity • NN and GNN perform better in trend prediction • NN has comparable performance with lower training time Model Pattern (Non-Increase) MAE (Curve Params) Median AE (Run-Time) XGBoost SS 32% NA 53% XGBoost PL 93% 0.202 52% NN 100% 0.163 39% GNN 100% 0.168 33%
  26. 26. User Interface • Workflow • Submit the job script • Graph generated at compile time • Two options • Visualize the Runtime vs Token Predictions • Get an optimal token count • Advantages • Informed decision • For all jobs • Before job execution
  27. 27. Integration • Bullet 1 • Sub-bullet • Sub-bullet • Bullet 2 • Sub-bullet • Sub-bullet
  28. 28. Case Studies Performance Characteristic Curve (PCC) Run Time Parallelism Study Platform Num Nodes Prediction AutoDOP SQL Server Single Run Time AutoToken Cosmos Multiple Peak Parallelism AutoToken_vNext / TASQ Cosmos Multiple Run Time, PCC (Strictly Monotonic) AutoExecutor Spark Multiple PCC (Monotonic)
  29. 29. AutoExecutor RathijitSen, Abhishek Roy, Alekh Jindal,Rui Fang, Jeff Zheng, XiaoleiLiu, Ruiping Li, “AutoExecutor: Predictive Parallelism for Spark SQL Queries”, [Under Submission]
  30. 30. Context • Platform: Spark, Azure Synapse • Executors: processes on worker nodes • Each executor can use a certain number or cores and amount of memory • Impact of number of executors for running a query: • Query performance and cost • Resource utilization and provisioning
  31. 31. Modeling Approach • Reuse and extend TASQ PCC model • Power-law curve with lower bound • Run time t(n) with executor count n: t(n) = max(b*na , m) • a, b, m: parameters ML Model • Count of operators • Input Cardinality • Avg. Row length • … PCC model parameters Random Forest … t(n) = b*na t(n) = m
  32. 32. Example predictions • Sparklens: predict after one execution of the query • AutoExecutor: predict before execution of the query
  33. 33. Error distributions (different templates, SF=100) • Most prediction errors at small number of executors S: Sparklens AE: AutoExecutor F1..F10: • ten-fold cross validation • 80% queries in training set • 20% in test set
  34. 34. System Architecture Feature Extraction Model Training Extensions Workload Table Anonymized Plans, Metrics Executor Events Telemetry Pipeline AutoExecutor Workload Analysis Peregrine Events PCC
  35. 35. Summary
  36. 36. Automatic selection of optimal parallelism • Capability and Approach: • Enable selection of optimal operating point with respect to optimization objective • ML models to predict run time/peak parallelism using query characteristics • Challenges: • Modeling PCC characteristics • AutoDOP: Point-wise • TASQ: Point-wise, Power-law function • AutoExecutor: Power-law + constant function • Collecting training data • TASQ: AREPAS • AutoExecutor: Sparklens Could we have other models for PCC? How would you simulate for other parameter changes?
  37. 37. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

A key benefit of serverless computing is that resources can be allocated on demand, but the quantity of resources to request, and allocate, for a job can profoundly impact its running time and cost. For a job that has not yet run, how can we provide users with an estimate of how the job’s performance changes with provisioned resources, so that users can make an informed choice upfront about cost-performance tradeoffs? This talk will describe several related research efforts at Microsoft to address this question. We focus on optimizing the amount of computational resources that control a data analytics query’s achieved intra-parallelism. These use machine learning models on query characteristics to predict the run time or Performance Characteristic Curve (PCC) as a function of the maximum parallelism that the query will be allowed to exploit. The AutoToken project uses models to predict the peak number of tokens (resource units) that is determined by the maximum parallelism that the recurring SCOPE job can ever exploit while running in Cosmos, an Exascale Big Data analytics platform at Microsoft. AutoToken_vNext, or TASQ, predicts the PCC as a function of the number of allocated tokens (limited parallelism). The AutoExecutor project uses models to predict the PCC for Apache Spark SQL queries as a function of the number of executors. The AutoDOP project uses models to predict the run time for SQL Server analytics queries, running on a single machine, as a function of their maximum allowed Degree Of Parallelism (DOP). We will present our approaches and prediction results for these scenarios, discuss some common challenges that we handled, and outline some open research questions in this space.

Views

Total views

76

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×