Planck

574 views

Published on

by Shivaram Venkataraman

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
574
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
36
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Planck

  1. 1. Planck: Performance Prediction for Large Scale Advanced Analytics Shivaram Venkataraman, Zongheng Yang Michael J. Franklin, Ben Recht, Ion Stoica amplab
  2. 2. Workload Trends: Advanced Analytics 2
  3. 3. Example: KeystoneML Pipeline Raw Data Cosine Transformation Feature Normalization Least Squares Solver ~100 iterations 3
  4. 4. Example: KeystoneML Pipeline Raw Data Cosine Transformation Feature Normalization Least Squares Solver ~100 iterations Iterative (each iteration many jobs) Long Running (expensive) Numerically Intensive 4
  5. 5. Systems Trends: Cloud Computing
  6. 6. 6
  7. 7. Huge number of choices 7
  8. 8. User Goal Minimize running time or finish by a deadline or minimize the cost 8
  9. 9. Does this matter ? - TSQR 0 5 10 15 20 25 30 35 TSQR Time(s) 1 r3.8xlarge 2 r3.4xlarge 4 r3.2xlarge 8 r3.xlarge 16 r3.large 0 20 40 60 80 100 120 140 160 180 200 Mem BW GB/s Equal Price: $2.8/Hour, Cores: 16, Memory: 244 GB TSQR Input Matrix:1M by 1K ~2x performance difference for same price !
  10. 10. 0 5 10 15 20 25 30 Multiply Time Time(s) 1 r3.8xlarge 2 r3.4xlarge 4 r3.2xlarge 8 r3.xlarge 16 r3.large 0 2 4 6 8 10 12 Net BW Gbps Equal Price: $2.8/Hour, Cores: 16, Memory: 244 GB Input Matrices: Both of size 400K by 1K Does this matter ? - Matrix Multiply Not same for all applications
  11. 11. Does this matter ? - Scale 64 mc 32 mc 16 8 4 mc 2 machines 0 5 10 15 20 0 10 20 30 Time(s) Price (cents) TSQR 1M by 1K on r3.4xlarge
  12. 12. Problem Setting Given a job and its input description: Develop a prediction model for running time Across number and types of instances 12
  13. 13. Predictive Model Needs to be cheap in terms of Time Money To explore N instance types, M number of machines è N*M possibilities 13
  14. 14. Our Approach: Overview End-to-End model Raw Data Cosine Transformation Feature Normalization Least Squares Solver / SVM ~100 iterations Use small samples Few iterations
  15. 15. Communication Patterns Same DAG Tree DAG All-to-one CONSTANT LOG LINEAR 15 Small number of Patterns à Simple Model
  16. 16. Can we fit a model ? time = a +b∗ input machines +c∗log(machines)+d ∗(machines) Input: Fraction of input used Machines: Number of machines used Collect training data for some values of scale, machines Fit model to compute a, b, c, d Serial Execution Computation (linear) Tree DAG All-to-One DAG
  17. 17. Does this match real applications ? Algorithm Compute Communication O(n/p) O(n2/p) O(1) O(p) O(log(p)) Other GLM Regression ✔ ✔ ✔ KMeans ✔ ✔ ✔ O(center) Naïve Bayes ✔ ✔ Pearson Correlation ✔ ✔ PCA ✔ ✔ ✔ QR (TSQR) ✔ ✔ Number of data items: n, Number of processors: p Assumption: Number of features is constant
  18. 18. Small data Problem ! Expensive to collect training data Time, money 1% 2% 4% 8% 1 2 4 8 Input Machines 18
  19. 19. Experiment Design design oblem of estimating a vector x ∈ Rn from measur yi = aT i x + wi, i = 1, . . . , m, rement noise. We assume that wi are independent with zero mean and unit variance, and that the me pan Rn . The maximum likelihood estimate of x, wh um variance estimate, is given by the least-squares so ˆx = m i=1 aiaT i −1 m i=1 yiai. yi = aT i x + wi, i = 1, . . . , m, rement noise. We assume that wi are independent Gaussian with zero mean and unit variance, and that the measurement span Rn . The maximum likelihood estimate of x, which is the um variance estimate, is given by the least-squares solution ˆx = m i=1 aiaT i −1 m i=1 yiai. mation error e = ˆx − x has zero mean and covariance matrix E = E eeT = m i=1 aiaT i −1 . acterizes the accuracy of the estimation, or the informativeness For example the α-confidence level ellipsoid for x is given by T −1 Covariance matrix of unbiased estimate Given a Linear Model 19 Lower Variance à Better Model
  20. 20. Experiment Design λ - Fraction of times each experiment is run Minimize can set i as the fraction of time an experiment is chosen and minimize the trace of the covariance matrix: Minimize tr(( mX i=1 iaiaT i ) 1 ) subject to i 0, i  1 Using Experiment Design: The predictive model we de- scribed in the previous section can be formulated as an ex- periment design problem. The feature set used for model de- sign consisted of just the scale of the data used and the num- ber of machines used for the experiment. Given some bounds 20 he study of nt given the sign specifi- are optimal gh-level the ints that can model. In or- g data points h those data we are try- y1, . . . , ym Each feature ensions (say design setup described above and only choose to r experiments whose values are non-zero. Accounting for Cost: One additional factor we consider in using experiment design is that each exp we run costs a different amount. This cost could be of time (i.e. it is more expensive to train with larger of the input) or in terms of machines (i.e. there is cost to say launching a machine). To account for of an experiment we can augment the optimization we setup above with an additional constraint that cost should be lesser than some budget. That is if w cost function which gives us a cost ci for an experim scale si and mi machines, we add a constraint to ou that mP i=1 ci i  B where B is the total budget. For of this paper we use the time taken to collect train as the cost and ignore any machine setup costs as Account for Cost
  21. 21. Experiment Design Steps Grid scale, machines Associate cost with each experiment 1% 2% 4% 8% 1 2 4 8 Input Machines Use CVX to solve
  22. 22. Planck: Performance Prediction Engine Training Jobs Job Binary Input Data Expt Design Linear Model Use few iterations for training
  23. 23. Accuracy, Overhead TIMIT Pipeline: Prediction Error ~ 15 to 20 % 23 4.1% 3.4% 84.3% 86.9% 0 2000 4000 6000 8000 10000 12000 64 45 Time (seconds) Machines Actual Predicted Training
  24. 24. Accuracy: MLlib 24 0 0.2 0.4 0.6 0.8 1 1.2 Regression KMeans NaiveBayes PCA Pearson Predicted / Actual 64 machines 45 machines
  25. 25. Using Performance Predictions 73.73 57.74 59.43 1 100 10 60 110 160 Machines 0 200 400 600 800 1000 0 30 60 90 120 150 Timeper iteration(s) Machines Predicted Actual TIMIT Pipeline on r3.xlarge machines 25
  26. 26. Work In Progress Open source tool for Amazon EC2 Extensions to more workloads - ADAM, Succinct workloads - ML execution traces Cross-validation to detect errors 26
  27. 27. Conclusion Performance predictions with low overhead Open source tool for Amazon EC2 Workloads / Traces ? Shivaram Venkataraman shivaram@cs.berkeley.edu 27
  28. 28. 28
  29. 29. Is Experiment Design useful ? Baseline: Cost-based model Pick cheapest configurations, same budget 29 1% 2% 4% 8% 1 2 4 8 Input Machines
  30. 30. Is Experiment Design useful ? -54% 15% -8% 373% -45% 8% 14% -3% 9% -16% -100% 0% 100% 200% 300% 400% 500% PredicetionError(%) Cost-based Experiment Design

×