Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

14,487 views
14,509 views

Published on

Published in: Technology
1 Comment
7 Likes
Statistics
Notes
  • Hi All, We are planning to start Hadoop online training batch on this week... If any one interested to attend the demo please register in our website... For this batch we are also provide everyday recorded sessions with Materials. For more information feel free to contact us : siva@keylabstraining.com. For Course Content and Recorded Demo Click Here : http://www.keylabstraining.com/hadoop-online-training-hyderabad-bangalore
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
14,487
On SlideShare
0
From Embeds
0
Number of Embeds
11,856
Actions
Shares
0
Downloads
82
Comments
1
Likes
7
Embeds 0
No embeds

No notes for slide

Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

  1. 1. RRE: Faster than SAS Results from Benchmarking Thomas W. Dinsmore, Revolution Analytics John Wallace, DataSong
  2. 2. Polling Question Do you currently use: – A) R or Revolution R Enterprise (RRE) – B) SAS – C) Both – D) Neither
  3. 3. Benchmarking RRE vs. SAS Background Approach Results Discussion
  4. 4. 4 Revolution R Enterprise Open source R Commercially support distribution Enhanced for enterprise use: – Scalable analytics – Developer tools – Integration tools – Deployment tools
  5. 5. 5 2012: Allstate Benchmark 0 50 100 150 200 250 300 6 300 Runtime, Minutes SAS PROC GENMOD RRE Poisson Regression, 150MM rows
  6. 6. Criticism: “Apples to Oranges” 6 20 Cores16 Cores
  7. 7. 7 Most SAS/STAT PROCs (including PROC GENMOD) run single-threaded. SAS/STAT: 91 PROCs • 69 single threaded • 13 multi-threaded • 9 distributed (if you license SAS HP Statistics)
  8. 8. 8
  9. 9. 9 2013: SAS Benchmark PROC HPGENSELECT – SAS/STAT – SAS High Performance Statistics Massive grid (140/144 nodes) – 16 cores per node – 2,240/2,304 cores Conclusion: SAS on 2,304 cores is competitive with RRE on 20 cores.
  10. 10. Honest Benchmarking Compare RRE and SAS/STAT performance – Same data – Same environment – Same tasks Test under real-world conditions Make the test fair and transparent
  11. 11. Data 11  Manufactured data  Reproducible in any environment  Designed to emulate “typical” working data  “Entity” tables: 1MM, 5MM rows  “Predict” tables: 10MM, 50MM rows Fact Pre- dict Entity 1 Entity 2 Entity key 571 Columns 21 Columns
  12. 12. Benchmarking Environment 12 SAS 9.4: • Base • STAT • Grid Manager Commodity servers: • 4 cores • 16GB Memory Gbit network CentOS RRE 7.0 Platform LSF 9
  13. 13. Analytic Tasks 13 Task SAS Capability RRE Capability Descriptive Statistics PROC SURVEYMEANS rxSummary Median and Deciles PROC SURVEYMEANS rxQuantile Frequency Distribution PROC FREQ rxCube Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod Linear Regression (Mixed predictors) PROC GENMOD rxLinMod Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl Logistic Regression PROC LOGISTIC rxLogit Generalized Linear PROC GENMOD rxGLM K-Means Clustering PROC FASTCLUS rxKMeans Score PROC SCORE rxPredict
  14. 14. 14 Preparation Generated data with randomized procedure Loaded data into native formats: – RRE: XDF file – SAS: SAS DATA set Generation and load times not included No meaningful differences
  15. 15. 15 RRE: 42 Times Faster Than SAS 9.4 0 1,000 2,000 3,000 4,000 5,000 6,000 124 5,192 Runtime, Seconds N=5,000,000 SAS 9.4 RRE RRE ~2 minutes SAS ~1 hour, 26 minutes Complete script: ten analytic tasks.
  16. 16. 16 RRE: Linear Scalability 68 124 623 5,192 0 1,000 2,000 3,000 4,000 5,000 6,000 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 Runtime,Seconds # Rows in Entity Table RRE 7 SAS 9.4 RRE: consistent performance with increased data volume.
  17. 17. 17 RRE: Up to 350X Faster Than SAS 0 50 100 150 200 250 300 350 400 RRE Speed Multiple 213 185 351 39 37 19 58 18 101 32 Runtime,Seconds N=5MM Stats Quintiles Freq Lin Reg 1 Lin Reg 2 Step Lin Logistic GLM Kmeans 1 Kmeans 2
  18. 18. 18 Why is RRE faster than SAS? RRE supports scalable computing out of the box – Multi-threaded processing – Distributed processing Legacy SAS is mostly single-threaded – DATA Step processing – Most SAS/STAT PROCs
  19. 19. 19 SAS HP PROCs 9 new SAS PROCs Bundled into SAS 9.4 Designed for scalability Multiple operating modes: – Single machine – Distributed (must license SAS HP Statistics)
  20. 20. 20 HP PROCs: Minimal Improvement 0 50 100 150 200 250 300 6.8 267.17 253.82 Runtime, Seconds N=5,000,000 SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod Linear regression, 20 predictors HPREG running in single machine mode.
  21. 21. 21 Summary  RRE is faster than Legacy SAS: – Same tasks – Same hardware  RRE speed: – Efficient engineering – Multi-threaded and distributed processing  SAS performance claims: – Massive hardware requirements – Force you to license more software from SAS – Don’t apply to Legacy SAS
  22. 22. 22 Polling Question Which of the following analytic software benefits is most important to you: – A) Completing projects faster – B) Building better predictive models – C) High performance with low infrastructure costs
  23. 23. 23 John Wallace, Founder & CEO
  24. 24.  Background  Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.  Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles, Singapore, and India.  Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from consulting through close partnerships with our clients.  Our Offerings  Customer interaction insight that powers applications for customer-level revenue attribution, targeting, media optimization.  Descriptive and predictive modeling of hidden trends and relationships in big data.  Custom development including applications, process automation, and decision support solutions. DataSong at a Glance
  25. 25. DataSong Offerings Hosted Applications ● Revenue Attribution ● Customer Targeting ● Marketing Planning We know Big Data. We analyze and provide the “so what”.
  26. 26. DataSong Architecture • ETL • N marketing channels • Behavioral variables • Promotional data • Overlay data • Functions to read Hadoop output; xdf creation • Exploratory data analysis • GAM survival models • Scoring for inference • Scoring for prediction • 5 billion scores per day per customer DATASONG DATA FORMAT (DDF) CUSTOM VARIABLES (PMML)
  27. 27. Where Speed Matters3 key dimensions ● how many rows ● how many variables ● how many iterations of a model Trade offs for speed ● Sampling variance ● Test fewers features ● Have less understanding of the signal This 3rd dimension means we must multiply any benchmark by N
  28. 28. 28
  29. 29. 29
  30. 30. 30 Thank You

×