7. TPCDS
The TPC Benchmark DS (TPC-DS) is a
decision support benchmark that models
several generally applicable aspects of a
decision support system, including
queries and data maintenance.
http://www.tpc.org/tpc_documents_curr
ent_versions/pdf/tpc-ds_v2.1.0.pdf
8.
9. Key Scenarios
• Examine large volumes of data
• Give answers to real-world business questions
• Execute queries of various operational
requirements and complexities (e.g., ad-hoc,
reporting, iterative OLAP, data mining)
• Are characterized by high CPU and IO load
• Are periodically synchronized with source OLTP
databases through database maintenance
functions
12. How to setup your
TestBench?
Spark
Cluster
Spark
Cluster
Spark
Cluster
Spark
Cluster
SQL DB
[Metastore]
Azure Storage(WASB)
BI /Consumption Area
13. Best Practices
• Keep all cluster types similar [VM’s]
• Metastore needs to be S2 or greater
• Add the data storage account as
additional storage account to the cluster
• Do not run benchmark on multiple
clusters at the same time
• Keep Monitoring Yarn/Tez for failures
• Use Byobu mode while running queries
• Beeline is the best client to run query
• Hive view is the worst for running
benchmark queries
• Install Presto from
https://github.com/hdinsight/presto-
hdinsight