SlideShare a Scribd company logo
1 of 16
Download to read offline
EXTENDING BIGBENCH 
Chaitan Baru, Milind Bhandarkar, Carlo Curino, Manuel Danisch, Michael Frank, BhaskarGowda, Hans-Arno Jacobsen, Huang Jie, DileepKumar, Raghu Nambiar, Meikel Poess, Francois Raab, Tilmann Rabl, NishkamRavi, Kai Sachs, SaptakSen, LanYi, and ChoonhanYoun 
TPC TC, September 5, 2014 
Unstructured Data 
Semi-Structured Data 
Structured Data 
Sales 
Customer 
Item 
Marketprice 
Web Page 
Web Log 
Reviews
THE BIGBENCHPROPOSAL 
End to end benchmark 
 
Application level 
Based on a product retailer (TPC 
-DS) 
Focused on Parallel DBMS and MR engines 
History 
 
Launched at 1stWBDB, San Jose 
 
Published at SIGMOD 2013 
 
Spec at WBDB proceedings 2012 (queries & data set) 
 
Full kit at WBDB 2014 
Collaboration with Industry & Academia 
 
First: Teradata, University of Toronto, Oracle, InfoSizing 
 
Now: bankmark, CLDS, Cisco, Cloudera, Hortonworks, Infosizing, Intel, Microsoft, MSRG, Oracle, Pivotal, SAP 
05.09.2014 
EXTENDING BIGBENCH 2
DATA MODEL 
 
Structured: TPC-DS + market prices 
 
Semi-structured: website click-stream 
 
Unstructured: customers’ reviews 
05.09.2014 
EXTENDING BIGBENCH 3 
Unstructured 
Data 
Semi-Structured Data 
Structured Data 
Sales 
Customer 
Item 
Marketprice 
Web Page 
Web Log 
Reviews 
Adapted 
TPC-DS 
BigBench 
Specific
DATA MODEL –3 VS 
Variety 
 
Different schema parts 
Volume 
 
Based on scale factor 
 
Similar to TPC-DS scaling, but continuous 
 
Weblogs & product reviews also scaled 
Velocity 
 
Refresh for all data 
05.09.2014 
EXTENDING BIGBENCH 4
WORKLOAD 
Workload Queries 
 
30 “queries” 
 
Specified in English (sort of) 
 
No required syntax (first implementation in Aster SQL MR) 
 
Kit implemented in Hive, HadoopMR, Mahout, OpenNLP 
Business functions (Adapted from 
McKinsey) 
 
Marketing 
 
Cross-selling, Customer micro-segmentation, Sentiment analysis, Enhancing multichannel consumer experiences 
 
Merchandising 
 
Assortment optimization, Pricing optimization 
 
Operations 
 
Performance transparency, Product return analysis 
 
Supply chain 
 
Inventory management 
 
Reporting (customers and products) 
05.09.2014 
EXTENDING BIGBENCH5
WORKLOAD -TECHNICAL ASPECTS 
Generic Characteristics 
Hive Implementation Characteristics 
05.09.2014 EXTENDING BIGBENCH 6 
Data Sources 
#Queries 
Percentage 
Structured 
18 
60% 
Semi-structured 
7 
23% 
Un-structured 
5 
17% 
Analytic techniques 
#Queries 
Percentage 
Statistics analysis 
6 
20% 
Data mining 
17 
57% 
Reporting 
8 
27% 
Query Types 
#Queries 
Percentage 
Pure HiveQL 
14 
46% 
Mahout 
5 
17% 
OpenNLP 
5 
17% 
Custom MR 
6 
20%
SQL-MR QUERY 1 
05.09.2014 
EXTENDING BIGBENCH 7
HIVEQLQUERY 1 
05.09.2014 
EXTENDING BIGBENCH8 
SELECT pid1, pid2, COUNT (*) AS cnt 
FROM ( 
FROM ( 
FROM ( 
SELECT s.ss_ticket_number AS oid , s.ss_item_sk AS pid 
FROM store_sales s 
INNER JOIN item i ON s.ss_item_sk = i.i_item_sk 
WHERE i.i_category_id in (1 ,2 ,3) and s.ss_store_sk in (10 , 20, 33, 40, 50) 
) q01_temp_join 
MAP q01_temp_join.oid, q01_temp_join.pid 
USING 'cat' 
AS oid, pid 
CLUSTER BY oid 
) q01_map_output 
REDUCE q01_map_output.oid, q01_map_output.pid 
USING 'java -cp bigbenchqueriesmr.jar:hive-contrib.jar de.bankmark.bigbench.queries.q01.Red' 
AS (pid1 BIGINT, pid2 BIGINT) 
) q01_temp_basket 
GROUP BY pid1, pid2 
HAVING COUNT (pid1) > 49 
ORDER BY pid1, cnt, pid2;
SCALING 
Continuous scaling model 
 
Realistic 
SF 1 ~ 1 GB 
Different scaling speeds 
 
Taken from TPC-DS 
 
Static 
 
Square root 
 
Logarithmic 
 
Linear 
05.09.2014 
EXTENDING BIGBENCH9
BENCHMARK PROCESS 
Data Generation 
Loading 
Power Test 
Throughput Test I 
Data Maintenance 
Throughput Test II 
Result 
05.09.2014 
EXTENDING BIGBENCH10 
Adapted to batch systems 
No trickle update 
Measured processes 
Loading 
Power Test (single user run) 
Throughput Test I (multi user run) 
Data Maintenance 
Throughput Test II (multi user run) 
Result 
Additive Metric
METRIC 
Throughput metric 
 
BigBench queries per hour 
Number of queries run 
 
30*(2*S+1) 
Measured times 
Metric 
05.09.2014 
EXTENDING BIGBENCH11 
Source: www.wikipedia.de
COMMUNITY FEEDBACK 
Gathered from authors’ companies, 
customers, other comments 
Positive 
 
Choice of TPC-DS as a basis 
 
Not only SQL 
 
Reference implementation 
Common misunderstanding 
 
Hive-only benchmark 
 
Due to reference implementation 
Feature requests 
 
Graph analytics 
 
Streaming 
 
Interactive querying 
 
Multi-tenancy 
 
Fault-tolerance 
Broader use case 
 
Social elements, advertisement, etc. 
Limited complexity 
 
Keep it simple to run and implement 
05.09.2014 
EXTENDING BIGBENCH12
EXTENDING BIGBENCH 
Better multi 
-tenancy 
 
Different types of streams (OLTP vs OLAP) 
More procedural workload 
 
Complex like page rank, indexing 
 
Simple like word count, sort 
More metrics 
 
Price and energy performance 
 
Performance in failure cases 
ETL workload 
 
Avoid complete reload of data for refresh 
05.09.2014 
EXTENDING BIGBENCH13
TOWARDS AN INDUSTRY STANDARD 
Collaboration with SPEC and TPC 
Aiming for fast development cycles 
Enterprise vs express benchmark (TPC) 
Specification sections ( 
open) 
 
Preamble 
 
High level overview 
 
Database design 
 
Overview of the schema and data 
 
Workload scaling 
 
How to scale the data and workload 
 
Metric and execution rules 
 
Reported metrics and rules on how to run the benchmark 
 
Pricing 
 
Reported price information 
 
Full disclosure report 
 
Wording and format of the benchmark result report 
 
Audit requirements 
 
Minimum audit requirements for an official result, self auditing scripts and tools 
05.09.2014 
EXTENDING BIGBENCH14 
Enterprise 
Express 
Specification 
Kit 
Specificimplementation 
Kit evaluation 
Best optimization 
System tuning (not kit) 
Complete audit 
Self audit / peer review 
Price requirement 
No pricing 
FullACID testing 
ACID self-assessment (no durability) 
Large varietyof configuration 
Focused on key components 
Substantialimplementation cost 
Limited cost, fast implementation
BIGBENCH EXPERIMENTS 
Tests 
on 
 
ClouderaCDH 5.0, Pivotal GPHD-3.0.1.0, IBM InfoSphereBigInsights 
In progress: Spark, Impala, Stinger, … 
3 Clusters (+) 
 
1 node: 2x Xeon E5-2450 0 @ 2.10GHz, 64GB RAM, 2 x 2TB HDD 
 
6 nodes: 2 x Xeon E5-2680 v2 @2.80GHz, 128GB RAM, 12 x 2TB HDD 
 
546 nodes: 2 x Xeon X5670 @2.93GHz, 48GB RAM, 12 x 2TB HDD 
EXTENDING BIGBENCH 
15 
0:00:00 
0:14:24 
0:28:48 
0:43:12 
0:57:36 
1:12:00 
DG 
LOAD 
q1 
q2 
q3 
q4 
q5 
q6 
q7 
q8 
q9 
q10 
q11 
q12 
q13 
q14 
q15 
q16 
q17 
q18 
q19 
q20 
q21 
q22 
q23 
q24 
q25 
q26 
q27 
q28 
q29 
q30 
Q16 
Q01 
05.09.2014
Try it 
-full kit online 
https://github.com/intel-hadoop/Big-Bench 
Contact: 
tilmann.rabl@bankmark.de 
05.09.2014 
EXTENDING BIGBENCH16 
QUESTIONS?

More Related Content

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data

  • 1. EXTENDING BIGBENCH Chaitan Baru, Milind Bhandarkar, Carlo Curino, Manuel Danisch, Michael Frank, BhaskarGowda, Hans-Arno Jacobsen, Huang Jie, DileepKumar, Raghu Nambiar, Meikel Poess, Francois Raab, Tilmann Rabl, NishkamRavi, Kai Sachs, SaptakSen, LanYi, and ChoonhanYoun TPC TC, September 5, 2014 Unstructured Data Semi-Structured Data Structured Data Sales Customer Item Marketprice Web Page Web Log Reviews
  • 2. THE BIGBENCHPROPOSAL End to end benchmark  Application level Based on a product retailer (TPC -DS) Focused on Parallel DBMS and MR engines History  Launched at 1stWBDB, San Jose  Published at SIGMOD 2013  Spec at WBDB proceedings 2012 (queries & data set)  Full kit at WBDB 2014 Collaboration with Industry & Academia  First: Teradata, University of Toronto, Oracle, InfoSizing  Now: bankmark, CLDS, Cisco, Cloudera, Hortonworks, Infosizing, Intel, Microsoft, MSRG, Oracle, Pivotal, SAP 05.09.2014 EXTENDING BIGBENCH 2
  • 3. DATA MODEL  Structured: TPC-DS + market prices  Semi-structured: website click-stream  Unstructured: customers’ reviews 05.09.2014 EXTENDING BIGBENCH 3 Unstructured Data Semi-Structured Data Structured Data Sales Customer Item Marketprice Web Page Web Log Reviews Adapted TPC-DS BigBench Specific
  • 4. DATA MODEL –3 VS Variety  Different schema parts Volume  Based on scale factor  Similar to TPC-DS scaling, but continuous  Weblogs & product reviews also scaled Velocity  Refresh for all data 05.09.2014 EXTENDING BIGBENCH 4
  • 5. WORKLOAD Workload Queries  30 “queries”  Specified in English (sort of)  No required syntax (first implementation in Aster SQL MR)  Kit implemented in Hive, HadoopMR, Mahout, OpenNLP Business functions (Adapted from McKinsey)  Marketing  Cross-selling, Customer micro-segmentation, Sentiment analysis, Enhancing multichannel consumer experiences  Merchandising  Assortment optimization, Pricing optimization  Operations  Performance transparency, Product return analysis  Supply chain  Inventory management  Reporting (customers and products) 05.09.2014 EXTENDING BIGBENCH5
  • 6. WORKLOAD -TECHNICAL ASPECTS Generic Characteristics Hive Implementation Characteristics 05.09.2014 EXTENDING BIGBENCH 6 Data Sources #Queries Percentage Structured 18 60% Semi-structured 7 23% Un-structured 5 17% Analytic techniques #Queries Percentage Statistics analysis 6 20% Data mining 17 57% Reporting 8 27% Query Types #Queries Percentage Pure HiveQL 14 46% Mahout 5 17% OpenNLP 5 17% Custom MR 6 20%
  • 7. SQL-MR QUERY 1 05.09.2014 EXTENDING BIGBENCH 7
  • 8. HIVEQLQUERY 1 05.09.2014 EXTENDING BIGBENCH8 SELECT pid1, pid2, COUNT (*) AS cnt FROM ( FROM ( FROM ( SELECT s.ss_ticket_number AS oid , s.ss_item_sk AS pid FROM store_sales s INNER JOIN item i ON s.ss_item_sk = i.i_item_sk WHERE i.i_category_id in (1 ,2 ,3) and s.ss_store_sk in (10 , 20, 33, 40, 50) ) q01_temp_join MAP q01_temp_join.oid, q01_temp_join.pid USING 'cat' AS oid, pid CLUSTER BY oid ) q01_map_output REDUCE q01_map_output.oid, q01_map_output.pid USING 'java -cp bigbenchqueriesmr.jar:hive-contrib.jar de.bankmark.bigbench.queries.q01.Red' AS (pid1 BIGINT, pid2 BIGINT) ) q01_temp_basket GROUP BY pid1, pid2 HAVING COUNT (pid1) > 49 ORDER BY pid1, cnt, pid2;
  • 9. SCALING Continuous scaling model  Realistic SF 1 ~ 1 GB Different scaling speeds  Taken from TPC-DS  Static  Square root  Logarithmic  Linear 05.09.2014 EXTENDING BIGBENCH9
  • 10. BENCHMARK PROCESS Data Generation Loading Power Test Throughput Test I Data Maintenance Throughput Test II Result 05.09.2014 EXTENDING BIGBENCH10 Adapted to batch systems No trickle update Measured processes Loading Power Test (single user run) Throughput Test I (multi user run) Data Maintenance Throughput Test II (multi user run) Result Additive Metric
  • 11. METRIC Throughput metric  BigBench queries per hour Number of queries run  30*(2*S+1) Measured times Metric 05.09.2014 EXTENDING BIGBENCH11 Source: www.wikipedia.de
  • 12. COMMUNITY FEEDBACK Gathered from authors’ companies, customers, other comments Positive  Choice of TPC-DS as a basis  Not only SQL  Reference implementation Common misunderstanding  Hive-only benchmark  Due to reference implementation Feature requests  Graph analytics  Streaming  Interactive querying  Multi-tenancy  Fault-tolerance Broader use case  Social elements, advertisement, etc. Limited complexity  Keep it simple to run and implement 05.09.2014 EXTENDING BIGBENCH12
  • 13. EXTENDING BIGBENCH Better multi -tenancy  Different types of streams (OLTP vs OLAP) More procedural workload  Complex like page rank, indexing  Simple like word count, sort More metrics  Price and energy performance  Performance in failure cases ETL workload  Avoid complete reload of data for refresh 05.09.2014 EXTENDING BIGBENCH13
  • 14. TOWARDS AN INDUSTRY STANDARD Collaboration with SPEC and TPC Aiming for fast development cycles Enterprise vs express benchmark (TPC) Specification sections ( open)  Preamble  High level overview  Database design  Overview of the schema and data  Workload scaling  How to scale the data and workload  Metric and execution rules  Reported metrics and rules on how to run the benchmark  Pricing  Reported price information  Full disclosure report  Wording and format of the benchmark result report  Audit requirements  Minimum audit requirements for an official result, self auditing scripts and tools 05.09.2014 EXTENDING BIGBENCH14 Enterprise Express Specification Kit Specificimplementation Kit evaluation Best optimization System tuning (not kit) Complete audit Self audit / peer review Price requirement No pricing FullACID testing ACID self-assessment (no durability) Large varietyof configuration Focused on key components Substantialimplementation cost Limited cost, fast implementation
  • 15. BIGBENCH EXPERIMENTS Tests on  ClouderaCDH 5.0, Pivotal GPHD-3.0.1.0, IBM InfoSphereBigInsights In progress: Spark, Impala, Stinger, … 3 Clusters (+)  1 node: 2x Xeon E5-2450 0 @ 2.10GHz, 64GB RAM, 2 x 2TB HDD  6 nodes: 2 x Xeon E5-2680 v2 @2.80GHz, 128GB RAM, 12 x 2TB HDD  546 nodes: 2 x Xeon X5670 @2.93GHz, 48GB RAM, 12 x 2TB HDD EXTENDING BIGBENCH 15 0:00:00 0:14:24 0:28:48 0:43:12 0:57:36 1:12:00 DG LOAD q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20 q21 q22 q23 q24 q25 q26 q27 q28 q29 q30 Q16 Q01 05.09.2014
  • 16. Try it -full kit online https://github.com/intel-hadoop/Big-Bench Contact: tilmann.rabl@bankmark.de 05.09.2014 EXTENDING BIGBENCH16 QUESTIONS?