SlideShare a Scribd company logo
1 of 10
Download to read offline
6. April 2018
ABench: Big Data Architecture Stack
Benchmark
[Vision Paper]
Todor Ivanov todor@dbis.cs.uni-frankfurt.de
Goethe University Frankfurt am Main, Germany
http://www.bigdata.uni-frankfurt.de/
Rekha Singhal rekha.singhal@tcs.com
TCS Research – Mumbai, India
http://www.tcs.com
6. April 2018
Motivation
• Growing number of new Big Data technologies and connectors in the Big Data Stacks
 Challenges for Solution Architects, Data Engineers, Data Scientist, Developers, etc.
• Missing benchmarks for each technology, connector or a combination of them
• Consequence  Increasing complexity in the Big Data Architecture Stacks
• Our approach  ABench: Big Data Architecture Stack Benchmark
2ICPE 2018, Berlin, Germany, April 9-13
6. April 2018
ABench Features
• Benchmark Framework
 Data generators or plugins for custom data generators
 Include data generator or public data sets to simulate workload that stresses the
architecture
• Reuse of existing benchmarks
 Case study using BigBench (in the next slides, Streaming and Machine Learning)
• Open source implementation and extendable design
• Easy to setup and extend
• Supporting and combining all four types of benchmarks in ABench
3ICPE 2018, Berlin, Germany, April 9-13
6. April 2018
Benchmarks Types (adapted from Andersen and Pettersen [1])
1. Generic Benchmarking: checks whether an implementation
fulfills given business requirements and specifications (Is the
defined business specification implemented accurately?).
2. Competitive Benchmarking: is a performance comparison
between the best tools on the platform layer that offer similar
functionality (e.g., throughput of MapReduce vs. Spark vs.
Flink).
3. Functional Benchmarking is a functional comparison of the
features of the tool against technologies from the same area.
(e.g., Spark Streaming vs. Spark Structured Streaming vs. Flink
Streaming).
4. Internal Benchmarking: comparing different implementations
of a functionality (e.g., Spark Scala vs. Java vs. R vs. PySpark)
4ICPE 2018, Berlin, Germany, April 9-13
6. April 2018
ABench Framework
5ICPE 2018, Berlin, Germany, April 9-13
Data Model Data StorageData Generation
Workload Generator
Benchmark Control Knobs
Performance Data Collection
Benchmark Validation Benchmark Metrics
Data Model
System to
Benchmark
6. April 2018
Stream Processing Benchmark – Use Case
• Adding stream processing to BigBench [2,3]
• Reuse of the web click logs in JSON format from BigBench V2 [3]
• Adding new streaming workloads
 possibility to execute the queries on a subset of the incoming stream of data
• Provide benchmark implementations based on Spark Streaming and Kafka
• Work In-progress: Exploratory Analysis of Spark Structured Streaming, @PABS 2018, Todor Ivanov
and Jason Taaffe
6ICPE 2018, Berlin, Germany, April 9-13
6. April 2018
Machine Learning Benchmark – Use Case
• Expanding the type of Machine Learning workloads in BigBench [2]
 five (Q5, Q20, Q25, Q26 and Q28) out of the 30 queries cover common ML algorithms
• Proposal by Sweta Singh (IBM)[4] for new workload with Collaborative Filtering using
Matrix Factorization implementation in Spark MLlib via the Alternating Least Squares (ALS)
• Other types of advanced analytics inspired by Gartner [5]:
 descriptive analytics
 diagnostic analytics
 predictive analytics
 prescriptive analytics
• Introduce new ML metrics for scalability and accuracy
7ICPE 2018, Berlin, Germany, April 9-13
6. April 2018
Next Steps
• Building express version of the benchmark framework
• Provide open source implementation of the Use Case benchmarks to stress test the existing
Big Data Architecture Stacks
• Enable the comparison of the most popular technologies (e.g., Kafka, Spark, etc.)
8ICPE 2018, Berlin, Germany, April 9-13
6. April 2018
Thank you for your attention!
This research has been supported by the Research Group of the Standard Performance
Evaluation Corporation (SPEC).
ICPE 2018, Berlin, Germany, April 9-13 9
6. April 2018
REFERENCES
[1] Bjørn Andersen and P-G Pettersen. 1995. Benchmarking handbook. Champman & Hall.
[2] Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al-
Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved
BigBench. In ICDE 2017, San Diego, CA, USA, April 19-22.
[3] Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte,
and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for
Big Data Analytics. In SIGMOD 2013. 1197–1208.
[4] Sweta Singh. 2016. Benchmarking Spark Machine Learning Using BigBench. In 8th TPC
Technology Conference, TPCTC 2016, New Delhi, India, September 5-9, 2016.
[5] Gartner 2017, https://www.gartner.com/doc/3471553/-planning-guide-data-analytics
10ICPE 2018, Berlin, Germany, April 9-13

More Related Content

What's hot

Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformBrandon White
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slidesARDC
 
[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise
[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise
[WSO2 Summit Chicago 2018] How to Build an Agile EnterpriseWSO2
 
SAP Big Data Innovation Lab at the University of Mannheim
SAP Big Data Innovation Lab at the University of MannheimSAP Big Data Innovation Lab at the University of Mannheim
SAP Big Data Innovation Lab at the University of MannheimProf. Dr. Alexander Maedche
 
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Leonidas Akritidis
 
Rahul_Bhatia_resume_new
Rahul_Bhatia_resume_newRahul_Bhatia_resume_new
Rahul_Bhatia_resume_newRahul Bhatia
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...Adel Sabour
 
Better Decisions, Faster – How Good Data Powers Valero
Better Decisions, Faster – How Good Data Powers ValeroBetter Decisions, Faster – How Good Data Powers Valero
Better Decisions, Faster – How Good Data Powers ValeroYokogawa1
 
Ground Technology Services - KeyLAB case study
Ground Technology Services - KeyLAB case studyGround Technology Services - KeyLAB case study
Ground Technology Services - KeyLAB case studyKeynetix
 
Uop bsa-376-individual-assignment-work-related-project-analysis-part-ii
Uop bsa-376-individual-assignment-work-related-project-analysis-part-iiUop bsa-376-individual-assignment-work-related-project-analysis-part-ii
Uop bsa-376-individual-assignment-work-related-project-analysis-part-iishyaminfo102
 
BarnieDAC
BarnieDACBarnieDAC
BarnieDACNplusT
 
Enabling Data-Driven Private-Public Collaborations
Enabling Data-Driven Private-Public CollaborationsEnabling Data-Driven Private-Public Collaborations
Enabling Data-Driven Private-Public CollaborationsTyrone Grandison
 
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...HPCC Systems
 
A Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning GamesA Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning GamesGBLxAPI
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardAbhishek Gupta
 

What's hot (20)

Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise
[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise
[WSO2 Summit Chicago 2018] How to Build an Agile Enterprise
 
Infographics project
Infographics projectInfographics project
Infographics project
 
SAP Big Data Innovation Lab at the University of Mannheim
SAP Big Data Innovation Lab at the University of MannheimSAP Big Data Innovation Lab at the University of Mannheim
SAP Big Data Innovation Lab at the University of Mannheim
 
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
 
IMDb Data Integration
IMDb Data IntegrationIMDb Data Integration
IMDb Data Integration
 
Rahul_Bhatia_resume_new
Rahul_Bhatia_resume_newRahul_Bhatia_resume_new
Rahul_Bhatia_resume_new
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
eric_fitzgerald_resume
eric_fitzgerald_resumeeric_fitzgerald_resume
eric_fitzgerald_resume
 
Metric import framework
Metric import frameworkMetric import framework
Metric import framework
 
Better Decisions, Faster – How Good Data Powers Valero
Better Decisions, Faster – How Good Data Powers ValeroBetter Decisions, Faster – How Good Data Powers Valero
Better Decisions, Faster – How Good Data Powers Valero
 
Ground Technology Services - KeyLAB case study
Ground Technology Services - KeyLAB case studyGround Technology Services - KeyLAB case study
Ground Technology Services - KeyLAB case study
 
Uop bsa-376-individual-assignment-work-related-project-analysis-part-ii
Uop bsa-376-individual-assignment-work-related-project-analysis-part-iiUop bsa-376-individual-assignment-work-related-project-analysis-part-ii
Uop bsa-376-individual-assignment-work-related-project-analysis-part-ii
 
BarnieDAC
BarnieDACBarnieDAC
BarnieDAC
 
Enabling Data-Driven Private-Public Collaborations
Enabling Data-Driven Private-Public CollaborationsEnabling Data-Driven Private-Public Collaborations
Enabling Data-Driven Private-Public Collaborations
 
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
ECL-Watch: A Big Data Application Performance Tuning Tool in the HPCC Systems...
 
A Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning GamesA Deep Dive Implementing xAPI in Learning Games
A Deep Dive Implementing xAPI in Learning Games
 
Big Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecardBig Data Project using HIVE - college scorecard
Big Data Project using HIVE - college scorecard
 
Covid 19 monitor
Covid 19 monitorCovid 19 monitor
Covid 19 monitor
 

Similar to ABench: Big Data Architecture Stack Benchmark

Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...DataBench
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streamingt_ivanov
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecturet_ivanov
 
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...DataBench
 
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseJason Plurad
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019DataBench
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdfssuserf0a206
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...Big Data Value Association
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseDataWorks Summit
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 DataBench
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...Alok Singh
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...DataBench
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...DataWorks Summit
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Data & Analytics at Scale
Data & Analytics at ScaleData & Analytics at Scale
Data & Analytics at ScaleWalid Mehanna
 

Similar to ABench: Big Data Architecture Stack Benchmark (20)

Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
Exploratory Analysis of Spark Structured Streaming, Todor Ivanov, Jason Taafe...
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecture
 
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
Building the DataBench Workflow and Architecture, Todor Ivanov, Bench 2019 - ...
 
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
DataBench Toolbox Demo, Ivan Martinez, Tomas Pariente Lobo, BDV Meet-Up Riga,...
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
Big Data Benchmarking, Tomas Pariente Lobo, Open Expo Europe, 20/06/2019
 
Big Data projects.pdf
Big Data projects.pdfBig Data projects.pdf
Big Data projects.pdf
 
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
 
Airline reservations and routing: a graph use case
Airline reservations and routing: a graph use caseAirline reservations and routing: a graph use case
Airline reservations and routing: a graph use case
 
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018 Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
Big Data Technical Benchmarking, Arne Berre, BDVe Webinar series, 09/10/2018
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Data & Analytics at Scale
Data & Analytics at ScaleData & Analytics at Scale
Data & Analytics at Scale
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 

More from t_ivanov

CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operationst_ivanov
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Adding Velocity to BigBench
Adding Velocity to BigBenchAdding Velocity to BigBench
Adding Velocity to BigBencht_ivanov
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platformst_ivanov
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBencht_ivanov
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusterst_ivanov
 

More from t_ivanov (7)

CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Adding Velocity to BigBench
Adding Velocity to BigBenchAdding Velocity to BigBench
Adding Velocity to BigBench
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platforms
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
 

Recently uploaded

Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
What is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docxWhat is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docxTechnogeeks
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxRemote DBA Services
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
full course of software engineering mid term.pdf
full course of software engineering mid term.pdffull course of software engineering mid term.pdf
full course of software engineering mid term.pdfAbdul salam
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Business Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisBusiness Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisDEEPRAJ PATHAK
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 

Recently uploaded (20)

Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
What is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docxWhat is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docx
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
logical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptxlogical backup of Oracle Datapump-detailed.pptx
logical backup of Oracle Datapump-detailed.pptx
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
full course of software engineering mid term.pdf
full course of software engineering mid term.pdffull course of software engineering mid term.pdf
full course of software engineering mid term.pdf
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024OpenMetadata Community Meeting - 4th April, 2024
OpenMetadata Community Meeting - 4th April, 2024
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Business Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisBusiness Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business Analysis
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 

ABench: Big Data Architecture Stack Benchmark

  • 1. 6. April 2018 ABench: Big Data Architecture Stack Benchmark [Vision Paper] Todor Ivanov todor@dbis.cs.uni-frankfurt.de Goethe University Frankfurt am Main, Germany http://www.bigdata.uni-frankfurt.de/ Rekha Singhal rekha.singhal@tcs.com TCS Research – Mumbai, India http://www.tcs.com
  • 2. 6. April 2018 Motivation • Growing number of new Big Data technologies and connectors in the Big Data Stacks  Challenges for Solution Architects, Data Engineers, Data Scientist, Developers, etc. • Missing benchmarks for each technology, connector or a combination of them • Consequence  Increasing complexity in the Big Data Architecture Stacks • Our approach  ABench: Big Data Architecture Stack Benchmark 2ICPE 2018, Berlin, Germany, April 9-13
  • 3. 6. April 2018 ABench Features • Benchmark Framework  Data generators or plugins for custom data generators  Include data generator or public data sets to simulate workload that stresses the architecture • Reuse of existing benchmarks  Case study using BigBench (in the next slides, Streaming and Machine Learning) • Open source implementation and extendable design • Easy to setup and extend • Supporting and combining all four types of benchmarks in ABench 3ICPE 2018, Berlin, Germany, April 9-13
  • 4. 6. April 2018 Benchmarks Types (adapted from Andersen and Pettersen [1]) 1. Generic Benchmarking: checks whether an implementation fulfills given business requirements and specifications (Is the defined business specification implemented accurately?). 2. Competitive Benchmarking: is a performance comparison between the best tools on the platform layer that offer similar functionality (e.g., throughput of MapReduce vs. Spark vs. Flink). 3. Functional Benchmarking is a functional comparison of the features of the tool against technologies from the same area. (e.g., Spark Streaming vs. Spark Structured Streaming vs. Flink Streaming). 4. Internal Benchmarking: comparing different implementations of a functionality (e.g., Spark Scala vs. Java vs. R vs. PySpark) 4ICPE 2018, Berlin, Germany, April 9-13
  • 5. 6. April 2018 ABench Framework 5ICPE 2018, Berlin, Germany, April 9-13 Data Model Data StorageData Generation Workload Generator Benchmark Control Knobs Performance Data Collection Benchmark Validation Benchmark Metrics Data Model System to Benchmark
  • 6. 6. April 2018 Stream Processing Benchmark – Use Case • Adding stream processing to BigBench [2,3] • Reuse of the web click logs in JSON format from BigBench V2 [3] • Adding new streaming workloads  possibility to execute the queries on a subset of the incoming stream of data • Provide benchmark implementations based on Spark Streaming and Kafka • Work In-progress: Exploratory Analysis of Spark Structured Streaming, @PABS 2018, Todor Ivanov and Jason Taaffe 6ICPE 2018, Berlin, Germany, April 9-13
  • 7. 6. April 2018 Machine Learning Benchmark – Use Case • Expanding the type of Machine Learning workloads in BigBench [2]  five (Q5, Q20, Q25, Q26 and Q28) out of the 30 queries cover common ML algorithms • Proposal by Sweta Singh (IBM)[4] for new workload with Collaborative Filtering using Matrix Factorization implementation in Spark MLlib via the Alternating Least Squares (ALS) • Other types of advanced analytics inspired by Gartner [5]:  descriptive analytics  diagnostic analytics  predictive analytics  prescriptive analytics • Introduce new ML metrics for scalability and accuracy 7ICPE 2018, Berlin, Germany, April 9-13
  • 8. 6. April 2018 Next Steps • Building express version of the benchmark framework • Provide open source implementation of the Use Case benchmarks to stress test the existing Big Data Architecture Stacks • Enable the comparison of the most popular technologies (e.g., Kafka, Spark, etc.) 8ICPE 2018, Berlin, Germany, April 9-13
  • 9. 6. April 2018 Thank you for your attention! This research has been supported by the Research Group of the Standard Performance Evaluation Corporation (SPEC). ICPE 2018, Berlin, Germany, April 9-13 9
  • 10. 6. April 2018 REFERENCES [1] Bjørn Andersen and P-G Pettersen. 1995. Benchmarking handbook. Champman & Hall. [2] Ahmad Ghazal, Todor Ivanov, Pekka Kostamaa, Alain Crolotte, Ryan Voong, Mohammed Al- Kateb, Waleed Ghazal, and Roberto V. Zicari. 2017. BigBench V2: The New and Improved BigBench. In ICDE 2017, San Diego, CA, USA, April 19-22. [3] Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: Towards An Industry Standard Benchmark for Big Data Analytics. In SIGMOD 2013. 1197–1208. [4] Sweta Singh. 2016. Benchmarking Spark Machine Learning Using BigBench. In 8th TPC Technology Conference, TPCTC 2016, New Delhi, India, September 5-9, 2016. [5] Gartner 2017, https://www.gartner.com/doc/3471553/-planning-guide-data-analytics 10ICPE 2018, Berlin, Germany, April 9-13