SlideShare a Scribd company logo
1 of 22
Download to read offline
OLAP*: Effectively &
Efficiently Supporting
Parallel OLAP over Big Data
Alfredo Cuzzocrea*, Rim Moussa‡ & Guandong Xu†
* cuzzocrea@si.deis.unical.it, ICAR-CNR & Univ. of Calabria, Italy
‡ rim.moussa@esti.rnu.tn, LATICE, Univ. of Tunis, Tunisia
† guandong.xu@uts.edu.au, AAI, Univ. of Technology, Australia

th

27 , Sept. 2013

3rd International Conference on Model & Data Engineering
MEDI’2013, Amantea, Calabria, Italy.
Outline
1. Context
2. Parallel Cube Processing
3. Performance Results
4. Related Work
5. Conclusion
6. Future Work

27th, Sept.
2013

MEDI’2013@Amantea

2
Context
Data
Warehouse
Systems

Multidimensional
Databases

OLAP Technologies
• Visual Analytics
BI Dashboards, OLAP cubes,
pivots tables, charts

• High Performance
Data is aggregated
27th, Sept.
2013

MEDI’2013@Amantea

3
Issues & Solution

27th, Sept.
2013

MEDI’2013@Amantea

4
Data Partitioning Schemes
-- DWS model (snowflake): Data View of the cube
Dimension
Table

Dimension
Table

Dimension
Table

Dimension
Table
27th, Sept.
2013

Fact
Table

Dimension
Table
MEDI’2013@Amantea

Dimension
Table

Dimension
Table
5
Different OLAP Business Questions
-– case of study: TPC-H Benchmark

BQ 1: Revenue / supplier
geography location / Part
Brand / Year-quarter-month
BQ 2:Turnover of customers /
geography location / Yearquarter-month / Part brand

27th, Sept.
2013

MEDI’2013@Amantea

6
Data Partitioning Schemes
Computer Cluster

?

DW fully replicated
Fragment Fact Table &
Replicate Dimension Tables
DHP Fact Table, fragment
some Dimension Tables &
replicate the rest
Cube size
Workload Processing
Minimal pre & post-processing, max //

DW Maintenance
Storage overhead
27th, Sept.
2013

MEDI’2013@Amantea

7
Performance Results
--TPC-H*d Benchmark
●

Multi-dimensional design of TPC-H benchmark
–
–

●

Minimal changes to TPC-H relational DB schema
Each SQL statement is mapped into an OLAP cube

TPC-H Workload translated into MDX
–

22 MDX statements for OLAP cubes' run

–

22 MDX statements for OLAP queries' run

27th, Sept.
2013

MEDI’2013@Amantea

8
Performance Results
--C10 example (benchmark ctnd)

27th, Sept.
2013

MEDI’2013@Amantea

9
Performance Results
-- Software Technologies & Hardware

Mondrian ROLAP Server

Mondrian-3.5.0
Jpivot OLAP client

Relational DBMS

Mysql 5.1

Servlet container
●

27th, Sept.
2013

French Grid Platform G5K
● Sophia site
● Suno nodes, 32 GB of memory, each CPU is Intel Xeon E5520,
2.27 GHz, with 2 CPUs per node and 4 cores per CPU
MEDI’2013@Amantea

10
OLAP*mid-tier Architecture

27th, Sept.
2013

MEDI’2013@Amantea

11
Performance Results
--TPC-H*d for SF=10 & single DB backend
Query
workloadd

Cube-Query workload
cube

query

Q1

2,147.33

2,777.49

0.29

Q10

7,100.24

n/a

-

Q11

2,558.21

3,020.27

1,604.1

n/a

n/a

n/a

Q9
●

●

Over 22 business queries: 14 perform as Q1, 4 perform as
Q10, 2 perform as Q11, 2 perform as Q9
The system under test was unable to build big cubes related
to business queries: Q3, Q9, Q10, Q13, Q18 and Q20, either
for memory leaks or systems constraints (max crossjoin size:
2,147,483,647),

27th, Sept.
2013

MEDI’2013@Amantea

12
Performance Results

(ctnd. 1)
--TPC-H*d for SF=10 & 4 DB backends
Query
workload

Cube-Query workload
cube

query

Query
workload

Cube-Query workload
cube

query

Q1

485.73

862.77

0.19

2,147.33

2,777.49

0.29

Q10

2,654.2

13,674.02

1,599.47

7,100.24

n/a

-

Q11

535.75

990.75

505.2

2,558.21

3,020.27

1,604.1

n/a

n/a

n/a

n/a

n/a

n/a

Q9
●

●

●

LineItem is DHPed along Orders, Orders is is DHPed along
Customer, Customer is PHPed, and all the rest are replicated
Over 22 business queries: 20 perform as Q1, Q10, Q11 and 2
perform as Q9
Improvements vary from 42.78% to 100%

27th, Sept.
2013

MEDI’2013@Amantea

13
Performance Results

(ctnd. 2)
--TPC-H*d for SF=10 & 4 DB backends & derived data
Query
workload

Cube-Query workload
cube

query

Query
workload

Cube-Query workload
cube

query

1.10

1.32

0.25

485.73

862.77

0.19

Q10

127.67

9,545.68

5.16

2,654.2

13,674.02

1,599.47

Q11

587.99

875.33

497.67

535.75

990.75

505.2

n/a

n/a

n/a

n/a

n/a

n/a

Q1

Q9
●

●

●

Derived data: Aggregate tables for sparse cubes or cubes having a
fixed size whether is SF, and Derived attributes for OLAP cubes
which size increases with SF
Response times of business queries of both workloads, for which
aggregate tables were built were improved.
The impact of derived attributes is mitigated. Performance results
show good improvements for Q10 and Q21, and small impact on
Q11 (saved operations are not complex).

27th, Sept.
2013

MEDI’2013@Amantea

14
Performance Results
--Derived Data Calculus
Single DB Backend

l_profit
(LineItem is fragmented into 4
fragments)
agg_c15

27th, Sept.
2013

862.4

18,195.48

1,461.99

4,377.51

1,288.31
71.63

10,904.00

ps_excess_YYYY
(PartSupp, Time are replicated
and LineItem is fragmented
into 4 fragments)

862.4

343.91

ps_isminimum
(PartSupp, Supplier, Nation,
Region are replicated )

agg_c1

For each DB Backend

852.84

MEDI’2013@Amantea

15
Related Work
●

PowerDB (Rohm et a., 2000)
–

–

–
●

TPC-R benchmark (SQL workload) for comparing
● fully replicated DW schema
● partial replication and data partitioning (only LineItem
table is fragmented)
PowerDB implements queries' routing algorithms (ShortQueries-ASAP, Affinity-Based routing) for load balancing
and Inter-q and intra-q parallelism
SF=0.1 (300MB all database files included)

cgmOLAP (Chen et al., 2006)
–
–

Panda project (Chen, 2004)
Parallel OLAP cube processor at a rate of 1TB/hour

27th, Sept.
2013

MEDI’2013@Amantea

16
Related Work (ctnd. 1)
●

ParGRES (Paes et al., 2008)
–
–
–
–
–

Automatic parsing of SQL statements, inter and intra-query
parallelism enabled,
Subset of TPC-H workload (Q1, Q3, Q4-Q8, Q12, Q14 and
Q19)
TPC-H with SF=5 (11GB including all DB files)
RDBMS: postgreSQL
32-node shared-nothing cluster, grid5000 clusters (2
CPUs, 1GB of memory)

27th, Sept.
2013

MEDI’2013@Amantea

17
Related Work (ctnd. 2)
●

SmaQSS DBC middleware (Lima et al., 2009)
–
–
–
–
–
–

Combination of physical/virtual partitioning and partial
replication
Partial replication uses chained declusteing
Subset of TPC-H workload (Q1, Q5, Q6, Q12, Q14, Q18
and Q21 coded in SQL)
TPC-H with SF=5 (11GB including all DB files)
RDBMS: postgreSQL
32-node shared-nothing cluster, grid5000 clusters (2
CPUs, 1GB of memory

27th, Sept.
2013

MEDI’2013@Amantea

18
Conclusion
●

Comparison of different DW fragmentation schemes
regarding,
Cube size,
– Distributed cube processing,
– storage overhead,
– DW maintenance
Implementation an OLAP mid-tier
–

●

- Connects to a pool of any RDBMSs through JDBC
- Uses OLAP4j- an open Java API for OLAP
27th, Sept.
2013

MEDI’2013@Amantea

19
Conclusion (ctnd.)
●

●

Performance assessment using TPC-H*d benchmark
and considering the whole workload (22 queries)
Implementation and experiments revealed
–

–

–

MDX language' shortcomings
● for each sub-query, we manually set parameters.
(next/previous value of the member if missing
value)
Mondrian ROLAP server limits
● No infinite combination of dimensions. Indeed, the
limit size is 2,147,483,647
Memory leaks

27th, Sept.
2013

MEDI’2013@Amantea

20
Future Work
●

Inspect the core of Mondrian and revise its source code

●

Automate DW partitioning

●

Consider bigger datasets

●

Consider TPC-DS benchmark (99 business queries,
multiple data marts)

27th, Sept.
2013

MEDI’2013@Amantea

21
Thank you for Your Attention
Q&A
OLAP*: Effectively & Efficiently Supporting Parallel
OLAP over Big Data
Alfredo Cuzzocrea, Rim Moussa & Guandong Xu
MEDI'2013@Amantea
27th Sept. 2013

More Related Content

What's hot

Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expertRim Moussa
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed DatasetsGabriele Modena
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexesDaniel Lemire
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 

What's hot (20)

Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expert
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Druid
DruidDruid
Druid
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Omid: A transactional Framework for HBase
Omid: A transactional Framework for HBaseOmid: A transactional Framework for HBase
Omid: A transactional Framework for HBase
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
 
SQLBits XI - ETL with Hadoop
SQLBits XI - ETL with HadoopSQLBits XI - ETL with Hadoop
SQLBits XI - ETL with Hadoop
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 

Similar to parallel OLAP

Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performanceShenglin Du
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...
Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...
Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...AAKASH S
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsRafael Ferreira da Silva
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesIRJET Journal
 
Tutotial 2 answer
Tutotial 2 answerTutotial 2 answer
Tutotial 2 answerUdaya Kumar
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...
Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...
Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...João Vazão Vasques
 
Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...tcucinotta
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2Mohit Garg
 
An efficient cloudlet scheduling via bin packing in cloud computing
An efficient cloudlet scheduling via bin packing in cloud computingAn efficient cloudlet scheduling via bin packing in cloud computing
An efficient cloudlet scheduling via bin packing in cloud computingIJECEIAES
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentIRJET Journal
 

Similar to parallel OLAP (20)

Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...
Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...
Meeting Deadlines of Scientific Workflows in Public Clouds with Tasks Replica...
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
Tutotial 2 answer
Tutotial 2 answerTutotial 2 answer
Tutotial 2 answer
 
An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...An enhanced adaptive scoring job scheduling algorithm with replication strate...
An enhanced adaptive scoring job scheduling algorithm with replication strate...
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...
Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...
Master Thesis Final Discussion - Decentralised Utility Scheduling Algorithm f...
 
Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
An efficient cloudlet scheduling via bin packing in cloud computing
An efficient cloudlet scheduling via bin packing in cloud computingAn efficient cloudlet scheduling via bin packing in cloud computing
An efficient cloudlet scheduling via bin packing in cloud computing
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
Energy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud EnvironmentEnergy-Efficient Task Scheduling in Cloud Environment
Energy-Efficient Task Scheduling in Cloud Environment
 

More from Rim Moussa

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data ProjectsRim Moussa
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Rim Moussa
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)Rim Moussa
 

More from Rim Moussa (6)

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data Projects
 
EMR AWS Demo
EMR AWS DemoEMR AWS Demo
EMR AWS Demo
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 

Recently uploaded

An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 

parallel OLAP

  • 1. OLAP*: Effectively & Efficiently Supporting Parallel OLAP over Big Data Alfredo Cuzzocrea*, Rim Moussa‡ & Guandong Xu† * cuzzocrea@si.deis.unical.it, ICAR-CNR & Univ. of Calabria, Italy ‡ rim.moussa@esti.rnu.tn, LATICE, Univ. of Tunis, Tunisia † guandong.xu@uts.edu.au, AAI, Univ. of Technology, Australia th 27 , Sept. 2013 3rd International Conference on Model & Data Engineering MEDI’2013, Amantea, Calabria, Italy.
  • 2. Outline 1. Context 2. Parallel Cube Processing 3. Performance Results 4. Related Work 5. Conclusion 6. Future Work 27th, Sept. 2013 MEDI’2013@Amantea 2
  • 3. Context Data Warehouse Systems Multidimensional Databases OLAP Technologies • Visual Analytics BI Dashboards, OLAP cubes, pivots tables, charts • High Performance Data is aggregated 27th, Sept. 2013 MEDI’2013@Amantea 3
  • 4. Issues & Solution 27th, Sept. 2013 MEDI’2013@Amantea 4
  • 5. Data Partitioning Schemes -- DWS model (snowflake): Data View of the cube Dimension Table Dimension Table Dimension Table Dimension Table 27th, Sept. 2013 Fact Table Dimension Table MEDI’2013@Amantea Dimension Table Dimension Table 5
  • 6. Different OLAP Business Questions -– case of study: TPC-H Benchmark BQ 1: Revenue / supplier geography location / Part Brand / Year-quarter-month BQ 2:Turnover of customers / geography location / Yearquarter-month / Part brand 27th, Sept. 2013 MEDI’2013@Amantea 6
  • 7. Data Partitioning Schemes Computer Cluster ? DW fully replicated Fragment Fact Table & Replicate Dimension Tables DHP Fact Table, fragment some Dimension Tables & replicate the rest Cube size Workload Processing Minimal pre & post-processing, max // DW Maintenance Storage overhead 27th, Sept. 2013 MEDI’2013@Amantea 7
  • 8. Performance Results --TPC-H*d Benchmark ● Multi-dimensional design of TPC-H benchmark – – ● Minimal changes to TPC-H relational DB schema Each SQL statement is mapped into an OLAP cube TPC-H Workload translated into MDX – 22 MDX statements for OLAP cubes' run – 22 MDX statements for OLAP queries' run 27th, Sept. 2013 MEDI’2013@Amantea 8
  • 9. Performance Results --C10 example (benchmark ctnd) 27th, Sept. 2013 MEDI’2013@Amantea 9
  • 10. Performance Results -- Software Technologies & Hardware Mondrian ROLAP Server Mondrian-3.5.0 Jpivot OLAP client Relational DBMS Mysql 5.1 Servlet container ● 27th, Sept. 2013 French Grid Platform G5K ● Sophia site ● Suno nodes, 32 GB of memory, each CPU is Intel Xeon E5520, 2.27 GHz, with 2 CPUs per node and 4 cores per CPU MEDI’2013@Amantea 10
  • 12. Performance Results --TPC-H*d for SF=10 & single DB backend Query workloadd Cube-Query workload cube query Q1 2,147.33 2,777.49 0.29 Q10 7,100.24 n/a - Q11 2,558.21 3,020.27 1,604.1 n/a n/a n/a Q9 ● ● Over 22 business queries: 14 perform as Q1, 4 perform as Q10, 2 perform as Q11, 2 perform as Q9 The system under test was unable to build big cubes related to business queries: Q3, Q9, Q10, Q13, Q18 and Q20, either for memory leaks or systems constraints (max crossjoin size: 2,147,483,647), 27th, Sept. 2013 MEDI’2013@Amantea 12
  • 13. Performance Results (ctnd. 1) --TPC-H*d for SF=10 & 4 DB backends Query workload Cube-Query workload cube query Query workload Cube-Query workload cube query Q1 485.73 862.77 0.19 2,147.33 2,777.49 0.29 Q10 2,654.2 13,674.02 1,599.47 7,100.24 n/a - Q11 535.75 990.75 505.2 2,558.21 3,020.27 1,604.1 n/a n/a n/a n/a n/a n/a Q9 ● ● ● LineItem is DHPed along Orders, Orders is is DHPed along Customer, Customer is PHPed, and all the rest are replicated Over 22 business queries: 20 perform as Q1, Q10, Q11 and 2 perform as Q9 Improvements vary from 42.78% to 100% 27th, Sept. 2013 MEDI’2013@Amantea 13
  • 14. Performance Results (ctnd. 2) --TPC-H*d for SF=10 & 4 DB backends & derived data Query workload Cube-Query workload cube query Query workload Cube-Query workload cube query 1.10 1.32 0.25 485.73 862.77 0.19 Q10 127.67 9,545.68 5.16 2,654.2 13,674.02 1,599.47 Q11 587.99 875.33 497.67 535.75 990.75 505.2 n/a n/a n/a n/a n/a n/a Q1 Q9 ● ● ● Derived data: Aggregate tables for sparse cubes or cubes having a fixed size whether is SF, and Derived attributes for OLAP cubes which size increases with SF Response times of business queries of both workloads, for which aggregate tables were built were improved. The impact of derived attributes is mitigated. Performance results show good improvements for Q10 and Q21, and small impact on Q11 (saved operations are not complex). 27th, Sept. 2013 MEDI’2013@Amantea 14
  • 15. Performance Results --Derived Data Calculus Single DB Backend l_profit (LineItem is fragmented into 4 fragments) agg_c15 27th, Sept. 2013 862.4 18,195.48 1,461.99 4,377.51 1,288.31 71.63 10,904.00 ps_excess_YYYY (PartSupp, Time are replicated and LineItem is fragmented into 4 fragments) 862.4 343.91 ps_isminimum (PartSupp, Supplier, Nation, Region are replicated ) agg_c1 For each DB Backend 852.84 MEDI’2013@Amantea 15
  • 16. Related Work ● PowerDB (Rohm et a., 2000) – – – ● TPC-R benchmark (SQL workload) for comparing ● fully replicated DW schema ● partial replication and data partitioning (only LineItem table is fragmented) PowerDB implements queries' routing algorithms (ShortQueries-ASAP, Affinity-Based routing) for load balancing and Inter-q and intra-q parallelism SF=0.1 (300MB all database files included) cgmOLAP (Chen et al., 2006) – – Panda project (Chen, 2004) Parallel OLAP cube processor at a rate of 1TB/hour 27th, Sept. 2013 MEDI’2013@Amantea 16
  • 17. Related Work (ctnd. 1) ● ParGRES (Paes et al., 2008) – – – – – Automatic parsing of SQL statements, inter and intra-query parallelism enabled, Subset of TPC-H workload (Q1, Q3, Q4-Q8, Q12, Q14 and Q19) TPC-H with SF=5 (11GB including all DB files) RDBMS: postgreSQL 32-node shared-nothing cluster, grid5000 clusters (2 CPUs, 1GB of memory) 27th, Sept. 2013 MEDI’2013@Amantea 17
  • 18. Related Work (ctnd. 2) ● SmaQSS DBC middleware (Lima et al., 2009) – – – – – – Combination of physical/virtual partitioning and partial replication Partial replication uses chained declusteing Subset of TPC-H workload (Q1, Q5, Q6, Q12, Q14, Q18 and Q21 coded in SQL) TPC-H with SF=5 (11GB including all DB files) RDBMS: postgreSQL 32-node shared-nothing cluster, grid5000 clusters (2 CPUs, 1GB of memory 27th, Sept. 2013 MEDI’2013@Amantea 18
  • 19. Conclusion ● Comparison of different DW fragmentation schemes regarding, Cube size, – Distributed cube processing, – storage overhead, – DW maintenance Implementation an OLAP mid-tier – ● - Connects to a pool of any RDBMSs through JDBC - Uses OLAP4j- an open Java API for OLAP 27th, Sept. 2013 MEDI’2013@Amantea 19
  • 20. Conclusion (ctnd.) ● ● Performance assessment using TPC-H*d benchmark and considering the whole workload (22 queries) Implementation and experiments revealed – – – MDX language' shortcomings ● for each sub-query, we manually set parameters. (next/previous value of the member if missing value) Mondrian ROLAP server limits ● No infinite combination of dimensions. Indeed, the limit size is 2,147,483,647 Memory leaks 27th, Sept. 2013 MEDI’2013@Amantea 20
  • 21. Future Work ● Inspect the core of Mondrian and revise its source code ● Automate DW partitioning ● Consider bigger datasets ● Consider TPC-DS benchmark (99 business queries, multiple data marts) 27th, Sept. 2013 MEDI’2013@Amantea 21
  • 22. Thank you for Your Attention Q&A OLAP*: Effectively & Efficiently Supporting Parallel OLAP over Big Data Alfredo Cuzzocrea, Rim Moussa & Guandong Xu MEDI'2013@Amantea 27th Sept. 2013