SlideShare a Scribd company logo
1 of 42
Download to read offline
# 1
19th International Conference on Data Management
COMAD’2013 @ Ahmedabad, India
20th
December
2013
Multidimensional Database Design via
Schema Transformation
Turning TPC-H into the TPC-H*d Multidimensional
Benchmark
Alfredo Cuzzocrea∗
and Rim Moussa‡
*cuzzocrea@si.deis.unical.it ICAR-CNR & University of Calabria, Italy
‡rim.moussa@esti.rnu.tn LaTICE, University of Tunis, Tunisia
Context
# 2
20th
of Dec.
2013
COMAD@Ahmedabad.India
Data
Warehouse
Systems
Multidimensional
Databases
OLAP Technologies
Visual Analytics: BI Dashboards,
OLAP cubes, pivots tables,
Motivations & Goals
# 3
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Motivations 
– BI analysts require Visual OLAP technologies
– According to market watchers, such as Pringle & Company
and Gartner, the market for BI platforms will remain one
of the fastest growing software markets in most regions,
• Goals: propose, implement & test
– Framework for MDB design based on business
requirements,
– Appropriate benchmark for OLAP servers benchmarking.
Motivations & Goals
# 4
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Motivations 
– BI analysts require Visual OLAP technologies
– According to market watchers, such as Pringle & Company
and Gartner, the market for BI platforms will remain one
of the fastest growing software markets in most regions,
• Goals: propose, implement & test
– Framework for MDB design based on business
requirements,
– Appropriate benchmark for OLAP servers.
Outline
• Rules for MDB Design
• Turning TPC-H Benchmark into a TPC-H*d: a
Multidimensional Benchmark
– Initial MDB Schema
– Performances Results
• Derived data based optimizations
– Workload Taxonomy
– Performance Results
• Related Work
• Conclusion
# 5
20th
of Dec.
2013
COMAD@Ahmedabad.India
Problem
• Given,
– A relational warehouse schema
– A Workload composed of business queries
W = {Q1, Q2, …, Qn}
where Qi is a parameterized query
• Multidimensional DB Schema
– How to define OLAP cubes?
• Will there be a single cube or multiple cubes? Are there any rules
for definition of virtual cubes?
– Which optimizations are suitable for performance tuning?
• Derived data calculus & refresh?
• Data partitioning and parallel cube building?
# 6# 6
20th
of Dec.
2013
COMAD@Ahmedabad.India
Idea
Map each business query to an OLAP cube
» Obtain an initial MDB schema composed of OLAP
cubes
Run Optimizations
» Derived Data: Derived attributes, Aggregate tables,
» Virtual Cubes
# 7# 7
20th
of Dec.
2013
COMAD@Ahmedabad.India
SQL Stmt Template
SELECT t1.col_i,t2.col_j,…,tn.col_k
aggregate_function(column) as measure_1, …,
aggregate_function(expression) as measure_m
FROM table_1 t1, table_2 t2, …, table_n tn
WHERE ti.col_x operator $query_parameter$
AND ti.col_y = tj.col_z
AND …
GROUP BY t1.col_i, t2.col_j, …, tn.col_k;
# 8# 8
20th
of Dec.
2013
COMAD@Ahmedabad.India
=, < , <=, >=, !=
min, max, sum, avg, count, count-distinct …
Rules for OLAP Cube Design
--Measures’ Definition
• Measures feature aggregate functions, such as min,
max,count,count-distinct,sum,avg, …
• Simple Measure
– Defined over a single attribute,
– Exple: SUM(l_extendedprice),
• Measure expressions
– Defined over multiple attributes,
– Exple: SUM(l_extendedprice*(1 - l_discount))
• Computed Members
– Defined over measures or measure expressions,
– Exple: M1=SUM(l_extendedprice),
M2=COUNT(l_orderkey), CM = M1 / M2
# 9# 9
20th
of Dec.
2013
COMAD@Ahmedabad.India
Rules for OLAP Cube Design
--Fact Table Definition (1)
• All attributes involved in measures and measure expressions
belong to the fact table!
– Exple: Q10 of TPC-H benchmark,
# 10# 10# 10
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Case measurable attributes belong to different tables The
fact table is the join of tables, to which measurable attributes
belong!
– Exple 1: Q9 of TPC-H benchmark, where l_extendedprice,
l_discount and l_qty belong to lineitem, and
ps_supplycost belongs to partsupp.
– The fact table is the join of lineitem and partsupp tables.
Select attributes needed for join with dimension tables (namely,
l_partkey, l_orderkey, l_suppkey), and
measurable attributes (namely l_extendedprice,
l_discount ,l_quantity, ps_supplycost).
# 11
Rules for OLAP Cube Design
--Fact Table Definition over multiple tables (2)
# 11# 11# 11
20th
of Dec.
2013
COMAD@Ahmedabad.India
# 12
Q9 SQL statement
Rules for OLAP Cube Design
--Fact Table Definition over multiple tables (3)
# 12# 12# 12# 12# 12
20th
of Dec.
2013
COMAD@Ahmedabad.India
# 13
Rules for OLAP Cube Design
--Fact Table Definition over multiple tables (4)
# 13# 13# 13# 13
20th
of Dec.
2013
COMAD@Ahmedabad.India
# 14
Rules for OLAP Cube Design
--Fact Table Definition & filters' processing (6)
# 14# 14# 14# 14
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Filters' Processing: the fact table is filtered using non-
multidimensional predicates,
– Extract all filters involving the fact table from the WHERE
clause, such as
• (attr_i operator attr_j), where both attr_i and attr_j
belong to the fact table,
• (attr_k operator $fixed value$), such that attr_k
belongs to the fact table,
• [not] exists (select … from … where attr_k …),
such that attr_k belongs to the fact table,
• attr_k [not] in (list of fixed values), such that
attr_k belongs to the fact table,
– Exple 1: Q10 of TPC-H benchmark
– Exple 2: Q16 of TPC-H benchmark
– Exple 3: Q21 of TPC-H benchmark
# 15
Rules for OLAP Cube Design
--Fact Table Definition & filters' processing (6)
# 15# 15# 15# 15
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Filters' Processing: the fact table is filtered using non-
multidimensional predicates,
– Extract all filters involving the fact table from the WHERE
clause, such as
• (attr_i operator attr_j), where both attr_i and attr_j
belong to the fact table,
• (attr_k operator $fixed value$), such that attr_k
belongs to the fact table,
• [not] exists (select … from … where attr_k …),
such that attr_k belongs to the fact table,
• attr_k [not] in (list of fixed values), such that
attr_k belongs to the fact table,
– Exple 1: Q10 of TPC-H benchmark
– Exple 2: Q16 of TPC-H benchmark
– Exple 3: Q21 of TPC-H benchmark
# 16
Rules for OLAP Cube Design
--Fact Table Definition & filters' processing (7)
# 16# 16# 16# 16
20th
of Dec.
2013
COMAD@Ahmedabad.India
Q10 SQL statement
# 17
Rules for OLAP Cube Design
--Fact Table Definition & filters' processing (8)
# 17# 17# 17# 17
20th
of Dec.
2013
COMAD@Ahmedabad.India
Q16 SQL statement
# 18
Rules for OLAP Cube Design
--Fact Table Definition & filters' processing (9)
# 18# 18# 18# 18
20th
of Dec.
2013
COMAD@Ahmedabad.India
Q21 SQL statement
# 19
Rules for OLAP Cube Design
--Dimensions' Definition (1)
# 19# 19# 19# 19
20th
of Dec.
2013
COMAD@Ahmedabad.India
First, consider all attributes in the SELECT, WHERE and
GROUP BY clauses,
– discard measurable attributes, which figure out in measures,
– discard attributes which figure out in the WHERE clause, and
are used for joining tables or filtering the fact table,
– Compose time dimension along well known time hierarchies,
• Year, quarter, month
– Compose geography dimension along well known locations'
hierarchies,
• Region, nation, city, district
# 20
Rules for OLAP Cube Design
--Dimensions' Definition (2)
# 20# 20# 20# 20
20th
of Dec.
2013
COMAD@Ahmedabad.India
– Exple: Q10 of TPC-H benchmark, all highlighted attributes are
considered for dimensions' mount!
• Time dimension o_orderdate requires order_year and
order_quarter levels
# 21
Rules for OLAP Cube Design
--Dimensions' Definition (3)
# 21# 21# 21# 21
20th
of Dec.
2013
COMAD@Ahmedabad.India
Second, find out hierarchical relations, i.e., one-to-many
relationships, and re-organize attributes along hierarchies to
form dimensions’ hierarchies,
– Example: Q10 of TPC-H benchmark
• Each customer can be related to at most one nation, but a 
nation may be related to many customers,
customer_dim: n_name (customer_nation) > c_custkey,
c_name, c_acctbal, c_address, c_phone, c_comment,
• order_dim: order_year > order_quarter
# 22
Rules for OLAP Cube Design
--Dimensions' Definition (4)
# 22# 22# 22# 22
20th
of Dec.
2013
COMAD@Ahmedabad.India
Third, distinguish levels from properties. Properties are in
functional dependency with levels,
– Example: Q10 of TPC-H benchmark
• For customer_dim, c_custkey is the level, and all of
c_name, c_acctbal, c_address, c_phone, c_comment
attributes are properties of c_custkey level.
# 23
Rules for OLAP Cube Design
--Dimensions' Definition & Filters' processing (5)
# 23# 23# 23# 23
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Filters Processing: not all tuples in the dimension table should
be considered, so we have to extract filters defined over
dimension tables from the WHERE clause not useful for
multidimensional design,
– Exple 1: Q12 of TPC-H Benchmark. The OLAP cube C12 counts
the nber of urgent and high priorities orders (hig line count), and
the nber of not urgent and not high priorities orders (low line
count) by line_ship_mode, line_receipt_year over orders facts,
and considering only lines such that commit_date < receipt_date
and ship_date < commit_date.
# 24
Rules for OLAP Cube Design
--Dimensions' Definition & Filters' processing (6)
# 24# 24# 24# 24
20th
of Dec.
2013
COMAD@Ahmedabad.India
Q12 SQL statement
# 25
TPC-H*d
--Summary
# 25# 25# 25# 25
20th
of Dec.
2013
COMAD@Ahmedabad.India
●
Multi-dimensional design of TPC-H benchmark
– Minimal changes to TPC-H relational DB schema
– Each SQL statement is mapped into an OLAP cube
●
TPC-H*d Workload
– 23 MDX statements for OLAP cubes' run
– 23 MDX statements for OLAP queries' run
# 26
TPC-H*d
--Screenshots of C10 and Q10 Pivot Tables
# 26# 26# 26# 26
20th
of Dec.
2013
COMAD@Ahmedabad.India
# 27
Performance Results
--Software and Hardware Technologies
# 27# 27# 27# 27
20th
of Dec.
2013
COMAD@Ahmedabad.India
French Grid Platform G5K
• Sophia site
• Suno nodes, 32 GB of memory, each CPU is Intel Xeon E5520,
• 2.27 GHz, with 2 CPUs per node and 4 cores per CPU
Relational DBMS
Mysql 5.1
Jpivot OLAP client
Servlet container
Mondrian ROLAP Server
Mondrian-3.5.0
# 28
Performance Results
--TPC-H*d for SF=10
# 28# 28# 28# 28
20th
of Dec.
2013
COMAD@Ahmedabad.India
●
Over 22 business queries: 14 perform as Q1, 4 perform as
Q10, 2 perform as Q11, 2 perform as Q9
●
The system under test was unable to build big cubes related
to business queries: Q3, Q9, Q10, Q13, Q18 and Q20, either
for memory leaks or systems constraints (max crossjoin size:
2,147,483,647),
Query
workloadd
Cube-Query workload
cube query
Q1 2,147.33 2,777.49 0.29
Q10 7,100.24 n/a -
Q11 2,558.21 3,020.27 1,604.1
Q9 n/a n/a n/a
# 29
Optimizations based on Derived Data
--Aggregate Tables (1)
# 29# 29# 29# 29
20th
of Dec.
2013
COMAD@Ahmedabad.India
• An aggregate table (a.k.a. Materialized view) summarizes
large number of detail rows into information that has a
coarser granularity, and so fewer rows.
– Allows faster query processing,
– Requires refresh: incremental refresh or a total rebuild.
# 30
Optimizations based on Derived Data
--Derived Attributes (1)
# 30# 30# 30# 30
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Alter warehouse schema and calculate attributes,
• It should allow gain in performance, CPU and I/O, (some joins
are no more processed),
• Choose attributes which are not stale after data refresh or
refresh cost is not important,
• Exple of Q10
– Add o_sumlostrevenue for each order,
– This avoids join of LineItem and Orders relations. It saves CPU and I/O.
# 31
Optimizations based on Derived Data
--Derived Attributes vs. OLAP Cube design (1)
# 31# 31# 31# 31
20th
of Dec.
2013
COMAD@Ahmedabad.India
CUSTOMER
ORDERS
LINEITEM
TIME
NATION
CUSTOMER
ORDERS
TIME
NATION
C10, with o_lost_revenue
derived attribute
Original C10
# 32
Workload Taxonomy
# 32# 32# 32# 32
20th
of Dec.
2013
COMAD@Ahmedabad.India
• 2 variables
– Cube dimensionality: size of the cross of dimensions’ sizes
– Cube density :  ratio of the size of the materialized view and the size
of the cross of dimensions’ sizes
• Exple 1: Q4 of TPC-H benchmark
– Cube dimensionality: order_years (7) × nbr_quarters (4) ×
order_priorities (5) × 1 measure (count orders) = 140
• Cube size is SF independent
– Cube density: for SF=0.96 (more than 96% of the cells are not empty)
• dense cube
# 33
Workload Taxonomy (2)
# 33# 33# 33# 33
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Exple 2: Q10 of TPC-H benchmark,
– order_year (7) × nbr_quarters (4) × line_return_flag (3) × customer (SF ×
150,000: selected by 25 nation) × 1 measure = 12,600,000 × SF
• SF dependent
– with restriction return_flag=R (returned parts), order_year (7) ×
nbr_quarters (4) × line_return_flag (1) × customer (SF × 150,000: selected
by 25 nation) × 1 measure = 4,200,000 × SF
– Cube density: for SF=0.12
• sparse cube
# 34
Workload Taxonomy
--Recommendations
# 34# 34# 34# 34
20th
of Dec.
2013
COMAD@Ahmedabad.India
Features TPC-H Business Questions (OLAP Cube)
Medium/high dimensionality
•Dense cube
•Result is not % TPC-H Scale
Factor independent
Build Aggregate Tables
Q1, Q3, Q4, Q5, Q6, Q7, Q8, Q12, Q13,
Q14, Q16, Q19, Q22
13 business questions
•High dimensionality
•Sparse cube, few results,
lots of empty cells
Build Aggregate Tables
Q15, Q18
2 business questions
•High dimensionality
•Result % of Scale Factor
Add Derived Attributes
Q2, Q9, Q10, Q11, Q17, Q20, Q21
7 business questions
# 35
Performnce Results
--TPC-H*d for SF=10 with derived data (ms)
# 35# 35# 35# 35
20th
of Dec.
2013
COMAD@Ahmedabad.India
Query
workload
Cube-Query workload
cube query
1.10 1.39 0.21
2.28 27.38 0.78
329.01 n/a -
2738.46 2723.85 1585.21
n/a n/a n/a
Q1
Q21
Q10
Q11
Q9
●
Response times of business queries of both workloads, for which
aggregate tables were built were improved.
●
The impact of derived attributes is mitigated. Performance results
show good improvements for Q10 and Q21, and small impact on
Q11 (saved operations are not complex).
Query
workload
Cube-Query workload
cube query
2,147.33 2,777.49 0.29
578.09 855.46 0.15
7,100.24 n/a -
2,558.21 3,020.27 1,604.1
n/a n/a n/a
# 36
Related Work
# 36# 36# 36# 36
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Methods for MDB design
– Not generalized, not tested, no empirical tests conducted by Niemi et al.
2011, Romero et al. 2006, Nair et al. 2007
• Variants of TPC-H benchmark
– Star Schema Benchmark (SSB) by O'Neil et al. 2009
• Turning the schema of TPC-H benchmark into a star schema(one fact
table and many dimensions)
• Almost half TPC-H benchmark,
• SQL workload, No OLAP cubes
– MS Analysis Services Test by La Brie et al. 2002
• Turning Q4 into an OLAP cube (MDX stmt)
• Performance measurements
# 37
Conclusion
# 37# 37# 37# 37
20th
of Dec.
2013
COMAD@Ahmedabad.India
• We provided a framework for MDB design
– Tested for the TPC-H benchmark,
– TPC-H is the most prominent DSS benchmark
• Thorough experimentations
– Two workloads types : 
• query-workload
• cube-then-query workload,
– Open source ROLAP server
– Different warehouse volumes SF=1,10
# 38
Future Work
# 38# 38# 38# 38
20th
of Dec.
2013
COMAD@Ahmedabad.India
• Test framework with TPC-DS workload
– 7 data marts
– A hundred of business queries
• Investigate other optimization methods for OLAP over Big Data
– Approximate query processing through data synopsis calculus,
# 39
COMAD’2013@Ahmedabad.India
20th
of December, 2013
Thank you for your
Attention
Q & A
Multidimensional Database Design via Schema
Transformation
Turning TPC-H into the TPC-H*d Multidimensional Benchmark
Alfredo Cuzzocrea & Rim Moussa
# 40
TPC-H*d
--DB Schema
# 40# 40# 40# 40
20th
of Dec.
2013
COMAD@Ahmedabad.India
# 41
Optimizations based on Derived Data
--Q10 -Derived Attribute
# 41# 41# 41# 41
20th
of Dec.
2013
COMAD@Ahmedabad.India
# 42
Performance Results
--Performance of Derived Data Calculus (ms)
# 42# 42# 42# 42
20th
of Dec.
2013
COMAD@Ahmedabad.India
Single DB Backend
ps_isminimum
(PartSupp, Supplier, Nation,
Region are replicated )
862.4
ps_excess_YYYY
(PartSupp, Time are replicated
and LineItem is fragmented
into 4 fragments)
18,195.48
l_profit
(LineItem is fragmented into 4
fragments)
4,377.51
agg_c1 343.91
agg_c15 10,904.00

More Related Content

What's hot

Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016DataStax
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
SQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するかSQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するかShogo Wakayama
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
Modern Data Stack in Motion
Modern Data Stack in MotionModern Data Stack in Motion
Modern Data Stack in MotionAlluxio, Inc.
 
Fargate起動歴1日の男が語る運用の勘どころ
Fargate起動歴1日の男が語る運用の勘どころFargate起動歴1日の男が語る運用の勘どころ
Fargate起動歴1日の男が語る運用の勘どころYuto Komai
 
Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身Amazon Web Services Japan
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
 
Where狙いのキー、order by狙いのキー
Where狙いのキー、order by狙いのキーWhere狙いのキー、order by狙いのキー
Where狙いのキー、order by狙いのキーyoku0825
 
PostgreSQLアンチパターン
PostgreSQLアンチパターンPostgreSQLアンチパターン
PostgreSQLアンチパターンSoudai Sone
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会Shigeru Hanada
 
エスイーが要件定義でやるべきたったひとつのこと
エスイーが要件定義でやるべきたったひとつのことエスイーが要件定義でやるべきたったひとつのこと
エスイーが要件定義でやるべきたったひとつのことYoshitaka Kawashima
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
 
シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析Yohei Azekatsu
 
A critique of ansi sql isolation levels 解説公開用
A critique of ansi sql isolation levels 解説公開用A critique of ansi sql isolation levels 解説公開用
A critique of ansi sql isolation levels 解説公開用Takashi Kambayashi
 

What's hot (20)

Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
SQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するかSQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するか
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
使ってみませんか?pg_hint_plan
使ってみませんか?pg_hint_plan使ってみませんか?pg_hint_plan
使ってみませんか?pg_hint_plan
 
Modern Data Stack in Motion
Modern Data Stack in MotionModern Data Stack in Motion
Modern Data Stack in Motion
 
Fargate起動歴1日の男が語る運用の勘どころ
Fargate起動歴1日の男が語る運用の勘どころFargate起動歴1日の男が語る運用の勘どころ
Fargate起動歴1日の男が語る運用の勘どころ
 
Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
Where狙いのキー、order by狙いのキー
Where狙いのキー、order by狙いのキーWhere狙いのキー、order by狙いのキー
Where狙いのキー、order by狙いのキー
 
PostgreSQLアンチパターン
PostgreSQLアンチパターンPostgreSQLアンチパターン
PostgreSQLアンチパターン
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
 
エスイーが要件定義でやるべきたったひとつのこと
エスイーが要件定義でやるべきたったひとつのことエスイーが要件定義でやるべきたったひとつのこと
エスイーが要件定義でやるべきたったひとつのこと
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析シンプルでシステマチックな Oracle Database, Exadata 性能分析
シンプルでシステマチックな Oracle Database, Exadata 性能分析
 
A critique of ansi sql isolation levels 解説公開用
A critique of ansi sql isolation levels 解説公開用A critique of ansi sql isolation levels 解説公開用
A critique of ansi sql isolation levels 解説公開用
 

Similar to Multidimensional DB design, revolving TPC-H benchmark into OLAP bench

Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsTeradata Aster
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Rim Moussa
 
Under the hood of the Altalis Platform
Under the hood of the Altalis PlatformUnder the hood of the Altalis Platform
Under the hood of the Altalis PlatformSafe Software
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statementsxKinAnx
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature EngineeringBigML, Inc
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotCitus Data
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
1Spatial: Cardiff FME World Tour: Spinning Web and Business Data into Gold
1Spatial: Cardiff FME World Tour:  Spinning Web and Business Data into Gold1Spatial: Cardiff FME World Tour:  Spinning Web and Business Data into Gold
1Spatial: Cardiff FME World Tour: Spinning Web and Business Data into Gold1Spatial
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczLDBC council
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud confamarsri
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...IRJET Journal
 
Domain Driven Design Tactical Patterns
Domain Driven Design Tactical PatternsDomain Driven Design Tactical Patterns
Domain Driven Design Tactical PatternsRobert Alexe
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Karthik Murugesan
 

Similar to Multidimensional DB design, revolving TPC-H benchmark into OLAP bench (20)

Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced Analytics
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)
 
Under the hood of the Altalis Platform
Under the hood of the Altalis PlatformUnder the hood of the Altalis Platform
Under the hood of the Altalis Platform
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
seminar100326a.pdf
seminar100326a.pdfseminar100326a.pdf
seminar100326a.pdf
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco SlotDistributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
Distributing Queries the Citus Way | PostgresConf US 2018 | Marco Slot
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
1Spatial: Cardiff FME World Tour: Spinning Web and Business Data into Gold
1Spatial: Cardiff FME World Tour:  Spinning Web and Business Data into Gold1Spatial: Cardiff FME World Tour:  Spinning Web and Business Data into Gold
1Spatial: Cardiff FME World Tour: Spinning Web and Business Data into Gold
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
 
Project_Report
Project_ReportProject_Report
Project_Report
 
Grill at bigdata-cloud conf
Grill at bigdata-cloud confGrill at bigdata-cloud conf
Grill at bigdata-cloud conf
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...Democratization of NOSQL Document-Database over Relational Database Comparati...
Democratization of NOSQL Document-Database over Relational Database Comparati...
 
Domain Driven Design Tactical Patterns
Domain Driven Design Tactical PatternsDomain Driven Design Tactical Patterns
Domain Driven Design Tactical Patterns
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 

More from Rim Moussa

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data ProjectsRim Moussa
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 TutorialRim Moussa
 
Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expertRim Moussa
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence GeneratorRim Moussa
 
Hadoop ensma poitiers
Hadoop ensma poitiersHadoop ensma poitiers
Hadoop ensma poitiersRim Moussa
 
TPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data cloudsTPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data cloudsRim Moussa
 
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsRim Moussa
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)Rim Moussa
 

More from Rim Moussa (14)

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data Projects
 
ISNCC 2017
ISNCC 2017ISNCC 2017
ISNCC 2017
 
EMR AWS Demo
EMR AWS DemoEMR AWS Demo
EMR AWS Demo
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 Tutorial
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Asd 2015
Asd 2015Asd 2015
Asd 2015
 
Ismis2014 dbaas expert
Ismis2014 dbaas expertIsmis2014 dbaas expert
Ismis2014 dbaas expert
 
Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
 
Hadoop ensma poitiers
Hadoop ensma poitiersHadoop ensma poitiers
Hadoop ensma poitiers
 
TPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data cloudsTPC-H analytics' scenarios and performances on Hadoop data clouds
TPC-H analytics' scenarios and performances on Hadoop data clouds
 
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metricsBenchmarking data warehouse systems in the cloud: new requirements & new metrics
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 
parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
 

Recently uploaded

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 

Recently uploaded (20)

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 

Multidimensional DB design, revolving TPC-H benchmark into OLAP bench

  • 1. # 1 19th International Conference on Data Management COMAD’2013 @ Ahmedabad, India 20th December 2013 Multidimensional Database Design via Schema Transformation Turning TPC-H into the TPC-H*d Multidimensional Benchmark Alfredo Cuzzocrea∗ and Rim Moussa‡ *cuzzocrea@si.deis.unical.it ICAR-CNR & University of Calabria, Italy ‡rim.moussa@esti.rnu.tn LaTICE, University of Tunis, Tunisia
  • 2. Context # 2 20th of Dec. 2013 COMAD@Ahmedabad.India Data Warehouse Systems Multidimensional Databases OLAP Technologies Visual Analytics: BI Dashboards, OLAP cubes, pivots tables,
  • 3. Motivations & Goals # 3 20th of Dec. 2013 COMAD@Ahmedabad.India • Motivations  – BI analysts require Visual OLAP technologies – According to market watchers, such as Pringle & Company and Gartner, the market for BI platforms will remain one of the fastest growing software markets in most regions, • Goals: propose, implement & test – Framework for MDB design based on business requirements, – Appropriate benchmark for OLAP servers benchmarking.
  • 4. Motivations & Goals # 4 20th of Dec. 2013 COMAD@Ahmedabad.India • Motivations  – BI analysts require Visual OLAP technologies – According to market watchers, such as Pringle & Company and Gartner, the market for BI platforms will remain one of the fastest growing software markets in most regions, • Goals: propose, implement & test – Framework for MDB design based on business requirements, – Appropriate benchmark for OLAP servers.
  • 5. Outline • Rules for MDB Design • Turning TPC-H Benchmark into a TPC-H*d: a Multidimensional Benchmark – Initial MDB Schema – Performances Results • Derived data based optimizations – Workload Taxonomy – Performance Results • Related Work • Conclusion # 5 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 6. Problem • Given, – A relational warehouse schema – A Workload composed of business queries W = {Q1, Q2, …, Qn} where Qi is a parameterized query • Multidimensional DB Schema – How to define OLAP cubes? • Will there be a single cube or multiple cubes? Are there any rules for definition of virtual cubes? – Which optimizations are suitable for performance tuning? • Derived data calculus & refresh? • Data partitioning and parallel cube building? # 6# 6 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 7. Idea Map each business query to an OLAP cube » Obtain an initial MDB schema composed of OLAP cubes Run Optimizations » Derived Data: Derived attributes, Aggregate tables, » Virtual Cubes # 7# 7 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 8. SQL Stmt Template SELECT t1.col_i,t2.col_j,…,tn.col_k aggregate_function(column) as measure_1, …, aggregate_function(expression) as measure_m FROM table_1 t1, table_2 t2, …, table_n tn WHERE ti.col_x operator $query_parameter$ AND ti.col_y = tj.col_z AND … GROUP BY t1.col_i, t2.col_j, …, tn.col_k; # 8# 8 20th of Dec. 2013 COMAD@Ahmedabad.India =, < , <=, >=, != min, max, sum, avg, count, count-distinct …
  • 9. Rules for OLAP Cube Design --Measures’ Definition • Measures feature aggregate functions, such as min, max,count,count-distinct,sum,avg, … • Simple Measure – Defined over a single attribute, – Exple: SUM(l_extendedprice), • Measure expressions – Defined over multiple attributes, – Exple: SUM(l_extendedprice*(1 - l_discount)) • Computed Members – Defined over measures or measure expressions, – Exple: M1=SUM(l_extendedprice), M2=COUNT(l_orderkey), CM = M1 / M2 # 9# 9 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 10. Rules for OLAP Cube Design --Fact Table Definition (1) • All attributes involved in measures and measure expressions belong to the fact table! – Exple: Q10 of TPC-H benchmark, # 10# 10# 10 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 11. • Case measurable attributes belong to different tables The fact table is the join of tables, to which measurable attributes belong! – Exple 1: Q9 of TPC-H benchmark, where l_extendedprice, l_discount and l_qty belong to lineitem, and ps_supplycost belongs to partsupp. – The fact table is the join of lineitem and partsupp tables. Select attributes needed for join with dimension tables (namely, l_partkey, l_orderkey, l_suppkey), and measurable attributes (namely l_extendedprice, l_discount ,l_quantity, ps_supplycost). # 11 Rules for OLAP Cube Design --Fact Table Definition over multiple tables (2) # 11# 11# 11 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 12. # 12 Q9 SQL statement Rules for OLAP Cube Design --Fact Table Definition over multiple tables (3) # 12# 12# 12# 12# 12 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 13. # 13 Rules for OLAP Cube Design --Fact Table Definition over multiple tables (4) # 13# 13# 13# 13 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 14. # 14 Rules for OLAP Cube Design --Fact Table Definition & filters' processing (6) # 14# 14# 14# 14 20th of Dec. 2013 COMAD@Ahmedabad.India • Filters' Processing: the fact table is filtered using non- multidimensional predicates, – Extract all filters involving the fact table from the WHERE clause, such as • (attr_i operator attr_j), where both attr_i and attr_j belong to the fact table, • (attr_k operator $fixed value$), such that attr_k belongs to the fact table, • [not] exists (select … from … where attr_k …), such that attr_k belongs to the fact table, • attr_k [not] in (list of fixed values), such that attr_k belongs to the fact table, – Exple 1: Q10 of TPC-H benchmark – Exple 2: Q16 of TPC-H benchmark – Exple 3: Q21 of TPC-H benchmark
  • 15. # 15 Rules for OLAP Cube Design --Fact Table Definition & filters' processing (6) # 15# 15# 15# 15 20th of Dec. 2013 COMAD@Ahmedabad.India • Filters' Processing: the fact table is filtered using non- multidimensional predicates, – Extract all filters involving the fact table from the WHERE clause, such as • (attr_i operator attr_j), where both attr_i and attr_j belong to the fact table, • (attr_k operator $fixed value$), such that attr_k belongs to the fact table, • [not] exists (select … from … where attr_k …), such that attr_k belongs to the fact table, • attr_k [not] in (list of fixed values), such that attr_k belongs to the fact table, – Exple 1: Q10 of TPC-H benchmark – Exple 2: Q16 of TPC-H benchmark – Exple 3: Q21 of TPC-H benchmark
  • 16. # 16 Rules for OLAP Cube Design --Fact Table Definition & filters' processing (7) # 16# 16# 16# 16 20th of Dec. 2013 COMAD@Ahmedabad.India Q10 SQL statement
  • 17. # 17 Rules for OLAP Cube Design --Fact Table Definition & filters' processing (8) # 17# 17# 17# 17 20th of Dec. 2013 COMAD@Ahmedabad.India Q16 SQL statement
  • 18. # 18 Rules for OLAP Cube Design --Fact Table Definition & filters' processing (9) # 18# 18# 18# 18 20th of Dec. 2013 COMAD@Ahmedabad.India Q21 SQL statement
  • 19. # 19 Rules for OLAP Cube Design --Dimensions' Definition (1) # 19# 19# 19# 19 20th of Dec. 2013 COMAD@Ahmedabad.India First, consider all attributes in the SELECT, WHERE and GROUP BY clauses, – discard measurable attributes, which figure out in measures, – discard attributes which figure out in the WHERE clause, and are used for joining tables or filtering the fact table, – Compose time dimension along well known time hierarchies, • Year, quarter, month – Compose geography dimension along well known locations' hierarchies, • Region, nation, city, district
  • 20. # 20 Rules for OLAP Cube Design --Dimensions' Definition (2) # 20# 20# 20# 20 20th of Dec. 2013 COMAD@Ahmedabad.India – Exple: Q10 of TPC-H benchmark, all highlighted attributes are considered for dimensions' mount! • Time dimension o_orderdate requires order_year and order_quarter levels
  • 21. # 21 Rules for OLAP Cube Design --Dimensions' Definition (3) # 21# 21# 21# 21 20th of Dec. 2013 COMAD@Ahmedabad.India Second, find out hierarchical relations, i.e., one-to-many relationships, and re-organize attributes along hierarchies to form dimensions’ hierarchies, – Example: Q10 of TPC-H benchmark • Each customer can be related to at most one nation, but a  nation may be related to many customers, customer_dim: n_name (customer_nation) > c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment, • order_dim: order_year > order_quarter
  • 22. # 22 Rules for OLAP Cube Design --Dimensions' Definition (4) # 22# 22# 22# 22 20th of Dec. 2013 COMAD@Ahmedabad.India Third, distinguish levels from properties. Properties are in functional dependency with levels, – Example: Q10 of TPC-H benchmark • For customer_dim, c_custkey is the level, and all of c_name, c_acctbal, c_address, c_phone, c_comment attributes are properties of c_custkey level.
  • 23. # 23 Rules for OLAP Cube Design --Dimensions' Definition & Filters' processing (5) # 23# 23# 23# 23 20th of Dec. 2013 COMAD@Ahmedabad.India • Filters Processing: not all tuples in the dimension table should be considered, so we have to extract filters defined over dimension tables from the WHERE clause not useful for multidimensional design, – Exple 1: Q12 of TPC-H Benchmark. The OLAP cube C12 counts the nber of urgent and high priorities orders (hig line count), and the nber of not urgent and not high priorities orders (low line count) by line_ship_mode, line_receipt_year over orders facts, and considering only lines such that commit_date < receipt_date and ship_date < commit_date.
  • 24. # 24 Rules for OLAP Cube Design --Dimensions' Definition & Filters' processing (6) # 24# 24# 24# 24 20th of Dec. 2013 COMAD@Ahmedabad.India Q12 SQL statement
  • 25. # 25 TPC-H*d --Summary # 25# 25# 25# 25 20th of Dec. 2013 COMAD@Ahmedabad.India ● Multi-dimensional design of TPC-H benchmark – Minimal changes to TPC-H relational DB schema – Each SQL statement is mapped into an OLAP cube ● TPC-H*d Workload – 23 MDX statements for OLAP cubes' run – 23 MDX statements for OLAP queries' run
  • 26. # 26 TPC-H*d --Screenshots of C10 and Q10 Pivot Tables # 26# 26# 26# 26 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 27. # 27 Performance Results --Software and Hardware Technologies # 27# 27# 27# 27 20th of Dec. 2013 COMAD@Ahmedabad.India French Grid Platform G5K • Sophia site • Suno nodes, 32 GB of memory, each CPU is Intel Xeon E5520, • 2.27 GHz, with 2 CPUs per node and 4 cores per CPU Relational DBMS Mysql 5.1 Jpivot OLAP client Servlet container Mondrian ROLAP Server Mondrian-3.5.0
  • 28. # 28 Performance Results --TPC-H*d for SF=10 # 28# 28# 28# 28 20th of Dec. 2013 COMAD@Ahmedabad.India ● Over 22 business queries: 14 perform as Q1, 4 perform as Q10, 2 perform as Q11, 2 perform as Q9 ● The system under test was unable to build big cubes related to business queries: Q3, Q9, Q10, Q13, Q18 and Q20, either for memory leaks or systems constraints (max crossjoin size: 2,147,483,647), Query workloadd Cube-Query workload cube query Q1 2,147.33 2,777.49 0.29 Q10 7,100.24 n/a - Q11 2,558.21 3,020.27 1,604.1 Q9 n/a n/a n/a
  • 29. # 29 Optimizations based on Derived Data --Aggregate Tables (1) # 29# 29# 29# 29 20th of Dec. 2013 COMAD@Ahmedabad.India • An aggregate table (a.k.a. Materialized view) summarizes large number of detail rows into information that has a coarser granularity, and so fewer rows. – Allows faster query processing, – Requires refresh: incremental refresh or a total rebuild.
  • 30. # 30 Optimizations based on Derived Data --Derived Attributes (1) # 30# 30# 30# 30 20th of Dec. 2013 COMAD@Ahmedabad.India • Alter warehouse schema and calculate attributes, • It should allow gain in performance, CPU and I/O, (some joins are no more processed), • Choose attributes which are not stale after data refresh or refresh cost is not important, • Exple of Q10 – Add o_sumlostrevenue for each order, – This avoids join of LineItem and Orders relations. It saves CPU and I/O.
  • 31. # 31 Optimizations based on Derived Data --Derived Attributes vs. OLAP Cube design (1) # 31# 31# 31# 31 20th of Dec. 2013 COMAD@Ahmedabad.India CUSTOMER ORDERS LINEITEM TIME NATION CUSTOMER ORDERS TIME NATION C10, with o_lost_revenue derived attribute Original C10
  • 32. # 32 Workload Taxonomy # 32# 32# 32# 32 20th of Dec. 2013 COMAD@Ahmedabad.India • 2 variables – Cube dimensionality: size of the cross of dimensions’ sizes – Cube density :  ratio of the size of the materialized view and the size of the cross of dimensions’ sizes • Exple 1: Q4 of TPC-H benchmark – Cube dimensionality: order_years (7) × nbr_quarters (4) × order_priorities (5) × 1 measure (count orders) = 140 • Cube size is SF independent – Cube density: for SF=0.96 (more than 96% of the cells are not empty) • dense cube
  • 33. # 33 Workload Taxonomy (2) # 33# 33# 33# 33 20th of Dec. 2013 COMAD@Ahmedabad.India • Exple 2: Q10 of TPC-H benchmark, – order_year (7) × nbr_quarters (4) × line_return_flag (3) × customer (SF × 150,000: selected by 25 nation) × 1 measure = 12,600,000 × SF • SF dependent – with restriction return_flag=R (returned parts), order_year (7) × nbr_quarters (4) × line_return_flag (1) × customer (SF × 150,000: selected by 25 nation) × 1 measure = 4,200,000 × SF – Cube density: for SF=0.12 • sparse cube
  • 34. # 34 Workload Taxonomy --Recommendations # 34# 34# 34# 34 20th of Dec. 2013 COMAD@Ahmedabad.India Features TPC-H Business Questions (OLAP Cube) Medium/high dimensionality •Dense cube •Result is not % TPC-H Scale Factor independent Build Aggregate Tables Q1, Q3, Q4, Q5, Q6, Q7, Q8, Q12, Q13, Q14, Q16, Q19, Q22 13 business questions •High dimensionality •Sparse cube, few results, lots of empty cells Build Aggregate Tables Q15, Q18 2 business questions •High dimensionality •Result % of Scale Factor Add Derived Attributes Q2, Q9, Q10, Q11, Q17, Q20, Q21 7 business questions
  • 35. # 35 Performnce Results --TPC-H*d for SF=10 with derived data (ms) # 35# 35# 35# 35 20th of Dec. 2013 COMAD@Ahmedabad.India Query workload Cube-Query workload cube query 1.10 1.39 0.21 2.28 27.38 0.78 329.01 n/a - 2738.46 2723.85 1585.21 n/a n/a n/a Q1 Q21 Q10 Q11 Q9 ● Response times of business queries of both workloads, for which aggregate tables were built were improved. ● The impact of derived attributes is mitigated. Performance results show good improvements for Q10 and Q21, and small impact on Q11 (saved operations are not complex). Query workload Cube-Query workload cube query 2,147.33 2,777.49 0.29 578.09 855.46 0.15 7,100.24 n/a - 2,558.21 3,020.27 1,604.1 n/a n/a n/a
  • 36. # 36 Related Work # 36# 36# 36# 36 20th of Dec. 2013 COMAD@Ahmedabad.India • Methods for MDB design – Not generalized, not tested, no empirical tests conducted by Niemi et al. 2011, Romero et al. 2006, Nair et al. 2007 • Variants of TPC-H benchmark – Star Schema Benchmark (SSB) by O'Neil et al. 2009 • Turning the schema of TPC-H benchmark into a star schema(one fact table and many dimensions) • Almost half TPC-H benchmark, • SQL workload, No OLAP cubes – MS Analysis Services Test by La Brie et al. 2002 • Turning Q4 into an OLAP cube (MDX stmt) • Performance measurements
  • 37. # 37 Conclusion # 37# 37# 37# 37 20th of Dec. 2013 COMAD@Ahmedabad.India • We provided a framework for MDB design – Tested for the TPC-H benchmark, – TPC-H is the most prominent DSS benchmark • Thorough experimentations – Two workloads types :  • query-workload • cube-then-query workload, – Open source ROLAP server – Different warehouse volumes SF=1,10
  • 38. # 38 Future Work # 38# 38# 38# 38 20th of Dec. 2013 COMAD@Ahmedabad.India • Test framework with TPC-DS workload – 7 data marts – A hundred of business queries • Investigate other optimization methods for OLAP over Big Data – Approximate query processing through data synopsis calculus,
  • 39. # 39 COMAD’2013@Ahmedabad.India 20th of December, 2013 Thank you for your Attention Q & A Multidimensional Database Design via Schema Transformation Turning TPC-H into the TPC-H*d Multidimensional Benchmark Alfredo Cuzzocrea & Rim Moussa
  • 40. # 40 TPC-H*d --DB Schema # 40# 40# 40# 40 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 41. # 41 Optimizations based on Derived Data --Q10 -Derived Attribute # 41# 41# 41# 41 20th of Dec. 2013 COMAD@Ahmedabad.India
  • 42. # 42 Performance Results --Performance of Derived Data Calculus (ms) # 42# 42# 42# 42 20th of Dec. 2013 COMAD@Ahmedabad.India Single DB Backend ps_isminimum (PartSupp, Supplier, Nation, Region are replicated ) 862.4 ps_excess_YYYY (PartSupp, Time are replicated and LineItem is fragmented into 4 fragments) 18,195.48 l_profit (LineItem is fragmented into 4 fragments) 4,377.51 agg_c1 343.91 agg_c15 10,904.00