SlideShare a Scribd company logo
1 of 39
TPC-H Performance
MPP & Column Store
What is TPCH
 The TPC Benchmark™H (TPC-H) is a decision support benchmark.
It consists of a suite of business oriented ad-hoc queries and
concurrent data modifications. The queries and the data populating
the database have been chosen to have broad industry-wide
relevance while maintaining a sufficient degree of ease of
implementation. This benchmark illustrates decision support
systems that
 Examine large volumes of data;
 Execute queries with a high degree of complexity;
 Give answers to critical business questions.
 The performance metric reported by TPC-H is called the TPC-H
Composite Query-per-Hour Performance Metric (QphH@Size), and
reflects multiple aspects of the capability of the system to process
queries. These aspects include the selected database size against
which the queries are executed, the query processing power when
queries are submitted by a single stream and the query throughput
when queries are submitted by multiple concurrent users.
Overview
TPC-H Schema overview
TPC-H Performance measurements
Partner engagement
TPC-H where is it today
TPC-H challenges
Looking ahead
Q&A
TPC-H Schema overview: Relationships between columns
TPC-H Schema overview : MPP data distribution
Table Column Node 1 Node 2 Node 3
LINEITEM
ORDERKEY 1 2 3
PARTKEY 6 4 8
SUPPKEY 3 18 5
ORDERS
ORDERKEY 1 2 3
CUSTKEY 4 2 9
PARTSUPP
PARTKEY 1 2 3
SUPPKEY 4 5 6
PART PARTKEY 1 2 3
CUSTOMER CUSTKEY 1 2 3
SUPPLIER SUPPKEY 1..N 1..N 1..N
NATION NATIONKEY 1..N 1..N 1..N
REGION REGIONKEY 1..N 1..N 1..N
Collocated
Over network data movement
Collocated Over network data movement
Table Distribution column
LINEITEM L_ORDERKEY
ORDERS O_ORDERKEY
PARTSUPP PS_PARTKEY
PART P_PARTKEY
CUSTOMER C_CUSTKEY
SUPPLIER REPLICATED
NATION REPLICATED
REGION REPLICATED
TPC-H Schema : Metrics
 Power:
 Run order
 RF1 (Inserts into LINEITEM and ORDERS)
 22 read only queries
 RF2 (Deletes from LINEITEM & ORDERS)
 Metric :
 Query per hour rate
 TPC-H Power@Size = 3600 * SF / Geomean(22
queries , RF1, RF2)
 Geometric mean of all queries results in a run
 Performance improvements to any query equally
improves the metric
 Throughput:
 Run orders
 N concurrent Power query streams with different
parameters
 N RF1 & RF2 streams, this can be run in parallel
with the concurrent streams above or after
 Metric :
 Ratio of the total number of queries executed over
the length of the measurement interval
 TPC-H Throughput@Size = (S*22*3600)/Ts *SF
 Absolute runtime matters, optimizing for the longest
running query helps
Throughput
Power
Run in
Parallel
Query Stream 01
Refresh function 1
Inserts into
LINEITEM & ORDERS
Query Stream 02
Query stream 00
14,2,9,20,6…5,7,12
…
Query Stream N
Refresh function 2
Deletes from
LINEITEM & ORDERS
Refresh streams
with N pairs of
RF1 & 2
Scale Factor Number of streams
100 5
300 6
1000 7
3000 8
10000 9
30000 10
100000 11
Outline
TPC-H Schema overview
TPC-H Performance measurements
Partner engagement
TPC-H where is it today
TPC-H challenges
Looking ahead
Q&A
TPC-H Performance measurements
 Invest in tools to analyze plans,
some consider plan analysis an art,
breaking down the plan to key metrics
helps a lot
 Capture enough information in the
execution plan to unveil performance
issues:
 Estimate Vs. Actual number of rows
etc..
 Amount of data spilled per disk
 Rows touched Vs. rows qualified
during scan
 Logical Vs. Physical reads
 CPU & Memory consumed per plan
operator
 Skew in number of rows processed
per thread per operator
 Instrument the code to provide
cycles per row for key scenarios:
 Scan
 Aggregate
 Join
Set
performance
goals
Measure
Performance
Start looking
at SMP &
MPP plans
Check CPU
& IO
utilization
Fix
performance
issues
Repeat
TPC-H Performance measurements
 Scalability within a single server
 Vary the number of processors
 Vary scale factor : 100G, 300G
 Identify queries that don‟t have linear scaling
 Capture:
 CPU & IO utilization per query with at least 1
second sampling rate
 Capture hot functions and waits if any
 Capture CPI ideally per function
 Capture execution plans
 Get busy crunching the data
 Scalability across multiple servers
 Vary the number of servers in the systems
 Vary amount of data per server
 Capture:
 CPU , disk & network IO
 Distributed plans
 Look for queries that have excessive cross node
traffic
 Identify suboptimal plans where
predicates/aggregates are not pushed down
More focused
performance effort
MPP
scaling
Data
scaling
SMP
scaling
Outline
TPC-H Schema overview
TPC-H Performance measurements
Partner engagement
TPC-H where is it today
TPC-H challenges
Looking ahead
Q&A
Partner engagements
 Can be considered as one of the secret sauces for highly performing
software
 Partners (HW/Infrastructure) tend to have vested interest in
showcasing Performance and Scalability of their products.
 Allows software companies to leverage HW expertise and provide
access to low level tools that are not publically available (Through
NDA).
 Partners occasionally provide HW for Performance benchmarks,
prototype evaluation, release publications
 Partners can be a great assist for :
 Providing low level analysis
 Collaborate in publications, benchmarks, proof of concepts etc..
 Provide HW for Performance testing, evaluation, improvement (large
scale experiments are expensive)
Partner engagements
 NVRAM: Random-access memory that retains its information when
power is turned off (non-volatile). This is in contrast to dynamic random-
access memory (DRAM)
 “Promises”:
 Latency within the same order of magnitude of DRAM
 Cheaper than SSDs
 +10TB of NVRAM in a 2-socket system within the next 4 years
 Still in prototype phase
 Could eliminates need for spinning disks or SSDs altogether
 In-memory database are likely to be early adopters of such
technology
 Good reading:
 http://research.microsoft.com/en-us/events/trios/trios13-final5.pdf
 http://www.hpl.hp.com/techreports/2013/HPL-2013-78R1.pdf
Partner engagements
Diablo technologies SSD in DRAM slot
http://www.diablo-technologies.com/
Partner engagements
Diablo technologies SSD in DRAM slot
DIMM capacity of 200GB & 400GB, technology is rebranded by IBM and VmWare Ready
http://www.diablo-technologies.com/
Outline
TPC-H Schema overview
TPC-H Performance measurements
Partner engagement
TPC-H where is it today
TPC-H challenges
Looking ahead
Q&A
TPC-H where is it today Why do benchmarks?
 Stimulate technological advancements
 Why TPCH?
 Introduce a set of technological challenges whose resolution will significantly improve the performance of the product
 As benchmark is it relevant to current DW applications ?
 Gartner Magic quadrant references:
“Vectorwise delivered leading 1TB non-clustered TPC Benchmark H (TPC-H) results in 2012”
 Big players are Oracle, Vectorwise, Microsoft, Exasol and Paraccel
 Most significant innovation came from:
 Kickfire acquired by Teradata, FPGA-based "Query Processor Module” with an instruction set tuned for database operations
 ParAccel acquired by Actian, shared-nothing architecture with a columnar orientation, adaptive compression, memory-centric design
 Exasol .. column-oriented way and proprietary InMemory compression methods are used, database also has automatic self optimization
(create indexes, stats , distribute tables etc.. )
 So where does it come in handy?
 Identify system bottlenecks
 Push performance focused features into the product
 TPC-H schema is heavily used for ETL and virtualization benchmarks
 Introduces lots of interesting challenges to the DMBS
 What about TPC-DS, it has a more realistic ETL process , snow flake schema, but no one has published a TPC-DS benchmark yet
TPC-H where is it today
 Number of publications is on the decline
99 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Number of publications 9 1 5 12 31 15 42 31 20 13 15 10 20 5 6
0
5
10
15
20
25
30
35
40
45
Numberofpublications
Number of TPCH publications per year
• First cloud based benchmark? When will we see
this?
Outline
TPC-H Schema overview
TPC-H Performance measurements
Partner engagement
TPC-H where is it today
TPC-H challenges
Looking ahead
Q&A
TPC-H challenges : Aggregation
 Almost all TPCH queries do aggregation
 Unless there is a sorted index (B-tree) on group by column aggregating in Hash table makes most sense
opposed to ordered aggregation
 Correctly sizing the hash table dictates performance
 If cardinality under estimates number of distinct values lots of chaining occurs and HT can eventually spill to
disk.
 If CE overestimates resources are not used optimally
 For low distinct count doing hash table per thread (local) then doing a global aggregation improves
performance
 For small group by on strings, present group by expressions as integers (index in array) opposed to
using a hash table (Reduce cache footprint)
 For group by on Primary key (C_CUSTKEY) no need to include other columns from CUSTOMER in the
Hash table
 Main benefits from PK/FK is aggregate optimizations
 Queries sensitive to aggregation performance:
 1, 3, 4, 10, 13, 18, 20, 21
TPC-H challenges : Aggregation
Q1
Reduces 6 billion rows
to 4
Sensitive to string
matching
Benefits from doing
local aggregation
Q10
Group by on most
Customer columns
If PK on C_CUSTKEY
exists could use
C_CUSTKEY for
aggregation
Further optimization
push down of aggregate
on O_CUSTKEY and
TOP
18
Group by on
L_ORDERKEY results
in 1.5 billion rows (4x
reduction)
Local aggregation
usually hurts
performance
Hash table for
aggregation alone can
take 25GB of RAM
TPC-H challenges : Joins
 Select a schema which leverages locality
Examples : ORDERS x LINEITEM on
L_ORDERKEY=O_ORDERKEY by hash partitioning on
ORDERKEY
 Q5,Q9,Q18 can spill and have bad performance if the correct
plan is not picked
 Q9 will cause over the network communication for MPP
systems, unless PARTSUPP, PART and SUPPLIER are
replicated which is not feasible for large scale factors
 TPCH joins are highly selective, hence efficient bloom filters
are necessary
 Simplistic guide : Find the most selective filter/aggregation
and this is where you start
TPC-H challenges : Expression evaluation
Arithmetic
operation
performance
Store decimals as
integers and save
some bits
19123 Vs. 191.23
Rebase of some of
the columns to use
less bits
Keep data in the
most compact form
to best exploit
SIMD instructions
Detecting
common sub
expressions
sum(l_extendedprice
) as
sum_base_price,
sum(l_extendedprice
*(1-l_discount)) as
sum_disc_price,
sum(l_extendedprice
*(1-
l_discount)*(1+l_tax)
) as sum_charge,
Expression
filter push
down
(Q7, Q19)
Q7 Take the
superset or
UNION of filters
and push down
to the scan
Q19 Take the
union of
individual
predicates
Column
projection vs
expression
evaluation
Cardinality
estimates
should help
decide to
Project
columns A & B
or or (A * (1 -
B) ) before a
filter on C
TPC-H challenges : Correlated subqueries
 Push down of predicates into subquery when applicable
 When sub queries are flattened batch processing
outperforms row by row
 Buffer overlapped intermediate results
 Partial query reuse
 Challenging for MPP systems (don‟t redistribute or
shuffle the same data twice)
TPC-H challenges : Parallelism and concurrency
 Current 2P servers have +48 cores, +½ TB of RAM &
+10GB/sec of disk IO BW, this means that within a single
box the engine needs to provide meaningful scaling
 Further sub-partitioning data on a single server alleviates
single server scaling problems
 TPC-H queries tend to use lots of workspace memory for
Joins and aggregations.
 Precise and dynamic memory allocation keeps queries
from spilling to under high concurrency
TPC-H challenges : Scan performance
 Disk read performance is crucial, should validate that
when system is not CPU bound IO subsystem is
efficiently used.
 Ability to filter out pages or segments from the scan is
crucial
 In memory scan performance can be increased if we
decrease the search scope and thereby the amount of
data that needs to be streamed from main memory to the
CPU
TPC-H challenges : Scan performance
Store dictionaries in sorted order or in a BST to make
• Compress the filter or predicate to do numeric comparison
opposed to decompress and match on strings
• Quickly validates if the value exists in the segment
TPC-H challenges : scan performance
 What do we do for highly selective filters?
 Implement paged indexes for columns of interest
 Partition a column into pages, store bitmap indices for each compressed value, bits reflect
which rows have the respective value, instead of scanning the entire segment for the
matching row , we only read the block which has the matching values aka bits set.
http://db.disi.unitn.eu/pages/VLDBProgram/pdf/IMDM/paper2.pdf
 In MPP a single SQL statement results in multiple SQL
statements that get executed locally on each node
 Some TPCDS queries can result in +20 SQL statements
that need be executed on each leaf node locally
 Steaming of data should result in better performance but
there are cases when this strategy fails.
 Placing data on disk after each steps allows the Query
optimizer to reevaluate the plan
TPC-H challenges : Intermediate steps in MPP
 Query :
Select count(*) from PART, PARTSUPP , LINEITEM where P_BRAND=“NIKE”
and PS_COMMENT like “%bla%” and P_PARTKEY=PS_PARTKEY and
L_PARTKEY = PS_PARTKEY group by P_BRAND
 Schema :
 PART distributed on P_PARTKEY
 PARTSUPP distributed on PS_PARTKEY
 LINEITEM distributed on L_ORDERKEY
 Create bloom filters BF1 on PART, push filter on PARTSUPP and
create BF2 , replicate bloom filter on all leaf nodes apply filter on
LINEITEM and only shuffle qualifying rows on
 Optimizer should chose between semi join reduction and
replicating PART x PARTSUPP
 Multiple copies of a set of columns distributed differently can
improve performance of such issue but at high cost.
TPC-H challenges : Improving join performance for incompatible joins
Outline
TPC-H Schema overview
TPC-H Performance measurements
Partner engagement
TPC-H where is it today
TPC-H challenges
Looking ahead
Q&A
 SQL to map reduce jobs? Crunching data in relational
database is always faster than HADOOP, bring data
from HADOOP into columnar format , perform analytics
with efficient generated code
 Full integration with analytics tools as SAS , R
, Tableau , Excel etc…
 Support PL/SQL syntax (Oracle Compete)
 Eliminate the aggregating node to reduce system cost
for a small number of nodes, Exasol does it.
Looking ahead
Competitive analysis
Exasol 1TB 240
threads, 20
processors
Exasol 1TB 768
threads, 64
processors
Exasol 3TB 960
threads, 80
processors
MemSql 83GB
480 threads, 40
sockets
Ms SqlServer
10TB, 160
threads, 8
processors
Oracle 11c
10TB, 512
threads, 4
processors
Sec/GB/Thread 1.4 1.5 1.5 46.7 8.1 40.7
-
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
Sec/GB/Thread
TPCH Q1 analysis Sec/GB/Thread (Lower is better)
Assuming all processors have the same speed!!!!
Referances:
• http://www.tpc.org/tpch/results/tpch_perf_results.asp
• http://www.esg-global.com/lab-reports/memsqle28099s-distributed-in-
memory-database/
Appendinx
 GMQ 2013
http://www.gartner.com/technology/reprints.do?id=1-
1DU2VD4&ct=130131&st=sb
GMQ 2014
http://www.gartner.com/technology/reprints.do?id=1-
1M9YEHW&ct=131028&st=sb
TPC-H column store
 Avoid virtual function calls, branching use templates
 Scan usually dominates CPU profile
 Vector/Batch processing is a must
 If done correctly code is very sensitive to branching, data dependency, exploit
instruction parallelism when possible
 Use SIMD instructions , leverage already existing libraries to encapsulate SSE
instructions complexity
 // define and initialize integer vectors a and b
 Vec4i a(10,11,12,13);
 Vec4i b(20,21,22,23);
 // add the two vectors
 Vec4i c = a + b;
 http://www.agner.org/optimize/vectorclass.pdf
TPC-H Plans
 Behold the power of the optimizer
 If plan is wrong you are doomed…
 Very good read for TPCH Q8
http://www.slideshare.net/GraySystemsLab/pass-summit-
2010-keynote-david-dewitt
JSON documents
 Most efficient way to store Json documents
 Great compression and quick retrieval, ask me how to
….
Q1
 Used as benchmark for computational power
 Arithmetic operation performance
 Aggregating to same hash buckets
 Common sub expressions pattern matching
 Scan performance sensitive
 String matching for aggregation (Could do matching on compressed format)
select l_returnflag, l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from lineitem
where l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3)
group by l_returnflag, l_linestatus
order by l_returnflag, l_linestatus;
Challenges
Q2
 Correlated sub query
 Push down of predicates to the correlated subquery
 Highly selective (Segment size plays a big role)
 Tricky to generate optimal plan
 Depending on which tables are partitioned and which are replicated, plan
performance varies a lot.
select
s_acctbal,s_name, n_name, p_partkey,
p_mfgr, s_address, s_phone, s_comment
from part, supplier,
partsupp, nation, region
where p_partkey = ps_partkey
and s_suppkey = ps_suppkey and p_size = [SIZE]
and p_type like '%[TYPE]' and s_nationkey = n_nationkey
and n_regionkey = r_regionkey and r_name = '[REGION]'
and ps_supplycost = ( select from
partsupp, supplier, nation, region
where p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = '[REGION]'
) order by
s_acctbal desc, n_name, s_name, p_partkey;
Challenges
Q3
 Collocated join between orders & lineitem
 Detect correlation between shipdate, orderdat
 Bitmap filters on lineitem are necessary
 Replicating (select c_custkey from customers where
c_mktsegment = „[SEGMENt]‟)
select TOP 10 l_orderkey, sum(l_extendedprice*(1-
l_discount)) as revenue,
o_orderdate, o_shippriority
from customer, orders, lineitem
where c_mktsegment = '[SEGMENT]' and c_custkey =
o_custkey
and l_orderkey = o_orderkey and o_orderdate < date
'[DATE]'
and l_shipdate > date '[DATE]'
group by l_orderkey, o_orderdate, o_shippriority
order by revenue desc, o_orderdate;
Challenges

More Related Content

What's hot

Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016DataStax
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 

What's hot (20)

Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 

Viewers also liked

PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittGraySystemsLab
 
Hug meetup impala 2.5 performance overview
Hug meetup impala 2.5 performance overviewHug meetup impala 2.5 performance overview
Hug meetup impala 2.5 performance overviewMostafa Mokhtar
 
Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016
Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016
Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016Saint Lucia Tourist Board
 
2014 - a Year for Zeal (Part 3)
2014 - a Year for Zeal (Part 3)2014 - a Year for Zeal (Part 3)
2014 - a Year for Zeal (Part 3)Gary V Carter
 
Wholesale Plastic Shopping Bags
Wholesale Plastic Shopping Bags Wholesale Plastic Shopping Bags
Wholesale Plastic Shopping Bags Plastic Bag Source
 
ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)
ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)
ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)Jadsada Surintun
 
Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...
Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...
Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...David Oyeyemi
 
SocialMedia_Hoyt
SocialMedia_Hoyt SocialMedia_Hoyt
SocialMedia_Hoyt jhoyt88
 
Project highway
Project highwayProject highway
Project highwaygraphic02
 
Rachael melton personal brand ppt
Rachael melton personal brand pptRachael melton personal brand ppt
Rachael melton personal brand pptRachael Melton
 
Am revue hebdo-16022013
Am revue hebdo-16022013Am revue hebdo-16022013
Am revue hebdo-16022013Romuald YONGA
 
Technology and agriculture
Technology and agricultureTechnology and agriculture
Technology and agricultureSohail_Ilyas
 
Bible study - the Bible unpacked (in-depth edition)
Bible study -  the Bible unpacked (in-depth edition)Bible study -  the Bible unpacked (in-depth edition)
Bible study - the Bible unpacked (in-depth edition)Roger Kyaw Swar Phone Maung
 
BitBox MVP Presentation
BitBox MVP PresentationBitBox MVP Presentation
BitBox MVP Presentationamwelch
 
【プレビュー版】コンテンツマーケティング実践ガイド
【プレビュー版】コンテンツマーケティング実践ガイド【プレビュー版】コンテンツマーケティング実践ガイド
【プレビュー版】コンテンツマーケティング実践ガイドワンマーケティング株式会社
 
Octaplex guidelines
Octaplex guidelinesOctaplex guidelines
Octaplex guidelinesEmil Pacheco
 

Viewers also liked (20)

PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWitt
 
Hug meetup impala 2.5 performance overview
Hug meetup impala 2.5 performance overviewHug meetup impala 2.5 performance overview
Hug meetup impala 2.5 performance overview
 
Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016
Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016
Features of Saint Lucia In Travel + Leisure Travel Guide Plus More For 2016
 
Slide03
Slide03Slide03
Slide03
 
2014 - a Year for Zeal (Part 3)
2014 - a Year for Zeal (Part 3)2014 - a Year for Zeal (Part 3)
2014 - a Year for Zeal (Part 3)
 
Wholesale Plastic Shopping Bags
Wholesale Plastic Shopping Bags Wholesale Plastic Shopping Bags
Wholesale Plastic Shopping Bags
 
Produk Khusus Pria PT. ABE
Produk Khusus Pria PT. ABEProduk Khusus Pria PT. ABE
Produk Khusus Pria PT. ABE
 
ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)
ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)
ข้อสอบ O net 56- การงานฯ (มัธยมปลาย)
 
Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...
Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...
Integrated Geophysical and Geochemical Investigations of Saline Water Intrusi...
 
SocialMedia_Hoyt
SocialMedia_Hoyt SocialMedia_Hoyt
SocialMedia_Hoyt
 
Project highway
Project highwayProject highway
Project highway
 
Rachael melton personal brand ppt
Rachael melton personal brand pptRachael melton personal brand ppt
Rachael melton personal brand ppt
 
Am revue hebdo-16022013
Am revue hebdo-16022013Am revue hebdo-16022013
Am revue hebdo-16022013
 
Technology and agriculture
Technology and agricultureTechnology and agriculture
Technology and agriculture
 
Bible study - the Bible unpacked (in-depth edition)
Bible study -  the Bible unpacked (in-depth edition)Bible study -  the Bible unpacked (in-depth edition)
Bible study - the Bible unpacked (in-depth edition)
 
BitBox MVP Presentation
BitBox MVP PresentationBitBox MVP Presentation
BitBox MVP Presentation
 
【プレビュー版】コンテンツマーケティング実践ガイド
【プレビュー版】コンテンツマーケティング実践ガイド【プレビュー版】コンテンツマーケティング実践ガイド
【プレビュー版】コンテンツマーケティング実践ガイド
 
Produk Kesehatan Organ Intim Wanita
Produk Kesehatan Organ Intim WanitaProduk Kesehatan Organ Intim Wanita
Produk Kesehatan Organ Intim Wanita
 
Prince
PrincePrince
Prince
 
Octaplex guidelines
Octaplex guidelinesOctaplex guidelines
Octaplex guidelines
 

Similar to TPC-H Column Store and MPP systems

The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfThe_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfDotInsight1
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Datainside-BigData.com
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...ijceronline
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarkshdhappy001
 
Analogic Link08 Presentation
Analogic Link08 PresentationAnalogic Link08 Presentation
Analogic Link08 Presentationrtgalv
 
Realtech assessment services combined slides final
Realtech assessment services combined slides finalRealtech assessment services combined slides final
Realtech assessment services combined slides finalCarly Shank
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataEMC
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereSAP Technology
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...Denodo
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczLDBC council
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
Fpa Cosmic Ffp Convertability Final
Fpa   Cosmic Ffp Convertability FinalFpa   Cosmic Ffp Convertability Final
Fpa Cosmic Ffp Convertability FinalHarold van Heeringen
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slidessmpant
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Nicolas Morales
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Dougsichie
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingeXascale Infolab
 
Parallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingParallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingQueBIT Consulting
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Hyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHitachi Vantara
 

Similar to TPC-H Column Store and MPP systems (20)

The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfThe_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
 
Introducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big DataIntroducing the TPCx-HS Benchmark for Big Data
Introducing the TPCx-HS Benchmark for Big Data
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarks
 
Analogic Link08 Presentation
Analogic Link08 PresentationAnalogic Link08 Presentation
Analogic Link08 Presentation
 
Realtech assessment services combined slides final
Realtech assessment services combined slides finalRealtech assessment services combined slides final
Realtech assessment services combined slides final
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL Anywhere
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
 
Keynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter BonczKeynote IDEAS 2013 - Peter Boncz
Keynote IDEAS 2013 - Peter Boncz
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
Fpa Cosmic Ffp Convertability Final
Fpa   Cosmic Ffp Convertability FinalFpa   Cosmic Ffp Convertability Final
Fpa Cosmic Ffp Convertability Final
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Doug
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data Benchmarking
 
Parallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT ConsultingParallel Processing in TM1 - QueBIT Consulting
Parallel Processing in TM1 - QueBIT Consulting
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Hyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital Transformation
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

TPC-H Column Store and MPP systems

  • 1. TPC-H Performance MPP & Column Store
  • 2. What is TPCH  The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance while maintaining a sufficient degree of ease of implementation. This benchmark illustrates decision support systems that  Examine large volumes of data;  Execute queries with a high degree of complexity;  Give answers to critical business questions.  The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream and the query throughput when queries are submitted by multiple concurrent users.
  • 3. Overview TPC-H Schema overview TPC-H Performance measurements Partner engagement TPC-H where is it today TPC-H challenges Looking ahead Q&A
  • 4. TPC-H Schema overview: Relationships between columns
  • 5. TPC-H Schema overview : MPP data distribution Table Column Node 1 Node 2 Node 3 LINEITEM ORDERKEY 1 2 3 PARTKEY 6 4 8 SUPPKEY 3 18 5 ORDERS ORDERKEY 1 2 3 CUSTKEY 4 2 9 PARTSUPP PARTKEY 1 2 3 SUPPKEY 4 5 6 PART PARTKEY 1 2 3 CUSTOMER CUSTKEY 1 2 3 SUPPLIER SUPPKEY 1..N 1..N 1..N NATION NATIONKEY 1..N 1..N 1..N REGION REGIONKEY 1..N 1..N 1..N Collocated Over network data movement Collocated Over network data movement Table Distribution column LINEITEM L_ORDERKEY ORDERS O_ORDERKEY PARTSUPP PS_PARTKEY PART P_PARTKEY CUSTOMER C_CUSTKEY SUPPLIER REPLICATED NATION REPLICATED REGION REPLICATED
  • 6. TPC-H Schema : Metrics  Power:  Run order  RF1 (Inserts into LINEITEM and ORDERS)  22 read only queries  RF2 (Deletes from LINEITEM & ORDERS)  Metric :  Query per hour rate  TPC-H Power@Size = 3600 * SF / Geomean(22 queries , RF1, RF2)  Geometric mean of all queries results in a run  Performance improvements to any query equally improves the metric  Throughput:  Run orders  N concurrent Power query streams with different parameters  N RF1 & RF2 streams, this can be run in parallel with the concurrent streams above or after  Metric :  Ratio of the total number of queries executed over the length of the measurement interval  TPC-H Throughput@Size = (S*22*3600)/Ts *SF  Absolute runtime matters, optimizing for the longest running query helps Throughput Power Run in Parallel Query Stream 01 Refresh function 1 Inserts into LINEITEM & ORDERS Query Stream 02 Query stream 00 14,2,9,20,6…5,7,12 … Query Stream N Refresh function 2 Deletes from LINEITEM & ORDERS Refresh streams with N pairs of RF1 & 2 Scale Factor Number of streams 100 5 300 6 1000 7 3000 8 10000 9 30000 10 100000 11
  • 7. Outline TPC-H Schema overview TPC-H Performance measurements Partner engagement TPC-H where is it today TPC-H challenges Looking ahead Q&A
  • 8. TPC-H Performance measurements  Invest in tools to analyze plans, some consider plan analysis an art, breaking down the plan to key metrics helps a lot  Capture enough information in the execution plan to unveil performance issues:  Estimate Vs. Actual number of rows etc..  Amount of data spilled per disk  Rows touched Vs. rows qualified during scan  Logical Vs. Physical reads  CPU & Memory consumed per plan operator  Skew in number of rows processed per thread per operator  Instrument the code to provide cycles per row for key scenarios:  Scan  Aggregate  Join Set performance goals Measure Performance Start looking at SMP & MPP plans Check CPU & IO utilization Fix performance issues Repeat
  • 9. TPC-H Performance measurements  Scalability within a single server  Vary the number of processors  Vary scale factor : 100G, 300G  Identify queries that don‟t have linear scaling  Capture:  CPU & IO utilization per query with at least 1 second sampling rate  Capture hot functions and waits if any  Capture CPI ideally per function  Capture execution plans  Get busy crunching the data  Scalability across multiple servers  Vary the number of servers in the systems  Vary amount of data per server  Capture:  CPU , disk & network IO  Distributed plans  Look for queries that have excessive cross node traffic  Identify suboptimal plans where predicates/aggregates are not pushed down More focused performance effort MPP scaling Data scaling SMP scaling
  • 10. Outline TPC-H Schema overview TPC-H Performance measurements Partner engagement TPC-H where is it today TPC-H challenges Looking ahead Q&A
  • 11. Partner engagements  Can be considered as one of the secret sauces for highly performing software  Partners (HW/Infrastructure) tend to have vested interest in showcasing Performance and Scalability of their products.  Allows software companies to leverage HW expertise and provide access to low level tools that are not publically available (Through NDA).  Partners occasionally provide HW for Performance benchmarks, prototype evaluation, release publications  Partners can be a great assist for :  Providing low level analysis  Collaborate in publications, benchmarks, proof of concepts etc..  Provide HW for Performance testing, evaluation, improvement (large scale experiments are expensive)
  • 12. Partner engagements  NVRAM: Random-access memory that retains its information when power is turned off (non-volatile). This is in contrast to dynamic random- access memory (DRAM)  “Promises”:  Latency within the same order of magnitude of DRAM  Cheaper than SSDs  +10TB of NVRAM in a 2-socket system within the next 4 years  Still in prototype phase  Could eliminates need for spinning disks or SSDs altogether  In-memory database are likely to be early adopters of such technology  Good reading:  http://research.microsoft.com/en-us/events/trios/trios13-final5.pdf  http://www.hpl.hp.com/techreports/2013/HPL-2013-78R1.pdf
  • 13. Partner engagements Diablo technologies SSD in DRAM slot http://www.diablo-technologies.com/
  • 14. Partner engagements Diablo technologies SSD in DRAM slot DIMM capacity of 200GB & 400GB, technology is rebranded by IBM and VmWare Ready http://www.diablo-technologies.com/
  • 15. Outline TPC-H Schema overview TPC-H Performance measurements Partner engagement TPC-H where is it today TPC-H challenges Looking ahead Q&A
  • 16. TPC-H where is it today Why do benchmarks?  Stimulate technological advancements  Why TPCH?  Introduce a set of technological challenges whose resolution will significantly improve the performance of the product  As benchmark is it relevant to current DW applications ?  Gartner Magic quadrant references: “Vectorwise delivered leading 1TB non-clustered TPC Benchmark H (TPC-H) results in 2012”  Big players are Oracle, Vectorwise, Microsoft, Exasol and Paraccel  Most significant innovation came from:  Kickfire acquired by Teradata, FPGA-based "Query Processor Module” with an instruction set tuned for database operations  ParAccel acquired by Actian, shared-nothing architecture with a columnar orientation, adaptive compression, memory-centric design  Exasol .. column-oriented way and proprietary InMemory compression methods are used, database also has automatic self optimization (create indexes, stats , distribute tables etc.. )  So where does it come in handy?  Identify system bottlenecks  Push performance focused features into the product  TPC-H schema is heavily used for ETL and virtualization benchmarks  Introduces lots of interesting challenges to the DMBS  What about TPC-DS, it has a more realistic ETL process , snow flake schema, but no one has published a TPC-DS benchmark yet
  • 17. TPC-H where is it today  Number of publications is on the decline 99 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Number of publications 9 1 5 12 31 15 42 31 20 13 15 10 20 5 6 0 5 10 15 20 25 30 35 40 45 Numberofpublications Number of TPCH publications per year • First cloud based benchmark? When will we see this?
  • 18. Outline TPC-H Schema overview TPC-H Performance measurements Partner engagement TPC-H where is it today TPC-H challenges Looking ahead Q&A
  • 19. TPC-H challenges : Aggregation  Almost all TPCH queries do aggregation  Unless there is a sorted index (B-tree) on group by column aggregating in Hash table makes most sense opposed to ordered aggregation  Correctly sizing the hash table dictates performance  If cardinality under estimates number of distinct values lots of chaining occurs and HT can eventually spill to disk.  If CE overestimates resources are not used optimally  For low distinct count doing hash table per thread (local) then doing a global aggregation improves performance  For small group by on strings, present group by expressions as integers (index in array) opposed to using a hash table (Reduce cache footprint)  For group by on Primary key (C_CUSTKEY) no need to include other columns from CUSTOMER in the Hash table  Main benefits from PK/FK is aggregate optimizations  Queries sensitive to aggregation performance:  1, 3, 4, 10, 13, 18, 20, 21
  • 20. TPC-H challenges : Aggregation Q1 Reduces 6 billion rows to 4 Sensitive to string matching Benefits from doing local aggregation Q10 Group by on most Customer columns If PK on C_CUSTKEY exists could use C_CUSTKEY for aggregation Further optimization push down of aggregate on O_CUSTKEY and TOP 18 Group by on L_ORDERKEY results in 1.5 billion rows (4x reduction) Local aggregation usually hurts performance Hash table for aggregation alone can take 25GB of RAM
  • 21. TPC-H challenges : Joins  Select a schema which leverages locality Examples : ORDERS x LINEITEM on L_ORDERKEY=O_ORDERKEY by hash partitioning on ORDERKEY  Q5,Q9,Q18 can spill and have bad performance if the correct plan is not picked  Q9 will cause over the network communication for MPP systems, unless PARTSUPP, PART and SUPPLIER are replicated which is not feasible for large scale factors  TPCH joins are highly selective, hence efficient bloom filters are necessary  Simplistic guide : Find the most selective filter/aggregation and this is where you start
  • 22. TPC-H challenges : Expression evaluation Arithmetic operation performance Store decimals as integers and save some bits 19123 Vs. 191.23 Rebase of some of the columns to use less bits Keep data in the most compact form to best exploit SIMD instructions Detecting common sub expressions sum(l_extendedprice ) as sum_base_price, sum(l_extendedprice *(1-l_discount)) as sum_disc_price, sum(l_extendedprice *(1- l_discount)*(1+l_tax) ) as sum_charge, Expression filter push down (Q7, Q19) Q7 Take the superset or UNION of filters and push down to the scan Q19 Take the union of individual predicates Column projection vs expression evaluation Cardinality estimates should help decide to Project columns A & B or or (A * (1 - B) ) before a filter on C
  • 23. TPC-H challenges : Correlated subqueries  Push down of predicates into subquery when applicable  When sub queries are flattened batch processing outperforms row by row  Buffer overlapped intermediate results  Partial query reuse  Challenging for MPP systems (don‟t redistribute or shuffle the same data twice)
  • 24. TPC-H challenges : Parallelism and concurrency  Current 2P servers have +48 cores, +½ TB of RAM & +10GB/sec of disk IO BW, this means that within a single box the engine needs to provide meaningful scaling  Further sub-partitioning data on a single server alleviates single server scaling problems  TPC-H queries tend to use lots of workspace memory for Joins and aggregations.  Precise and dynamic memory allocation keeps queries from spilling to under high concurrency
  • 25. TPC-H challenges : Scan performance  Disk read performance is crucial, should validate that when system is not CPU bound IO subsystem is efficiently used.  Ability to filter out pages or segments from the scan is crucial  In memory scan performance can be increased if we decrease the search scope and thereby the amount of data that needs to be streamed from main memory to the CPU
  • 26. TPC-H challenges : Scan performance Store dictionaries in sorted order or in a BST to make • Compress the filter or predicate to do numeric comparison opposed to decompress and match on strings • Quickly validates if the value exists in the segment
  • 27. TPC-H challenges : scan performance  What do we do for highly selective filters?  Implement paged indexes for columns of interest  Partition a column into pages, store bitmap indices for each compressed value, bits reflect which rows have the respective value, instead of scanning the entire segment for the matching row , we only read the block which has the matching values aka bits set. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/IMDM/paper2.pdf
  • 28.  In MPP a single SQL statement results in multiple SQL statements that get executed locally on each node  Some TPCDS queries can result in +20 SQL statements that need be executed on each leaf node locally  Steaming of data should result in better performance but there are cases when this strategy fails.  Placing data on disk after each steps allows the Query optimizer to reevaluate the plan TPC-H challenges : Intermediate steps in MPP
  • 29.  Query : Select count(*) from PART, PARTSUPP , LINEITEM where P_BRAND=“NIKE” and PS_COMMENT like “%bla%” and P_PARTKEY=PS_PARTKEY and L_PARTKEY = PS_PARTKEY group by P_BRAND  Schema :  PART distributed on P_PARTKEY  PARTSUPP distributed on PS_PARTKEY  LINEITEM distributed on L_ORDERKEY  Create bloom filters BF1 on PART, push filter on PARTSUPP and create BF2 , replicate bloom filter on all leaf nodes apply filter on LINEITEM and only shuffle qualifying rows on  Optimizer should chose between semi join reduction and replicating PART x PARTSUPP  Multiple copies of a set of columns distributed differently can improve performance of such issue but at high cost. TPC-H challenges : Improving join performance for incompatible joins
  • 30. Outline TPC-H Schema overview TPC-H Performance measurements Partner engagement TPC-H where is it today TPC-H challenges Looking ahead Q&A
  • 31.  SQL to map reduce jobs? Crunching data in relational database is always faster than HADOOP, bring data from HADOOP into columnar format , perform analytics with efficient generated code  Full integration with analytics tools as SAS , R , Tableau , Excel etc…  Support PL/SQL syntax (Oracle Compete)  Eliminate the aggregating node to reduce system cost for a small number of nodes, Exasol does it. Looking ahead
  • 32. Competitive analysis Exasol 1TB 240 threads, 20 processors Exasol 1TB 768 threads, 64 processors Exasol 3TB 960 threads, 80 processors MemSql 83GB 480 threads, 40 sockets Ms SqlServer 10TB, 160 threads, 8 processors Oracle 11c 10TB, 512 threads, 4 processors Sec/GB/Thread 1.4 1.5 1.5 46.7 8.1 40.7 - 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 Sec/GB/Thread TPCH Q1 analysis Sec/GB/Thread (Lower is better) Assuming all processors have the same speed!!!! Referances: • http://www.tpc.org/tpch/results/tpch_perf_results.asp • http://www.esg-global.com/lab-reports/memsqle28099s-distributed-in- memory-database/
  • 33. Appendinx  GMQ 2013 http://www.gartner.com/technology/reprints.do?id=1- 1DU2VD4&ct=130131&st=sb GMQ 2014 http://www.gartner.com/technology/reprints.do?id=1- 1M9YEHW&ct=131028&st=sb
  • 34. TPC-H column store  Avoid virtual function calls, branching use templates  Scan usually dominates CPU profile  Vector/Batch processing is a must  If done correctly code is very sensitive to branching, data dependency, exploit instruction parallelism when possible  Use SIMD instructions , leverage already existing libraries to encapsulate SSE instructions complexity  // define and initialize integer vectors a and b  Vec4i a(10,11,12,13);  Vec4i b(20,21,22,23);  // add the two vectors  Vec4i c = a + b;  http://www.agner.org/optimize/vectorclass.pdf
  • 35. TPC-H Plans  Behold the power of the optimizer  If plan is wrong you are doomed…  Very good read for TPCH Q8 http://www.slideshare.net/GraySystemsLab/pass-summit- 2010-keynote-david-dewitt
  • 36. JSON documents  Most efficient way to store Json documents  Great compression and quick retrieval, ask me how to ….
  • 37. Q1  Used as benchmark for computational power  Arithmetic operation performance  Aggregating to same hash buckets  Common sub expressions pattern matching  Scan performance sensitive  String matching for aggregation (Could do matching on compressed format) select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; Challenges
  • 38. Q2  Correlated sub query  Push down of predicates to the correlated subquery  Highly selective (Segment size plays a big role)  Tricky to generate optimal plan  Depending on which tables are partitioned and which are replicated, plan performance varies a lot. select s_acctbal,s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = [SIZE] and p_type like '%[TYPE]' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' and ps_supplycost = ( select from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' ) order by s_acctbal desc, n_name, s_name, p_partkey; Challenges
  • 39. Q3  Collocated join between orders & lineitem  Detect correlation between shipdate, orderdat  Bitmap filters on lineitem are necessary  Replicating (select c_custkey from customers where c_mktsegment = „[SEGMENt]‟) select TOP 10 l_orderkey, sum(l_extendedprice*(1- l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = '[SEGMENT]' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '[DATE]' and l_shipdate > date '[DATE]' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate; Challenges

Editor's Notes

  1. The subcommittee included representatives from Compaq, Data General, Dell, EMC, HP, IBM, Informix, Microsoft, NCR, Oracle, Sequent, SGI, Sun, Sybase, and Unisys
  2. Define main tables, scale factor. Benefits of collocated joinsHow tables can be partitionedMetric Geometric mean to avoid optimizing individual queries All tables grow linearly with scale factor except NATION and REGION Unless you have Btree indexes there shouldn’t be any loop joins ??
  3. If all we have is distributed and replicated tables
  4. Doesn’t matter how fast the physical operators are if the generated plan is wrong.Plan changes can make or break performanceIf CPU utilization is 100% and performance is still not acceptable look into CPI Cycles/instruction,
  5. Invest in buildingtools that do post processing on plans to identify inefficient plans:Avg number of rows per operatorJoins that don’t reduce the number of rowsAggregates that don’t reduce the number of rowsOver or underestimating SpillingIt pays off to build profiling into the code to get Cycles per row for Scans, Aggregates, filtering etc….
  6. http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html 3:30Response http://www.youtube.com/watch?v=48_oSIkEJlo#t=77Poking Oraclehttp://www.youtube.com/watch?v=48_oSIkEJlo#t=279NV Ram? Is that on the horizon?Company X names this a tectonic change and will be commodity HW by 2016, with capacity up to 10TB per 2-socket serverNVRAM in a box exposed as a SAN equivelanthttp://www.diablo-technologies.com/
  7. http://www.hpl.hp.com/techreports/2013/HPL-2013-78R1.pdfhttp://www.diablo-technologies.com/
  8. http://www.hpl.hp.com/techreports/2013/HPL-2013-78R1.pdfhttp://www.diablo-technologies.com/
  9. http://www.diablo-technologies.com/http://finance.yahoo.com/news/diablo-technologies-achieves-vmware-ready-190200995.html
  10. Q18 based on 1TB SF
  11. Select sum(l_extendedprice * (1 - l_discount) ) as revenue From lineitem, part Where ( p_partkey = l_partkeyand p_brand = ‘[BRAND1]’ and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) and l_quantity &gt;= [QUANTITY1] and l_quantity &lt;= [QUANTITY1] + 10 and p_size between 1 and 5 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkeyand p_brand = ‘[BRAND2]’ and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) and l_quantity &gt;= [QUANTITY2] and l_quantity &lt;= [QUANTITY2] + 10 and p_size between 1 and 10 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkeyand p_brand = ‘[BRAND3]’ and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) and l_quantity &gt;= [QUANTITY3] and l_quantity &lt;= [QUANTITY3] + 10 and p_size between 1 and 15 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ );Conjunctions and Disjunctions
  12. Select sum(l_extendedprice * (1 - l_discount) ) as revenue From lineitem, part Where ( p_partkey = l_partkeyand p_brand = ‘[BRAND1]’ and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) and l_quantity &gt;= [QUANTITY1] and l_quantity &lt;= [QUANTITY1] + 10 and p_size between 1 and 5 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkeyand p_brand = ‘[BRAND2]’ and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) and l_quantity &gt;= [QUANTITY2] and l_quantity &lt;= [QUANTITY2] + 10 and p_size between 1 and 10 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkeyand p_brand = ‘[BRAND3]’ and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) and l_quantity &gt;= [QUANTITY3] and l_quantity &lt;= [QUANTITY3] + 10 and p_size between 1 and 15 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ );Conjunctions and Disjunctions
  13. Q2,11,15,17 and Q20Select sum(l_extendedprice) / 7.0 as avg_yearlyFrom lineitem, part Where p_partkey = l_partkeyand p_brand = &apos;[BRAND]&apos; and p_container = &apos;[CONTAINER]&apos; and l_quantity &lt; ( select 0.2 * avg(l_quantity) from lineitemwhere l_partkey = p_partkey);