SlideShare a Scribd company logo
1 of 26
Daniel Abadi -- Yale University
Column-Stores vs. Row-Stores:
How Different Are They
Really?
Daniel Abadi (Yale),
Samuel Madden (MIT),
Nabil Hachem (AvantGarde Consulting)
June 12th
, 2008
Row vs. Column-Stores
Last
Name
First
Name
E-
m
ai
l
Phone
#
Street
Address
Last
Name
First
Name E-mail Phone #
Street
Address
Row-Store Column-Store
− Might read in
unnecessary data
+ Only need to read
in relevant data
+ Easy to add a new record
− Tuple writes might require
multiple seeks
Column-Stores
• Really good for read-mostly data
warehouses
 Lot’s of column scans and aggregations
 Writes tend to be in batch
 [CK85], [SAB+05], [ZBN+05], [HLA+06],
[SBC+07] all verify this
 Top 3 in TPC-H rankings (Exasol, ParAccel,
and Kickfire) are column-stores
 Factor of 5 faster on performance
 Factor of 2 superior on price/performance
Data Warehouse DBMS
Software
• $4.5 billion industry (out of total $16 billion
DBMS software industry)
• Growing 10% annually
Momentum
• Right solution for growing market  $$$$
• Vertica, ParAccel, Kickfire, Calpont,
Infobright, and Exasol new entrants
• Sybase IQ’s profits rapidly increasing
• Yahoo’s world largest (multi-petabyte)
data warehouse is a column-store (from
Mahat Technologies acquisition)
Paper Looks At Key Question
• How much of the buzz around column-
stores just marketing hype?
 Do you really need to buy Sybase
IQ/Vertica/ParAccel?
 How far will your current row-store take you?
 Can you get column-store performance from a row-
store?
 Can you simulate a column-store in a row-store
Paper Methodology
• Comparing row-store vs. column-store is
dangerous/borderline meaningless
• Instead, compare row-store vs. row-store
and column-store vs. column-store
 Simulate a column-store inside of a row-store
 Remove column-oriented features from
column-store until it behaves like a row-store
Simulate Column-Store
Inside Row-Store
Last
Name
First
Name
E-
m
ai
l
Phone
#
Street
Address
Last
Name
First
Name E-mail
1
2
3
1
2
3
1
2
3
Option A:
Vertical
Partitioning
…
Option B:
Index Every
Column
Last Name
Index
First Name Index
Star Schema Benchmark
• Fact table contains 17 columns and
60,000,000 rows
• 4 dimension tables, biggest one has
80,000 rows
• Queries touch 3-4 foreign keys in fact
table, 1-2 numeric columns
SSBM Averages
0.0
50.0
100.0
150.0
200.0
250.0
Time(seconds)
Average 25.7 79.9 221.2
Normal Row-Store
Vertically Partitioned
Row-Store
Row-Store With All
Indexes
What’s Going On?
• Vertically Partitioned Case
 Tuple Sizes
 Horizontal Partitioning
• All Indexes Case
 Tuple Reconstruction
Star Schema Benchmark
• Fact table contains 17 columns and
60,000,000 rows
• 4 dimension tables, biggest one has
80,000 rows
• Queries touch 3-4 foreign keys in fact
table, 1-2 numeric columns
Tuple Size
1
2
3
Column
Data
TID
1
2
3
TID Column
Data
1
2
3
TID Column
Data
Tuple
Header
•Queries touch 3-4 foreign keys in fact table, 1-2 numeric
columns
•Complete fact table takes up ~4 GB (compressed)
•Vertically partitioned tables take up 0.7-1.1 GB
(compressed)
Horizontal Partitioning
• Fact table horizontally partitioned on year
 Year is an element of the ‘Date’ dimension
table
 Most queries in SSBM have a predicate on
year
 Since vertically partitioned tables do not
contain the ‘Date’ foreign key, row-store could
not similarly partition them
What’s Going On?
• Vertically Partitioned Case
 Tuple Sizes
 Horizontal Partitioning
• All Indexes Case
 Tuple Construction
Tuple Construction
• Pretty much all queries require a column
to be extracted (in the SELECT clause)
that has not yet been accessed, e.g.:
 SELECT store_name, SUM(revenue)
FROM Facts, Stores
WHERE fact.store_id = stores.store_id
AND stores.area = “NEW ENGLAND”
GROUP BY store_name
Tuple Construction
• Result of lower part of query plan is a set
of TIDs that passed all predicates
• Need to extract SELECT attributes at
these TIDs
 BUT: index maps value to TID
 You really want to map TID to value (i.e., a
vertical partition)
  Tuple construction is SLOW
So….
• All indexes approach is pretty obviously a
poor way to simulate a column-store
• Problems with vertical partitioning are
NOT fundamental
 Store tuple header in a separate partition
 Allow virtual TIDs
 Allow HP using a foreign key on a different VP
• So can row-stores simulate column-
stores?
Row-Store vs. Column-Store
0.0
5.0
10.0
15.0
20.0
25.0
30.0
Time(seconds)
Average 25.7 11.7 4.4
Row-Store Row-Store (M V) C-Store
Column-Store Experiments
• Start with column-store (C-Store)
• Remove column-store-specific
performance optimizations
• End with column-store that behaves like a
row-store
Compression
• Higher data value locality
in column-stores
 Better ratio  reduced I/O
• Can use schemes like
run-length encoding
 Easy to operate on directly
for improved performance
([AMF06])
Q1
Q1
Q1
Q1
Q1
Q1
Q1
Q2
Q2
Q2
Q2
…
…
Quarter
(Q1, 1, 300)
Quarter
(Q2, 301, 350)
(Q3, 651, 500)
(Q4, 1151, 600)
• Early Materialization:
create rows first. But:
 Poor memory bandwidth
utilization
 Lose opportunity for
vectorized operation
2
1
3
1
2
3
3
3
7
13
42
80
Construct
2
3
3
3
7
13
42
80
Select + Aggregate
2
1
3
1
4
4
4
4
prodID storeIDcustID price
QUERY:
SELECT custID,SUM(price)
FROM table
WHERE (prodID = 4) AND
(storeID = 1) AND
GROUP BY custID
Early vs. Late Materialization
4
4
4
4
Other Column-Store
Optimizations
• Invisible join
 Column-store specific join
 Optimizations for star schemas
 Similar to a semi-join
• Block Processing
Simplified Version of Results
0.0
10.0
20.0
30.0
40.0
50.0
Time(seconds)
Average 4.4 14.9 40.7
Original C-Store
C-Store,No
Compression
C-Store,Early
Materialization
Conclusion
• Might be possible to simulate a row-store
in a column-store, BUT:
 Need better support for vertical partitioning at
the storage layer
 Need support for column-specific
optimizations at the executer level
• Working with HP-Labs to find out
Come Join the Yale DB
Group!

More Related Content

What's hot

introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopiwan_rg
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxDeep Learning Italia
 
Mongo Nosql CRUD Operations
Mongo Nosql CRUD OperationsMongo Nosql CRUD Operations
Mongo Nosql CRUD Operationsanujaggarwal49
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceNVIDIA
 
Distributed Systems In One Lesson
Distributed Systems In One LessonDistributed Systems In One Lesson
Distributed Systems In One LessonTim Berglund
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesSenthil Kumar M
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 

What's hot (20)

introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Lzw compression
Lzw compressionLzw compression
Lzw compression
 
Mongo Nosql CRUD Operations
Mongo Nosql CRUD OperationsMongo Nosql CRUD Operations
Mongo Nosql CRUD Operations
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Deep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and InferenceDeep Learning Workflows: Training and Inference
Deep Learning Workflows: Training and Inference
 
Distributed Systems In One Lesson
Distributed Systems In One LessonDistributed Systems In One Lesson
Distributed Systems In One Lesson
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 

Viewers also liked

Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresDaniel Abadi
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesLuis Cipriani
 
MonetDB :column-store approach in database
MonetDB :column-store approach in databaseMonetDB :column-store approach in database
MonetDB :column-store approach in databaseNikhil Patteri
 
Efficient transaction processing in sap hana
Efficient transaction processing in sap hanaEfficient transaction processing in sap hana
Efficient transaction processing in sap hanaMysa Vijay
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryJustin Swanhart
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-presDaniel Abadi
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi
 

Viewers also liked (20)

Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
VLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-StoresVLDB 2009 Tutorial on Column-Stores
VLDB 2009 Tutorial on Column-Stores
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
 
Session initiation protocol
Session initiation protocolSession initiation protocol
Session initiation protocol
 
MonetDB :column-store approach in database
MonetDB :column-store approach in databaseMonetDB :column-store approach in database
MonetDB :column-store approach in database
 
Efficient transaction processing in sap hana
Efficient transaction processing in sap hanaEfficient transaction processing in sap hana
Efficient transaction processing in sap hana
 
Conquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard queryConquering "big data": An introduction to shard query
Conquering "big data": An introduction to shard query
 
Invisible loading
Invisible loadingInvisible loading
Invisible loading
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 

Similar to Column-Stores vs. Row-Stores: How Different are they Really?

2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2Kelvin Chan
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Tuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseTuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseAnil Gupta
 
Sloupcové uložení dat a použití in-memory technologií u řešení Exadata
Sloupcové uložení dat a použití in-memory technologií u řešení ExadataSloupcové uložení dat a použití in-memory technologií u řešení Exadata
Sloupcové uložení dat a použití in-memory technologií u řešení ExadataMarketingArrowECS_CZ
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoresqlserver.co.il
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Julien SIMON
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceAmazon Web Services
 
FOUNDATION OF DATA SCIENCE SQL QUESTIONS
FOUNDATION OF DATA SCIENCE SQL QUESTIONSFOUNDATION OF DATA SCIENCE SQL QUESTIONS
FOUNDATION OF DATA SCIENCE SQL QUESTIONSHITIKAJAIN4
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
IT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxIT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxReneeClintGortifacio
 

Similar to Column-Stores vs. Row-Stores: How Different are they Really? (20)

2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2017 biological databasespart2
2017 biological databasespart22017 biological databasespart2
2017 biological databasespart2
 
2016 02 23_biological_databases_part2
2016 02 23_biological_databases_part22016 02 23_biological_databases_part2
2016 02 23_biological_databases_part2
 
Sql rally 2013 columnstore indexes
Sql rally 2013   columnstore indexesSql rally 2013   columnstore indexes
Sql rally 2013 columnstore indexes
 
BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2BI Knowledge Sharing Session 2
BI Knowledge Sharing Session 2
 
MySql
MySqlMySql
MySql
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Tuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBaseTuning Apache Phoenix/HBase
Tuning Apache Phoenix/HBase
 
Sloupcové uložení dat a použití in-memory technologií u řešení Exadata
Sloupcové uložení dat a použití in-memory technologií u řešení ExadataSloupcové uložení dat a použití in-memory technologií u řešení Exadata
Sloupcové uložení dat a použití in-memory technologií u řešení Exadata
 
Excel Tips 101
Excel Tips 101Excel Tips 101
Excel Tips 101
 
Excel tips
Excel tipsExcel tips
Excel tips
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
 
Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)Deep Dive: Amazon Redshift (March 2017)
Deep Dive: Amazon Redshift (March 2017)
 
Deep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performanceDeep Dive Redshift, with a focus on performance
Deep Dive Redshift, with a focus on performance
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
FOUNDATION OF DATA SCIENCE SQL QUESTIONS
FOUNDATION OF DATA SCIENCE SQL QUESTIONSFOUNDATION OF DATA SCIENCE SQL QUESTIONS
FOUNDATION OF DATA SCIENCE SQL QUESTIONS
 
Excel Tips.pptx
Excel Tips.pptxExcel Tips.pptx
Excel Tips.pptx
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
IT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptxIT301-Datawarehousing (1) and its sub topics.pptx
IT301-Datawarehousing (1) and its sub topics.pptx
 

More from Daniel Abadi

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Daniel Abadi
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database SystemsDaniel Abadi
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...Daniel Abadi
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Daniel Abadi
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Daniel Abadi
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesDaniel Abadi
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and DeterminismDaniel Abadi
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi
 

More from Daniel Abadi (9)

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 
CAP, PACELC, and Determinism
CAP, PACELC, and DeterminismCAP, PACELC, and Determinism
CAP, PACELC, and Determinism
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Column-Stores vs. Row-Stores: How Different are they Really?

  • 1. Daniel Abadi -- Yale University Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi (Yale), Samuel Madden (MIT), Nabil Hachem (AvantGarde Consulting) June 12th , 2008
  • 2. Row vs. Column-Stores Last Name First Name E- m ai l Phone # Street Address Last Name First Name E-mail Phone # Street Address Row-Store Column-Store − Might read in unnecessary data + Only need to read in relevant data + Easy to add a new record − Tuple writes might require multiple seeks
  • 3. Column-Stores • Really good for read-mostly data warehouses  Lot’s of column scans and aggregations  Writes tend to be in batch  [CK85], [SAB+05], [ZBN+05], [HLA+06], [SBC+07] all verify this  Top 3 in TPC-H rankings (Exasol, ParAccel, and Kickfire) are column-stores  Factor of 5 faster on performance  Factor of 2 superior on price/performance
  • 4. Data Warehouse DBMS Software • $4.5 billion industry (out of total $16 billion DBMS software industry) • Growing 10% annually
  • 5. Momentum • Right solution for growing market  $$$$ • Vertica, ParAccel, Kickfire, Calpont, Infobright, and Exasol new entrants • Sybase IQ’s profits rapidly increasing • Yahoo’s world largest (multi-petabyte) data warehouse is a column-store (from Mahat Technologies acquisition)
  • 6. Paper Looks At Key Question • How much of the buzz around column- stores just marketing hype?  Do you really need to buy Sybase IQ/Vertica/ParAccel?  How far will your current row-store take you?  Can you get column-store performance from a row- store?  Can you simulate a column-store in a row-store
  • 7. Paper Methodology • Comparing row-store vs. column-store is dangerous/borderline meaningless • Instead, compare row-store vs. row-store and column-store vs. column-store  Simulate a column-store inside of a row-store  Remove column-oriented features from column-store until it behaves like a row-store
  • 8. Simulate Column-Store Inside Row-Store Last Name First Name E- m ai l Phone # Street Address Last Name First Name E-mail 1 2 3 1 2 3 1 2 3 Option A: Vertical Partitioning … Option B: Index Every Column Last Name Index First Name Index
  • 9. Star Schema Benchmark • Fact table contains 17 columns and 60,000,000 rows • 4 dimension tables, biggest one has 80,000 rows • Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns
  • 10. SSBM Averages 0.0 50.0 100.0 150.0 200.0 250.0 Time(seconds) Average 25.7 79.9 221.2 Normal Row-Store Vertically Partitioned Row-Store Row-Store With All Indexes
  • 11. What’s Going On? • Vertically Partitioned Case  Tuple Sizes  Horizontal Partitioning • All Indexes Case  Tuple Reconstruction
  • 12. Star Schema Benchmark • Fact table contains 17 columns and 60,000,000 rows • 4 dimension tables, biggest one has 80,000 rows • Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns
  • 13. Tuple Size 1 2 3 Column Data TID 1 2 3 TID Column Data 1 2 3 TID Column Data Tuple Header •Queries touch 3-4 foreign keys in fact table, 1-2 numeric columns •Complete fact table takes up ~4 GB (compressed) •Vertically partitioned tables take up 0.7-1.1 GB (compressed)
  • 14. Horizontal Partitioning • Fact table horizontally partitioned on year  Year is an element of the ‘Date’ dimension table  Most queries in SSBM have a predicate on year  Since vertically partitioned tables do not contain the ‘Date’ foreign key, row-store could not similarly partition them
  • 15. What’s Going On? • Vertically Partitioned Case  Tuple Sizes  Horizontal Partitioning • All Indexes Case  Tuple Construction
  • 16. Tuple Construction • Pretty much all queries require a column to be extracted (in the SELECT clause) that has not yet been accessed, e.g.:  SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact.store_id = stores.store_id AND stores.area = “NEW ENGLAND” GROUP BY store_name
  • 17. Tuple Construction • Result of lower part of query plan is a set of TIDs that passed all predicates • Need to extract SELECT attributes at these TIDs  BUT: index maps value to TID  You really want to map TID to value (i.e., a vertical partition)   Tuple construction is SLOW
  • 18. So…. • All indexes approach is pretty obviously a poor way to simulate a column-store • Problems with vertical partitioning are NOT fundamental  Store tuple header in a separate partition  Allow virtual TIDs  Allow HP using a foreign key on a different VP • So can row-stores simulate column- stores?
  • 20. Column-Store Experiments • Start with column-store (C-Store) • Remove column-store-specific performance optimizations • End with column-store that behaves like a row-store
  • 21. Compression • Higher data value locality in column-stores  Better ratio  reduced I/O • Can use schemes like run-length encoding  Easy to operate on directly for improved performance ([AMF06]) Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q2 Q2 Q2 Q2 … … Quarter (Q1, 1, 300) Quarter (Q2, 301, 350) (Q3, 651, 500) (Q4, 1151, 600)
  • 22. • Early Materialization: create rows first. But:  Poor memory bandwidth utilization  Lose opportunity for vectorized operation 2 1 3 1 2 3 3 3 7 13 42 80 Construct 2 3 3 3 7 13 42 80 Select + Aggregate 2 1 3 1 4 4 4 4 prodID storeIDcustID price QUERY: SELECT custID,SUM(price) FROM table WHERE (prodID = 4) AND (storeID = 1) AND GROUP BY custID Early vs. Late Materialization 4 4 4 4
  • 23. Other Column-Store Optimizations • Invisible join  Column-store specific join  Optimizations for star schemas  Similar to a semi-join • Block Processing
  • 24. Simplified Version of Results 0.0 10.0 20.0 30.0 40.0 50.0 Time(seconds) Average 4.4 14.9 40.7 Original C-Store C-Store,No Compression C-Store,Early Materialization
  • 25. Conclusion • Might be possible to simulate a row-store in a column-store, BUT:  Need better support for vertical partitioning at the storage layer  Need support for column-specific optimizations at the executer level • Working with HP-Labs to find out
  • 26. Come Join the Yale DB Group!