SlideShare a Scribd company logo
1 of 28
Download to read offline
Slide 1
Unlock Potential
William McKnight
President
McKnight Consulting Group
www.mcknightcg.com
@williammcknight
Analytic Databases Should be Columnar
@williammcknight
Copyright © 2021 McKnight Consulting Group Slide 2
William McKnight
President, McKnight Consulting Group
Consulted to Pfizer, Scotiabank, Fidelity, TD
Ameritrade, Teva Pharmaceuticals, Verizon, and
many other Global 1000 companies
Frequent keynote speaker and trainer internationally
Hundreds of articles, blogs and white papers in
publication
Focused on delivering business value and solving
business problems utilizing proven, streamlined
approaches to information management
Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young
Entrepreneur of Year Finalist
Owner/consultant: Data strategy and implementation
consulting firm
William McKnight
The Savvy Manager’s Guide
The
Savvy
Manager’s
Guide
Information
Management
Information Management
Strategies for Gaining a
Competitive Advantage with Data
2
Copyright © 2021 McKnight Consulting Group Slide 3
Origins 2005
Copyright © 2021 McKnight Consulting Group Slide 4
RDBMS Design over the years
RDBMS design is virtually unchanged, except for parallelism
Hardware, however:
§ Storage capacity has increased tremendously (and got far
cheaper)
§ CPU performance has improved
§ Transfer rates and seek times have increased modestly
Copyright © 2021 McKnight Consulting Group Slide 5
Row-Wise DBMS Stores Data in Rows
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
Copyright © 2021 McKnight Consulting Group Slide 6
L2 Cache Misses
CPU
L1
L2
Memory
Storage
Copyright © 2021 McKnight Consulting Group Slide 7
Data Block Layout
© McKnight Consulting Group, 2010
Page Header
Page
Footer
Row IDs
1120Aris Doug Johnson Practice
Director 206-676-5636
doug.johnson@aris.com
1121Stolt Offshore MS Ltd Craig Lennox Mr
+66 1226 71269
craig.lennox@stoltoffshore.com
1122Medtronic, Inc. Mark Kohls Principle
Database Administrator
763.516.2557
mark.kohls@medtronic.com
Records
Copyright © 2021 McKnight Consulting Group Slide 8
Columnar Data Block Layout
Block Header
Page
Footer
1120
1121
1122
1123
1124
1125
…
Records
Copyright © 2021 McKnight Consulting Group Slide 9
Traditional databases
Date Store # State Class Sales Category
3/1/21 32 NY A 6 Gen
3/1/21 35 CT A 9 Spec
3/1/21 36 CT C 11 Gen
3/1/21 39 SD D 8 Gen
3/1/21 42 KY A 5 Spec
3/1/21 43 VT C 14 Spec
3/1/21 47 GA A 31 Gen
3/1/21 51 MD A 4 Sub
3/1/21 55 DC D 16 Gen
3/1/21 59 NY B 7 Gen
3/1/21 62 NJ C 9 Spec
Calculate the average
sales for the “A”
stores in “NY”
Traditional approach:
• Data stored by row using
data blocks (4K … 32K)
• For queries, select a ‘filter’
-Build B-tree index for filters,
-BUT If filter is not selective
enough then scan the table
-Go to selected blocks and add
up sales numbers
-Randomly distributed data
will result in most blocks being
read
-Still have to read irrelevant
data in each block
Copyright © 2021 McKnight Consulting Group Slide 10
Mixing Columns in Containers
Copyright © 2021 McKnight Consulting Group Slide 11
Vertical Partitioning of Data
Columnar -
Columns are
stored
independently
Date Store # State Class Sales Category
3/1/13 32 NY A 6 Gen
3/1/13 35 CT A 9 Spec
3/1/13 36 CT C 11 Gen
3/1/13 39 SD D 8 Gen
3/1/13 42 KY A 5 Spec
3/1/13 43 VT C 14 Spec
3/1/13 47 GA A 31 Gen
3/1/13 51 MD A 4 Sub
3/1/13 55 DC D 16 Gen
3/1/13 59 NY B 7 Gen
3/1/13 62 NJ C 9 Spec
Benefits:
• Consistent data types are easy to compress
• Resulting storage size is typically less than 50% the
size of the raw data
Copyright © 2021 McKnight Consulting Group Slide 12
Columnar Compression
• Positional Representation
• Run-Length Encoding
• Dictionary Encoding
• Delta from Median
• NULL and Trim leading or trailing zeros or blanks
• UTF8 Compression
Copyright © 2021 McKnight Consulting Group Slide 13
Run-Length
Qtr Store# Sales Qtr
Q1 32 6 Q1 1 500
Q1 35 9 Q2 501 999
Q1 36 11 Q3 1000 1498
Q1 39 8
Q1 42 5
Q1 43 14
Q2 32 31
Q2 35 4
Q2 36 16
Q2 39 7
Q2 42 9
(Value, StartPosition, Count)
Copyright © 2021 McKnight Consulting Group Slide 14
Dictionary Encoding Example
Original data value Orig.
Size*
Compressed Value New size
(bytes)
England 30 0 1
England 30 0 1
United States of America 30 1 1
United States of America 30 1 1
Japan 30 2 1
Argentina 30 3 1
Sri Lanka 30 4 1
Japan 30 2 1
United States of America 30 1 1
Totals 270 9
* Fixed length, 30 bytes per value
Copyright © 2021 McKnight Consulting Group Slide 15
Compression Applied
Copyright © 2021 McKnight Consulting Group Slide 16
Materialization Strategies
Function of ‘projection’
§ Row-stores = removes unneeded columns from result set
§ Column-stores = when to GLUE
Early Materialization
§ Construct rows before processing
§ Decompress all compressed columns first
Late Materialization
§ Wait until end of operation
Copyright © 2021 McKnight Consulting Group Slide 17
Late Materialization
(4,1,4)
prodID
2
1
3
1
storeID
SELECT custID, price
FROM Sales
WHERE (prodID = 4) AND (storeID = 1)
Select
prodId = 4
Select
storeID = 1
1
1
1
1
0
1
0
1
AND
3
3
13
80
3 13
3 80
Construct
Copyright © 2021 McKnight Consulting Group Slide 18
Row-based
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
Workload Splitting
Same data in both structures
Optimizer or user determines which to use
Columnar
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
CustomerID
CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
Copyright © 2021 McKnight Consulting Group Slide 19
Benchmark
SQLite for row-oriented
DuckDB for columnar
Copyright © 2021 McKnight Consulting Group Slide 20
Test 1 : Insert 100000 high cardinality
customers
CREATE TABLE customer ( id INTEGER PRIMARY KEY, lastname VARCHAR(20), firstname VARCHAR(30), street
VARCHAR(30), city VARCHAR(20), state VARCHAR(2), zip VARCHAR(10), country VARCHAR(20), phone
VARCHAR(10))
First 10 rows:
0|Jordan|Katherine|5832 Degan St|Freyer|AZ|86285|USA|(691)551-1092
1|Andrade|Sterling|1047 Clark St|Michaels|MT|83750|USA|(665)579-6921
2|Frederick|John|5807 Travis St|Jones|AK|95733|USA|(896)790-5223
3|Binette|Jimmy|8629 Kester St|Booker|LA|62854|USA|(569)203-5537
4|Caswell|Stefanie|4165 Green St|Champagne|TN|11565|USA|(926)189-1496
5|Palmer|Neil|3340 Mohabir St|Callahan|IN|49647|USA|(986)595-8182
6|Silva|Carol|8553 Hamilton St|Lanzi|GA|93518|USA|(238)814-9708
7|Folkers|Robert|1984 Beebe St|Sprenger|OK|06495|USA|(488)334-2533
8|Moultrie|Bernard|14 Armstrong St|Taus|NV|61668|USA|(357)688-8420
9|King|Giuseppe|5864 Goede St|White|TN|26195|USA|(651)345-7210
Row -> 100000 rows inserted in 80920.816 ms Average : 809.208 μs
Columnar -> 100000 rows inserted in 110879.347 ms Average : 1108.793 μs
Copyright © 2021 McKnight Consulting Group Slide 21
Test 2 : Insert 10000 low cardinality
items
CREATE TABLE item ( id INTEGER PRIMARY KEY, name VARCHAR(30), department
VARCHAR(30), status VARCHAR(1), price DECIMAL(8,2) )
First 10 rows...
0|Bunker|Clothing|B|789.64
1|Creighton|Clothing|B|390.59
2|Cole|Clothing|A|625.07
3|Jantzen|House goods|B|827.39
4|Lopez|Clothing|B|194.08
5|Dery|House goods|B|199.29
6|Flores|Electronics|B|552.61
7|Crigger|Clothing|B|172.15
8|Kidder|Clothing|B|30.97
9|Marion|Clothing|A|228.73
Row -> 10000 rows inserted in 7379.863 ms Average : 737.986 μs
Columnar -> 10000 rows inserted in 7930.747 ms Average : 793.075 μs
Copyright © 2021 McKnight Consulting Group Slide 22
Test 3 : Insert 1000000 Narrow Fact
Table Data
CREATE TABLE sales ( id INTEGER PRIMARY KEY, customerid INTEGER, itemid INTEGER )
First 10 rows...
0|15340|1000
1|15443|9490
2|37370|1805
3|65986|2084
4|69930|7926
5|89665|3421
6|49097|5176
7|16616|6072
8|39226|5486
9|64665|3398
Row -> 1000000 rows inserted in 519258.013 ms Average : 519.258 μs
Columnar -> 1000000 rows inserted in 535044.747 ms Average : 535.045 μs
Copyright © 2021 McKnight Consulting Group Slide 23
Test 4 : Single Table Select
SELECT lastname FROM customer WHERE state=‘AL’;
SELECT lastname FROM customer WHERE state='AK’;
SELECT lastname FROM customer WHERE state='AZ’;
SELECT lastname FROM customer WHERE state='AR’;
SELECT lastname FROM customer WHERE state='CA’;
SELECT lastname FROM customer WHERE state='CO’;
SELECT lastname FROM customer WHERE state='CT’;
SELECT lastname FROM customer WHERE state='DE’;
SELECT lastname FROM customer WHERE state='FL’;
SELECT lastname FROM customer WHERE state=‘GA’;
Row -> 50 queries in 392.256 ms Average : 7845.130 μs
Columnar -> 50 queries in 165.821 ms Average : 3316.412 μs
Copyright © 2021 McKnight Consulting Group Slide 24
Test 5 : Single Table Aggregation
SELECT department, sum(price) FROM item GROUP BY department;
SELECT status, sum(price) FROM item GROUP BY status;
SELECT substring(name, 1, 1), sum(price) FROM item GROUP BY substring(name, 1, 1);
Row -> 3 queries in 7.833 ms Average : 2611.001 μs
Columnar -> 3 queries in 2.115 ms Average : 704.924 μs
Copyright © 2021 McKnight Consulting Group Slide 25
Test 6 : Analytics Join Aggregation
SELECT department, sum(price) FROM customer c, item i, sales s WHERE s.customerid
= c.id AND s.itemid = i.id GROUP BY department;
SELECT status, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id
AND s.itemid = i.id GROUP BY status;
SELECT city, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id
AND s.itemid = i.id GROUP BY city;
SELECT state, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id
AND s.itemid = i.id GROUP BY state;
SELECT substring(lastname, 1, 1), sum(price) FROM customer c, item i, sales s WHERE
s.customerid = c.id AND s.itemid = i.id GROUP BY substring(lastname, 1, 1);
Row -> 5 queries in 8744.340 ms Average : 1748867.941 μs
Columnar -> 5 queries in 1209.917 ms Average : 241983.461 μs
Copyright © 2021 McKnight Consulting Group Slide 26
Benchmark Conclusions
Columnar is a little slower to load, but much
faster on queries
2.3x faster on simple single column scans
3.7x on simple aggregations
7.2x on an analytics query with a 3-table join
Copyright © 2021 McKnight Consulting Group Slide 27
Summary: Columnar Databases
§ Is an alternative to row storage
§ Stores each container independently
§ Addresses idle CPUs and disk bottlenecks
§ Is great for compression
§ Is best when there is a lot of data, long rows and when you
can isolate the loads
§ Is great for high column selectivity queries
Slide 28
Unlock Potential
William McKnight
President
McKnight Consulting Group
www.mcknightcg.com
@williammcknight
Analytic Databases Should be Columnar
@williammcknight

More Related Content

What's hot

Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterDATAVERSITY
 
Necessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessNecessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessDATAVERSITY
 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Blueprint
 
Business Value Through Reference and Master Data Strategies
Business Value Through Reference and Master Data StrategiesBusiness Value Through Reference and Master Data Strategies
Business Value Through Reference and Master Data StrategiesDATAVERSITY
 
A Modern Approach to DI & MDM
A Modern Approach to DI & MDMA Modern Approach to DI & MDM
A Modern Approach to DI & MDMDATAVERSITY
 
Convincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is EssentialConvincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is EssentialDATAVERSITY
 
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...DATAVERSITY
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityDATAVERSITY
 
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData Blueprint
 
The Value of Metadata
The Value of MetadataThe Value of Metadata
The Value of MetadataDATAVERSITY
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data GovernanceDATAVERSITY
 
Data-Ed Webinar: Best Practices with the DMM
Data-Ed Webinar: Best Practices with the DMMData-Ed Webinar: Best Practices with the DMM
Data-Ed Webinar: Best Practices with the DMMDATAVERSITY
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMDATAVERSITY
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDATAVERSITY
 
Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...
Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...
Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...DATAVERSITY
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
DataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDATAVERSITY
 

What's hot (20)

Data Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words MatterData Management Meets Human Management - Why Words Matter
Data Management Meets Human Management - Why Words Matter
 
Necessary Prerequisites to Data Success
Necessary Prerequisites to Data SuccessNecessary Prerequisites to Data Success
Necessary Prerequisites to Data Success
 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data Quality
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing
 
Business Value Through Reference and Master Data Strategies
Business Value Through Reference and Master Data StrategiesBusiness Value Through Reference and Master Data Strategies
Business Value Through Reference and Master Data Strategies
 
A Modern Approach to DI & MDM
A Modern Approach to DI & MDMA Modern Approach to DI & MDM
A Modern Approach to DI & MDM
 
Convincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is EssentialConvincing Stakeholders Data Governance Is Essential
Convincing Stakeholders Data Governance Is Essential
 
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
 
Data-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data QualityData-Ed Online: Approaching Data Quality
Data-Ed Online: Approaching Data Quality
 
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROI
 
The Value of Metadata
The Value of MetadataThe Value of Metadata
The Value of Metadata
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Data Management is Data Governance
Data Management is Data GovernanceData Management is Data Governance
Data Management is Data Governance
 
Data-Ed Webinar: Best Practices with the DMM
Data-Ed Webinar: Best Practices with the DMMData-Ed Webinar: Best Practices with the DMM
Data-Ed Webinar: Best Practices with the DMM
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use Cases
 
Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...
Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...
Slides: How to Avoid the 10 Big Data Analytics Blunders — Best Practices for ...
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
DataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance Programs
 

Similar to Analytic Platforms Should Be Columnar Orientation

Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationDATAVERSITY
 
Wed 1030 mc_knight_william_color
Wed 1030 mc_knight_william_colorWed 1030 mc_knight_william_color
Wed 1030 mc_knight_william_colorDATAVERSITY
 
The Path to Digital Transformation
The Path to Digital TransformationThe Path to Digital Transformation
The Path to Digital TransformationPrecisely
 
Getting Started with SQL Server Performance Tuning.pdf
Getting Started with SQL Server Performance Tuning.pdfGetting Started with SQL Server Performance Tuning.pdf
Getting Started with SQL Server Performance Tuning.pdfJohn Sterrett
 
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)Denodo
 
What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021DATAVERSITY
 
Is Our Information Management Mature?  
Is Our Information Management Mature?  Is Our Information Management Mature?  
Is Our Information Management Mature?  DATAVERSITY
 
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)Denodo
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingDATAVERSITY
 
Supercharge Your Digital Transformation by Establishing a DevOps Platform
Supercharge Your Digital Transformation by Establishing a DevOps PlatformSupercharge Your Digital Transformation by Establishing a DevOps Platform
Supercharge Your Digital Transformation by Establishing a DevOps PlatformXebiaLabs
 
Cleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VA
Cleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VACleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VA
Cleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VAClearedJobs.Net
 
Data Center Management Certification.pdf
Data Center Management Certification.pdfData Center Management Certification.pdf
Data Center Management Certification.pdfIDCA
 
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreAccelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreDatabricks
 
Introduction To SQL Server 2014
Introduction To SQL Server 2014Introduction To SQL Server 2014
Introduction To SQL Server 2014Vishal Pawar
 
2 Years of Exadata in Production
2 Years of Exadata in Production2 Years of Exadata in Production
2 Years of Exadata in ProductionEnkitec
 
Digital Transformation with 2 Speed IT & Agile Scrum
Digital Transformation with 2 Speed IT & Agile ScrumDigital Transformation with 2 Speed IT & Agile Scrum
Digital Transformation with 2 Speed IT & Agile Scrumtoamitkumar
 

Similar to Analytic Platforms Should Be Columnar Orientation (20)

Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
 
Wed 1030 mc_knight_william_color
Wed 1030 mc_knight_william_colorWed 1030 mc_knight_william_color
Wed 1030 mc_knight_william_color
 
The Path to Digital Transformation
The Path to Digital TransformationThe Path to Digital Transformation
The Path to Digital Transformation
 
Getting Started with SQL Server Performance Tuning.pdf
Getting Started with SQL Server Performance Tuning.pdfGetting Started with SQL Server Performance Tuning.pdf
Getting Started with SQL Server Performance Tuning.pdf
 
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
 
What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021What Is My Enterprise Data Maturity 2021
What Is My Enterprise Data Maturity 2021
 
Is Our Information Management Mature?  
Is Our Information Management Mature?  Is Our Information Management Mature?  
Is Our Information Management Mature?  
 
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
 
Nbi comp
Nbi compNbi comp
Nbi comp
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity ModelingADV Slides: Modern Analytic Data Architecture Maturity Modeling
ADV Slides: Modern Analytic Data Architecture Maturity Modeling
 
Delphix
DelphixDelphix
Delphix
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Supercharge Your Digital Transformation by Establishing a DevOps Platform
Supercharge Your Digital Transformation by Establishing a DevOps PlatformSupercharge Your Digital Transformation by Establishing a DevOps Platform
Supercharge Your Digital Transformation by Establishing a DevOps Platform
 
Cleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VA
Cleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VACleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VA
Cleared Job Fair Job Seeker Handbook June 9, 2011, Tysons Corner, VA
 
Data Center Management Certification.pdf
Data Center Management Certification.pdfData Center Management Certification.pdf
Data Center Management Certification.pdf
 
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreAccelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
 
Introduction To SQL Server 2014
Introduction To SQL Server 2014Introduction To SQL Server 2014
Introduction To SQL Server 2014
 
2 Years of Exadata in Production
2 Years of Exadata in Production2 Years of Exadata in Production
2 Years of Exadata in Production
 
Digital Transformation with 2 Speed IT & Agile Scrum
Digital Transformation with 2 Speed IT & Agile ScrumDigital Transformation with 2 Speed IT & Agile Scrum
Digital Transformation with 2 Speed IT & Agile Scrum
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Analytic Platforms Should Be Columnar Orientation

  • 1. Slide 1 Unlock Potential William McKnight President McKnight Consulting Group www.mcknightcg.com @williammcknight Analytic Databases Should be Columnar @williammcknight
  • 2. Copyright © 2021 McKnight Consulting Group Slide 2 William McKnight President, McKnight Consulting Group Consulted to Pfizer, Scotiabank, Fidelity, TD Ameritrade, Teva Pharmaceuticals, Verizon, and many other Global 1000 companies Frequent keynote speaker and trainer internationally Hundreds of articles, blogs and white papers in publication Focused on delivering business value and solving business problems utilizing proven, streamlined approaches to information management Former Database Engineer, Fortune 50 Information Technology executive and Ernst&Young Entrepreneur of Year Finalist Owner/consultant: Data strategy and implementation consulting firm William McKnight The Savvy Manager’s Guide The Savvy Manager’s Guide Information Management Information Management Strategies for Gaining a Competitive Advantage with Data 2
  • 3. Copyright © 2021 McKnight Consulting Group Slide 3 Origins 2005
  • 4. Copyright © 2021 McKnight Consulting Group Slide 4 RDBMS Design over the years RDBMS design is virtually unchanged, except for parallelism Hardware, however: § Storage capacity has increased tremendously (and got far cheaper) § CPU performance has improved § Transfer rates and seek times have increased modestly
  • 5. Copyright © 2021 McKnight Consulting Group Slide 5 Row-Wise DBMS Stores Data in Rows CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
  • 6. Copyright © 2021 McKnight Consulting Group Slide 6 L2 Cache Misses CPU L1 L2 Memory Storage
  • 7. Copyright © 2021 McKnight Consulting Group Slide 7 Data Block Layout © McKnight Consulting Group, 2010 Page Header Page Footer Row IDs 1120Aris Doug Johnson Practice Director 206-676-5636 doug.johnson@aris.com 1121Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 71269 craig.lennox@stoltoffshore.com 1122Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 mark.kohls@medtronic.com Records
  • 8. Copyright © 2021 McKnight Consulting Group Slide 8 Columnar Data Block Layout Block Header Page Footer 1120 1121 1122 1123 1124 1125 … Records
  • 9. Copyright © 2021 McKnight Consulting Group Slide 9 Traditional databases Date Store # State Class Sales Category 3/1/21 32 NY A 6 Gen 3/1/21 35 CT A 9 Spec 3/1/21 36 CT C 11 Gen 3/1/21 39 SD D 8 Gen 3/1/21 42 KY A 5 Spec 3/1/21 43 VT C 14 Spec 3/1/21 47 GA A 31 Gen 3/1/21 51 MD A 4 Sub 3/1/21 55 DC D 16 Gen 3/1/21 59 NY B 7 Gen 3/1/21 62 NJ C 9 Spec Calculate the average sales for the “A” stores in “NY” Traditional approach: • Data stored by row using data blocks (4K … 32K) • For queries, select a ‘filter’ -Build B-tree index for filters, -BUT If filter is not selective enough then scan the table -Go to selected blocks and add up sales numbers -Randomly distributed data will result in most blocks being read -Still have to read irrelevant data in each block
  • 10. Copyright © 2021 McKnight Consulting Group Slide 10 Mixing Columns in Containers
  • 11. Copyright © 2021 McKnight Consulting Group Slide 11 Vertical Partitioning of Data Columnar - Columns are stored independently Date Store # State Class Sales Category 3/1/13 32 NY A 6 Gen 3/1/13 35 CT A 9 Spec 3/1/13 36 CT C 11 Gen 3/1/13 39 SD D 8 Gen 3/1/13 42 KY A 5 Spec 3/1/13 43 VT C 14 Spec 3/1/13 47 GA A 31 Gen 3/1/13 51 MD A 4 Sub 3/1/13 55 DC D 16 Gen 3/1/13 59 NY B 7 Gen 3/1/13 62 NJ C 9 Spec Benefits: • Consistent data types are easy to compress • Resulting storage size is typically less than 50% the size of the raw data
  • 12. Copyright © 2021 McKnight Consulting Group Slide 12 Columnar Compression • Positional Representation • Run-Length Encoding • Dictionary Encoding • Delta from Median • NULL and Trim leading or trailing zeros or blanks • UTF8 Compression
  • 13. Copyright © 2021 McKnight Consulting Group Slide 13 Run-Length Qtr Store# Sales Qtr Q1 32 6 Q1 1 500 Q1 35 9 Q2 501 999 Q1 36 11 Q3 1000 1498 Q1 39 8 Q1 42 5 Q1 43 14 Q2 32 31 Q2 35 4 Q2 36 16 Q2 39 7 Q2 42 9 (Value, StartPosition, Count)
  • 14. Copyright © 2021 McKnight Consulting Group Slide 14 Dictionary Encoding Example Original data value Orig. Size* Compressed Value New size (bytes) England 30 0 1 England 30 0 1 United States of America 30 1 1 United States of America 30 1 1 Japan 30 2 1 Argentina 30 3 1 Sri Lanka 30 4 1 Japan 30 2 1 United States of America 30 1 1 Totals 270 9 * Fixed length, 30 bytes per value
  • 15. Copyright © 2021 McKnight Consulting Group Slide 15 Compression Applied
  • 16. Copyright © 2021 McKnight Consulting Group Slide 16 Materialization Strategies Function of ‘projection’ § Row-stores = removes unneeded columns from result set § Column-stores = when to GLUE Early Materialization § Construct rows before processing § Decompress all compressed columns first Late Materialization § Wait until end of operation
  • 17. Copyright © 2021 McKnight Consulting Group Slide 17 Late Materialization (4,1,4) prodID 2 1 3 1 storeID SELECT custID, price FROM Sales WHERE (prodID = 4) AND (storeID = 1) Select prodId = 4 Select storeID = 1 1 1 1 1 0 1 0 1 AND 3 3 13 80 3 13 3 80 Construct
  • 18. Copyright © 2021 McKnight Consulting Group Slide 18 Row-based CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 Workload Splitting Same data in both structures Optimizer or user determines which to use Columnar CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 CustomerID CompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
  • 19. Copyright © 2021 McKnight Consulting Group Slide 19 Benchmark SQLite for row-oriented DuckDB for columnar
  • 20. Copyright © 2021 McKnight Consulting Group Slide 20 Test 1 : Insert 100000 high cardinality customers CREATE TABLE customer ( id INTEGER PRIMARY KEY, lastname VARCHAR(20), firstname VARCHAR(30), street VARCHAR(30), city VARCHAR(20), state VARCHAR(2), zip VARCHAR(10), country VARCHAR(20), phone VARCHAR(10)) First 10 rows: 0|Jordan|Katherine|5832 Degan St|Freyer|AZ|86285|USA|(691)551-1092 1|Andrade|Sterling|1047 Clark St|Michaels|MT|83750|USA|(665)579-6921 2|Frederick|John|5807 Travis St|Jones|AK|95733|USA|(896)790-5223 3|Binette|Jimmy|8629 Kester St|Booker|LA|62854|USA|(569)203-5537 4|Caswell|Stefanie|4165 Green St|Champagne|TN|11565|USA|(926)189-1496 5|Palmer|Neil|3340 Mohabir St|Callahan|IN|49647|USA|(986)595-8182 6|Silva|Carol|8553 Hamilton St|Lanzi|GA|93518|USA|(238)814-9708 7|Folkers|Robert|1984 Beebe St|Sprenger|OK|06495|USA|(488)334-2533 8|Moultrie|Bernard|14 Armstrong St|Taus|NV|61668|USA|(357)688-8420 9|King|Giuseppe|5864 Goede St|White|TN|26195|USA|(651)345-7210 Row -> 100000 rows inserted in 80920.816 ms Average : 809.208 μs Columnar -> 100000 rows inserted in 110879.347 ms Average : 1108.793 μs
  • 21. Copyright © 2021 McKnight Consulting Group Slide 21 Test 2 : Insert 10000 low cardinality items CREATE TABLE item ( id INTEGER PRIMARY KEY, name VARCHAR(30), department VARCHAR(30), status VARCHAR(1), price DECIMAL(8,2) ) First 10 rows... 0|Bunker|Clothing|B|789.64 1|Creighton|Clothing|B|390.59 2|Cole|Clothing|A|625.07 3|Jantzen|House goods|B|827.39 4|Lopez|Clothing|B|194.08 5|Dery|House goods|B|199.29 6|Flores|Electronics|B|552.61 7|Crigger|Clothing|B|172.15 8|Kidder|Clothing|B|30.97 9|Marion|Clothing|A|228.73 Row -> 10000 rows inserted in 7379.863 ms Average : 737.986 μs Columnar -> 10000 rows inserted in 7930.747 ms Average : 793.075 μs
  • 22. Copyright © 2021 McKnight Consulting Group Slide 22 Test 3 : Insert 1000000 Narrow Fact Table Data CREATE TABLE sales ( id INTEGER PRIMARY KEY, customerid INTEGER, itemid INTEGER ) First 10 rows... 0|15340|1000 1|15443|9490 2|37370|1805 3|65986|2084 4|69930|7926 5|89665|3421 6|49097|5176 7|16616|6072 8|39226|5486 9|64665|3398 Row -> 1000000 rows inserted in 519258.013 ms Average : 519.258 μs Columnar -> 1000000 rows inserted in 535044.747 ms Average : 535.045 μs
  • 23. Copyright © 2021 McKnight Consulting Group Slide 23 Test 4 : Single Table Select SELECT lastname FROM customer WHERE state=‘AL’; SELECT lastname FROM customer WHERE state='AK’; SELECT lastname FROM customer WHERE state='AZ’; SELECT lastname FROM customer WHERE state='AR’; SELECT lastname FROM customer WHERE state='CA’; SELECT lastname FROM customer WHERE state='CO’; SELECT lastname FROM customer WHERE state='CT’; SELECT lastname FROM customer WHERE state='DE’; SELECT lastname FROM customer WHERE state='FL’; SELECT lastname FROM customer WHERE state=‘GA’; Row -> 50 queries in 392.256 ms Average : 7845.130 μs Columnar -> 50 queries in 165.821 ms Average : 3316.412 μs
  • 24. Copyright © 2021 McKnight Consulting Group Slide 24 Test 5 : Single Table Aggregation SELECT department, sum(price) FROM item GROUP BY department; SELECT status, sum(price) FROM item GROUP BY status; SELECT substring(name, 1, 1), sum(price) FROM item GROUP BY substring(name, 1, 1); Row -> 3 queries in 7.833 ms Average : 2611.001 μs Columnar -> 3 queries in 2.115 ms Average : 704.924 μs
  • 25. Copyright © 2021 McKnight Consulting Group Slide 25 Test 6 : Analytics Join Aggregation SELECT department, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id AND s.itemid = i.id GROUP BY department; SELECT status, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id AND s.itemid = i.id GROUP BY status; SELECT city, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id AND s.itemid = i.id GROUP BY city; SELECT state, sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id AND s.itemid = i.id GROUP BY state; SELECT substring(lastname, 1, 1), sum(price) FROM customer c, item i, sales s WHERE s.customerid = c.id AND s.itemid = i.id GROUP BY substring(lastname, 1, 1); Row -> 5 queries in 8744.340 ms Average : 1748867.941 μs Columnar -> 5 queries in 1209.917 ms Average : 241983.461 μs
  • 26. Copyright © 2021 McKnight Consulting Group Slide 26 Benchmark Conclusions Columnar is a little slower to load, but much faster on queries 2.3x faster on simple single column scans 3.7x on simple aggregations 7.2x on an analytics query with a 3-table join
  • 27. Copyright © 2021 McKnight Consulting Group Slide 27 Summary: Columnar Databases § Is an alternative to row storage § Stores each container independently § Addresses idle CPUs and disk bottlenecks § Is great for compression § Is best when there is a lot of data, long rows and when you can isolate the loads § Is great for high column selectivity queries
  • 28. Slide 28 Unlock Potential William McKnight President McKnight Consulting Group www.mcknightcg.com @williammcknight Analytic Databases Should be Columnar @williammcknight