Wed 1030 mc_knight_william_color

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
726
On Slideshare
726
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
16
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Unlock Potential Columnar Databases: Data Does the Twist and Analytics Shout William McKnight, President, McKnight Consulting GroupCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 1
  • 2. William McKnight,www.mcknightcg.com Helping organizations adopt business-effective information management practices and technologies.Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 2
  • 3. Agenda• Row-Wise Design• Columnar Storage• Materialization• Wrap-UpCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 3
  • 4. Unlock Potential Row-Wise Design© McKnight Consulting Group, 2010 Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 4
  • 5. DBMS Design over the years RDBMS design is virtually unchanged, except forparallelism Hardware, however: Disk capacity has increased tremendously (and got far cheaper) CPU performance has improved too, but… Transfer rates and seek times have increased modestlyCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 5
  • 6. L2 Cache Misses CPU L1 L2 Memory DiskCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 6
  • 7. Row-Wise DBMS Stores Data in Rows CustomerIDCompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 1138 CP Associates Wilson Mak Consultant 252-92593731 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1143 aft greg tanner cto 303.233.6122 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 7
  • 8. Data Page Layout Page Header 1120Aris Doug Johnson Practice Director 206-676-5636 doug.johnson@aris.com Records 1121Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 71269 craig.lennox@stoltoffshore.com 1122Medtronic, Inc. Database Administrator Mark Kohls Principle Page 763.516.2557 mark.kohls@medtronic.com Footer© McKnight Consulting Group, 2010 Row IDs Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 8
  • 9. Traditional databases Calculate the average sales for the “A” stores in “NY”Traditional approach: Date Store # State Class Sales Category …• Data stored by row using 3/1/2010 32 NY A 6 Gen small data pages (4K or 8K) 3/1/2010 35 CT A 9 Spec• For queries, select a ‘filter’ 3/1/2010 36 CT C 11 Gen -Build B-tree index for filters, 3/1/2010 39 SD D 8 Gen -BUT If filter is not selective 3/1/2010 42 KY A 5 Spec enough then scan the table 3/1/2010 43 VT C 14 Spec-Go to selected pages and add 3/1/2010 47 GA A 31 Genup sales numbers 3/1/2010 51 MD A 4 Sub -Randomly distributed data 3/1/2010 55 DC D 16 Gen will result in most pages being 3/1/2010 59 NY B 7 Gen read 3/1/2010 62 NJ C 9 Spec -Still have to read irrelevant data in each page Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 9
  • 10. Unlock Potential Columnar Storage© McKnight Consulting Group, 2010 Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 10
  • 11. Columnar DBMS Stores Data in ColumnsCustomerID 1119 1120 1121 1122 1123 1124 1126 1127 1128 1133 1134CompanyName m4ii Aris Stolt Offshore MS Ltd Medtronic, Inc. Beckman Coulter Banco de Bogotá The Boeing Company Consulting IT/1 Banco de Bogotá The HArtford CGI GroupContactFirstName dhamotharan Doug Craig Mark Tim José Alfredo Mike Leif B. JOSE ALFREDO Jimmy TerryContactLastName achaiyan Johnson Lennox Kohls Parsons López Arias Roberts Soerensen LOPEZ ARIAS Chen PetherickContactTitle solutions architect Practice Director Mr Principle DatabaseBusiness Systems Administrador DWH enior Business Process Architect Consultant DWH usiness System Analyst Consultant Administrator Manager S Data Warehouse Administrador B SeniorPhoneNumber 91222507176 206-676-5636 +66 1226 712519 763.516.2557 +61 22 996 0963 5713320032 (206)655-7155 +65 26236691 5713320032 215-653-2662 613-236-2155 Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 11
  • 12. Columnar Data Page Layout Page Header 1120 1121 1122 1123 Records 1124 1125 … Page FooterCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 12
  • 13. Vertical Partitioning of Data Date Store # State Class Sales Category … Columnar -Columns are 3/1/2010 32 NY A 6 Gen 3/1/2010 35 CT A 9 Specstored 3/1/2010 36 CT C 11 Genindependently 3/1/2010 39 SD D 8 Gen 3/1/2010 42 KY A 5 Spec 3/1/2010 43 VT C 14 Spec 3/1/2010 47 GA A 31 Gen 3/1/2010 51 MD A 4 Sub 3/1/2010 55 DC D 16 Gen 3/1/2010 59 NY B 7 Gen 3/1/2010 62 NJ C 9 Spec Benefits:• Consistent data types are easy to compress• Resulting storage size is typically less than 50% the size of the raw data Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 13
  • 14. Columnar Storage Options Decomposed Storage Model Positional Representation Modified B-Tree/Row Length Encryption BitmapCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 14
  • 15. Modified B-Tree/Run LengthEncryption Qtr Store# Sales Qtr Q1 32 6 Q1 1 500 Q1 35 9 Q2 501 999 Q1 36 11 Q3 1000 1498 Q1 39 8 Store# Q1 42 5 32 1 1 Q1 43 14 35 2 2 Q2 32 31 36 3 3 Q2 35 4 Q2 36 16 Q2 39 7 Q2 42 9 (Value, StartPosition, Count)Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 15
  • 16. Workload Splitting Row-based CustomerID CompanyName CustomerID CustomerID 1119 m4ii CompanyName 1119 m4ii Columnar CompanyName ContactFirstName ContactLastName ContactTitle ContactFirstName ContactLastName ContactTitle ContactFirstName ContactLastName ContactTitle dhamotharan achaiyan dhamotharan solutions architect achaiyan solutions architect PhoneNumber PhoneNumber 91222507176 PhoneNumber 91222507176 1120 Aris m4ii 1119 Doug dhamotharan Johnsonachaiyan solutions architect Practice Director 91222507176 206-676-5636 1120 Aris Doug Johnson Practice Director 206-676-5636 1121 StoltAris 1120 Offshore MS Ltd CraigDoug Johnson Lennox Mr Practice Director 206-676-5636 +66 1226 712519 CustomerIDCompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1121 Stolt Offshore MS Ltd Mark 1122 Medtronic, Inc. Craig Lennox Kohls Principle Database Administrator 763.516.2557 712519 Mr +66 1226 1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1122 Medtronic, Inc. 1122 Medtronic, Inc. 1123 Beckman Coulter Tim Mark Mark Kohls Parsons Kohls Principle Database Administrator 763.516.2557 Principle Database Administrator 22 996 0963 Business Systems Manager +61 763.516.2557 1120 Aris Doug Johnson Practice Director 206-676-5636 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 1123 Beckman Coulter 1124 Banco de Bogotá Tim José Alfredo Parsons López Arias AdministradorSystems Manager Business DWH 5713320032 0963 +61 22 996 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1126 The Banco de Bogotá 1124 Boeing Company Mike José Alfredo Roberts Arias López Administrador DWH 5713320032 Senior Business Process Architect (206)655-7155 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 CustomerID 1126 The Boeing CompanyLeif B. CompanyName IT/1 Consulting Mike 1127 IT/1 Consulting 1127 ContactFirstName ContactLastName SoerensenWarehouse Consultant Architect 26236691 +65 26236691 Leif B. Roberts Soerensen Data Senior Business Process PhoneNumber ContactTitle +65 (206)655-7155 Data Warehouse Consultant 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 CustomerID IT/1 Consulting dhamotharan m4ii Banco 1128 Banco de BogotáALFREDO ContactLastName ContactTitle DWH Consultant91222507176 1127 CompanyName 1119CustomerID de Bogotá 1128 CompanyName ContactFirstName LOPEZ ARIAS LOPEZ ARIAS Warehouse JOSEContactFirstName ContactLastName Data Leif B. Soerensen Administrador +65 26236691 5713320032 JOSE ALFREDO solutions architect Administrador DWH PhoneNumber 5713320032 achaiyan ContactTitle PhoneNumber 1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1120 Aris m4ii Banco de BogotáDoug JOSE ALFREDO LOPEZ ARIAS solutions architectAnalyst 1128 1119 1119HArtford The HArtford 1133 The m4ii 1133 dhamotharan Johnson achaiyanChen Business System DWH System206-676-5636 Jimmy Jimmy Chen dhamotharanachaiyan Practice Administrador solutions architect Director Business 5713320032 215-653-2662 Analyst 91222507176 91222507176 215-653-2662 1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1121 StoltAris The HArtford Group Doug Doug Lennox Chen 1133 Offshore MS CGI Craig Jimmy 1134 CGI Group 1120 1120 Aris1134Ltd Terry Terry Johnson Petherick Business System Consultant+66 1226 712519 613-236-2155 Petherick Johnson Practice DirectorAnalyst Senior Consultant Mr Practice DirectorSenior 215-653-2662 613-236-2155 206-676-5636 206-676-5636 1121 StoltCGI Group Metavante Corporation Kohls Petherick Kundinger DatabasePresidentVice President 1226 712519 1135 1121Offshore MS Ltd Mark Terry 1134 Stolt Corporation Ron Metavante 1122 Medtronic, Inc. Offshore MS LtdCraig Craig 1135 Ron LennoxLennox PrincipleSenior Consultant Kundinger Mr Mr Vice Assistant Assistant +66 613-236-2155 616-577-9227 712519 Administrator 763.516.2557 1226 616-577-9227 +66 1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691 1122 Medtronic, Inc. CP Associates Ron 1138 1122 Coulter Corporation 1135 Associates CP Metavante Wilson 1123 Beckman Medtronic, Inc. Tim Mark Mark 1138 ParsonsKundinger Mak Principle DatabasePresident +61 22 996 0963 252-92593731 Mak Wilson Kohls Kohls Business PrincipleVice Administrator 763.516.2557 Assistant Database Administrator 616-577-9227 Consultant Systems Consultant Manager 252-92593731 763.516.2557 1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1138de Beckman Coulter Tim Wilson López Arias 1124 Banco CP Associates José Alfredo 1142 Beckman Coulter PRSBBogotá 1123 1123 Ming Long Tim Wu Mak Parsons Consultant Systems BusinessAssistant Administrator 252-92593731 Assistant Administrator Manager +61 22 996 0963 ext 719 Parsons Wu Business Systems Manager Administrador DWH 226-2-23931261 0963 5713320032 22 996 +61 1142 PRSB Ming Long 226-2-23931261 ext 719 1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1143 1124 Banco de Bogotá José Alfredo Roberts Wu 1142 PRSBBogotá aft 1124 Banco de1143 aft 1126 The Boeing Company Mikegreg José Alfredo López Arias Arias AdministradorAdministrator Ming Long tanner cto Assistant DWH DWH López tanner Business Process Architect (206)655-7155 Senior Administrador 226-2-23931261 ext 719 303.233.6122 5713320032 303.233.6122 5713320032 greg cto 1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1127 IT/1 The aft Solutions Company greg 1143 The 1144 Zamba Boeing 1126 1126Boeing CompanyLeif B. Consulting Jeff Mike Mike 1144 Zamba Solutions tanner McCallRoberts McCallWarehouseBusiness Process Architect 303.233.6122 Roberts Soerensen Jeff Data cto Executive Vice President Architect 26236691 602-626-6125 Senior Business ProcessVice President (206)655-7155 Senior Consultant Executive +65 (206)655-7155 602-626-6125 1144 Zamba Solutions JOSE ALFREDO 1127 IT/1 Consultancy 1146 1127Consulting MR Bogotá Jeff Mukesh 1128 Banco de IT/11146 MR Consultancy Leif B. LOPEZ ARIAS Consulting Leif B. McCall Rughani Mr Executive Vice President 5713320032 26236691 602-626-6125 +66 (0)1379 662219 1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 Mukesh Soerensen Data Warehouse Consultant Soerensen Rughani DataDWH Administrador Warehouse Consultant +65 26236691 +66 (0)1379 662219 Mr +65 1138 CP Associates Wilson Mak Consultant 252-92593731 1133 The Banco de1147 Intellor Jimmy MukeshALFREDO Rughani ARIASAdministradorAnalyst Coordinator 5713320032 301-202-6766 1146 MR Consultancy 1147 Intellor Group Group Robin LOPEZ Martin Mr Robin ALFREDO LOPEZ ARIAS Business Administrador DWH 1128 1128 Banco de Bogotá JOSEJOSE Chen HArtford Bogotá Martin Project Coordinator System DWH Project 301-202-6766 662219 +66 (0)1379 5713320032 215-653-2662 1134 CGI The IntellorBogotá 1133 1133HArtford Banco Terry Robin 1147 The Group 1148 Group de HArtford Banco 1148 Jimmy de Bogotá Martin José Jimmy Petherick Chen Alfredo Business Coordinator Project Chen Arias López Arias System Analyst López José Alfredo Senior Consultant DWH Analyst 613-236-2155 Administrador System 301-202-6766 5713320032 BusinessAdministrador DWH 215-653-2662 5713320032 215-653-2662 1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719 1134 CGI Group de Bogotá 1148 Banco 1135 Metavante Corporation RonTerry Terry 1134 CGI Group José Alfredo Petherick AriasAssistant Vice President López KundingerPetherick Senior Consultant DWH Administrador Senior Consultant 5713320032 613-236-2155 613-236-2155 616-577-9227 1143 aft greg tanner cto 303.233.6122 1138 CP Associates Corporation Ron Ron 1135 Metavante 1135 Metavante Corporation Wilson MakKundinger Kundinger Consultant Vice President Assistant Assistant Vice President 616-577-9227 616-577-9227 252-92593731 1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1142 PRSB Associates 1138 CP CP Associates Ming Long 1138 Wilson Wilson Wu Mak Mak Consultant Consultant Assistant Administrator 252-92593731 252-92593731 226-2-23931261 ext 719 1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1143 aft PRSB 1142 1142 PRSB greg Ming Long Long Wu Wu Ming tanner cto Assistant Administrator Assistant Administrator 226-2-23931261 ext 719 719 226-2-23931261 ext 303.233.6122 1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1143 aft aft 1143 1144 Zamba Solutions Jeff greg greg McCall tanner tanner Executive cto President cto Vice 303.233.6122 303.233.6122 602-626-6125 1146 MR Consultancy Solutions Jeff Jeff 1144 Zamba Solutions 1144 Zamba Mukesh Rughani McCall Mr Executive Vice President McCall Executive Vice President 602-626-6125 602-626-6125 +66 (0)1379 662219 1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1147 Intellor Group ConsultancyRobin 1146 MR Consultancy 1146 MR Mukesh Mukesh MartinRughani Rughani Project Coordinator Mr Mr 301-202-6766 (0)1379 662219 +66 (0)1379 662219 +66 1148 Banco de Intellor Group José Alfredo 1147 IntellorBogotá 1147 Group RobinRobin López Arias MartinMartin Project Coordinator Project Coordinator Administrador DWH 301-202-6766 5713320032301-202-6766 1148 Banco de Bogotá 1148 Banco de Bogotá José Alfredo José Alfredo López Arias Arias Administrador DWH DWH López Administrador 5713320032 5713320032 Same data in both structures Optimizer or user determines which to useCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 16
  • 17. The Value of Performance“How many MALES are NOT INSURED in CALIFORNIA? RDBMSGender State 800 Bytes x 10M = 500,000 I/Os Insured M M NY CA Y Y 16K Page 10M F CT N  Process large amounts of ROWS M MA Y unused data M CA N - -  Often requires full 800 Bytes/Row table scan 10M Bits x 3 col / 8 = 235 I/Os Gender Insured State 16K Page1 M Y CA 1 0 12 M N CA 1 1 13 F Y NY 10M Bits + + = 2 0 0 04 M N CA 1 1 1 Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 17
  • 18. Unlock Potential Materialization© McKnight Consulting Group, 2010 Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 18
  • 19. Materialization Strategies Function of ‘projection’ Row-stores = removes unneeded columns from result set Column-stores = when to GLUE Early Materialization Construct rows before processing Decompress all compressed columns first Late Materialization Wait until end of operationCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 19
  • 20. Early Materialization 4 1 3 13 Projection 3 13 Selection (where) 4 1 3 80 (select) 3 80 4 2 2 7 4 1 3 13 SELECT custID,price FROM Sales 4 3 3 42 WHERE (prodID = 4) AND (storeID = 1) 4 1 3 80 Materialize(4,1,4) 2 2 7 1 3 13prodID 3 3 42 1 3 80 storeID custID price Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 20
  • 21. Late Materialization 3 13 3 80 AND Construct 1 0 1 1 3 13 1 0 3 80 1 1 Select Select SELECT custID, price prodId = 4 storeID = 1 FROM Sales WHERE (prodID = 4) AND (storeID = 1) (4,1,4) 2 1 prodID 3 1 storeID Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 21
  • 22. Unlock Potential Wrap-Up© McKnight Consulting Group, 2010 Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 22
  • 23. Summary: Column Databases Is an alternative to row storage Is seeing more adoption – vendors/customers Stores each column independently Addresses idle CPUs and disk bottlenecks Is great for compression Is best when there is a lot of data, long rows andwhen you can isolate the loads Is great for high column selectivity queries Takes longer to loadCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 23
  • 24. Columnar Databases: Data Does theTwist and Analytics Shout Presented by: William McKnight President McKnight Consulting Group LLC (214) 514-1444 wmcknight@mcknightcg.com www.mcknightcg.com Twitter @williammcknightCopyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 24