Turning the tables: The Columnar Alternative

403 views

Published on

Understand appropriate workloads for Columnar DBMS engines including InfiniDB

Published in: Technology
  • Be the first to comment

Turning the tables: The Columnar Alternative

  1. 1. Turning the Tables– The Columnar AlternativeSkySQL & MariaDB: Solutions DayCalpont Proprietary and Confidential®
  2. 2. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.2Agenda• Who are we?• Columnar database basicsoStructural differences• Understanding workloadsoQuery Vision/Scope (OLTP vs. Analytic)oQuery Variety (Static vs. Ad-Hoc/Dynamic)oData Volume, Data Structure• Putting it all together
  3. 3. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.Calpont and InfiniDB• Calpont CorporationoHeadquartered in Frisco, TXoTeam members in California, Colorado, Boston3• ProductsoInfiniDB Community Initial release Oct 2009Latest release 2.2oInfiniDB Enterprise Initial release Feb 2010Latest release 3.6®
  4. 4. Introduction to Columnar databases• Columnar ConceptsCalpont Proprietary and Confidential
  5. 5. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.5Row-Oriented vs. Column-OrientedRow-oriented: rows stored sequentiallyColumn-oriented: each column is stored ina separate fileEach column for a given row is at the same offset.Key Fname Lname State Zip Phone Age Sex1 Bugs Bunny NY 11217 (718) 938-3235 34 M2 Yosemite Sam CA 95389 (209) 375-6572 52 M3 Daffy Duck NY 10013 (212) 227-1810 35 M4 Elmer Fudd ME 04578 (207) 882-7323 43 M5 Witch Hazel MA 01970 (978) 744-0991 57 FKey12345FnameBugsYosemiteDaffyElmerWitchLnameBunnySamDuckFuddHazelStateNYCANYMEMAZip1121795389100130457801970Phone(718) 938-3235(209) 375-6572(212) 227-1810(207) 882-7323(978) 744-0991Age3452354357SexMMMMFIndex Key RowID1 2233467573562 2233467571233 2233467553404 2233468943435 223346757120Index Key RowID1 2233467573562 2233467571233 2233467553404 2233468943435 223346757120Index Key RowID1 2233467573562 2233467571233 2233467553404 2233468943435 223346757120
  6. 6. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.6Columnar Implicit Row Identifier• Implicit row identifier with columnar.• Avoidance of record and field meta-data with columnar.
  7. 7. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.7Single-Row Operation (Insertion)Key Fname Lname State Zip Phone Age Sex1 Bugs Bunny NY 11217 (718) 938-3235 34 M2 Yosemite Sam CA 95389 (209) 375-6572 52 M3 Daffy Duck NY 10013 (212) 227-1810 35 M4 Elmer Fudd ME 04578 (207) 882-7323 43 M5 Witch Hazel MA 01970 (978) 744-0991 57 FKey12345FnameBugsYosemiteDaffyElmerWitchLnameBunnySamDuckFuddHazelStateNYCANYMEMAZip1121795389100130457801970Phone(718) 938-3235(209) 375-6572(212) 227-1810(207) 882-7323(978) 744-0991Age3452354357SexMMMMFRow-oriented: new row insertedColumn-oriented: value deleted from each file6 Marvin Martian CA 91602 (818) 761-9964 26 M6 Marvin Martian CA 91602 (818) 761-9964 26 MIndex Key RowID1 2233467573562 2233467571233 2233467553404 2233468943435 2233467571206 223346757121
  8. 8. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.8Single-Row Operation (Deletion)Key Fname Lname State Zip Phone Age Sex1 Bugs Bunny NY 11217 (718) 938-3235 34 M2 Yosemite Sam CA 95389 (209) 375-6572 52 M3 Daffy Duck NY 10013 (212) 227-1810 35 M4 Elmer Fudd ME 04578 (207) 882-7323 43 M5 Witch Hazel MA 01970 (978) 744-0991 57 FKey12345FnameBugsYosemiteDaffyElmerWitchLnameBunnySamDuckFuddHazelStateNYCANYMEMAZip1121795389100130457801970Phone(718) 938-3235(209) 375-6572(212) 227-1810(207) 882-7323(978) 744-0991Age3452354357SexMMMMFRow-oriented: new rows deletedColumn-oriented: value deleted from each file6 Marvin Martian CA 91602 (818) 761-9964 26 M6 Marvin Martian CA 91602 (818) 761-9964 26 MIndex Key RowID1 2233467573562 2233467571233 2233467553404 2233468943435 2233467571206 223346757121
  9. 9. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.9Update OperationsRow-oriented: Update 100% of rows meanschange 100% of blocks on disk.Column-oriented: Update just the blocks neededKey Fname Lname State Zip Phone Age Sex1 Bugs Bunny NY 11217 (718) 938-3235 34 M2 Yosemite Sam CA 95389 (209) 375-6572 52 M3 Daffy Duck NY 10013 (212) 227-1810 35 M4 Elmer Fudd ME 04578 (207) 882-7323 43 M5 Witch Hazel MA 01970 (978) 744-0991 57 FKey12345FnameBugsYosemiteDaffyElmerWitchLnameBunnySamDuckFuddHazelStateNYCANYMEMAZip1121795389100130457801970Phone(718) 938-3235(209) 375-6572(212) 227-1810(207) 882-7323(978) 744-0991Age3452354357SexMMMMF
  10. 10. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.10Single-Row Operations•Columnar not efficient for singleton insertions.•Columnar not efficient for singleton deletions.•Columnar efficient for ranged column updates.•Columnar efficient for batched inserts -bulk load•Columnar efficient for batched partition drop.
  11. 11. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.11Add a New ColumnRow-oriented: Usually requires rebuilding tableColumn-oriented: Create another fileKey Fname Lname State Zip Phone Age Sex1 Bugs Bunny NY 11217 (718) 938-3235 34 M2 Yosemite Sam CA 95389 (209) 375-6572 52 M3 Daffy Duck NY 10013 (212) 227-1810 35 M4 Elmer Fudd ME 04578 (207) 882-7323 43 M5 Witch Hazel MA 01970 (978) 744-0991 57 FKey12345FnameBugsYosemiteDaffyElmerWitchLnameBunnySamDuckFuddHazelStateNYCANYMEMAZip1121795389100130457801970Phone(718) 938-3235(209) 375-6572(212) 227-1810(207) 882-7323(978) 744-0991Age3452354357SexMMMMFGolfYNYYNGolfYNYYN
  12. 12. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.12Add a New Column• Columnar very flexible around adding columns.• No table rebuild required with columnar.
  13. 13. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.13Columnar Basic Differences• What we know so far:o Columnar not suited for OLTP style individual row insertions/deletions.o Columnar slower than a well-tuned index when finding individualrows.• But wait, columnar databases actually load faster? How?o Avoiding transactional load in favour of batching.
  14. 14. Workloads• Query Vision/Scope (OLTP vs. Analytic)• Query Variety (Static vs. Ad-Hoc/Dynamic)• Data Volume, Data StructureCalpont Proprietary and Confidential
  15. 15. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.15Workload – Query Vision/ScopeForestTreeQuery Vision/Scope1 100 10,000 1,000,000 100,000,000 10,000,000,000
  16. 16. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.16Workload – Query Vision/Scope1 100 10,000 1,000,000 100,000,000 10,000,000,000Query Vision/ScopeOLTP Workloads Analytic WorkloadsGeneral purpose DBMS missed the target( dated database technology generally not optimal )
  17. 17. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.17Where are your workloads?1 100 10,000 1,000,000 100,000,000 10,000,000,000Query Vision/ScopeOLTP Workloads Analytic Workloads• Most customers do both, and we recommend two engineso May require ETL or Asynchronous Replication (Tungsten)• If your Analytic workloads are small, probably don’t need columnar• If your transactional workloads are small, then don’t need row
  18. 18. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.18Workload – Query Variety1 10 100 1000 10000How many different types of Analysis are done?How many dimensions? ( How many indexes? )Static BusinessRequirementsAd-Hoc/DynamicBusiness Requirements
  19. 19. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.19Workload – Query Variety1 10 100 1000 10000• If you can easily cover your queries with a couple of indexes andbusiness requirements change slowly: then you may not need acolumnar DBMS.• If you need more Analytics, faster Analytics, and faster deploymentsof new Analytics, then columnar DBMS is a good fit.How many different types of Analysis are done?How many dimensions? ( How many indexes? )Static BusinessRequirementsAd-Hoc/DynamicBusiness Requirements
  20. 20. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.20Data Volume1 100 10,000 1,000,000 100,000,000 10,000,000,000Total Rows StoredAnalytics Optimized DBMS (Columnar)+OLTP Optimized DBMS (shards or other)General purpose DBMS can be suitable atsmall scales
  21. 21. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.21Data Volume1 100 10,000 1,000,000 100,000,000 10,000,000,000Total Rows StoredAnalytics Query + Big Data= Columnar + MPP1 100 10,000 1,000,000 100,000,000 10,000,000,000Query Vision/Scope• Some Columnar DBMS also offer MPP (Massively ParallelProcessing) to distribute workload to the data nodes.
  22. 22. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.22Data StructureKey Varchar_80001 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna2 aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.3 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint4 occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.5 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magnaKey12345Row-oriented: heavy text usageColumn-oriented: heavy text usageVarchar_8000Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magnaaliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sintoccaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna• Columnar DBMS and Row DBMS I/O will be about the same.• Candidate for Sphinx Search or other tool.
  23. 23. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.23Data Structure - Flexibility• Columnar allows for on-line schema modifications.• No penalty for infrequently used columns.• Sparse columns will compress to virtually nothing.
  24. 24. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.24Putting it all together• Designed for massive, high performance analytics• Designed for ad-hoc flexibility• Not suited for OLTP, KeyValue, NoSQL workloads• Hadoop connectivity and beyond
  25. 25. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.InfiniDB Product Footprint25

×