Apr. 19, 2023
Apr. 19, 2023
  1. 1. 1 C-Store: A Column-oriented DBMS By New England Database Group
  2. 2. 2 M.I.T Current DBMS Gold Standard Store fields in one record contiguously on disk Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer and executor (technology from 1979) Aries-style transactions
  3. 3. 3 M.I.T Terminology -- “Row Store” Record 2 Record 4 Record 1 Record 3 E.g. DB2, Oracle, Sybase, SQLServer, …
  4. 4. 4 M.I.T Row Stores are Write Optimized Can insert and delete a record in one physical write Good for on-line transaction processing (OLTP) But not for read mostly applications Data warehouses CRM
  5. 5. 5 M.I.T Elephants Have Extended Row Stores With Bitmap indices Better sequential read Integration of “datacube” products Materialized views But there may be a better idea…….
  6. 6. 6 M.I.T Column Stores
  7. 7. 7 M.I.T At 100K Feet…. Ad-hoc queries read 2 columns out of 20 In a very large warehouse, Fact table is rarely clustered correctly Column store reads 10% of what a row store reads
  8. 8. 8 M.I.T C-Store (Column Store) Project Brandeis/Brown/MIT/UMass-Boston project Usual suspects participating Enough coded to get performance numbers for some queries Complete status later
  9. 9. 9 M.I.T We Build on Previous Pioneering Work…. Sybase IQ (early ’90s) Monet (see CIDR ’05 for the most recent description)
  10. 10. 10 M.I.T C-Store – a Hybrid Store Read-optimized Column store Write-optimized Column store Tuple mover (Much like Monet) (What we have been talking about so far) (Batch rebuilder)
  11. 11. 11 M.I.T C-Store Technical Ideas Code the columns to save space No alignment Big disk blocks Only materialized views (perhaps many) Focus on Sorting not indexing Automatic physical DBMS design
  12. 12. 12 M.I.T C-store (Column Store) Technical Ideas Innovative redundancy  Redundant storage of elements of table in serveral overlapping projections in diffrent orders  High availabilty, K-safety, overlapping projections Xacts – but no need for Mohan (Dr. C. Mohan? WAL based?) avoid 2PC, no prepare, no locking Data ordered on anything, Not just time Column optimizer and executor
  13. 13. 13 M.I.T Data Model  Projection  Segment  Storage Keys - tuple  Join indices – (sid, storage key)
  14. 14. 14 M.I.T Code the Columns Work hard to shrink space Use extra space for multiple orders Fundamentally easier than in a row store  E.g. RLE works well  type 1, self order, few distinct values (v, f, n)  type 2, foreign order, few distinct values (v, b)  type 3, self order, many distinct values (delta)  type 4, foreign order, many distinct values (unencoded, or densepack B-tree)
  15. 15. 15 M.I.T Different Indexing Few values Many values Sequential RLE encoded Conventional B-tree at the value level Delta encoded Conventional B-tree at the block level Non sequential Bitmap per value Conventional Gzip Conventional B-tree at the block level
  16. 16. 16 M.I.T No Alignment Densepack columns E.g. a 5 bit field takes 5 bits Current CPU speed going up faster than disk bandwidth Faster to shift data in CPU than to waste disk bandwidth
  17. 17. 17 M.I.T Big Disk Blocks Tunable Big (minimum size is 64K)
  18. 18. 18 M.I.T Only Materialized Views Projection (materialized view) is some number of columns from a fact table Plus columns in a dimension table – with a 1-n join between Fact and Dimension table Stored in order of a storage key(s) Several may be stored!!!!! With a permutation, if necessary, to map between them
  19. 19. 19 M.I.T Only Materialized Views Table (as the user specified it and sees it) is not stored! No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one)
  20. 20. 20 M.I.T Automatic Physical DBMS Design Accept a “training set” of queries and a space budget Choose the MVs auto-magically Re-optimize periodically based on a log of the interactions
  21. 21. 21 M.I.T Innovative Redundancy Hardly any warehouse is recovered by a redo from the log Takes too long! Store enough MVs at enough places to ensure K-safety Rebuild dead objects from elsewhere in the network K-safety is a DBMS-design problem!
  22. 22. 22 M.I.T XACTS – No Mohan Undo from a log (that does not need to be persistent) Redo by rebuild from elsewhere in the network
  23. 23. 23 M.I.T XACTS – No Mohan Snapshot isolation (run queries as of a tunable time in the recent past) To solve read-write conflicts Distributed Xacts Without a prepare message (no 2 phase commit)
  24. 24. 24 M.I.T Updates & Trans Machanism  ET - Effictive Time  IV - Insertion Time, for WS only  DRV - Deleted Record Vector, for Projection  TA - Timestamp Authority – HWM – LWM: triggered by tuple mover
  25. 25. 25 M.I.T Recovery  K-Safty  Common Case - WS crash, RS ok – recovery sql – log when tlastmove(S)>tlastmove(Sr)
  26. 26. 26 M.I.T Tuple Mover  MOP : Merge-Out Process  emit LWM – The need of users who want historical access – WS space constraints
  27. 27. 27 M.I.T Column Executor Column operations  not row operations Columns remain coded  if possible LM  Late materialization
  28. 28. 28 M.I.T Column Optimizer Chooses MVs on which to run the query Most important task Build in snowflake schemas Which are simple to optimize without exhaustive search
  29. 29. 29 M.I.T Column Join Example
  30. 30. 30 M.I.T Column Join Example
  31. 31. 31 M.I.T Column Join Example
  32. 32. 32 M.I.T Current Performance 100X popular row store in 40% of the space 10X popular column store in 70% of the space 7X popular row store in 1/6th of the space Code available with BSD license
  33. 33. 33 M.I.T Structure Going Forward Vertica Very well financed start-up to commercialize C-store Doing the heavy lifting University Research Funded by Vertica
  34. 34. 34 M.I.T Vertica Complete alpha system in December ‘05 Everything, including DBMS designer With current performance! Looking for early customers to work with (see me if you are interested)
  35. 35. 35 M.I.T University Research Extension of algorithms to non-snowflake schemas Study of L2 cache performance Study of coding strategies Study of executor options Study of recovery tactics Non-cursor interface Study of optimizer primitives
  36. 36. 36 M.I.T Paper list  C-Store: A Column Oriented DBMS – Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. VLDB, pages 553-564, 2005.  Integrating Compression and Execution in Column-Oriented Database Systems – Daniel J. Abadi, Samuel R. Madden, and Miguel C. Ferreira. Proceedings of SIGMOD, June, 2006, Chicago, USA.  Performance Tradeoffs in Read-Optimized Databases – Stavros Harizopoulos, Velen Liang, Daniel Abadi, and Samuel Madden. Proceedings of VLDB, September, 2006, Seoul, Korea.  Column-Stores For Wide and Sparse Data – Daniel J. Abadi. Proceedings of CIDR, January, 2007, Asilomar, USA.  Materialization Strategies in a Column-Oriented DBMS – Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden. Proceedings of ICDE, April, 2007, Istanbul, Turkey.  Scalable Semantic Web Data Management Using Vertical Partitioning – Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. Proceedings of VLDB, September, 2007, Vienna, Austria.  Column-Stores vs. Row-Stores: How Different Are They Really? – Daniel J. Abadi, Samuel R. Madden, Nabil Hachem. In Proceedings of SIGMOD, 2008, Vancouver, Canada.

