Columnar Database and hadoop



江志伟( Alex Jiang )
2012-12-1
Agenda   •



1.   Column Advantage
2.   Storage and Process
3.   Hadoop Related
History


    2001 PAX

    Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch
    Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, …

    C-Store: A Column Oriented DBMS

    D. J. Abadi, etc: Integrating Compression and Execution in Column-O
    riented Database Systems. In SIGMOD, pages 671–682, 2006.

    D. J. Abadi, etc: Materialization Strategies in a Column-Oriented DB
    MS. In ICDE, pages 466–475, 2007.
File Format


PAX
Columnar storage
(Columnar) compression
PPD vs Index or MV
SerDe
PAX




(Picture From oracle blog)
Columnar Store vs Row Store

●   IO-1 (basic column store): Every storage block contain
    s data from only ONE column.
●   IO-2: Aggressive compression.
●   IO-3: No record-ids.
●   CPU-4: A column executor
●   CPU-5: Executor runs on compressed data.
●   CPU-6: Executor can process columns that are key se
    quence or entry sequence.
Columnar Store advantage
●
    Compression
      RLE, Bitmap ..
●
    Ppd
      reduce IO
●
    Late Materialization
      less memeory and CPU overhead
●
    Block Iteration (Vectorization)
      less CPU overhead
●
    Invisible Join
          – block as join key
Compression
●   Run-length Encoding   ●   High Selectivity :
●   ENCODING DELTAVAL            Gender ,age
●   Bit Vector Encoding   ●   Mid Selectivity :
●   BLOCK_DICT                   City , Category
       data skew          ●   Low Selectivity :
       compound                  item_id , user_id
                                 Price,quantity,
                                 comment
Column File Format




(Picture From Vertica Blog)
PPD


Prediction Push Down
    Continuous IO
    Compound Prediction
    Max-Min in each minor Block
PAX has ppd but not efficience
PPD




(Picture from Vertica Blog)
late materialization

Construct Row
Apply Filter + Projection


Projections column only needed(also ppd)
Decoding Column First
Wait util process
Different Compression have difference behavior
Early Materialization




  (Picture from William McKnight)
Late Materialization




 (Picture from William McKnight)
Common Confusion IO

Choose more column ,more close to row store
IO <5%
   record-ID
   Row store free space at block tail
   variable length field
   IO Access Pattern means scalability
   Hardware Trend
   Compression rate
Common Confusion SerDe

Row or PAX SerDe
    cpu cache miss
    no columnar compression
    Block Iteration (construct tuple or row)


Java vs C/C++
   C/c++ direct memory mapping
   Java Fastutil
Index and MV
Reduce IO                 Scalability
Avoid Sort                Storange cost
    Index join            Complex desige
Lookup                    Hard maintain
Pre-computation :         High latency
     Join                 Slow down loading
     Group by             Lost Details
Query Rewrite
Data Modeling

Fat table vs 3NF
Hadoop Related


File Format
  Trenvi vs IBM CIF
  Schema Evolution
  Portable File Format
   Bigger Block Size
    IO Pattern
    SerDe network influence
Hadoop Related

Storage Cost
NameNode
    Less block

   Bigger block size

   Cold data even bigger

   No Intermediate Level

JobTracker
    Each Job have Less Map and reduce number

DataNode
Hadoop Related

Real Data ingestion
   Hbase + Flume
   Balanced Data
   Write avro file format first, then sort merge

SerDe memory reduce
    Tuple Structure not row
Batch Update+Delete+Insert
Hadoop Related

MR Performance Boost
  Block Shuffle (3 times faster)

  Skew data have less overhead

  Less map number and bigger spill

  Reduce side combine

  Light Compression Codec(snappy not LZO)

  Combiner or in-memroy combiner deprecated
Hadoop Related

Easier Performance Tuning
  mapred.min.split.size(deprecated)

  mapred.child.java.opts

  mapred.compress.map.output(deprecated)

  io.sort.mb

  io.sort.spill.percent(deprecated)

  Io.sort.factor

  mapred.reduce.parallel.copies(deprecated)

  Map and reduce number easier estimate

  Reduce algorithm will change
Hadoop Related

Easy Management
   Less Partition or Dynamic Partition

   Integrity constraints and Referential integrity

   Statistic make simple query engine

   Cold Data automatic merge

   Trojan Layout vs Columnar Projections

Less Design complexity
   Map join vs Fat Table

   Group by + Index
Reference
●
    http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/

●
    http://cs-www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf

●
    http://www.infoq.com/news/2011/09/nosqlnow-columnar-databases/

●
    DREMEL Melnik, Gubarev, Long, Romer, Shivakumar, & Tolton, VLDB 2010

●
    Trenvi http://avro.apache.org/docs/current/trevni/spec.html

●
    http://www.vertica.com/2011/09/01/the-power-of-projections-part-1/
Thank you!
                                 Q&A

Alex Jiang

gemini5201314 at gmail dot com

http://www.gemini5201314.net

Column and hadoop

  • 1.
    Columnar Database andhadoop 江志伟( Alex Jiang ) 2012-12-1
  • 2.
    Agenda • 1. Column Advantage 2. Storage and Process 3. Hadoop Related
  • 3.
    History  2001 PAX  Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, …  C-Store: A Column Oriented DBMS  D. J. Abadi, etc: Integrating Compression and Execution in Column-O riented Database Systems. In SIGMOD, pages 671–682, 2006.  D. J. Abadi, etc: Materialization Strategies in a Column-Oriented DB MS. In ICDE, pages 466–475, 2007.
  • 4.
    File Format PAX Columnar storage (Columnar)compression PPD vs Index or MV SerDe
  • 5.
  • 6.
    Columnar Store vsRow Store ● IO-1 (basic column store): Every storage block contain s data from only ONE column. ● IO-2: Aggressive compression. ● IO-3: No record-ids. ● CPU-4: A column executor ● CPU-5: Executor runs on compressed data. ● CPU-6: Executor can process columns that are key se quence or entry sequence.
  • 7.
    Columnar Store advantage ● Compression RLE, Bitmap .. ● Ppd reduce IO ● Late Materialization less memeory and CPU overhead ● Block Iteration (Vectorization) less CPU overhead ● Invisible Join – block as join key
  • 8.
    Compression ● Run-length Encoding ● High Selectivity : ● ENCODING DELTAVAL Gender ,age ● Bit Vector Encoding ● Mid Selectivity : ● BLOCK_DICT City , Category data skew ● Low Selectivity : compound item_id , user_id Price,quantity, comment
  • 9.
    Column File Format (PictureFrom Vertica Blog)
  • 10.
    PPD Prediction Push Down Continuous IO Compound Prediction Max-Min in each minor Block PAX has ppd but not efficience
  • 11.
  • 12.
    late materialization Construct Row ApplyFilter + Projection Projections column only needed(also ppd) Decoding Column First Wait util process Different Compression have difference behavior
  • 13.
    Early Materialization (Picture from William McKnight)
  • 14.
    Late Materialization (Picturefrom William McKnight)
  • 15.
    Common Confusion IO Choosemore column ,more close to row store IO <5% record-ID Row store free space at block tail variable length field IO Access Pattern means scalability Hardware Trend Compression rate
  • 16.
    Common Confusion SerDe Rowor PAX SerDe cpu cache miss no columnar compression Block Iteration (construct tuple or row) Java vs C/C++ C/c++ direct memory mapping Java Fastutil
  • 17.
    Index and MV ReduceIO Scalability Avoid Sort Storange cost Index join Complex desige Lookup Hard maintain Pre-computation : High latency Join Slow down loading Group by Lost Details Query Rewrite
  • 18.
  • 19.
    Hadoop Related File Format Trenvi vs IBM CIF Schema Evolution Portable File Format Bigger Block Size IO Pattern SerDe network influence
  • 20.
    Hadoop Related Storage Cost NameNode Less block Bigger block size Cold data even bigger No Intermediate Level JobTracker Each Job have Less Map and reduce number DataNode
  • 21.
    Hadoop Related Real Dataingestion Hbase + Flume Balanced Data Write avro file format first, then sort merge SerDe memory reduce Tuple Structure not row Batch Update+Delete+Insert
  • 22.
    Hadoop Related MR PerformanceBoost Block Shuffle (3 times faster) Skew data have less overhead Less map number and bigger spill Reduce side combine Light Compression Codec(snappy not LZO) Combiner or in-memroy combiner deprecated
  • 23.
    Hadoop Related Easier PerformanceTuning mapred.min.split.size(deprecated) mapred.child.java.opts mapred.compress.map.output(deprecated) io.sort.mb io.sort.spill.percent(deprecated) Io.sort.factor mapred.reduce.parallel.copies(deprecated) Map and reduce number easier estimate Reduce algorithm will change
  • 24.
    Hadoop Related Easy Management Less Partition or Dynamic Partition Integrity constraints and Referential integrity Statistic make simple query engine Cold Data automatic merge Trojan Layout vs Columnar Projections Less Design complexity Map join vs Fat Table Group by + Index
  • 26.
    Reference ● http://www.dbms2.com/2011/02/06/columnar-compression-database-storage/ ● http://cs-www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf ● http://www.infoq.com/news/2011/09/nosqlnow-columnar-databases/ ● DREMEL Melnik, Gubarev, Long, Romer, Shivakumar, & Tolton, VLDB 2010 ● Trenvi http://avro.apache.org/docs/current/trevni/spec.html ● http://www.vertica.com/2011/09/01/the-power-of-projections-part-1/
  • 27.
    Thank you! Q&A Alex Jiang gemini5201314 at gmail dot com http://www.gemini5201314.net