SlideShare a Scribd company logo
1 of 25
DeMystifying
Columnar Databases

             June Tong
        jtong@calpont.com
      straycat90@gmail.com
               April 2012


                                             ®



      Calpont Proprietary and Confidential
Agenda

       • What is a columnar database?

       • Why is it better than a row-oriented database?

       • When isn’t it better?

       • What do I need to know to use it?

       • How will I need to change my application code?




InfiniDB® Scalable. Fast. Simple.   2           Copyright © 2011 Calpont. All Rights Reserved.
Who is Calpont?

  • Calpont Corporation
     oPrivately held
     oHeadquartered in Frisco, TX

                                      Our Mission
                                       To provide a
                                       scalable data
                                       platform that
                                     enables analytic
                                    business decisions
                                        as timely as
                                      customers and
                                     markets dictate.




InfiniDB® Scalable. Fast. Simple.      3                 Copyright © 2011 Calpont. All Rights Reserved.
InfiniDB

  InfiniDB is a columnar MPP MySQL database engine,
  expressly designed for analytic applications
      oInfiniDB Community (single-server)
      oInfiniDB Enterprise
                  Version 2.2 – shared disk
                  Version 3.0 – added shared nothing option
                                              ®




InfiniDB® Scalable. Fast. Simple.    4            Copyright © 2011 Calpont. All Rights Reserved.
Traditional Row-Oriented Storage

    Rows stored sequentially
      Key    Fname        Lname     State    Zip    Phone            Age   Sex
       1     Bugs         Bunny      NY     11217   (718) 938-3235   34    M
       2     Yosemite     Sam        CA     95389   (209) 375-6572   52    M
       3     Daffy        Duck       NY     10013   (212) 227-1810   35    M
       4     Elmer        Fudd       ME     04578   (207) 882-7323   43    M
       5     Witch        Hazel      MA     01970   (978) 744-0991   57    F




    Provides best performance when most queries
    are for multiple columns of a single row
    (OLTP applications)




InfiniDB® Scalable. Fast. Simple.                           5                    Copyright © 2011 Calpont. All Rights Reserved.
Key Lookup in a Row-Oriented Database
     Indexes
                                            Indexes on high-cardinality columns
    Key
     1
            RowID
            0001B008D23A671A
                                            make accessing a single row very fast
     2      0001B008D23A671B
     3      0001B008D23A671C                Key   Fname          Lname   State    Zip    Phone                   Age     Sex
     4      0001B008D23A671D                 1    Bugs           Bunny    NY     11217   (718) 938-3235          34      M
     5      0001B008D23A671E                 2    Yosemite       Sam      CA     95389   (209) 375-6572          52      M
                                             3    Daffy          Duck     NY     10013   (212) 227-1810          35      M
           WHERE key=4                       4    Elmer          Fudd     ME     04578   (207) 882-7323          43      M
                                             5    Witch          Hazel    MA     01970   (978) 744-0991          57      F
          Elmer Fudd calls
          customer service                        but don’t help on analytical queries
     Phone              RowID                     scanning many rows
     (207) 882-7323      0001B008D23A671D
     (209) 375-6572      0001B008D23A671B         e.g.
     (212) 227-1810      0001B008D23A671C
     (718) 938-3235
     (978) 744-0991
                         0001B008D23A671A
                         0001B008D23A671E
                                                       What’s the average age of males?
       WHERE phone=‘(207) 882-7323’



InfiniDB® Scalable. Fast. Simple.                            6                            Copyright © 2011 Calpont. All Rights Reserved.
Sequential Scans are Killers

    What if you had 100 million rows, with 100 columns?
                          Sex                                                       Age



                                        If the table is 100GB,
                                      you have to read 100GB.


                                                      Or build composite
                                                   indexes on EVERYTHING.



                                                                                                    7
InfiniDB®   Scalable. Fast. Simple.            7                     Copyright © 2011 Calpont. All Rights Reserved.
Column-Oriented Storage

     Each column is stored in a separate file
      Key        Fname              Lname   State    Zip    Phone                Age           Sex
       1         Bugs               Bunny    NY     11217   (718) 938-3235       34            M
       2         Yosemite           Sam      CA     95389   (209) 375-6572       52            M
       3         Daffy              Duck     NY     10013   (212) 227-1810       35            M
       4         Elmer              Fudd     ME     04578   (207) 882-7323       43            M
       5         Witch              Hazel    MA     01970   (978) 744-0991       57            F




     Each column for a given row is at the same offset
     (auto-indexing)




InfiniDB® Scalable. Fast. Simple.                    8                       Copyright © 2011 Calpont. All Rights Reserved.
Read Columns, Not Rows

    Only read the files you need
      Key        Fname              Lname   State    Zip    Phone                Age           Sex
       1         Bugs               Bunny    NY     11217   (718) 938-3235       34            M
       2         Yosemite           Sam      CA     95389   (209) 375-6572       52            M
       3         Daffy              Duck     NY     10013   (212) 227-1810       35            M
       4         Elmer              Fudd     ME     04578   (207) 882-7323       43            M
       5         Witch              Hazel    MA     01970   (978) 744-0991       57            F




    Also get improved compression because all data in
    one file is the same data type.




InfiniDB® Scalable. Fast. Simple.                    9                       Copyright © 2011 Calpont. All Rights Reserved.
I/O Reduction

    So you still have 100 million rows, with 100 columns...


                          Males
                                                                      Age

                                    But you only read
                                       2 columns,
                                     instead of 100




InfiniDB® Scalable. Fast. Simple.        10             Copyright © 2011 Calpont. All Rights Reserved.
Vertical Partitioning

         Columnar databases produce automatic
         vertical partitioning
           1       Bugs             Bunny   Brooklyn      NY        11217   (718) 938-3235
           2       Yosemite         Sam     Wawona        CA        95389   (209) 375-6572
           3       Daffy            Duck    New York      NY        10013   (212) 227-1810
           4       Elmer            Fudd    Wiscasset     ME        04578   (207) 882-7323
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
          8m       Snoopy           Brown   Springfield   MA        01105   (413) 781-6500




InfiniDB® Scalable. Fast. Simple.                              11                            Copyright © 2011 Calpont. All Rights Reserved.
Horizontal Partitioning
         InfiniDB also automatically creates horizontal
         partitions of 8 million rows (default)
           1       Bugs             Bunny   Brooklyn      NY        11217   (718) 938-3235
           2
           3
                   Yosemite
                   Daffy
                                    Sam
                                    Duck
                                            Wawona
                                            New York
                                                          CA
                                                          NY
                                                                    95389
                                                                    10013
                                                                            (209) 375-6572
                                                                            (212) 227-1810
                                                                                                  Knowing
           4
            :
                   Elmer
                     :
                                    Fudd
                                     :
                                            Wiscasset
                                             :
                                                          ME
                                                           :
                                                                    04578
                                                                        :
                                                                            (207) 882-7323
                                                                               :
                                                                                                  what values
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :                  are in each
            :        :               :       :             :            :      :
            :
            :
                     :
                     :
                                     :
                                     :
                                             :
                                             :
                                                           :
                                                           :
                                                                        :
                                                                        :
                                                                               :
                                                                               :
                                                                                                  partition
            :
            :
                     :
                     :
                                     :
                                     :
                                             :
                                             :
                                                           :
                                                           :
                                                                        :
                                                                        :
                                                                               :
                                                                               :
                                                                                                  allows for
          8m
            :        :
                   Snoopy
                                     :
                                    Brown
                                             :
                                            Springfield
                                                           :
                                                          MA
                                                                        :
                                                                    01105
                                                                               :
                                                                            (413) 781-6500
                                                                                                  partition
            :        :               :       :            :             :     :
                                                                                                  elimination
            :
            :
                     :
                     :
                                     :
                                     :
                                             :
                                             :
                                                          :
                                                          :
                                                                        :
                                                                        :
                                                                              :
                                                                              :
                                                                                                  at query
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :                   time
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :



InfiniDB® Scalable. Fast. Simple.                              12                            Copyright © 2011 Calpont. All Rights Reserved.
Bonus: Easy to Add a New Column

    Row-oriented: Usually requires rebuilding table
    Key     Fname        Lname      State    Zip    Phone            Age   Sex Golf
     1      Bugs         Bunny       NY     11217   (718) 938-3235   34    M    Y        Addition of
     2      Yosemite     Sam         CA     95389   (209) 375-6572   52    M    N
     3      Daffy        Duck        NY     10013   (212) 227-1810   35    M    Y        column shifts
     4      Elmer        Fudd        ME     04578   (207) 882-7323   43    M    Y
     5      Witch        Hazel       MA     01970   (978) 744-0991   57    F    N        every row


    Column-oriented: Just create another file
      Key        Fname              Lname       State       Zip        Phone                Age           Sex          Golf
       1         Bugs               Bunny        NY        11217       (718) 938-3235       34            M             Y
       2         Yosemite           Sam          CA        95389       (209) 375-6572       52            M             N
       3         Daffy              Duck         NY        10013       (212) 227-1810       35            M             Y
       4         Elmer              Fudd         ME        04578       (207) 882-7323       43            M             Y
       5         Witch              Hazel        MA        01970       (978) 744-0991       57            F             N




InfiniDB® Scalable. Fast. Simple.                           13                          Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations

      Because of the nature of columnar storage, single-
      row operations can underperform.

       Do not attempt OLTP-style transactions
       on a columnar database.


      More details on individual DML statements follow...



InfiniDB® Scalable. Fast. Simple.   14      Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations: Insert

    Row-oriented: new rows appended to the end
    Key     Fname         Lname     State      Zip     Phone            Age     Sex
     1      Bugs          Bunny      NY       11217    (718) 938-3235   34      M
     2      Yosemite      Sam        CA       95389    (209) 375-6572   52      M
     3      Daffy         Duck       NY       10013    (212) 227-1810   35      M
     4      Elmer         Fudd       ME       04578    (207) 882-7323   43      M
     5      Witch         Hazel      MA       01970    (978) 744-0991   57      F
       6    Marvin        Martian    CA       91602    (818) 761-9964   26      M


     Columnar: new value must be added to each file
      Key        Fname              Lname         State        Zip           Phone                Age           Sex
       1         Bugs               Bunny          NY         11217          (718) 938-3235       34            M
       2         Yosemite           Sam            CA         95389          (209) 375-6572       52            M
       3         Daffy              Duck           NY         10013          (212) 227-1810       35            M
       4         Elmer              Fudd           ME         04578          (207) 882-7323       43            M
       5         Witch              Hazel          MA         01970          (978) 744-0991       57            F
        6        Marvin             Martian           CA      91602          (818) 761-9964       26            M




InfiniDB® Scalable. Fast. Simple.                              15                             Copyright © 2011 Calpont. All Rights Reserved.
Insert: Solution

      Do batch inserts and use cpimport, the bulk
      loader, instead.


      CPIMPORT is your friend.




InfiniDB® Scalable. Fast. Simple.   16     Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations: Delete

    Row-oriented: row is deleted
    Key     Fname        Lname       State    Zip    Phone            Age   Sex
     1      Bugs         Bunny        NY     11217   (718) 938-3235   34    M
     2      Yosemite     Sam          CA     95389   (209) 375-6572   52    M
     3      Daffy        Duck         NY     10013   (212) 227-1810   35    M
     4      Elmer        Fudd         ME     04578   (207) 882-7323   43    M
     5      Witch        Hazel        MA     01970   (978) 744-0991   57    F



    Columnar: each column must be deleted from
    its file
    Key         Fname               Lname       State       Zip        Phone               Age           Sex
     1          Bugs                Bunny        NY        11217       (718) 938-3235      34            M
     2          Yosemite            Sam          CA        95389       (209) 375-6572      52            M
     3          Daffy               Duck         NY        10013       (212) 227-1810      35            M
     4          Elmer               Fudd         ME        04578       (207) 882-7323      43            M
     5          Witch               Hazel        MA        01970       (978) 744-0991      57            F




InfiniDB® Scalable. Fast. Simple.                            17                         Copyright © 2011 Calpont. All Rights Reserved.
Delete: Solutions

     Do batch deletes.

     Any extents that contain only data that is to be
     deleted can be dropped.

     Otherwise, consider copying desired rows to a new
     table using the bulk loader and dropping the old
     table.




InfiniDB® Scalable. Fast. Simple.   18       Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations: Update

    Row-oriented: value replaced
    Key     Fname        Lname       State    Zip    Phone            Age   Sex
     1      Bugs         Bunny        NY     11217   (718) 852-2352   34    M
     2      Yosemite     Sam          CA     95389   (209) 375-6572   52    M
     3      Daffy        Duck         NY     10013   (212) 227-1810   35    M
     4      Elmer        Fudd         ME     04578   (207) 882-7323   43    M
     5      Witch        Hazel        MA     01970   (978) 744-0991   57    F




    Column-oriented: value replaced
    Key         Fname               Lname       State       Zip        Phone               Age           Sex
     1          Bugs                Bunny        NY        11217       (718) 852-2352      34            M
     2          Yosemite            Sam          CA        95389       (209) 375-6572      52            M
     3          Daffy               Duck         NY        10013       (212) 227-1810      35            M
     4          Elmer               Fudd         ME        04578       (207) 882-7323      43            M
     5          Witch               Hazel        MA        01970       (978) 744-0991      57            F



     Yeah, this one just works.


InfiniDB® Scalable. Fast. Simple.                            19                         Copyright © 2011 Calpont. All Rights Reserved.
Architecture – Shared Disk

    (2.2)




                                         or …




                                                       Single Server




InfiniDB® Scalable. Fast. Simple.   20          Copyright © 2011 Calpont. All Rights Reserved.
Architecture – Shared Nothing

    (3.0 option)




InfiniDB® Scalable. Fast. Simple.   21   Copyright © 2011 Calpont. All Rights Reserved.
What Do I Need to Change?

    • Uses MySQL front-end
           o Standard SQL for DDL and DML
           o Most MySQL commands will still work


    Exceptions:
    No cartesian products
    No triggers

    (not a comprehensive list)



InfiniDB® Scalable. Fast. Simple.   22             Copyright © 2011 Calpont. All Rights Reserved.
InfiniDB Ease of Use

     • Automatic Everything:
            o    Vertical partitioning – eliminate unneeded columns
            o    Horizontal partitioning – eliminate unneeded extents
            o    Improved compression
            o    No indexes – columns are de facto indexes
     • You already know how to use it:
            o Standard SQL
            o Familiar MySQL front-end



InfiniDB® Scalable. Fast. Simple.       23             Copyright © 2011 Calpont. All Rights Reserved.
Info

     Links:
     www.calpont.com
     www.calpont.com/products/tryinfinidb – 30-day trial of Enterprise Edition
     www.infinidb.org – Community Edition




InfiniDB® Scalable. Fast. Simple.        24                 Copyright © 2011 Calpont. All Rights Reserved.
The end




InfiniDB® Scalable. Fast. Simple.      25     Copyright © 2011 Calpont. All Rights Reserved.

More Related Content

Viewers also liked

MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMats Kindahl
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Remote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-SatRemote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-SatPlus Technologies
 
Spring's Creations
Spring's CreationsSpring's Creations
Spring's CreationsMakala (D)
 
Olivia lammers a day in the life
Olivia lammers a day in the lifeOlivia lammers a day in the life
Olivia lammers a day in the lifeolammersp1
 
Server Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery ManagerServer Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery ManagerPlus Technologies
 
James shorty candies_1937_2011
James shorty candies_1937_2011James shorty candies_1937_2011
James shorty candies_1937_2011Deena Chadwick
 
Session 41 Mathias Magnusson
Session 41 Mathias MagnussonSession 41 Mathias Magnusson
Session 41 Mathias Magnussonmathmagn
 
5 Worst States for Identity Theft
5 Worst States for Identity Theft5 Worst States for Identity Theft
5 Worst States for Identity TheftIDT911
 
Inspiratioanl Quotes
Inspiratioanl QuotesInspiratioanl Quotes
Inspiratioanl QuotesJims Rohini
 

Viewers also liked (12)

MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Remote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-SatRemote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-Sat
 
Spring's Creations
Spring's CreationsSpring's Creations
Spring's Creations
 
Olivia lammers a day in the life
Olivia lammers a day in the lifeOlivia lammers a day in the life
Olivia lammers a day in the life
 
Server Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery ManagerServer Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery Manager
 
James shorty candies_1937_2011
James shorty candies_1937_2011James shorty candies_1937_2011
James shorty candies_1937_2011
 
Prayer semminar
Prayer  semminarPrayer  semminar
Prayer semminar
 
Session 41 Mathias Magnusson
Session 41 Mathias MagnussonSession 41 Mathias Magnusson
Session 41 Mathias Magnusson
 
5 Worst States for Identity Theft
5 Worst States for Identity Theft5 Worst States for Identity Theft
5 Worst States for Identity Theft
 
Inspiratioanl Quotes
Inspiratioanl QuotesInspiratioanl Quotes
Inspiratioanl Quotes
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Demystifying Columnar Databases

  • 1. DeMystifying Columnar Databases June Tong jtong@calpont.com straycat90@gmail.com April 2012 ® Calpont Proprietary and Confidential
  • 2. Agenda • What is a columnar database? • Why is it better than a row-oriented database? • When isn’t it better? • What do I need to know to use it? • How will I need to change my application code? InfiniDB® Scalable. Fast. Simple. 2 Copyright © 2011 Calpont. All Rights Reserved.
  • 3. Who is Calpont? • Calpont Corporation oPrivately held oHeadquartered in Frisco, TX Our Mission To provide a scalable data platform that enables analytic business decisions as timely as customers and markets dictate. InfiniDB® Scalable. Fast. Simple. 3 Copyright © 2011 Calpont. All Rights Reserved.
  • 4. InfiniDB InfiniDB is a columnar MPP MySQL database engine, expressly designed for analytic applications oInfiniDB Community (single-server) oInfiniDB Enterprise  Version 2.2 – shared disk  Version 3.0 – added shared nothing option ® InfiniDB® Scalable. Fast. Simple. 4 Copyright © 2011 Calpont. All Rights Reserved.
  • 5. Traditional Row-Oriented Storage Rows stored sequentially Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Provides best performance when most queries are for multiple columns of a single row (OLTP applications) InfiniDB® Scalable. Fast. Simple. 5 Copyright © 2011 Calpont. All Rights Reserved.
  • 6. Key Lookup in a Row-Oriented Database Indexes Indexes on high-cardinality columns Key 1 RowID 0001B008D23A671A make accessing a single row very fast 2 0001B008D23A671B 3 0001B008D23A671C Key Fname Lname State Zip Phone Age Sex 4 0001B008D23A671D 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 5 0001B008D23A671E 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M WHERE key=4 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Elmer Fudd calls customer service but don’t help on analytical queries Phone RowID scanning many rows (207) 882-7323 0001B008D23A671D (209) 375-6572 0001B008D23A671B e.g. (212) 227-1810 0001B008D23A671C (718) 938-3235 (978) 744-0991 0001B008D23A671A 0001B008D23A671E What’s the average age of males? WHERE phone=‘(207) 882-7323’ InfiniDB® Scalable. Fast. Simple. 6 Copyright © 2011 Calpont. All Rights Reserved.
  • 7. Sequential Scans are Killers What if you had 100 million rows, with 100 columns? Sex Age If the table is 100GB, you have to read 100GB. Or build composite indexes on EVERYTHING. 7 InfiniDB® Scalable. Fast. Simple. 7 Copyright © 2011 Calpont. All Rights Reserved.
  • 8. Column-Oriented Storage Each column is stored in a separate file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Each column for a given row is at the same offset (auto-indexing) InfiniDB® Scalable. Fast. Simple. 8 Copyright © 2011 Calpont. All Rights Reserved.
  • 9. Read Columns, Not Rows Only read the files you need Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Also get improved compression because all data in one file is the same data type. InfiniDB® Scalable. Fast. Simple. 9 Copyright © 2011 Calpont. All Rights Reserved.
  • 10. I/O Reduction So you still have 100 million rows, with 100 columns... Males Age But you only read 2 columns, instead of 100 InfiniDB® Scalable. Fast. Simple. 10 Copyright © 2011 Calpont. All Rights Reserved.
  • 11. Vertical Partitioning Columnar databases produce automatic vertical partitioning 1 Bugs Bunny Brooklyn NY 11217 (718) 938-3235 2 Yosemite Sam Wawona CA 95389 (209) 375-6572 3 Daffy Duck New York NY 10013 (212) 227-1810 4 Elmer Fudd Wiscasset ME 04578 (207) 882-7323 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8m Snoopy Brown Springfield MA 01105 (413) 781-6500 InfiniDB® Scalable. Fast. Simple. 11 Copyright © 2011 Calpont. All Rights Reserved.
  • 12. Horizontal Partitioning InfiniDB also automatically creates horizontal partitions of 8 million rows (default) 1 Bugs Bunny Brooklyn NY 11217 (718) 938-3235 2 3 Yosemite Daffy Sam Duck Wawona New York CA NY 95389 10013 (209) 375-6572 (212) 227-1810 Knowing 4 : Elmer : Fudd : Wiscasset : ME : 04578 : (207) 882-7323 : what values : : : : : : : : : : : : : : are in each : : : : : : : : : : : : : : : : : : : : : partition : : : : : : : : : : : : : : allows for 8m : : Snoopy : Brown : Springfield : MA : 01105 : (413) 781-6500 partition : : : : : : : elimination : : : : : : : : : : : : : : at query : : : : : : : : : : : : : : time : : : : : : : : : : : : : : : : : : : : : : : : : : : : InfiniDB® Scalable. Fast. Simple. 12 Copyright © 2011 Calpont. All Rights Reserved.
  • 13. Bonus: Easy to Add a New Column Row-oriented: Usually requires rebuilding table Key Fname Lname State Zip Phone Age Sex Golf 1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y Addition of 2 Yosemite Sam CA 95389 (209) 375-6572 52 M N 3 Daffy Duck NY 10013 (212) 227-1810 35 M Y column shifts 4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y 5 Witch Hazel MA 01970 (978) 744-0991 57 F N every row Column-oriented: Just create another file Key Fname Lname State Zip Phone Age Sex Golf 1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y 2 Yosemite Sam CA 95389 (209) 375-6572 52 M N 3 Daffy Duck NY 10013 (212) 227-1810 35 M Y 4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y 5 Witch Hazel MA 01970 (978) 744-0991 57 F N InfiniDB® Scalable. Fast. Simple. 13 Copyright © 2011 Calpont. All Rights Reserved.
  • 14. Single-Row Operations Because of the nature of columnar storage, single- row operations can underperform. Do not attempt OLTP-style transactions on a columnar database. More details on individual DML statements follow... InfiniDB® Scalable. Fast. Simple. 14 Copyright © 2011 Calpont. All Rights Reserved.
  • 15. Single-Row Operations: Insert Row-oriented: new rows appended to the end Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Columnar: new value must be added to each file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M InfiniDB® Scalable. Fast. Simple. 15 Copyright © 2011 Calpont. All Rights Reserved.
  • 16. Insert: Solution Do batch inserts and use cpimport, the bulk loader, instead. CPIMPORT is your friend. InfiniDB® Scalable. Fast. Simple. 16 Copyright © 2011 Calpont. All Rights Reserved.
  • 17. Single-Row Operations: Delete Row-oriented: row is deleted Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Columnar: each column must be deleted from its file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F InfiniDB® Scalable. Fast. Simple. 17 Copyright © 2011 Calpont. All Rights Reserved.
  • 18. Delete: Solutions Do batch deletes. Any extents that contain only data that is to be deleted can be dropped. Otherwise, consider copying desired rows to a new table using the bulk loader and dropping the old table. InfiniDB® Scalable. Fast. Simple. 18 Copyright © 2011 Calpont. All Rights Reserved.
  • 19. Single-Row Operations: Update Row-oriented: value replaced Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 852-2352 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Column-oriented: value replaced Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 852-2352 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Yeah, this one just works. InfiniDB® Scalable. Fast. Simple. 19 Copyright © 2011 Calpont. All Rights Reserved.
  • 20. Architecture – Shared Disk (2.2) or … Single Server InfiniDB® Scalable. Fast. Simple. 20 Copyright © 2011 Calpont. All Rights Reserved.
  • 21. Architecture – Shared Nothing (3.0 option) InfiniDB® Scalable. Fast. Simple. 21 Copyright © 2011 Calpont. All Rights Reserved.
  • 22. What Do I Need to Change? • Uses MySQL front-end o Standard SQL for DDL and DML o Most MySQL commands will still work Exceptions: No cartesian products No triggers (not a comprehensive list) InfiniDB® Scalable. Fast. Simple. 22 Copyright © 2011 Calpont. All Rights Reserved.
  • 23. InfiniDB Ease of Use • Automatic Everything: o Vertical partitioning – eliminate unneeded columns o Horizontal partitioning – eliminate unneeded extents o Improved compression o No indexes – columns are de facto indexes • You already know how to use it: o Standard SQL o Familiar MySQL front-end InfiniDB® Scalable. Fast. Simple. 23 Copyright © 2011 Calpont. All Rights Reserved.
  • 24. Info Links: www.calpont.com www.calpont.com/products/tryinfinidb – 30-day trial of Enterprise Edition www.infinidb.org – Community Edition InfiniDB® Scalable. Fast. Simple. 24 Copyright © 2011 Calpont. All Rights Reserved.
  • 25. The end InfiniDB® Scalable. Fast. Simple. 25 Copyright © 2011 Calpont. All Rights Reserved.