SlideShare a Scribd company logo
1 of 25
DeMystifying
Columnar Databases

             June Tong
        jtong@calpont.com
      straycat90@gmail.com
               April 2012


                                             ®



      Calpont Proprietary and Confidential
Agenda

       • What is a columnar database?

       • Why is it better than a row-oriented database?

       • When isn’t it better?

       • What do I need to know to use it?

       • How will I need to change my application code?




InfiniDB® Scalable. Fast. Simple.   2           Copyright © 2011 Calpont. All Rights Reserved.
Who is Calpont?

  • Calpont Corporation
     oPrivately held
     oHeadquartered in Frisco, TX

                                      Our Mission
                                       To provide a
                                       scalable data
                                       platform that
                                     enables analytic
                                    business decisions
                                        as timely as
                                      customers and
                                     markets dictate.




InfiniDB® Scalable. Fast. Simple.      3                 Copyright © 2011 Calpont. All Rights Reserved.
InfiniDB

  InfiniDB is a columnar MPP MySQL database engine,
  expressly designed for analytic applications
      oInfiniDB Community (single-server)
      oInfiniDB Enterprise
                  Version 2.2 – shared disk
                  Version 3.0 – added shared nothing option
                                              ®




InfiniDB® Scalable. Fast. Simple.    4            Copyright © 2011 Calpont. All Rights Reserved.
Traditional Row-Oriented Storage

    Rows stored sequentially
      Key    Fname        Lname     State    Zip    Phone            Age   Sex
       1     Bugs         Bunny      NY     11217   (718) 938-3235   34    M
       2     Yosemite     Sam        CA     95389   (209) 375-6572   52    M
       3     Daffy        Duck       NY     10013   (212) 227-1810   35    M
       4     Elmer        Fudd       ME     04578   (207) 882-7323   43    M
       5     Witch        Hazel      MA     01970   (978) 744-0991   57    F




    Provides best performance when most queries
    are for multiple columns of a single row
    (OLTP applications)




InfiniDB® Scalable. Fast. Simple.                           5                    Copyright © 2011 Calpont. All Rights Reserved.
Key Lookup in a Row-Oriented Database
     Indexes
                                            Indexes on high-cardinality columns
    Key
     1
            RowID
            0001B008D23A671A
                                            make accessing a single row very fast
     2      0001B008D23A671B
     3      0001B008D23A671C                Key   Fname          Lname   State    Zip    Phone                   Age     Sex
     4      0001B008D23A671D                 1    Bugs           Bunny    NY     11217   (718) 938-3235          34      M
     5      0001B008D23A671E                 2    Yosemite       Sam      CA     95389   (209) 375-6572          52      M
                                             3    Daffy          Duck     NY     10013   (212) 227-1810          35      M
           WHERE key=4                       4    Elmer          Fudd     ME     04578   (207) 882-7323          43      M
                                             5    Witch          Hazel    MA     01970   (978) 744-0991          57      F
          Elmer Fudd calls
          customer service                        but don’t help on analytical queries
     Phone              RowID                     scanning many rows
     (207) 882-7323      0001B008D23A671D
     (209) 375-6572      0001B008D23A671B         e.g.
     (212) 227-1810      0001B008D23A671C
     (718) 938-3235
     (978) 744-0991
                         0001B008D23A671A
                         0001B008D23A671E
                                                       What’s the average age of males?
       WHERE phone=‘(207) 882-7323’



InfiniDB® Scalable. Fast. Simple.                            6                            Copyright © 2011 Calpont. All Rights Reserved.
Sequential Scans are Killers

    What if you had 100 million rows, with 100 columns?
                          Sex                                                       Age



                                        If the table is 100GB,
                                      you have to read 100GB.


                                                      Or build composite
                                                   indexes on EVERYTHING.



                                                                                                    7
InfiniDB®   Scalable. Fast. Simple.            7                     Copyright © 2011 Calpont. All Rights Reserved.
Column-Oriented Storage

     Each column is stored in a separate file
      Key        Fname              Lname   State    Zip    Phone                Age           Sex
       1         Bugs               Bunny    NY     11217   (718) 938-3235       34            M
       2         Yosemite           Sam      CA     95389   (209) 375-6572       52            M
       3         Daffy              Duck     NY     10013   (212) 227-1810       35            M
       4         Elmer              Fudd     ME     04578   (207) 882-7323       43            M
       5         Witch              Hazel    MA     01970   (978) 744-0991       57            F




     Each column for a given row is at the same offset
     (auto-indexing)




InfiniDB® Scalable. Fast. Simple.                    8                       Copyright © 2011 Calpont. All Rights Reserved.
Read Columns, Not Rows

    Only read the files you need
      Key        Fname              Lname   State    Zip    Phone                Age           Sex
       1         Bugs               Bunny    NY     11217   (718) 938-3235       34            M
       2         Yosemite           Sam      CA     95389   (209) 375-6572       52            M
       3         Daffy              Duck     NY     10013   (212) 227-1810       35            M
       4         Elmer              Fudd     ME     04578   (207) 882-7323       43            M
       5         Witch              Hazel    MA     01970   (978) 744-0991       57            F




    Also get improved compression because all data in
    one file is the same data type.




InfiniDB® Scalable. Fast. Simple.                    9                       Copyright © 2011 Calpont. All Rights Reserved.
I/O Reduction

    So you still have 100 million rows, with 100 columns...


                          Males
                                                                      Age

                                    But you only read
                                       2 columns,
                                     instead of 100




InfiniDB® Scalable. Fast. Simple.        10             Copyright © 2011 Calpont. All Rights Reserved.
Vertical Partitioning

         Columnar databases produce automatic
         vertical partitioning
           1       Bugs             Bunny   Brooklyn      NY        11217   (718) 938-3235
           2       Yosemite         Sam     Wawona        CA        95389   (209) 375-6572
           3       Daffy            Duck    New York      NY        10013   (212) 227-1810
           4       Elmer            Fudd    Wiscasset     ME        04578   (207) 882-7323
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :
          8m       Snoopy           Brown   Springfield   MA        01105   (413) 781-6500




InfiniDB® Scalable. Fast. Simple.                              11                            Copyright © 2011 Calpont. All Rights Reserved.
Horizontal Partitioning
         InfiniDB also automatically creates horizontal
         partitions of 8 million rows (default)
           1       Bugs             Bunny   Brooklyn      NY        11217   (718) 938-3235
           2
           3
                   Yosemite
                   Daffy
                                    Sam
                                    Duck
                                            Wawona
                                            New York
                                                          CA
                                                          NY
                                                                    95389
                                                                    10013
                                                                            (209) 375-6572
                                                                            (212) 227-1810
                                                                                                  Knowing
           4
            :
                   Elmer
                     :
                                    Fudd
                                     :
                                            Wiscasset
                                             :
                                                          ME
                                                           :
                                                                    04578
                                                                        :
                                                                            (207) 882-7323
                                                                               :
                                                                                                  what values
            :        :               :       :             :            :      :
            :        :               :       :             :            :      :                  are in each
            :        :               :       :             :            :      :
            :
            :
                     :
                     :
                                     :
                                     :
                                             :
                                             :
                                                           :
                                                           :
                                                                        :
                                                                        :
                                                                               :
                                                                               :
                                                                                                  partition
            :
            :
                     :
                     :
                                     :
                                     :
                                             :
                                             :
                                                           :
                                                           :
                                                                        :
                                                                        :
                                                                               :
                                                                               :
                                                                                                  allows for
          8m
            :        :
                   Snoopy
                                     :
                                    Brown
                                             :
                                            Springfield
                                                           :
                                                          MA
                                                                        :
                                                                    01105
                                                                               :
                                                                            (413) 781-6500
                                                                                                  partition
            :        :               :       :            :             :     :
                                                                                                  elimination
            :
            :
                     :
                     :
                                     :
                                     :
                                             :
                                             :
                                                          :
                                                          :
                                                                        :
                                                                        :
                                                                              :
                                                                              :
                                                                                                  at query
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :                   time
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :
            :        :               :       :            :             :     :



InfiniDB® Scalable. Fast. Simple.                              12                            Copyright © 2011 Calpont. All Rights Reserved.
Bonus: Easy to Add a New Column

    Row-oriented: Usually requires rebuilding table
    Key     Fname        Lname      State    Zip    Phone            Age   Sex Golf
     1      Bugs         Bunny       NY     11217   (718) 938-3235   34    M    Y        Addition of
     2      Yosemite     Sam         CA     95389   (209) 375-6572   52    M    N
     3      Daffy        Duck        NY     10013   (212) 227-1810   35    M    Y        column shifts
     4      Elmer        Fudd        ME     04578   (207) 882-7323   43    M    Y
     5      Witch        Hazel       MA     01970   (978) 744-0991   57    F    N        every row


    Column-oriented: Just create another file
      Key        Fname              Lname       State       Zip        Phone                Age           Sex          Golf
       1         Bugs               Bunny        NY        11217       (718) 938-3235       34            M             Y
       2         Yosemite           Sam          CA        95389       (209) 375-6572       52            M             N
       3         Daffy              Duck         NY        10013       (212) 227-1810       35            M             Y
       4         Elmer              Fudd         ME        04578       (207) 882-7323       43            M             Y
       5         Witch              Hazel        MA        01970       (978) 744-0991       57            F             N




InfiniDB® Scalable. Fast. Simple.                           13                          Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations

      Because of the nature of columnar storage, single-
      row operations can underperform.

       Do not attempt OLTP-style transactions
       on a columnar database.


      More details on individual DML statements follow...



InfiniDB® Scalable. Fast. Simple.   14      Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations: Insert

    Row-oriented: new rows appended to the end
    Key     Fname         Lname     State      Zip     Phone            Age     Sex
     1      Bugs          Bunny      NY       11217    (718) 938-3235   34      M
     2      Yosemite      Sam        CA       95389    (209) 375-6572   52      M
     3      Daffy         Duck       NY       10013    (212) 227-1810   35      M
     4      Elmer         Fudd       ME       04578    (207) 882-7323   43      M
     5      Witch         Hazel      MA       01970    (978) 744-0991   57      F
       6    Marvin        Martian    CA       91602    (818) 761-9964   26      M


     Columnar: new value must be added to each file
      Key        Fname              Lname         State        Zip           Phone                Age           Sex
       1         Bugs               Bunny          NY         11217          (718) 938-3235       34            M
       2         Yosemite           Sam            CA         95389          (209) 375-6572       52            M
       3         Daffy              Duck           NY         10013          (212) 227-1810       35            M
       4         Elmer              Fudd           ME         04578          (207) 882-7323       43            M
       5         Witch              Hazel          MA         01970          (978) 744-0991       57            F
        6        Marvin             Martian           CA      91602          (818) 761-9964       26            M




InfiniDB® Scalable. Fast. Simple.                              15                             Copyright © 2011 Calpont. All Rights Reserved.
Insert: Solution

      Do batch inserts and use cpimport, the bulk
      loader, instead.


      CPIMPORT is your friend.




InfiniDB® Scalable. Fast. Simple.   16     Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations: Delete

    Row-oriented: row is deleted
    Key     Fname        Lname       State    Zip    Phone            Age   Sex
     1      Bugs         Bunny        NY     11217   (718) 938-3235   34    M
     2      Yosemite     Sam          CA     95389   (209) 375-6572   52    M
     3      Daffy        Duck         NY     10013   (212) 227-1810   35    M
     4      Elmer        Fudd         ME     04578   (207) 882-7323   43    M
     5      Witch        Hazel        MA     01970   (978) 744-0991   57    F



    Columnar: each column must be deleted from
    its file
    Key         Fname               Lname       State       Zip        Phone               Age           Sex
     1          Bugs                Bunny        NY        11217       (718) 938-3235      34            M
     2          Yosemite            Sam          CA        95389       (209) 375-6572      52            M
     3          Daffy               Duck         NY        10013       (212) 227-1810      35            M
     4          Elmer               Fudd         ME        04578       (207) 882-7323      43            M
     5          Witch               Hazel        MA        01970       (978) 744-0991      57            F




InfiniDB® Scalable. Fast. Simple.                            17                         Copyright © 2011 Calpont. All Rights Reserved.
Delete: Solutions

     Do batch deletes.

     Any extents that contain only data that is to be
     deleted can be dropped.

     Otherwise, consider copying desired rows to a new
     table using the bulk loader and dropping the old
     table.




InfiniDB® Scalable. Fast. Simple.   18       Copyright © 2011 Calpont. All Rights Reserved.
Single-Row Operations: Update

    Row-oriented: value replaced
    Key     Fname        Lname       State    Zip    Phone            Age   Sex
     1      Bugs         Bunny        NY     11217   (718) 852-2352   34    M
     2      Yosemite     Sam          CA     95389   (209) 375-6572   52    M
     3      Daffy        Duck         NY     10013   (212) 227-1810   35    M
     4      Elmer        Fudd         ME     04578   (207) 882-7323   43    M
     5      Witch        Hazel        MA     01970   (978) 744-0991   57    F




    Column-oriented: value replaced
    Key         Fname               Lname       State       Zip        Phone               Age           Sex
     1          Bugs                Bunny        NY        11217       (718) 852-2352      34            M
     2          Yosemite            Sam          CA        95389       (209) 375-6572      52            M
     3          Daffy               Duck         NY        10013       (212) 227-1810      35            M
     4          Elmer               Fudd         ME        04578       (207) 882-7323      43            M
     5          Witch               Hazel        MA        01970       (978) 744-0991      57            F



     Yeah, this one just works.


InfiniDB® Scalable. Fast. Simple.                            19                         Copyright © 2011 Calpont. All Rights Reserved.
Architecture – Shared Disk

    (2.2)




                                         or …




                                                       Single Server




InfiniDB® Scalable. Fast. Simple.   20          Copyright © 2011 Calpont. All Rights Reserved.
Architecture – Shared Nothing

    (3.0 option)




InfiniDB® Scalable. Fast. Simple.   21   Copyright © 2011 Calpont. All Rights Reserved.
What Do I Need to Change?

    • Uses MySQL front-end
           o Standard SQL for DDL and DML
           o Most MySQL commands will still work


    Exceptions:
    No cartesian products
    No triggers

    (not a comprehensive list)



InfiniDB® Scalable. Fast. Simple.   22             Copyright © 2011 Calpont. All Rights Reserved.
InfiniDB Ease of Use

     • Automatic Everything:
            o    Vertical partitioning – eliminate unneeded columns
            o    Horizontal partitioning – eliminate unneeded extents
            o    Improved compression
            o    No indexes – columns are de facto indexes
     • You already know how to use it:
            o Standard SQL
            o Familiar MySQL front-end



InfiniDB® Scalable. Fast. Simple.       23             Copyright © 2011 Calpont. All Rights Reserved.
Info

     Links:
     www.calpont.com
     www.calpont.com/products/tryinfinidb – 30-day trial of Enterprise Edition
     www.infinidb.org – Community Edition




InfiniDB® Scalable. Fast. Simple.        24                 Copyright © 2011 Calpont. All Rights Reserved.
The end




InfiniDB® Scalable. Fast. Simple.      25     Copyright © 2011 Calpont. All Rights Reserved.

More Related Content

Viewers also liked

MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMats Kindahl
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Remote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-SatRemote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-SatPlus Technologies
 
Spring's Creations
Spring's CreationsSpring's Creations
Spring's CreationsMakala (D)
 
Olivia lammers a day in the life
Olivia lammers a day in the lifeOlivia lammers a day in the life
Olivia lammers a day in the lifeolammersp1
 
Server Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery ManagerServer Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery ManagerPlus Technologies
 
James shorty candies_1937_2011
James shorty candies_1937_2011James shorty candies_1937_2011
James shorty candies_1937_2011Deena Chadwick
 
Session 41 Mathias Magnusson
Session 41 Mathias MagnussonSession 41 Mathias Magnusson
Session 41 Mathias Magnussonmathmagn
 
5 Worst States for Identity Theft
5 Worst States for Identity Theft5 Worst States for Identity Theft
5 Worst States for Identity TheftIDT911
 
Inspiratioanl Quotes
Inspiratioanl QuotesInspiratioanl Quotes
Inspiratioanl QuotesJims Rohini
 

Viewers also liked (12)

MySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal ScalingMySQL Sharding: Tools and Best Practices for Horizontal Scaling
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Remote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-SatRemote Location Printing With OM Plus i-Sat
Remote Location Printing With OM Plus i-Sat
 
Spring's Creations
Spring's CreationsSpring's Creations
Spring's Creations
 
Olivia lammers a day in the life
Olivia lammers a day in the lifeOlivia lammers a day in the life
Olivia lammers a day in the life
 
Server Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery ManagerServer Consolidation with OM Plus Delivery Manager
Server Consolidation with OM Plus Delivery Manager
 
James shorty candies_1937_2011
James shorty candies_1937_2011James shorty candies_1937_2011
James shorty candies_1937_2011
 
Prayer semminar
Prayer  semminarPrayer  semminar
Prayer semminar
 
Session 41 Mathias Magnusson
Session 41 Mathias MagnussonSession 41 Mathias Magnusson
Session 41 Mathias Magnusson
 
5 Worst States for Identity Theft
5 Worst States for Identity Theft5 Worst States for Identity Theft
5 Worst States for Identity Theft
 
Inspiratioanl Quotes
Inspiratioanl QuotesInspiratioanl Quotes
Inspiratioanl Quotes
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 

Demystifying Columnar Databases

  • 1. DeMystifying Columnar Databases June Tong jtong@calpont.com straycat90@gmail.com April 2012 ® Calpont Proprietary and Confidential
  • 2. Agenda • What is a columnar database? • Why is it better than a row-oriented database? • When isn’t it better? • What do I need to know to use it? • How will I need to change my application code? InfiniDB® Scalable. Fast. Simple. 2 Copyright © 2011 Calpont. All Rights Reserved.
  • 3. Who is Calpont? • Calpont Corporation oPrivately held oHeadquartered in Frisco, TX Our Mission To provide a scalable data platform that enables analytic business decisions as timely as customers and markets dictate. InfiniDB® Scalable. Fast. Simple. 3 Copyright © 2011 Calpont. All Rights Reserved.
  • 4. InfiniDB InfiniDB is a columnar MPP MySQL database engine, expressly designed for analytic applications oInfiniDB Community (single-server) oInfiniDB Enterprise  Version 2.2 – shared disk  Version 3.0 – added shared nothing option ® InfiniDB® Scalable. Fast. Simple. 4 Copyright © 2011 Calpont. All Rights Reserved.
  • 5. Traditional Row-Oriented Storage Rows stored sequentially Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Provides best performance when most queries are for multiple columns of a single row (OLTP applications) InfiniDB® Scalable. Fast. Simple. 5 Copyright © 2011 Calpont. All Rights Reserved.
  • 6. Key Lookup in a Row-Oriented Database Indexes Indexes on high-cardinality columns Key 1 RowID 0001B008D23A671A make accessing a single row very fast 2 0001B008D23A671B 3 0001B008D23A671C Key Fname Lname State Zip Phone Age Sex 4 0001B008D23A671D 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 5 0001B008D23A671E 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M WHERE key=4 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Elmer Fudd calls customer service but don’t help on analytical queries Phone RowID scanning many rows (207) 882-7323 0001B008D23A671D (209) 375-6572 0001B008D23A671B e.g. (212) 227-1810 0001B008D23A671C (718) 938-3235 (978) 744-0991 0001B008D23A671A 0001B008D23A671E What’s the average age of males? WHERE phone=‘(207) 882-7323’ InfiniDB® Scalable. Fast. Simple. 6 Copyright © 2011 Calpont. All Rights Reserved.
  • 7. Sequential Scans are Killers What if you had 100 million rows, with 100 columns? Sex Age If the table is 100GB, you have to read 100GB. Or build composite indexes on EVERYTHING. 7 InfiniDB® Scalable. Fast. Simple. 7 Copyright © 2011 Calpont. All Rights Reserved.
  • 8. Column-Oriented Storage Each column is stored in a separate file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Each column for a given row is at the same offset (auto-indexing) InfiniDB® Scalable. Fast. Simple. 8 Copyright © 2011 Calpont. All Rights Reserved.
  • 9. Read Columns, Not Rows Only read the files you need Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Also get improved compression because all data in one file is the same data type. InfiniDB® Scalable. Fast. Simple. 9 Copyright © 2011 Calpont. All Rights Reserved.
  • 10. I/O Reduction So you still have 100 million rows, with 100 columns... Males Age But you only read 2 columns, instead of 100 InfiniDB® Scalable. Fast. Simple. 10 Copyright © 2011 Calpont. All Rights Reserved.
  • 11. Vertical Partitioning Columnar databases produce automatic vertical partitioning 1 Bugs Bunny Brooklyn NY 11217 (718) 938-3235 2 Yosemite Sam Wawona CA 95389 (209) 375-6572 3 Daffy Duck New York NY 10013 (212) 227-1810 4 Elmer Fudd Wiscasset ME 04578 (207) 882-7323 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8m Snoopy Brown Springfield MA 01105 (413) 781-6500 InfiniDB® Scalable. Fast. Simple. 11 Copyright © 2011 Calpont. All Rights Reserved.
  • 12. Horizontal Partitioning InfiniDB also automatically creates horizontal partitions of 8 million rows (default) 1 Bugs Bunny Brooklyn NY 11217 (718) 938-3235 2 3 Yosemite Daffy Sam Duck Wawona New York CA NY 95389 10013 (209) 375-6572 (212) 227-1810 Knowing 4 : Elmer : Fudd : Wiscasset : ME : 04578 : (207) 882-7323 : what values : : : : : : : : : : : : : : are in each : : : : : : : : : : : : : : : : : : : : : partition : : : : : : : : : : : : : : allows for 8m : : Snoopy : Brown : Springfield : MA : 01105 : (413) 781-6500 partition : : : : : : : elimination : : : : : : : : : : : : : : at query : : : : : : : : : : : : : : time : : : : : : : : : : : : : : : : : : : : : : : : : : : : InfiniDB® Scalable. Fast. Simple. 12 Copyright © 2011 Calpont. All Rights Reserved.
  • 13. Bonus: Easy to Add a New Column Row-oriented: Usually requires rebuilding table Key Fname Lname State Zip Phone Age Sex Golf 1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y Addition of 2 Yosemite Sam CA 95389 (209) 375-6572 52 M N 3 Daffy Duck NY 10013 (212) 227-1810 35 M Y column shifts 4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y 5 Witch Hazel MA 01970 (978) 744-0991 57 F N every row Column-oriented: Just create another file Key Fname Lname State Zip Phone Age Sex Golf 1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y 2 Yosemite Sam CA 95389 (209) 375-6572 52 M N 3 Daffy Duck NY 10013 (212) 227-1810 35 M Y 4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y 5 Witch Hazel MA 01970 (978) 744-0991 57 F N InfiniDB® Scalable. Fast. Simple. 13 Copyright © 2011 Calpont. All Rights Reserved.
  • 14. Single-Row Operations Because of the nature of columnar storage, single- row operations can underperform. Do not attempt OLTP-style transactions on a columnar database. More details on individual DML statements follow... InfiniDB® Scalable. Fast. Simple. 14 Copyright © 2011 Calpont. All Rights Reserved.
  • 15. Single-Row Operations: Insert Row-oriented: new rows appended to the end Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Columnar: new value must be added to each file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M InfiniDB® Scalable. Fast. Simple. 15 Copyright © 2011 Calpont. All Rights Reserved.
  • 16. Insert: Solution Do batch inserts and use cpimport, the bulk loader, instead. CPIMPORT is your friend. InfiniDB® Scalable. Fast. Simple. 16 Copyright © 2011 Calpont. All Rights Reserved.
  • 17. Single-Row Operations: Delete Row-oriented: row is deleted Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Columnar: each column must be deleted from its file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F InfiniDB® Scalable. Fast. Simple. 17 Copyright © 2011 Calpont. All Rights Reserved.
  • 18. Delete: Solutions Do batch deletes. Any extents that contain only data that is to be deleted can be dropped. Otherwise, consider copying desired rows to a new table using the bulk loader and dropping the old table. InfiniDB® Scalable. Fast. Simple. 18 Copyright © 2011 Calpont. All Rights Reserved.
  • 19. Single-Row Operations: Update Row-oriented: value replaced Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 852-2352 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Column-oriented: value replaced Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 852-2352 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Yeah, this one just works. InfiniDB® Scalable. Fast. Simple. 19 Copyright © 2011 Calpont. All Rights Reserved.
  • 20. Architecture – Shared Disk (2.2) or … Single Server InfiniDB® Scalable. Fast. Simple. 20 Copyright © 2011 Calpont. All Rights Reserved.
  • 21. Architecture – Shared Nothing (3.0 option) InfiniDB® Scalable. Fast. Simple. 21 Copyright © 2011 Calpont. All Rights Reserved.
  • 22. What Do I Need to Change? • Uses MySQL front-end o Standard SQL for DDL and DML o Most MySQL commands will still work Exceptions: No cartesian products No triggers (not a comprehensive list) InfiniDB® Scalable. Fast. Simple. 22 Copyright © 2011 Calpont. All Rights Reserved.
  • 23. InfiniDB Ease of Use • Automatic Everything: o Vertical partitioning – eliminate unneeded columns o Horizontal partitioning – eliminate unneeded extents o Improved compression o No indexes – columns are de facto indexes • You already know how to use it: o Standard SQL o Familiar MySQL front-end InfiniDB® Scalable. Fast. Simple. 23 Copyright © 2011 Calpont. All Rights Reserved.
  • 24. Info Links: www.calpont.com www.calpont.com/products/tryinfinidb – 30-day trial of Enterprise Edition www.infinidb.org – Community Edition InfiniDB® Scalable. Fast. Simple. 24 Copyright © 2011 Calpont. All Rights Reserved.
  • 25. The end InfiniDB® Scalable. Fast. Simple. 25 Copyright © 2011 Calpont. All Rights Reserved.