Vertica
Zvika Gutkin
DB Expert
Zvika.gutkin@gmail.com
Agenda

• What is Vertica.

• How does it work.

• How To Use Vertica … (The Right Way ).

• Where It Falls Short.

• Examples …
MPP-Columnar DBMS

10x –100x performance of classic RDBMS.
Linear Scale
SQL
Commodity Hardware
Built-in fault tolerance
10x –100x performance of classic
                 RDBMS
•   Column store architecture
•   High Compression rates
•   Sorted columns
•   Objects Segmentation/Replication.
How Does It Work ?
Tuple Mover
Delete
• Deleted rows are only marked as deleted.
• Stored in delete vector on disk.
• Query merge the ROS and Deleted vector to
  remove deleted records.
• Data is removed asynchronously during
  mergeout.
Projections
•   Physical structure of the table (logical)
•   Stored sorted and compressed
•   Internal maintenance
•   At least one (super) projection.
•   Projection Types:
    –   Super projection
    –   Query specific projection
    –   Pre join projection
    –   Buddy projection
Projections
What‘s Important ….
•   Choose the right columns (General Vs Specific).
•   Choose the right sort order .
•   Choose the right encoding .
•   Choose the right column to partition by .
•   Choose the right column to segment by .
Where It Falls Short …
• Lack of Features .
• Good for specific types of queries .
  – Keep Queries Simple .
  – Use the right columns
  – Use Order By to help optimizer pick the right
    projection
  – Check the join column – Best if both tables order
    by it .
  – Check the join column – best if segmented by it.
Choose the Right sort order
             Example
select
    a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID,
    count(distinct a11.VS_LP_SESSION_ID) AS Visits,
    (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS
WJXBFS1
 from lp_15744040.FACT_VISIT_ROOM a11
 group by
    a11.LP_ACCOUNT_ID;
First projection ….
table_name         projection_name      projection_column_name   column_position   sort_position
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad VS_LP_SESSION_ID          0                 0
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad LP_ACCOUNT_ID             1                 1
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad VS_LP_VISITOR_ID          2                 2
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad VISIT_FROM_DT_TRUNC       3                 3
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad ACCOUNT_ID                4                 4
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad ROOM_ID                   5                 5
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad VISIT_FROM_DT_ACTUAL      6                 6
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad VISIT_TO_DT_ACTUAL        7                 7
FACT_VISIT_ROOM    FACT_VISIT_ROOM_bad HOT_LEAD_IND              8                 8


Access Path:
+-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1)
| Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
| Group By: a11.LP_ACCOUNT_ID
| +---> GROUPBY HASH (SORT OUTPUT) [Cost: 7M, Rows: 10K] (PATH ID: 2)
| | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
| | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3)
| | | Projection: lp_15744040.FACT_VISIT_ROOM_bad
| | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
Second projection …
table_name          projection_name        projection_column_name   column_position   sort_position
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   LP_ACCOUNT_ID            0                 0
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   VS_LP_SESSION_ID         1                 1
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   VS_LP_VISITOR_ID         2                 2
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   VISIT_FROM_DT_TRUNC      3                 3
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   ACCOUNT_ID               4                 4
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   ROOM_ID                  5                 5
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   VISIT_FROM_DT_ACTUAL     6                 6
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   VISIT_TO_DT_ACTUAL       7                 7
FACT_VISIT_ROOM     FACT_VISIT_ROOM_fix1   HOT_LEAD_IND             8                 8



  Access Path:
  +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1)
  | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
  | Group By: a11.LP_ACCOUNT_ID
  | +---> GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 2)
  | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
  | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3)
  | | | Projection: lp_15744040.FACT_VISIT_ROOM_fix1
  | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
Results …
Elapsed Time First projection
GROUPBY HASH (SORT OUTPUT)

Time: First fetch (7 rows): 264527.916 ms. All rows formatted: 264527.978 ms




Elapsed Time Second projection
GROUPBY PIPELINED



Time: First fetch (7 rows): 38913.909 ms. All rows formatted: 38913.965 ms
Join Example
select a12.DT_WEEK AS DT_WEEK,
    a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID,
    count(distinct a11.VS_LP_SESSION_ID) AS Visits,
    (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1
 from zzz.FACT_VISIT a11
    join zzz.DIM_DATE_TIME a12
     on (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID)
 where (a11.LP_ACCOUNT_ID in ('57386690')
  and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50')
 group by a12.DT_WEEK,
    a11.LP_ACCOUNT_ID

   Filter : LP_ACCOUNT_ID, VISIT_FROM_DT_TRUNC
   Group By : DT_WEEK , LP_ACCOUNT_ID
   Join: VISIT_FROM_DT_TRUNC , DATE_TIME_ID
   Select : DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID
Full Explain Plan…
 Access Path:
  +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1)
  | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
  | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID
  | Execute on: All Nodes
  | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2)
  | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
  | | Execute on: All Nodes
  | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISTICS)] (PATH ID: 3)
  | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID)
  | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID
  | | | Execute on: All Nodes
  | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 421K, Rows: 372M (NO STATISTICS)] (PATH ID: 4)
  | | | | Projection: zzz.FACT_VISIT_b0
  | | | | Materialize: a11.VISIT_FROM_DT_TRUNC
  | | | | Filter: (a11.LP_ACCOUNT_ID = '57386690')
  | | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <=
 '2011-12-31 12:52:50'::timestamp))
  | | | | Execute on: All Nodes
  | | | +-- Inner -> STORAGE ACCESS for a12 [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 5)
  | | | | Projection: zzz.DIM_DATE_TIME_node0004
  | | | | Materialize: a12.DATE_TIME_ID, a12.DT_WEEK
  | | | | Filter: ((a12.DATE_TIME_ID >= '2011-09-01 15:28:00'::timestamp) AND (a12.DATE_TIME_ID <= '2011-12-31
 12:52:50'::timestamp))
  | | | | Execute on: All Nodes
Explain Plan (substract)…
 Access Path:l
 +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID:
 1)
 | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
 | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID
 | Execute on: All Nodes
 | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2)
 | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
 | | Execute on: All Nodes
 | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISlTICS)] (PATH ID: 3)
 | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID)
 | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID
 | | | Execute on: All Nodes




 Time: First fetch (6 rows): 56654.894 ms. All rows formatted: 56654.988 ms
Solution one - Functions
   select week(a11.VISIT_FROM_DT_TRUNC) AS DT_WEEK,
       a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID,
       count(distinct a11.VS_LP_SESSION_ID) AS Visits,
       (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1
    from zzz.FACT_VISIT a11
    where (a11.LP_ACCOUNT_ID in ('57386690')
     and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50')
    group by week(a11.VISIT_FROM_DT_TRUNC),
       a11.LP_ACCOUNT_ID;
Access Path:
+-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 127, Rows: 1 (STALE STATISTICS)] (PATH ID: 1)
| Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
| Group By: <SVAR>, a11.LP_ACCOUNT_ID
| Execute on: All Nodes
| +---> GROUPBY HASH (SORT OUTPUT) [Cost: 126, Rows: 1 (STALE STATISTICS)] (PATH ID: 2)
| | Group By: (date_part('week', a11.VISIT_FROM_DT_TRUNC))::int, a11.LP_ACCOUNT_ID,
a11.VS_LP_SESSION_ID
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for a11 [Cost: 125, Rows: 1 (STALE STATISTICS)] (PATH ID: 3)
| | | Projection: zzz.FACT_VISIT_b0
              Time: First fetch (6 rows): 33453.997 ms. All rows formatted: 33454.154 ms
                                        Saved the Join Time
Solution Two- PreJoin Projection
Pros                        Cons
• Eliminate Join overhead   • Not Flexible
• Maintain By Vertica       • Cause Overhead on Load
                            • Need Primary/Foreign Key
                            • Maintenance Restrictions
Solution Two- PreJoin Projection
order by
LP_ACCOUNT_ID,VISIT_FROM_DT_TRUNC,DT_WEEK,HOT_LEAD_IND,DATE_TIME_ID,VS_LP_SESSION_ID


Access Path:
+-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 12K, Rows: 10K] (PATH ID: 1)
| Aggregates: count(DISTINCT visit_date_time_prejoin8_b0.VS_LP_SESSION_ID)
| Group By: visit_date_time_prejoin8_b0.DT_WEEK,
visit_date_time_prejoin8_b0.LP_ACCOUNT_ID
| Execute on: All Nodes
| +---> GROUPBY HASH (SORT OUTPUT) [Cost: 11K, Rows: 10K] (PATH ID: 2)
| | Group By: visit_date_time_prejoin8_b0.DT_WEEK,
visit_date_time_prejoin8_b0.LP_ACCOUNT_ID, visit_date_time_prejoin8_b0.VS_LP_SESSION_ID
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for <No Alias> [Cost: 8K, Rows: 1M] (PATH ID: 3)
| | | Projection: lp_15744040.visit_date_time_prejoin8_b0

Time: First fetch (6 rows): 35312.331 ms. All rows formatted: 35312.421 ms
                               Saved the Join Time
Solution Two- PreJoin Projection
    Sorted By DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID

Access Path:
+-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 542K, Rows: 10K] (PATH ID: 1)
| Aggregates: count(DISTINCT visit_date_time_prejoin_z6.VS_LP_SESSION_ID)
| Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.LP_ACCOUNT_ID
| Execute on: All Nodes
| +---> GROUPBY PIPELINED [Cost: 542K, Rows: 10K] (PATH ID: 2)
| | Group By: visit_date_time_prejoin_z6.DT_WEEK,
visit_date_time_prejoin_z6.VS_LP_SESSION_ID, visit_date_time_prejoin_z6.LP_ACCOUNT_ID
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for <No Alias> [Cost: 501K, Rows: 15M] (PATH ID: 3)
| | | Projection: lp_15744040.visit_date_time_prejoin_z6
||



Time: First fetch (6 rows): 3680.853 ms. All rows formatted: 3680.969 ms
                   Saved the Join Time and Group by hash Time
Solution Three - Denormalize
select DT_WEEK,
     a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID,
     count(distinct a11.VS_LP_SESSION_ID) AS Visits,
     (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1
  from zzz.FACT_VISIT_Z1 a11
  where (a11.LP_ACCOUNT_ID in ('57386690')
   and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50')
  group by DT_WEEK,
     a11.LP_ACCOUNT_ID;
Access Path:
+-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 1)
| Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
| Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID
| Execute on: All Nodes
| +---> GROUPBY HASH (SORT OUTPUT) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 2)
| | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for a11 [Cost: 2M, Rows: 372M (NO STATISTICS)] (PATH ID: 3)
| | | Projection: zzz.FACT_VISIT_Z1_super
Time: First etch (6 rows): 33885.178 ms. All rows formatted: 33885.253 ms
                                 Saved the Join Time
Solution Three - Denormalize
• Changing the projection sort order
Access Path:
+-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 588K, Rows: 10K] (PATH ID: 1)
| Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID)
| Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID
| Execute on: All Nodes
| +---> GROUPBY PIPELINED [Cost: 587K, Rows: 10K] (PATH ID: 2)
| | Group By: a11.DT_WEEK, a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID
| | Execute on: All Nodes
| | +---> STORAGE ACCESS for a11 [Cost: 531K, Rows: 20M] (PATH ID: 3)
| | | Projection: zzz.fact_visit_z1_pipe
| | | Materialize: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
| | | Filter: (a11.LP_ACCOUNT_ID = '57386690')
| | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp)
AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp))
| | | Execute on: All Nodes
 Time: First fetch (6 rows): 4313.497 ms. All rows formatted: 4313.600 ms
                Saved the Join Time and Group by hash Time
Let’s sum it up…

  • Keep it simple
 • Keep it sorted.
 • Keep it joinless
Questions ?
Thank You

Vertica mpp columnar dbms

  • 1.
  • 2.
    Agenda • What isVertica. • How does it work. • How To Use Vertica … (The Right Way ). • Where It Falls Short. • Examples …
  • 3.
    MPP-Columnar DBMS 10x –100xperformance of classic RDBMS. Linear Scale SQL Commodity Hardware Built-in fault tolerance
  • 4.
    10x –100x performanceof classic RDBMS • Column store architecture • High Compression rates • Sorted columns • Objects Segmentation/Replication.
  • 5.
  • 6.
  • 7.
    Delete • Deleted rowsare only marked as deleted. • Stored in delete vector on disk. • Query merge the ROS and Deleted vector to remove deleted records. • Data is removed asynchronously during mergeout.
  • 8.
    Projections • Physical structure of the table (logical) • Stored sorted and compressed • Internal maintenance • At least one (super) projection. • Projection Types: – Super projection – Query specific projection – Pre join projection – Buddy projection
  • 9.
  • 10.
    What‘s Important …. • Choose the right columns (General Vs Specific). • Choose the right sort order . • Choose the right encoding . • Choose the right column to partition by . • Choose the right column to segment by .
  • 11.
    Where It FallsShort … • Lack of Features . • Good for specific types of queries . – Keep Queries Simple . – Use the right columns – Use Order By to help optimizer pick the right projection – Check the join column – Best if both tables order by it . – Check the join column – best if segmented by it.
  • 12.
    Choose the Rightsort order Example select a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from lp_15744040.FACT_VISIT_ROOM a11 group by a11.LP_ACCOUNT_ID;
  • 13.
    First projection …. table_name projection_name projection_column_name column_position sort_position FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_SESSION_ID 0 0 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad LP_ACCOUNT_ID 1 1 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_VISITOR_ID 2 2 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_TRUNC 3 3 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ACCOUNT_ID 4 4 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ROOM_ID 5 5 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_ACTUAL 6 6 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_TO_DT_ACTUAL 7 7 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad HOT_LEAD_IND 8 8 Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_bad | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
  • 14.
    Second projection … table_name projection_name projection_column_name column_position sort_position FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 LP_ACCOUNT_ID 0 0 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_SESSION_ID 1 1 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_VISITOR_ID 2 2 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_TRUNC 3 3 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ACCOUNT_ID 4 4 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ROOM_ID 5 5 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_ACTUAL 6 6 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_TO_DT_ACTUAL 7 7 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 HOT_LEAD_IND 8 8 Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_fix1 | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
  • 15.
    Results … Elapsed TimeFirst projection GROUPBY HASH (SORT OUTPUT) Time: First fetch (7 rows): 264527.916 ms. All rows formatted: 264527.978 ms Elapsed Time Second projection GROUPBY PIPELINED Time: First fetch (7 rows): 38913.909 ms. All rows formatted: 38913.965 ms
  • 16.
    Join Example select a12.DT_WEEKAS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 join zzz.DIM_DATE_TIME a12 on (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by a12.DT_WEEK, a11.LP_ACCOUNT_ID  Filter : LP_ACCOUNT_ID, VISIT_FROM_DT_TRUNC  Group By : DT_WEEK , LP_ACCOUNT_ID  Join: VISIT_FROM_DT_TRUNC , DATE_TIME_ID  Select : DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID
  • 17.
    Full Explain Plan… Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 421K, Rows: 372M (NO STATISTICS)] (PATH ID: 4) | | | | Projection: zzz.FACT_VISIT_b0 | | | | Materialize: a11.VISIT_FROM_DT_TRUNC | | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes | | | +-- Inner -> STORAGE ACCESS for a12 [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) | | | | Projection: zzz.DIM_DATE_TIME_node0004 | | | | Materialize: a12.DATE_TIME_ID, a12.DT_WEEK | | | | Filter: ((a12.DATE_TIME_ID >= '2011-09-01 15:28:00'::timestamp) AND (a12.DATE_TIME_ID <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes
  • 18.
    Explain Plan (substract)… Access Path:l +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISlTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes Time: First fetch (6 rows): 56654.894 ms. All rows formatted: 56654.988 ms
  • 19.
    Solution one -Functions select week(a11.VISIT_FROM_DT_TRUNC) AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by week(a11.VISIT_FROM_DT_TRUNC), a11.LP_ACCOUNT_ID; Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 127, Rows: 1 (STALE STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: <SVAR>, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 126, Rows: 1 (STALE STATISTICS)] (PATH ID: 2) | | Group By: (date_part('week', a11.VISIT_FROM_DT_TRUNC))::int, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 125, Rows: 1 (STALE STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_b0 Time: First fetch (6 rows): 33453.997 ms. All rows formatted: 33454.154 ms Saved the Join Time
  • 20.
    Solution Two- PreJoinProjection Pros Cons • Eliminate Join overhead • Not Flexible • Maintain By Vertica • Cause Overhead on Load • Need Primary/Foreign Key • Maintenance Restrictions
  • 21.
    Solution Two- PreJoinProjection order by LP_ACCOUNT_ID,VISIT_FROM_DT_TRUNC,DT_WEEK,HOT_LEAD_IND,DATE_TIME_ID,VS_LP_SESSION_ID Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 12K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin8_b0.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 11K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID, visit_date_time_prejoin8_b0.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 8K, Rows: 1M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin8_b0 Time: First fetch (6 rows): 35312.331 ms. All rows formatted: 35312.421 ms Saved the Join Time
  • 22.
    Solution Two- PreJoinProjection Sorted By DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 542K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin_z6.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 542K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.VS_LP_SESSION_ID, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 501K, Rows: 15M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin_z6 || Time: First fetch (6 rows): 3680.853 ms. All rows formatted: 3680.969 ms Saved the Join Time and Group by hash Time
  • 23.
    Solution Three -Denormalize select DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT_Z1 a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by DT_WEEK, a11.LP_ACCOUNT_ID; Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 2M, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_Z1_super Time: First etch (6 rows): 33885.178 ms. All rows formatted: 33885.253 ms Saved the Join Time
  • 24.
    Solution Three -Denormalize • Changing the projection sort order Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 588K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 587K, Rows: 10K] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 531K, Rows: 20M] (PATH ID: 3) | | | Projection: zzz.fact_visit_z1_pipe | | | Materialize: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | Execute on: All Nodes Time: First fetch (6 rows): 4313.497 ms. All rows formatted: 4313.600 ms Saved the Join Time and Group by hash Time
  • 25.
    Let’s sum itup… • Keep it simple • Keep it sorted. • Keep it joinless
  • 26.
  • 27.