Introduction to Vertica (Architecture & More)

  • 1,112 views
Uploaded on

LivePersonDev is happy to host this meetup with Zvika Gutkin, an Oracle and Vertica expert DBA in LivePerson, and specialist in BI and Big Data. …

LivePersonDev is happy to host this meetup with Zvika Gutkin, an Oracle and Vertica expert DBA in LivePerson, and specialist in BI and Big Data.

At LivePerson, we handle enormous amounts of data. We use Vertica to analyse this data in real time.

In this lecture Zvika will cover the following:
1. Present the architecture of Vertica
2. Compare row store to column store
3. Explain how Vertica achieve Fast query time
4. Show few use cases .
5. Explains what does Liveperson do with Vertica? Why we chose Vertica?
6. Talk about why we Love Vertica and Why we hate it .
7. Is Vertica SQL DB or NoSQL? Is vertica Consistent or Eventually consistent?
8. How Vertica differ from other SQL and noSQL technologies?

More in: Technology , Travel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,112
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Transaction is a feature that database have and file system don’t.ACID =>Atomicity, Consistency, Isolation, Durabilityguarantee that database transactions are processed reliably. Take the database from one consistent state to another consistent state.A => trx happened completely or notC=>take the db from one consistent state to another .I=>until commit the trx will not be seen by other trx.D=>once trx committed it’s permanent.
  • Which system can support all those requirements
  • Real time dashborading - a lot of users with simple set and get = CassandraReal time complex analytics - no sql doesn't’t support complex analytics ( sessionize , gap filling , event pattern matching ….)Billing – short transaction , auditing , transactions a lot of small DMLs => OracleBlog Site – big documents(threads and comments), fast response , huge amount of concurrent users, unstructured data => Couchbase======================================LP Cassandra => every several seconds …avg chat time, how much time agent was logged in back in 5 etc …Vertica=> how much visits,engagements,conversionConversion per type over period of timeAvg order valueConversion lift
  • Minimum ReqCPU=> six or eight core CPUs .MEMORY => 4 GB of memory per physical CPU core STORGAE THROUGHPUT => 20 MB/s per physical core.NETWORK => 1 GB , recommended => 10 GB networking is highly recommended
  • Wos – row storeRos – column storeMoveout – wos to rosMergeout – merge ros containers
  • Super projection–Projection that contains all columns of a logical tableQuery Specific Projection–Subset of columns–Sorted for a specific query or class of queriesPre1Join Projection–Stores the result of a join between Fact and 1+ Dimension table(s)•Buddy Projection–Projection with the same columns and segmentation ondifferent nodes to provide high availability (HA)
  • Encoding typeAuto (default) – for string => LZO , for number => deltaDeltaval => data is recorded as a difference from the smallest value in the data blockRLE => replaces sequences (runs) of identical values with a single pair that contains the value and number of occurrences.Other … (BLOCK_DICT )
  • Delete PerformanceAll Projections must be optimized for deletes–All Projections for a table should contain all columns used in the delete WHERE clause.Replay delete – Data marked for deletion after the current AHM must be preserved through mergeout via a process called “replay delete” – Replay delete is slow-running when there are many deletes to be replayed, and holds a T lock, preventing additional deletes
  • Lack of features – procedural language, good repository tables, explain plan trace,lobs,vpdDocumentationKeep Queries Simple - optimizer is not mature and will not pick the right projections Use the right columns – needs to be at the projection Use Order By - use it even when not needed by the query .Check the join column - order both on it to use merge joinCheck the join column - when segmented both on it will use local join.
  • Hash table must be completely built before results can be output to the user
  • Less memory is used and runs faster than Group By HashGroup by Pipe is critical to aggregate large amounts of data OR a large number of groups – Can stream an infinite number of tuples
  • CREATE PROJECTION lp_15744040.visit_date_time_prejoin_z6( DT_WEEK ENCODING RLE, LP_ACCOUNT_ID ENCODING RLE, VS_LP_SESSION_ID, VISIT_FROM_DT_TRUNC ENCODING RLE, DATE_TIME_ID)AS SELECT DIM_DATE_TIME.DT_WEEK, FACT_VISIT.LP_ACCOUNT_ID, FACT_VISIT.VS_LP_SESSION_ID, FACT_VISIT.VISIT_FROM_DT_TRUNC, DIM_DATE_TIME.DATE_TIME_ID FROM (lp_15744040.FACT_VISIT JOIN lp_15744040.DIM_DATE_TIME ON ((FACT_VISIT.VISIT_FROM_DT_TRUNC = DIM_DATE_TIME.DATE_TIME_ID))) ORDER BY DIM_DATE_TIME.DT_WEEK, FACT_VISIT.LP_ACCOUNT_ID, FACT_VISIT.VS_LP_SESSION_IDSEGMENTED BY hash(FACT_VISIT.VS_LP_SESSION_ID) ALL NODES ;
  • Create the fact with Week column We can add month,day and what ever we want
  • CREATE PROJECTION zzz.fact_visit_z1_pipe( DT_WEEK ENCODING RLE, LP_ACCOUNT_ID ENCODING RLE, VS_LP_SESSION_ID, VISIT_FROM_DT_TRUNC ENCODING RLE)AS SELECT FACT_VISIT.DT_WEEK, FACT_VISIT.LP_ACCOUNT_ID, FACT_VISIT.VS_LP_SESSION_ID, FACT_VISIT.VISIT_FROM_DT_TRUNC FROM zzz.FACT_VISIT_Z1 as FACT_VISIT ORDER BY FACT_VISIT.DT_WEEK, FACT_VISIT.LP_ACCOUNT_ID, FACT_VISIT.VS_LP_SESSION_IDSEGMENTED BY hash(FACT_VISIT.VS_LP_SESSION_ID) ALL NODES ;
  • Vertica can handle join and even large join but if you can avoid it don’t do that . DE normalize as you can .

Transcript

  • 1. Vertica Zvika Gutkin DB Expert Zvika.gutkin@gmail.com
  • 2. Agenda • Vertica VS the world • What is Vertica • How does it work • How To Use Vertica … (The Right Way ) • Where It Falls Short • Drill Down to SQL’s… (Group by & Joins )
  • 3. • 1,000,000 concurrent users • 1,000,000 operations/s • Micro seconds read & write latency • Complex analytics queries with seconds latency • ACID
  • 4. Vertica Oracle Couchbase Cassandra Mongo MySql Exadata
  • 5. Vertica VS the World Vertica Oracle Cassandra Couchbase Scale Mpp Single Server* Mpp Mpp Data Model Relational structured Relational structured Column store schema-less Document schema-less Transaction Model ACID ACID Eventually consistent Consistent Dr Application solution Stand by read only Active Active Active Active Development Sql… Sql… Python,Java,Cql … Python,Java,Ph p… Best for Analytics Generic,OLTP Write intensive key value Read and write intensive json documents CAP CP N/A AP CP
  • 6. Use Cases • Real time dashborading (5,000 concurrent users, heavy writes and simple fetches ). • Real time complex analytics • Billing • Blog Site
  • 7. MPP-Columnar DBMS • 10x –100x performance of classic RDBMS • Linear Scale • SQL • Commodity Hardware • Built-in fault tolerance
  • 8. 10x –100x performance of classic RDBMS Column store architecture • High Compression rates. • Sorted columns. • Objects Segmentation/Replication.
  • 9. Regular table Continent Country City Size Size type Population Asia Israel Tel Aviv 52000 Acres 450000 N.America USA Dallas 385 Sq. miles 1200000 Create Table …..
  • 10. Rows Vs Column Block1 •Asia •Israel •Sq. miles •Tel Aviv Block2 •52000 •450000 •Asia Block3 •Israel •Sq. miles •Jerusalem Block4 •78000 •800000 •N.America Block 5 •Usa •Dallas •Sq. miles •385 Block 6 •1200000 •Asia •Israel Block 7 •Haifa •Sq. miles •63000 Block 8 •268000 •N.America •Usa Block 9 •New York •Sq. miles •468 •8200000 Block1 •Asia •Israel •Sq. miles •Tel Aviv Block2 •52000 •450000 •Asia Block3 •Israel •Sq. miles •Jerusalem Block4 •78000 •800000 •N.America Block 5 •Usa •Dallas •Sq. miles •385 Block 6 •1200000 •Asia •Israel Block 7 •Haifa •Sq. miles •63000 Block 8 •268000 •N.America •Usa Block 9 •New York •Sq. miles •468 •8200000 Block1 •Asia •Israel •Sq. miles •Tel Aviv Block2 •52000 •450000 •Asia Block3 •Israel •Sq. miles •Jerusalem Block4 •78000 •800000 •N.America Block 5 •Usa •Dallas •Sq. miles •385 Block 6 •1200000 •Asia •Israel Block 7 •Haifa •Sq. miles •63000 Block 8 •268000 •N.America •Usa Block 9 •New York •Sq. miles •468 •8200000 Block1 •Asia •Israel •Sq. miles •Tel Aviv Block2 •52000 •450000 •Asia Block3 •Israel •Sq. miles •Jerusalem Block4 •78000 •800000 •N.America Block 5 •Usa •Dallas •Sq. miles •385 Block 6 •1200000 •Asia •Israel Block 7 •Haifa •Sq. miles •63000 Block 8 •268000 •America •Usa Block 9 •New York •Sq. miles •468 •8200000 Block1 •Asia •Israel •Sq. miles •Tel Aviv Block2 •52000 •450000 •Asia Block3 •Israel •Sq. miles •Jerusalem Block4 •78000 •800000 •N.America Block 5 •Usa •Dallas •Sq. miles •385 Block 6 •1200000 •Asia •Israel Block 7 •Haifa •Sq. miles •63000 Block 8 •268000 •N.America •Usa Block 9 •New York •Sq. miles •468 •8200000
  • 11. Rows VS Columns • Conversion Table (~2 billion rows a month) –Oracle • Uncompressed => 418 GB • Compressed (manual) => 147 GB –Vertica • 21 GB
  • 12. How Does It Work ?
  • 13. Tuple Mover
  • 14. ROS Asia,23 N.America,13 Israel,23 Usa,13 Natanya,1 Zoran,1 … seattle,1 Chicago,1 Austin,1 … Asia,2 N.America, 3 Israel,2 Usa,1 Jerusalem,1 Tel aviv,1 … Dallas,1 New Jersey,1 New York,1 … WOS Tuple Mover Flow N.America Usa Dallas Sq. miles 385 1200000 Asia Israel Tel Aviv Sq. miles 52000 450000 N.America Usa New York Sq. miles 462 8200000 N.America Usa New Jersey Sq. miles 468 8800000 Asia Israel Jerusalem Sq. miles 78000 800000 Asia,25 N.America,16 Israel,25 Usa,16 Jerusalem,1 Natanya,1 Tel Aviv,1 Zoran,1 … Austin,1 Chicago,1 Dallas,1 New Jersey,1 New York,1 seattle,1 …
  • 15. Projections • Physical structure of the table (logical) • Stored sorted and compressed • Internal maintenance • At least one (super) projection • Projection Types: – Super projection – Query specific projection – Pre join projection – Buddy projection
  • 16. Projections
  • 17. How to build my projections ? • Use DBD • Choose the right columns (General Vs Specific) • Choose the right sort order • Choose the right encoding • Choose the right column to partition by • Choose the right column to segment by
  • 18. Rule of thumbs (Don’t tell Tom Kyte) • Avoid “select * …” • De normalize • Use bulks for DML’s • Use merge join for large joins. • Understand Vertica architecture & your data
  • 19. Delete/Update • Deleted rows are only marked as deleted • Stored in delete vector on disk • Query merge the ROS and Deleted vector to remove deleted records • Data is removed asynchronously during merge out
  • 20. Delete/Update Strata issue 500 MB 2 GB 4 GB
  • 21. Where It Falls Short … • Lack of Features • Documentation • Good for specific types of queries
  • 22. Let’s Dive into Sql Examples 1. Sort Optimization 2. Join Optimization
  • 23. Choose the Right sort order Example select a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from lp_15744040.FACT_VISIT_ROOM a11 group by a11.LP_ACCOUNT_ID;
  • 24. First projection …. table_name projection_name projection_column_name column_position sort_position FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_SESSION_ID 0 0 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad LP_ACCOUNT_ID 1 1 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VS_LP_VISITOR_ID 2 2 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_TRUNC 3 3 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ACCOUNT_ID 4 4 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad ROOM_ID 5 5 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_FROM_DT_ACTUAL 6 6 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad VISIT_TO_DT_ACTUAL 7 7 FACT_VISIT_ROOM FACT_VISIT_ROOM_bad HOT_LEAD_IND 8 8 Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_bad | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
  • 25. Second projection … table_name projection_name projection_column_name column_position sort_position FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 LP_ACCOUNT_ID 0 0 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_SESSION_ID 1 1 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VS_LP_VISITOR_ID 2 2 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_TRUNC 3 3 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ACCOUNT_ID 4 4 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 ROOM_ID 5 5 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_FROM_DT_ACTUAL 6 6 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 VISIT_TO_DT_ACTUAL 7 7 FACT_VISIT_ROOM FACT_VISIT_ROOM_fix1 HOT_LEAD_IND 8 8 Access Path: +-GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.LP_ACCOUNT_ID | +---> GROUPBY PIPELINED [Cost: 7M, Rows: 10K] (PATH ID: 2) | | Group By: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | +---> STORAGE ACCESS for a11 [Cost: 5M, Rows: 199M] (PATH ID: 3) | | | Projection: lp_15744040.FACT_VISIT_ROOM_fix1 | | | Materialize: a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID
  • 26. Results … Elapsed Time First projection GROUPBY HASH (SORT OUTPUT) Time: First fetch (7 rows): 264527.916 ms. All rows formatted: 264527.978 ms Elapsed Time Second projection GROUPBY PIPELINED Time: First fetch (7 rows): 38913.909 ms. All rows formatted: 38913.965 ms
  • 27. 2 Group by Hash Not Sorted Value Count 1 1 1 C B A D 12 2 2 2
  • 28. Group By Pipe Operator Sorted Count( ) =
  • 29. Join Example select a12.DT_WEEK AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 join zzz.DIM_DATE_TIME a12 on (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by a12.DT_WEEK, a11.LP_ACCOUNT_ID  Filter : LP_ACCOUNT_ID, VISIT_FROM_DT_TRUNC  Group By : DT_WEEK , LP_ACCOUNT_ID  Join: VISIT_FROM_DT_TRUNC , DATE_TIME_ID  Select : DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID
  • 30. Full Explain Plan… Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes | | | +-- Outer -> STORAGE ACCESS for a11 [Cost: 421K, Rows: 372M (NO STATISTICS)] (PATH ID: 4) | | | | Projection: zzz.FACT_VISIT_b0 | | | | Materialize: a11.VISIT_FROM_DT_TRUNC | | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes | | | +-- Inner -> STORAGE ACCESS for a12 [Cost: 1K, Rows: 10K (NO STATISTICS)] (PATH ID: 5) | | | | Projection: zzz.DIM_DATE_TIME_node0004 | | | | Materialize: a12.DATE_TIME_ID, a12.DT_WEEK | | | | Filter: ((a12.DATE_TIME_ID >= '2011-09-01 15:28:00'::timestamp) AND (a12.DATE_TIME_ID <= '2011-12-31 12:52:50'::timestamp)) | | | | Execute on: All Nodes
  • 31. Explain Plan (substract)… Access Path:l +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 14M, Rows: 5M (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 6M, Rows: 100M (NO STATISTICS)] (PATH ID: 2) | | Group By: a12.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> JOIN HASH [Cost: 944K, Rows: 372M (NO STATISlTICS)] (PATH ID: 3) | | | Join Cond: (a11.VISIT_FROM_DT_TRUNC = a12.DATE_TIME_ID) | | | Materialize at Output: a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | | Execute on: All Nodes Time: First fetch (6 rows): 56654.894 ms. All rows formatted: 56654.988 ms
  • 32. Solution one - Functions select week(a11.VISIT_FROM_DT_TRUNC) AS DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by week(a11.VISIT_FROM_DT_TRUNC), a11.LP_ACCOUNT_ID; Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 127, Rows: 1 (STALE STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: <SVAR>, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 126, Rows: 1 (STALE STATISTICS)] (PATH ID: 2) | | Group By: (date_part('week', a11.VISIT_FROM_DT_TRUNC))::int, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 125, Rows: 1 (STALE STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_b0 Time: First fetch (6 rows): 33453.997 ms. All rows formatted: 33454.154 ms Saved the Join Time
  • 33. Solution Two- PreJoin Projection Pros • Eliminate Join overhead • Maintain By Vertica Cons • Not Flexible • Cause Overhead on Load • Need Primary/Foreign Key • Maintenance Restrictions
  • 34. Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 12K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin8_b0.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 11K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin8_b0.DT_WEEK, visit_date_time_prejoin8_b0.LP_ACCOUNT_ID, visit_date_time_prejoin8_b0.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 8K, Rows: 1M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin8_b0 Solution Two- PreJoin Projection order by LP_ACCOUNT_ID,VISIT_FROM_DT_TRUNC,DT_WEEK,HOT_LEAD_IND,DATE_TIME_ID,VS_LP_SESSION_ID Time: First fetch (6 rows): 35312.331 ms. All rows formatted: 35312.421 ms Saved the Join Time
  • 35. Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 542K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT visit_date_time_prejoin_z6.VS_LP_SESSION_ID) | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 542K, Rows: 10K] (PATH ID: 2) | | Group By: visit_date_time_prejoin_z6.DT_WEEK, visit_date_time_prejoin_z6.VS_LP_SESSION_ID, visit_date_time_prejoin_z6.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for <No Alias> [Cost: 501K, Rows: 15M] (PATH ID: 3) | | | Projection: lp_15744040.visit_date_time_prejoin_z6 | | Solution Two- PreJoin Projection Sorted By DT_WEEK, LP_ACCOUNT_ID, VS_LP_SESSION_ID Time: First fetch (6 rows): 3680.853 ms. All rows formatted: 3680.969 ms Saved the Join Time and Group by hash Time
  • 36. Solution Three - Denormalize select DT_WEEK, a11.LP_ACCOUNT_ID AS LP_ACCOUNT_ID, count(distinct a11.VS_LP_SESSION_ID) AS Visits, (count(distinct a11.VS_LP_SESSION_ID) * 1.0) AS WJXBFS1 from zzz.FACT_VISIT_Z1 a11 where (a11.LP_ACCOUNT_ID in ('57386690') and a11.VISIT_FROM_DT_TRUNC between '2011-09-01 15:28:00' and '2011-12-31 12:52:50') group by DT_WEEK, a11.LP_ACCOUNT_ID; Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY HASH (SORT OUTPUT) [Cost: 3M, Rows: 10K (NO STATISTICS)] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 2M, Rows: 372M (NO STATISTICS)] (PATH ID: 3) | | | Projection: zzz.FACT_VISIT_Z1_super Time: First etch (6 rows): 33885.178 ms. All rows formatted: 33885.253 ms Saved the Join Time
  • 37. • Changing the projection sort order Solution Three - Denormalize Access Path: +-GROUPBY PIPELINED (RESEGMENT GROUPS) [Cost: 588K, Rows: 10K] (PATH ID: 1) | Aggregates: count(DISTINCT a11.VS_LP_SESSION_ID) | Group By: a11.DT_WEEK, a11.LP_ACCOUNT_ID | Execute on: All Nodes | +---> GROUPBY PIPELINED [Cost: 587K, Rows: 10K] (PATH ID: 2) | | Group By: a11.DT_WEEK, a11.VS_LP_SESSION_ID, a11.LP_ACCOUNT_ID | | Execute on: All Nodes | | +---> STORAGE ACCESS for a11 [Cost: 531K, Rows: 20M] (PATH ID: 3) | | | Projection: zzz.fact_visit_z1_pipe | | | Materialize: a11.DT_WEEK, a11.LP_ACCOUNT_ID, a11.VS_LP_SESSION_ID | | | Filter: (a11.LP_ACCOUNT_ID = '57386690') | | | Filter: ((a11.VISIT_FROM_DT_TRUNC >= '2011-09-01 15:28:00'::timestamp) AND (a11.VISIT_FROM_DT_TRUNC <= '2011-12-31 12:52:50'::timestamp)) | | | Execute on: All Nodes Time: First fetch (6 rows): 4313.497 ms. All rows formatted: 4313.600 ms Saved the Join Time and Group by hash Time
  • 38. Keep it simple. Keep it sorted. *** Keep it joinless. Let’s sum it up…
  • 39. Questions ?
  • 40. Thank You