This document discusses various ways that window functions can be used to analyze event data. It provides examples and templates for calculating cumulative sums, growth rates, identifying first events, sessionizing events, finding sequence lengths, joining on time intervals, and deduplicating records. Common use cases include analyzing trends over time, identifying changes or transitions, joining related events, and cleaning duplicate data. Templates are provided that can be adapted for different analyses involving partitions, orders, lags, leads and rankings.
5. • NULLS values are treated as their own group over
the partitioned columns.
NULLS !!!!!
6. • NULLS values are treated as their own group over
the partitioned columns.
• They sorted and ranked according to the NULLS
FIRST or NULLS LAST option.
NULLS !!!!!
7. • NULLS values are treated as their own group over
the partitioned columns.
• They sorted and ranked according to the NULLS
FIRST or NULLS LAST option.
• By default, NULL values are sorted and ranked last
in ASC ordering, and sorted and ranked first in
DESC ordering.
NULLS !!!!!
11. Supported window functions
• Window function are vendor ‘ dependent!!!!
• Most vendors support the all the basic window functions
like avg, min, max, count, sum, lead, lag, rank etc.
12. Supported window functions
• Window function are vendor ‘ dependent!!!!
• Most vendors support the all the basic window functions
like avg, min, max, count, sum, lead, lag, rank etc.
• These are the vendor specific supported window
functions:
13. Supported window functions
• Window function are vendor ‘ dependent!!!!
• Most vendors support the all the basic window functions
like avg, min, max, count, sum, lead, lag, rank etc.
• These are the vendor specific supported window
functions:
14.
15. Our Business use case
Lets say we have the following events table
17. Our Business use case
Event_id time user_id type click_num
1 18:00 1 a 2
2 18:20 1 a 3
3 18:59 1 b 4
4 18:00 2 b 1
Lets say we have the following events table
21. Calculating Cumulative sum via sum
SELECT event_id,
time,
user_id,
type,
sum(click_num) over (PARTITION BY user_id ORDER BY time
ROWS UNBOUNDED PRECEDING)
FROM t
32. Calculate Growth
• Calculate growth of accounts month after month
• Calculate growth of orders amount per store month after month
• Calculate growth of orders amount per store and opportunity
33. Calculate Growth
• Calculate growth of accounts month after month
• Calculate growth of orders amount per store month after month
• Calculate growth of orders amount per store and opportunity HARD :P
38. Indicate the first event for each user
SELECT event_id,
time,
user_id,
type
floor(1/ row_number() over (PARTITION BY user_id ORDER BY time))
FROM t
42. Indicate the first template use-cases
• Our analysis need to indicate rank without changing.
granularity (filtering for example or summation of sub
granularity).
43. Indicate the first template use-cases
• Our analysis need to indicate rank without changing.
granularity (filtering for example or summation of sub
granularity).
• When we want to consolidate tables which indicate
changes and the source doesn’t support it. For example
table which indicate transitions to plans might miss the
first plans.
44. Indicate the first template use-cases
• Our analysis need to indicate rank without changing.
granularity (filtering for example or summation of sub
granularity).
• When we want to consolidate tables which indicate
changes and the source doesn’t support it. For example
table which indicate transitions to plans might miss the
first plans.
• Our query goal is not clear whether changing granularity
is needed.
45. Indicate the first template questions
• Show indicator on the first segment touch point on the store
• Show indicator for the first event of product change for store
54. Creating session template
SELECT *,
SUM(new_session) over (PARTITION BY user_id ORDER BY time) AS new_session
FROM ( SELECT *
CASE WHEN (time - LEAD(time) over (PARTITION BY user_id ORDER BY time)) >
<INTERVAL>
THEN 1
ELSE 0
END AS new_session
FROM t
)
56. Creating session with condition
Lets say we don’t want to sessionize events with type b
57. Creating session with condition
Event_id time user_id type session
1 18:00 1 a 1
2 18:20 1 b
3 18:59 1 a 2
4 18:00 2 a 1
Lets say we don’t want to sessionize events with type b
60. Creating session with condition
SELECT *,
SUM(new_session) over (PARTITION BY user_id ,condition ORDER BY time) AS new_session
FROM ( SELECT *
CASE WHEN (time - LEAD(time) over (PARTITION BY user_id,condition ORDER BY time)) >
<INTERVAL>
THEN 1
ELSE 0
END AS new_session
FROM t
)
61. Creating session with condition
template use-cases
• Our analysis need to indicate rank without changing.
granularity (filtering for example).
65. Finding Series length use-cases
• Our analysis need to check sequence of improvement:
66. Finding Series length questions
• find the largest sequence of month with increase in lead creation
• find the largest sequence of month with decrease in ticket for product x
• find the difference in sequence of month with decrease in ticket for product
x after bug was fixed
67. Time decay template use-cases
Our analysis has time related importance calculation
• Calculated recommended movie , if I saw a movie a week
ago it will probably be more relevant to my interest than
movie I saw 5 years ago .
68. Our Business use case
Event_id time user_id package
1 1/1/18 1 a
2 1/2/18 1 b
3 1/6/18 1 a
4 1/7/18 1 c
Lets say we have the following package change table
70. Join on time interval
Lets say we have the following package change
table
71. Join on time interval
Event_id From_time To_time user_id package
1 1/1/18 1/2/18 1 a
2 1/2/18 1/6/18 1 b
3 1/6/18 1/7/18 1 a
4 1/7/18 NULL 1 c
Lets say we have the following package change
table
72. Join on time interval
Event_id From_time To_time user_id package
1 1/1/18 1/2/18 1 a
2 1/2/18 1/6/18 1 b
3 1/6/18 1/7/18 1 a
4 1/7/18 1/1/99 1 c
Lets say we have the following package change
table
*
74. Join on time interval
SELECT event_id,
type,
user_id,
package,
time AS from_time,
ISNULL(LEAD(time) over (PARTITION BY user_id ORDER BY time)),’1/1/99’)
AS to_time
FROM t
78. Join on time interval template use-cases
• We need to join between two events which didn’t occur in
the exact same time
79. Join on time interval template use-cases
• We need to join between two events which didn’t occur in
the exact same time
• When we want to create “contracts” when we have
events only.
80. Join on time interval template use-cases
• We need to join between two events which didn’t occur in
the exact same time
• When we want to create “contracts” when we have
events only.
• When we want to consolidate two “contracts” from
different sources.
81. Join on time interval template questions
• Connecting orders to the browsing session which resulted in the
purchase
• For each month understand which package the customer is.
82.
83. Our Business use case
Event_id time user_id type click_num
1 18:00 1 a 2
2 18:20 1 a 3
3 18:59 1 b 4
3 18:59 1 b 4
Lets say we have the following events table
Nice trick: https://blog.jooq.org/2016/10/31/a-little-known-sql-feature-use-logical-windowing-to-aggregate-sliding-ranges/
Not supported in most engines
Example segment
Example segment
Example segment
Example segment
Example segment
Example segment
Can be used for sessionization as well
- Link is https://www.youtube.com/watch?v=mgipNdAgQ3o&t=1070s%3Fstart=20:08&end=24:34