SlideShare a Scribd company logo
1 of 35
“Row Pattern Matching” with
Database 12c MATCH_RECOGNIZE
Beating the Best Pre-12c Solutions
Stew Ashton (stewashton.wordpress.com)
OUGN17
Can you read the following line? If not, please move closer.
It's much better when you can read the code ;)
Agenda
• Who am I?
• Pre-12c solutions compared to row pattern
matching with MATCH_RECOGNIZE
– For all sizes of data
– Thinking in patterns
– Syntax only as needed
• Watch out for “catastrophic backtracking”
• Other things to keep in mind (time permitting)
2
Who am I?
• 1967: first FORTRAN program
• 1981-2015: software engineer / architect
• 2005-present: focus on Oracle DB development
– Contribute to asktom & OTN SQL forum
– Presented at OOW, UKOUG tech, DOAG
– 2015 DEVVY awards finalist.
– Advocate of data-centric application architecture
3
4 pre-12c Solutions
• Analytics
1) Tabibitosan / Fixed Difference
2) "Start of Group"
• Model
3) Bin fitting (fixed size)
4) Bin fitting (fixed number)
4
1) “Fixed Difference” / Tabibitosan
PAGE
1
2
3
5
6
7
10
11
12
42
5
-1
-2
-3
-4
= 0
= 0
= 0
= 1
• Identify and group rows with
consecutive values
• My presentation: print best liked slides
• Math: subtract known consecutives (1, 2,…)
– Consecutive means A = B - 1
– If A = B - 1 (consecutive) then A - 1 = B – 2
Else (not consecutive) A - 1 <> B - 2
– Consecutive becomes equality,
non-consecutive becomes inequality
1) Pre-12c
select min(page) firstpage,
max(page) lastpage,
count(*) cnt
FROM (
SELECT page,
page –
Row_Number() over(order by page)
as grp_id
FROM t
)
GROUP BY grp_id;
PAGE [RN] GRP_ID
1 - 1 0
2 - 2 0
3 - 3 0
5 - 4 1
6 - 5 1
7 - 6 1
10 - 7 3
11 - 8 3
12 - 9 3
42 -10 32 6
FIRSTPAGE LASTPAGE CNT
1 3 3
5 7 3
10 12 3
42 42 1
PAGE GRP_ID
1 0
2 0
3 0
5 1
6 1
7 1
10 3
11 3
12 3
42 32
Pattern and Matching Rows
• PATTERN
– Uninterrupted series of input rows
– Described as list of conditions (≅ “regular expressions”)
PATTERN (A B*)
"A" : 1 row, "B*" : 0 or more rows, as many as possible
• DEFINE (at least one) row condition
[A undefined = TRUE]
B AS page = PREV(page)+1
• Each series that matches the pattern is a “match”
– "A" and "B" identify the rows that meet each condition
– There can be unmatched rows between series
7
Input, Processing, Output
1. Define input
2. Order input
3. Process pattern
4. using defined conditions
5. Output: rows per match
6. Output: columns per row
7. Go where after match?
8
SELECT *
FROM t
MATCH_RECOGNIZE (
ORDER BY page
MEASURES
A.page as firstpage,
LAST(page) as lastpage,
COUNT(*) cnt
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B*)
DEFINE B AS page = PREV(page)+1
);
1) Run_Stats comparison
9
For one million rows:
“Latches” are serialization devices: fewer means more scalable
Stat Pre 12c Match_R Pct
Latches 5047 4836 96%
Elapsed Time 1.1164 1.0528 94%
CPU used by this session 1.09 1.04 95%
1) Execution Plans
10
Operation Used-Mem
SELECT STATEMENT
HASH GROUP BY 49M (0)
VIEW
WINDOW SORT 20M (0)
TABLE ACCESS FULL
Operation Used-Mem
SELECT STATEMENT
VIEW
MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTOMATON 20M (0)
TABLE ACCESS FULL
2) “Start of Group”
• Identify group boundaries, often using LAG()
• 3 steps instead of 2:
1. For each row: if start of group, assign 1
Else assign 0
2. Running total of 1s and 0s produces a group
identifier
3. Group by the group identifier
11
2) Requirement
GROUP_NAME START_TS END_TS
X 2014-01-01 00:00 2014-02-01 00:00
X 2014-03-01 00:00 2014-04-01 00:00
X 2014-04-01 00:00 2014-05-01 00:00
X 2014-06-01 00:00 2014-06-01 01:00
X 2014-06-01 01:00 2014-06-01 02:00
X 2014-06-01 02:00 2014-06-01 03:00
Y 2014-06-01 03:00 2014-06-01 04:00
Y 2014-06-01 04:00 2014-06-01 05:00
Y 2014-07-03 08:00 2014-09-29 17:00
12
Merge contiguous date ranges in same group
1
2
2
3
3
3
1
1
2
X 01-01 00:00 02-01 00:00 1
X 03-01 00:00 04-01 00:00 1
X 04-01 00:00 05-01 00:00 0
X 06-01 00:00 06-01 01:00 1
X 06-01 01:00 06-01 02:00 0
X 06-01 02:00 06-01 03:00 0
Y 06-01 03:00 06-01 04:00 1
Y 06-01 04:00 06-01 05:00 0
Y 07-03 08:00 09-29 17:00 1
X 01-01 00:00 02-01 00:00
X 03-01 00:00 05-01 00:00
X 06-01 00:00 06-01 03:00
Y 06-01 03:00 06-01 05:00
Y 07-03 08:00 09-29 17:00
13
with grp_starts as (
select a.*,
case when start_ts =
lag(end_ts) over(
partition by group_name
order by start_ts
)
then 0 else 1 end grp_start
from t a
), grps as (
select b.*,
sum(grp_start) over(
partition by group_name
order by start_ts
) grp_id
from grp_starts b)
select group_name,
min(start_ts) start_ts,
max(end_ts) end_ts
from grps
group by group_name, grp_id;
2) Match_Recognize
14
SELECT * FROM t
MATCH_RECOGNIZE(
PARTITION BY group_name
ORDER BY start_ts
MEASURES
A.start_ts start_ts,
end_ts end_ts
PATTERN(A B*)
DEFINE B AS start_ts = prev(end_ts)
);
New this time:
• Added PARTITION BY
• ONE ROW PER MATCH
and
SKIP PAST LAST ROW
are the defaults
One solution replaces two methods: simple!
2) Run_Stats comparison
15
For 500,000 rows:
Stat Pre 12c Match_R Pct
Latches 8108 7133 88%
Elapsed Time 1.7551 0.9765 56%
CPU used by this session 1.73 0.96 55%
Operation Used-Mem
SELECT STATEMENT
HASH GROUP BY 22M (0)
VIEW
WINDOW BUFFER 32M (0)
VIEW
WINDOW SORT 27M (0)
TABLE ACCESS FULL
Operation Used-Mem
SELECT STATEMENT
VIEW
MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTOMATON 27M (0)
TABLE ACCESS FULL
2) Execution Plans
16
2) Matching within a group
17
SELECT * FROM (
SELECT * from t
WHERE group_name = 'X'
)
MATCH_RECOGNIZE
…
);
Filter before MATCH_RECOGNIZE to avoid extra work
2) Predicate pushing
18
Select * from <view> where group_name = 'X'
Operation Name A-Rows Buffers
SELECT STATEMENT 3 4
VIEW 3 4
MATCH RECOGNIZE SORT DETERMINISTIC
FINITE AUTO
3 4
TABLE ACCESS BY INDEX ROWID
BATCHED
T 6 4
INDEX RANGE SCAN TI 6 3
3) “Bin fitting”: fixed size
• Requirement
– Order by study_site
– Put in “bins” with size =
65,000 max
STUDY_SITE CNT STUDY_SITE CNT
1001 3407 1026 137
1002 4323 1028 6005
1004 1623 1029 76
1008 1991 1031 4599
1011 885 1032 1989
1012 11597 1034 3427
1014 1989 1036 879
1015 5282 1038 6485
1017 2841 1039 3
1018 5183 1040 1105
1020 6176 1041 6460
1022 2784 1042 968
1023 25865 1044 471
1024 3734 1045 3360
FIRST_SITE LAST_SITE SUM_CNT
1001 1022 48081
1023 1044 62203
1045 1045 3360
19
SELECT s first_site, MAX(e) last_site, MAX(sm) sum_cnt FROM (
SELECT s, e, cnt, sm FROM t
MODEL
MEASURES (study_site s, study_site e, cnt, cnt sm)
RULES (
sm[ > 1] =
CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000
THEN cnt[cv()]
ELSE sm[cv() - 1] + cnt[cv()]
END,
s[ > 1] =
CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000
THEN s[cv()]
ELSE s[cv() - 1]
END
)
)
GROUP BY s;
• DIMENSION with row_number
orders data and processing
• rn can be used like a subscript
• cv() means current row
• cv()-1 means previous row
DIMENSION BY (row_number() over(order by study_site) rn)
rn
[cv() – 1] [cv()] [cv()]
[cv()]
[cv() – 1] [cv()]
rn
[cv() - 1] [cv()] [cv()]
[cv()]
[cv() – 1]
20
SELECT * FROM t
MATCH_RECOGNIZE (
ORDER BY study_site
MEASURES
FIRST(study_site) first_site,
LAST(study_site) last_site,
SUM(cnt) sum_cnt
PATTERN (A+)
DEFINE A AS SUM(cnt) <= 65000
);
New this time:
• PATTERN
(A+) replaces (A B*)
means 1 or more rows
• Why? In previous
examples I used PREV(),
which returns NULL on
the first row.
One solution replaces 3 methods: simpler!
21
3) Run_Stats comparison
For 150000 rows:
Stat Pre 12c Match_R Pct
Latches 3501 1312 37%
Elapsed Time 2.4036 0.1028 4%
CPU used by this session 2.39 0 0%
22
Id Operation Used-Mem
0 SELECT STATEMENT
1 HASH GROUP BY 1481K (0)
2 VIEW
3 SQL MODEL ORDERED 17M (0)
4 WINDOW SORT 4330K (0)
5 TABLE ACCESS FULL
Id Operation Used-Mem
0 SELECT STATEMENT
1 VIEW
2 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 4330K(0)
3 TABLE ACCESS FULL
3) Execution Plans
23
Name Val Val BIN1 BIN2 BIN3
1 1 10 10
2 2 9 10 9
3 3 8 10 9 8
4 4 7 10 9 15
5 5 6 10 15 15
6 6 5 15 15 15
7 7 4 19 15 15
8 8 3 19 18 15
9 9 2 19 18 17
10 10 1 19 18 18
4) “Bin fitting”: fixed number
24
• Requirement
– Distribute values in 3
parts as equally as
possible
• “Best fit decreasing”
– Sort values in
decreasing order
– Put each value in least
full “bin”
4) Brilliant pre 12c solution
SELECT bin, Max (bin_value) bin_value
FROM (
SELECT * FROM items
MODEL
DIMENSION BY
(Row_Number() OVER
(ORDER BY item_value DESC) rn)
MEASURES (
item_name,
item_value,
Row_Number() OVER
(ORDER BY item_value DESC) bin,
item_value bin_value,
Row_Number() OVER
(ORDER BY item_value DESC) rn_m,
0 min_bin,
Count(*) OVER () - 3 - 1 n_iters
)
RULES ITERATE(100000)
UNTIL (ITERATION_NUMBER >= n_iters[1]) (
min_bin[1] = Min(rn_m) KEEP (DENSE_RANK
FIRST ORDER BY bin_value)[rn<= 3],
bin[ITERATION_NUMBER + 3 + 1] =
min_bin[1],
bin_value[min_bin[1]] =
bin_value[CV()] +
Nvl(item_value[ITERATION_NUMBER+4], 0))
)
WHERE item_name IS NOT NULL
group by bin;
25
SELECT * from items
MATCH_RECOGNIZE (
ORDER BY item_value desc
MEASURES
sum(bin1.item_value) bin1,
sum(bin2.item_value) bin2,
sum(bin3.item_value) bin3
PATTERN ((bin1|bin2|bin3)+)
DEFINE
bin1 AS count(bin1.*) = 1
OR sum(bin1.item_value)-bin1.item_value
<= least(
sum(bin2.item_value),
sum(bin3.item_value)
),
bin2 AS count(bin2.*) = 1
OR sum(bin2.item_value)-bin2.item_value
<= sum(bin3.item_value)
);
• ()+ = 1 or more of whatever
is inside
• '|' = alternatives,
“preferred in the order
specified”
• Bin1 condition:
• No rows here yet,
• Or this bin least full
• Bin2 condition
• No rows here yet, or
• This bin less full than 3
PATTERN ((bin1|bin2|bin3)+)
bin1 AS count(bin1.*) = 1
OR sum(bin1.item_value)-bin1.item_value
<= least(
sum(bin2.item_value),
sum(bin3.item_value)
),
bin2 AS count(bin2.*) = 1
OR sum(bin2.item_value)-bin2.item_value
<= sum(bin3.item_value)
26
4) Run_Stats comparison
27
For 20,000 rows:
Stat Pre 12c Match_R Pct
Latches 99165 794 .80%
Elapsed Time 58.3251 0.028 .05%
CPU used by this session 56.75 0 .00%
Backtracking
• What happens when there is no match???
• “Greedy” quantifiers * + {2,}
– are not that greedy
– Take all the rows they can, BUT
give rows back if necessary – one at a time
• Regular expression engines will test all possible
combinations to find a match
28
29
123456789
A
AB
ABBB
ABBBB
ABBBBB
ABBBBBB
ABBBBBBB
ABBBBBBBC
ABBBBBBC
ABBBBBC
ABBBBC
ABBBC
ABBC
ABC
AC
Backtracking in action:
pattern(a b* c)
1. Find A
2. Find all the Bs you can
3. At the end, look for a C
4. No C? Backtrack through the Bs
5. Still no C? No Match!
Repeating conditions
select 'match' from (
select level n from dual
connect by level <= 100
)
match_recognize(
pattern(a b* c)
define b as n > prev(n)
, c as n = 0
);
Runs in 0.006 secs
select 'match' from (
select level n from dual
connect by level <= 100
)
match_recognize(
pattern(a b* b* b* c)
define b as n > prev(n)
, c as n = 0
);
Runs in 2.934 secs
30
SELECT * FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES FIRST(tstamp) AS start_tstamp,
LAST(tstamp) AS end_tstamp
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
DEFINE DOWN AS price < PREV(price),
UP AS price > PREV(price),
STRT AS price >= nvl(PREV(PRICE),0)
);
Runs in 0.014 seconds
Imprecise Conditions
CREATE TABLE Ticker (
SYMBOL VARCHAR2(10),
tstamp DATE,
price NUMBER
);
insert into ticker
select 'ACME',
sysdate + level/24/60/60,
10000-level
from dual
connect by level <= 5000;
SELECT * FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES FIRST(tstamp) AS start_tstamp,
LAST(tstamp) AS end_tstamp
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
DEFINE DOWN AS price < PREV(price),
UP AS price > PREV(price)
);
Runs in 13 seconds
INMEMORY: 7 seconds 31
Keep in Mind
• Backtracking
– Precise conditions
– Test data with no matches
• To debug:
Measures classifier() cl,
match_number() mn
All rows per match with
unmatched rows
• No DISTINCT, no LISTAGG
• MEASURES columns must
have aliases
• “Reluctant quantifier” = ?
= JDBC bind variable
(Fix in 12.2 JDBC driver !)
• “Pattern variables” are
identifiers, not bind
variables
32
Questions?
The really good stuff is tomorrow!
"Meet your Match", 10:30, Bundestag 2
33
1) consecutive
values
2) based on
starting point
3) Bin fitting:
fixed size
4) Bin fitting:
fixed number
Pre 12c Analytic function,
Group by
2 analytics,
Group by
MODEL clause
using “subscript”
Procedural
MODEL clause
Match_
Recognize
C = PREV(C)+1 C1=PREV(C2) SUM(C1) <= X 3 conditions
Elapsed 94% 56% 4% 0.05%
Which row do we mean?
Expression DEFINE
MEASURES
ALL ROWS… ONE ROW…
FIRST(start_ts) First row of match
start_ts current row last row of match
LAST(end_ts) current row last row of match
FINAL
LAST(end_ts)
ORA-62509 last row of match
B.start_ts most recent B row last B row
PREV(), NEXT() Physical offset from referenced row
COUNT(*) from first to current row all rows in match
COUNT(B.*) B rows including current row all B rows
34
Output Row “shape”
Per Match PARTITION BY ORDER BY MEASURES Other input
ONE ROW X Omitted X omitted
ALL ROWS X X X X
35
ORA-00918, anyone?

More Related Content

What's hot

Insersion & Bubble Sort in Algoritm
Insersion & Bubble Sort in AlgoritmInsersion & Bubble Sort in Algoritm
Insersion & Bubble Sort in AlgoritmEhsan Ehrari
 
Intro to tsql unit 10
Intro to tsql   unit 10Intro to tsql   unit 10
Intro to tsql unit 10Syed Asrarali
 
Numeric functions in SQL | Oracle
Numeric functions in SQL | OracleNumeric functions in SQL | Oracle
Numeric functions in SQL | OracleVimal Raj
 
String function in my sql
String function in my sqlString function in my sql
String function in my sqlknowledgemart
 
Multi dimensional array
Multi dimensional arrayMulti dimensional array
Multi dimensional arrayRajendran
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting SpatialFAO
 
MySQL 5.7 String Functions
MySQL 5.7 String FunctionsMySQL 5.7 String Functions
MySQL 5.7 String FunctionsFrancesco Marino
 
Topological sort
Topological sortTopological sort
Topological sortjabishah
 
Matlab practice
Matlab practiceMatlab practice
Matlab practiceZunAib Ali
 
Programs in array using SWIFT
Programs in array using SWIFTPrograms in array using SWIFT
Programs in array using SWIFTvikram mahendra
 
Introduction to matlab lecture 4 of 4
Introduction to matlab lecture 4 of 4Introduction to matlab lecture 4 of 4
Introduction to matlab lecture 4 of 4Randa Elanwar
 
Linear models
Linear modelsLinear models
Linear modelsFAO
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
 

What's hot (20)

Insersion & Bubble Sort in Algoritm
Insersion & Bubble Sort in AlgoritmInsersion & Bubble Sort in Algoritm
Insersion & Bubble Sort in Algoritm
 
Sorting ppt
Sorting pptSorting ppt
Sorting ppt
 
Intro to tsql unit 10
Intro to tsql   unit 10Intro to tsql   unit 10
Intro to tsql unit 10
 
Sorting
SortingSorting
Sorting
 
Numeric functions in SQL | Oracle
Numeric functions in SQL | OracleNumeric functions in SQL | Oracle
Numeric functions in SQL | Oracle
 
String function in my sql
String function in my sqlString function in my sql
String function in my sql
 
Multi dimensional array
Multi dimensional arrayMulti dimensional array
Multi dimensional array
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
 
MySQL 5.7 String Functions
MySQL 5.7 String FunctionsMySQL 5.7 String Functions
MySQL 5.7 String Functions
 
Oracle sql ppt2
Oracle sql ppt2Oracle sql ppt2
Oracle sql ppt2
 
Topological sort
Topological sortTopological sort
Topological sort
 
Programming with matlab session 6
Programming with matlab session 6Programming with matlab session 6
Programming with matlab session 6
 
Matlab practice
Matlab practiceMatlab practice
Matlab practice
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Datarepresentation2
Datarepresentation2Datarepresentation2
Datarepresentation2
 
Programs in array using SWIFT
Programs in array using SWIFTPrograms in array using SWIFT
Programs in array using SWIFT
 
Introduction to matlab lecture 4 of 4
Introduction to matlab lecture 4 of 4Introduction to matlab lecture 4 of 4
Introduction to matlab lecture 4 of 4
 
Linear models
Linear modelsLinear models
Linear models
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
Pandas Series
Pandas SeriesPandas Series
Pandas Series
 

Similar to Row Pattern Matching in Oracle Database 12c

SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practiceJano Suchal
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 
Advanced SQL For Data Scientists
Advanced SQL For Data ScientistsAdvanced SQL For Data Scientists
Advanced SQL For Data ScientistsDatabricks
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
TechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - Trivadis
TechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - TrivadisTechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - Trivadis
TechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - TrivadisTrivadis
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
lecture7.ppt
lecture7.pptlecture7.ppt
lecture7.pptEdFeranil
 
MySQL 8.0 NF : Common Table Expressions (CTE)
MySQL 8.0 NF : Common Table Expressions (CTE)MySQL 8.0 NF : Common Table Expressions (CTE)
MySQL 8.0 NF : Common Table Expressions (CTE)I Goo Lee
 
Lecture 02: Preliminaries of Data structure
Lecture 02: Preliminaries of Data structureLecture 02: Preliminaries of Data structure
Lecture 02: Preliminaries of Data structureNurjahan Nipa
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB plc
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internalsAlexey Ermakov
 
Most useful queries
Most useful queriesMost useful queries
Most useful queriesSam Depp
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013Connor McDonald
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3MariaDB plc
 

Similar to Row Pattern Matching in Oracle Database 12c (20)

SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 
Advanced SQL For Data Scientists
Advanced SQL For Data ScientistsAdvanced SQL For Data Scientists
Advanced SQL For Data Scientists
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
TechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - Trivadis
TechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - TrivadisTechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - Trivadis
TechEvent 2019: Uses of Row Pattern Matching; Kim Berg Hansen - Trivadis
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
lecture7.ppt
lecture7.pptlecture7.ppt
lecture7.ppt
 
MySQL 8.0 NF : Common Table Expressions (CTE)
MySQL 8.0 NF : Common Table Expressions (CTE)MySQL 8.0 NF : Common Table Expressions (CTE)
MySQL 8.0 NF : Common Table Expressions (CTE)
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
Lecture 02: Preliminaries of Data structure
Lecture 02: Preliminaries of Data structureLecture 02: Preliminaries of Data structure
Lecture 02: Preliminaries of Data structure
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Most useful queries
Most useful queriesMost useful queries
Most useful queries
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Row Pattern Matching in Oracle Database 12c

  • 1. “Row Pattern Matching” with Database 12c MATCH_RECOGNIZE Beating the Best Pre-12c Solutions Stew Ashton (stewashton.wordpress.com) OUGN17 Can you read the following line? If not, please move closer. It's much better when you can read the code ;)
  • 2. Agenda • Who am I? • Pre-12c solutions compared to row pattern matching with MATCH_RECOGNIZE – For all sizes of data – Thinking in patterns – Syntax only as needed • Watch out for “catastrophic backtracking” • Other things to keep in mind (time permitting) 2
  • 3. Who am I? • 1967: first FORTRAN program • 1981-2015: software engineer / architect • 2005-present: focus on Oracle DB development – Contribute to asktom & OTN SQL forum – Presented at OOW, UKOUG tech, DOAG – 2015 DEVVY awards finalist. – Advocate of data-centric application architecture 3
  • 4. 4 pre-12c Solutions • Analytics 1) Tabibitosan / Fixed Difference 2) "Start of Group" • Model 3) Bin fitting (fixed size) 4) Bin fitting (fixed number) 4
  • 5. 1) “Fixed Difference” / Tabibitosan PAGE 1 2 3 5 6 7 10 11 12 42 5 -1 -2 -3 -4 = 0 = 0 = 0 = 1 • Identify and group rows with consecutive values • My presentation: print best liked slides • Math: subtract known consecutives (1, 2,…) – Consecutive means A = B - 1 – If A = B - 1 (consecutive) then A - 1 = B – 2 Else (not consecutive) A - 1 <> B - 2 – Consecutive becomes equality, non-consecutive becomes inequality
  • 6. 1) Pre-12c select min(page) firstpage, max(page) lastpage, count(*) cnt FROM ( SELECT page, page – Row_Number() over(order by page) as grp_id FROM t ) GROUP BY grp_id; PAGE [RN] GRP_ID 1 - 1 0 2 - 2 0 3 - 3 0 5 - 4 1 6 - 5 1 7 - 6 1 10 - 7 3 11 - 8 3 12 - 9 3 42 -10 32 6 FIRSTPAGE LASTPAGE CNT 1 3 3 5 7 3 10 12 3 42 42 1 PAGE GRP_ID 1 0 2 0 3 0 5 1 6 1 7 1 10 3 11 3 12 3 42 32
  • 7. Pattern and Matching Rows • PATTERN – Uninterrupted series of input rows – Described as list of conditions (≅ “regular expressions”) PATTERN (A B*) "A" : 1 row, "B*" : 0 or more rows, as many as possible • DEFINE (at least one) row condition [A undefined = TRUE] B AS page = PREV(page)+1 • Each series that matches the pattern is a “match” – "A" and "B" identify the rows that meet each condition – There can be unmatched rows between series 7
  • 8. Input, Processing, Output 1. Define input 2. Order input 3. Process pattern 4. using defined conditions 5. Output: rows per match 6. Output: columns per row 7. Go where after match? 8 SELECT * FROM t MATCH_RECOGNIZE ( ORDER BY page MEASURES A.page as firstpage, LAST(page) as lastpage, COUNT(*) cnt ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B*) DEFINE B AS page = PREV(page)+1 );
  • 9. 1) Run_Stats comparison 9 For one million rows: “Latches” are serialization devices: fewer means more scalable Stat Pre 12c Match_R Pct Latches 5047 4836 96% Elapsed Time 1.1164 1.0528 94% CPU used by this session 1.09 1.04 95%
  • 10. 1) Execution Plans 10 Operation Used-Mem SELECT STATEMENT HASH GROUP BY 49M (0) VIEW WINDOW SORT 20M (0) TABLE ACCESS FULL Operation Used-Mem SELECT STATEMENT VIEW MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTOMATON 20M (0) TABLE ACCESS FULL
  • 11. 2) “Start of Group” • Identify group boundaries, often using LAG() • 3 steps instead of 2: 1. For each row: if start of group, assign 1 Else assign 0 2. Running total of 1s and 0s produces a group identifier 3. Group by the group identifier 11
  • 12. 2) Requirement GROUP_NAME START_TS END_TS X 2014-01-01 00:00 2014-02-01 00:00 X 2014-03-01 00:00 2014-04-01 00:00 X 2014-04-01 00:00 2014-05-01 00:00 X 2014-06-01 00:00 2014-06-01 01:00 X 2014-06-01 01:00 2014-06-01 02:00 X 2014-06-01 02:00 2014-06-01 03:00 Y 2014-06-01 03:00 2014-06-01 04:00 Y 2014-06-01 04:00 2014-06-01 05:00 Y 2014-07-03 08:00 2014-09-29 17:00 12 Merge contiguous date ranges in same group
  • 13. 1 2 2 3 3 3 1 1 2 X 01-01 00:00 02-01 00:00 1 X 03-01 00:00 04-01 00:00 1 X 04-01 00:00 05-01 00:00 0 X 06-01 00:00 06-01 01:00 1 X 06-01 01:00 06-01 02:00 0 X 06-01 02:00 06-01 03:00 0 Y 06-01 03:00 06-01 04:00 1 Y 06-01 04:00 06-01 05:00 0 Y 07-03 08:00 09-29 17:00 1 X 01-01 00:00 02-01 00:00 X 03-01 00:00 05-01 00:00 X 06-01 00:00 06-01 03:00 Y 06-01 03:00 06-01 05:00 Y 07-03 08:00 09-29 17:00 13 with grp_starts as ( select a.*, case when start_ts = lag(end_ts) over( partition by group_name order by start_ts ) then 0 else 1 end grp_start from t a ), grps as ( select b.*, sum(grp_start) over( partition by group_name order by start_ts ) grp_id from grp_starts b) select group_name, min(start_ts) start_ts, max(end_ts) end_ts from grps group by group_name, grp_id;
  • 14. 2) Match_Recognize 14 SELECT * FROM t MATCH_RECOGNIZE( PARTITION BY group_name ORDER BY start_ts MEASURES A.start_ts start_ts, end_ts end_ts PATTERN(A B*) DEFINE B AS start_ts = prev(end_ts) ); New this time: • Added PARTITION BY • ONE ROW PER MATCH and SKIP PAST LAST ROW are the defaults One solution replaces two methods: simple!
  • 15. 2) Run_Stats comparison 15 For 500,000 rows: Stat Pre 12c Match_R Pct Latches 8108 7133 88% Elapsed Time 1.7551 0.9765 56% CPU used by this session 1.73 0.96 55%
  • 16. Operation Used-Mem SELECT STATEMENT HASH GROUP BY 22M (0) VIEW WINDOW BUFFER 32M (0) VIEW WINDOW SORT 27M (0) TABLE ACCESS FULL Operation Used-Mem SELECT STATEMENT VIEW MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTOMATON 27M (0) TABLE ACCESS FULL 2) Execution Plans 16
  • 17. 2) Matching within a group 17 SELECT * FROM ( SELECT * from t WHERE group_name = 'X' ) MATCH_RECOGNIZE … ); Filter before MATCH_RECOGNIZE to avoid extra work
  • 18. 2) Predicate pushing 18 Select * from <view> where group_name = 'X' Operation Name A-Rows Buffers SELECT STATEMENT 3 4 VIEW 3 4 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 3 4 TABLE ACCESS BY INDEX ROWID BATCHED T 6 4 INDEX RANGE SCAN TI 6 3
  • 19. 3) “Bin fitting”: fixed size • Requirement – Order by study_site – Put in “bins” with size = 65,000 max STUDY_SITE CNT STUDY_SITE CNT 1001 3407 1026 137 1002 4323 1028 6005 1004 1623 1029 76 1008 1991 1031 4599 1011 885 1032 1989 1012 11597 1034 3427 1014 1989 1036 879 1015 5282 1038 6485 1017 2841 1039 3 1018 5183 1040 1105 1020 6176 1041 6460 1022 2784 1042 968 1023 25865 1044 471 1024 3734 1045 3360 FIRST_SITE LAST_SITE SUM_CNT 1001 1022 48081 1023 1044 62203 1045 1045 3360 19
  • 20. SELECT s first_site, MAX(e) last_site, MAX(sm) sum_cnt FROM ( SELECT s, e, cnt, sm FROM t MODEL MEASURES (study_site s, study_site e, cnt, cnt sm) RULES ( sm[ > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000 THEN cnt[cv()] ELSE sm[cv() - 1] + cnt[cv()] END, s[ > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000 THEN s[cv()] ELSE s[cv() - 1] END ) ) GROUP BY s; • DIMENSION with row_number orders data and processing • rn can be used like a subscript • cv() means current row • cv()-1 means previous row DIMENSION BY (row_number() over(order by study_site) rn) rn [cv() – 1] [cv()] [cv()] [cv()] [cv() – 1] [cv()] rn [cv() - 1] [cv()] [cv()] [cv()] [cv() – 1] 20
  • 21. SELECT * FROM t MATCH_RECOGNIZE ( ORDER BY study_site MEASURES FIRST(study_site) first_site, LAST(study_site) last_site, SUM(cnt) sum_cnt PATTERN (A+) DEFINE A AS SUM(cnt) <= 65000 ); New this time: • PATTERN (A+) replaces (A B*) means 1 or more rows • Why? In previous examples I used PREV(), which returns NULL on the first row. One solution replaces 3 methods: simpler! 21
  • 22. 3) Run_Stats comparison For 150000 rows: Stat Pre 12c Match_R Pct Latches 3501 1312 37% Elapsed Time 2.4036 0.1028 4% CPU used by this session 2.39 0 0% 22
  • 23. Id Operation Used-Mem 0 SELECT STATEMENT 1 HASH GROUP BY 1481K (0) 2 VIEW 3 SQL MODEL ORDERED 17M (0) 4 WINDOW SORT 4330K (0) 5 TABLE ACCESS FULL Id Operation Used-Mem 0 SELECT STATEMENT 1 VIEW 2 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 4330K(0) 3 TABLE ACCESS FULL 3) Execution Plans 23
  • 24. Name Val Val BIN1 BIN2 BIN3 1 1 10 10 2 2 9 10 9 3 3 8 10 9 8 4 4 7 10 9 15 5 5 6 10 15 15 6 6 5 15 15 15 7 7 4 19 15 15 8 8 3 19 18 15 9 9 2 19 18 17 10 10 1 19 18 18 4) “Bin fitting”: fixed number 24 • Requirement – Distribute values in 3 parts as equally as possible • “Best fit decreasing” – Sort values in decreasing order – Put each value in least full “bin”
  • 25. 4) Brilliant pre 12c solution SELECT bin, Max (bin_value) bin_value FROM ( SELECT * FROM items MODEL DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn) MEASURES ( item_name, item_value, Row_Number() OVER (ORDER BY item_value DESC) bin, item_value bin_value, Row_Number() OVER (ORDER BY item_value DESC) rn_m, 0 min_bin, Count(*) OVER () - 3 - 1 n_iters ) RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) ( min_bin[1] = Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn<= 3], bin[ITERATION_NUMBER + 3 + 1] = min_bin[1], bin_value[min_bin[1]] = bin_value[CV()] + Nvl(item_value[ITERATION_NUMBER+4], 0)) ) WHERE item_name IS NOT NULL group by bin; 25
  • 26. SELECT * from items MATCH_RECOGNIZE ( ORDER BY item_value desc MEASURES sum(bin1.item_value) bin1, sum(bin2.item_value) bin2, sum(bin3.item_value) bin3 PATTERN ((bin1|bin2|bin3)+) DEFINE bin1 AS count(bin1.*) = 1 OR sum(bin1.item_value)-bin1.item_value <= least( sum(bin2.item_value), sum(bin3.item_value) ), bin2 AS count(bin2.*) = 1 OR sum(bin2.item_value)-bin2.item_value <= sum(bin3.item_value) ); • ()+ = 1 or more of whatever is inside • '|' = alternatives, “preferred in the order specified” • Bin1 condition: • No rows here yet, • Or this bin least full • Bin2 condition • No rows here yet, or • This bin less full than 3 PATTERN ((bin1|bin2|bin3)+) bin1 AS count(bin1.*) = 1 OR sum(bin1.item_value)-bin1.item_value <= least( sum(bin2.item_value), sum(bin3.item_value) ), bin2 AS count(bin2.*) = 1 OR sum(bin2.item_value)-bin2.item_value <= sum(bin3.item_value) 26
  • 27. 4) Run_Stats comparison 27 For 20,000 rows: Stat Pre 12c Match_R Pct Latches 99165 794 .80% Elapsed Time 58.3251 0.028 .05% CPU used by this session 56.75 0 .00%
  • 28. Backtracking • What happens when there is no match??? • “Greedy” quantifiers * + {2,} – are not that greedy – Take all the rows they can, BUT give rows back if necessary – one at a time • Regular expression engines will test all possible combinations to find a match 28
  • 29. 29 123456789 A AB ABBB ABBBB ABBBBB ABBBBBB ABBBBBBB ABBBBBBBC ABBBBBBC ABBBBBC ABBBBC ABBBC ABBC ABC AC Backtracking in action: pattern(a b* c) 1. Find A 2. Find all the Bs you can 3. At the end, look for a C 4. No C? Backtrack through the Bs 5. Still no C? No Match!
  • 30. Repeating conditions select 'match' from ( select level n from dual connect by level <= 100 ) match_recognize( pattern(a b* c) define b as n > prev(n) , c as n = 0 ); Runs in 0.006 secs select 'match' from ( select level n from dual connect by level <= 100 ) match_recognize( pattern(a b* b* b* c) define b as n > prev(n) , c as n = 0 ); Runs in 2.934 secs 30
  • 31. SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) AS start_tstamp, LAST(tstamp) AS end_tstamp AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+ DOWN+ UP+) DEFINE DOWN AS price < PREV(price), UP AS price > PREV(price), STRT AS price >= nvl(PREV(PRICE),0) ); Runs in 0.014 seconds Imprecise Conditions CREATE TABLE Ticker ( SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER ); insert into ticker select 'ACME', sysdate + level/24/60/60, 10000-level from dual connect by level <= 5000; SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) AS start_tstamp, LAST(tstamp) AS end_tstamp AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+ DOWN+ UP+) DEFINE DOWN AS price < PREV(price), UP AS price > PREV(price) ); Runs in 13 seconds INMEMORY: 7 seconds 31
  • 32. Keep in Mind • Backtracking – Precise conditions – Test data with no matches • To debug: Measures classifier() cl, match_number() mn All rows per match with unmatched rows • No DISTINCT, no LISTAGG • MEASURES columns must have aliases • “Reluctant quantifier” = ? = JDBC bind variable (Fix in 12.2 JDBC driver !) • “Pattern variables” are identifiers, not bind variables 32
  • 33. Questions? The really good stuff is tomorrow! "Meet your Match", 10:30, Bundestag 2 33 1) consecutive values 2) based on starting point 3) Bin fitting: fixed size 4) Bin fitting: fixed number Pre 12c Analytic function, Group by 2 analytics, Group by MODEL clause using “subscript” Procedural MODEL clause Match_ Recognize C = PREV(C)+1 C1=PREV(C2) SUM(C1) <= X 3 conditions Elapsed 94% 56% 4% 0.05%
  • 34. Which row do we mean? Expression DEFINE MEASURES ALL ROWS… ONE ROW… FIRST(start_ts) First row of match start_ts current row last row of match LAST(end_ts) current row last row of match FINAL LAST(end_ts) ORA-62509 last row of match B.start_ts most recent B row last B row PREV(), NEXT() Physical offset from referenced row COUNT(*) from first to current row all rows in match COUNT(B.*) B rows including current row all B rows 34
  • 35. Output Row “shape” Per Match PARTITION BY ORDER BY MEASURES Other input ONE ROW X Omitted X omitted ALL ROWS X X X X 35 ORA-00918, anyone?