SQL Pattern Matching – should I start using it?

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
12c SQL Pattern Matching –
wann werde ich das benutzen?
Andrej Pashchenko
Senior Consultant
Trivadis GmbH

Unser Unternehmen.
12c SQL Pattern Matching – wann werde ich das benutzen?2 19.11.2015
Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution
Engineering und der Erbringung von IT-Services mit Fokussierung auf -
und -Technologien in der Schweiz, Deutschland, Österreich und
Dänemark. Trivadis erbringt ihre Leistungen aus den strategischen Geschäftsfeldern:
Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme.
B E T R I E B

KOPENHAGEN
MÜNCHEN
LAUSANNE
BERN
ZÜRICH
BRUGG
GENF
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
WIEN
Mit über 600 IT- und Fachexperten bei Ihnen vor Ort.
14 Trivadis Niederlassungen mit
über 600 Mitarbeitenden.
Über 200 Service Level Agreements.
Mehr als 4'000 Trainingsteilnehmer.
Forschungs- und Entwicklungsbudget:
CHF 5.0 Mio.
Finanziell unabhängig und
nachhaltig profitabel.
Erfahrung aus mehr als 1'900 Projekten
pro Jahr bei über 800 Kunden.

Über mich
Senior Consultant bei der Trivadis GmbH, Düsseldorf
Schwerpunkt Oracle
– Application Development
– Application Performance
– Data Warehousing
22 Jahre IT-Erfahrung, davon 16 Jahre mit Oracle DB
Kurs-Referent „Oracle 12c New Features für Entwickler“
und „Beyond SQL and PL/SQL“
Blog: http://blog.sqlora.com

Agenda
1. Introduction
2. Find consecutive ranges and gaps
3. Trouble Ticket roundtrip
4. Grouping on fuzzy criteria
5. Merge temporal intervals

Introduction

Introduction
Analytic
functions
Analytic
functions
enhancements
SQL Model
Clause
LISTAGG
NTH_VALUE
PIVOT/UNPIVOT
clause
Pattern
Matching
Top-N

Introduction
Oracle 12c database supports SQL Pattern Matching with the new
clause - MATCH_RECOGNIZE
pattern matching in a sequences of rows
nothing to do with string patterns (PL/SQL REGEXP_...
functions)
it‘s a clause, not a function
after the table name in FROM clause
patterns are expressed with regular expression syntax over
pattern variables
pattern variables are defined as SQL expressions
19.11.2015 12c SQL Pattern Matching – wann werde ich das benutzen?8

Introduction
MATCH_RECOGNIZE
( [ PARTITION BY <cols> ]
[ ORDER BY <cols> ]
[ MEASURES <cols> ]
[ ONE ROW PER MATCH | ALL ROWS PER MATCH ]
[ SKIP_TO <option> ]
PATTERN ( <row pattern> )
[ SUBSET <subset list> ]
DEFINE <definition list> )

Introduction
Example: Find Mappings in the ETL logging table, which were
increasingly faster over a period of four days. Output: start and end dates
of the period, elapsed time at the beginning and the end of the period,
average elapsed time.

Introduction
SELECT etl_date, mapping_name, elapsed
FROM dwh_etl_runs;
...
04-NOV-14 MAP_STG_S_ORDER_ITEM +000000 00:14:54.42738
05-NOV-14 MAP_STG_S_ORDER +000000 00:10:13.44989
05-NOV-14 MAP_STG_S_ASSET +000000 00:14:15.22855
...

Introduction
12c SQL Pattern Matching – wann werde ich das benutzen?12
SELECT *
FROM dwh_etl_runs MATCH_RECOGNIZE (
PARTITION BY mapping_name
ORDER BY etl_date
MEASURES FIRST (etl_date) AS start_date
, LAST (etl_date) AS end_date
, FIRST (elapsed) AS first_elapsed
, LAST (elapsed) AS last_elapsed
, AVG(elapsed) AS avg_elapsed
PATTERN (STRT DOWN{3})
DEFINE DOWN AS elapsed < PREV(elapsed) )
As for analytic functions:
partition and order
Define measures, which are
accessible in the main query
Define search pattern with
regular expression over boolean
pattern variables
Define pattern variables
Navigation operators:
▪ PREV, NEXT – physical offset
▪ FIRST, LAST – logical offset
19.11.2015

Introduction
PATTERN: Subset of Perl syntax for regular expressions
– * — 0 or more iterations
– + — 1 or more iterations
– ? — 0 or 1 iterations
– {n} — n iterations (n > 0)
– {n,} — n or more iterations (n >= 0)
– {n,m} — between n and m (inclusive) iterations (0 <= n <= m, 0 < m)
– {,m} — between 0 and m (inclusive) iterations (m > 0)
– ( ) – Grouping
– | – Alternation
– {- … -} – Exclusion
– ^ - before the first row in the Partition
– $ - after the last row in the partition
– ? – “reluctant” vs. “greedy”
– ….
19.11.2015

Introduction
Patterns are everywhere
Financial
Telcos
Retail Traffic
Automotive
Transport /
Logistics
Fraud Detection
Quality of Service
Trouble Ticketing
Price Trends
Buying Patterns
Stock Market Money
Laundering
Sensor Data
Network Activity
Advertising
Campaigns
Sessionization
Frequent Flyer
Programms
Process Chain
CRM
19.11.2015

Introduction
SQL had no efficient way to handle such questions
pre 12c solutions
self-joins, subqueries (NOT) IN, (NOT) EXISTS
switch to PL/SQL - „Do it yourself“, often multiple SQL queries
transfer some logic to pipelined functions and integrate them in
the main query
analytic (window) functions
– ORA-30483: window functions are not allowed here
– not possible to use in WHERE clause
– not possible to nest them
– unable to access the output of analytic functions in other rows
– often leads to nesting queries, self-joins, etc.
19.11.2015

Agenda
1. Introduction

Find consecutive ranges and gaps

Find Consecutive Ranges / Gaps
SLA, QoS: find the longest period without outage
Table T_GAPS
Find consecutive ranges in the values of column ID
Output: Start- and End-ID of consecutive range
ID
1
2
3
5
6
10
11
12
14
20
21
…
mr_consecutive.sql
Start of Range End of Range
1 3
5 6
10 12
19.11.2015

Pre 12c solution using analytic functionsID
1
2
3
5
6
10
11
12
14
20
21
…
WITH groups_marked AS (
SELECT id
, CASE
WHEN id != LAG(id,1,id) OVER(ORDER BY id) + 1 THEN 1
ELSE 0
END new_grp
FROM t_gaps)
, sum_grp AS (
SELECT id, SUM(new_grp) OVER(ORDER BY id) grp_sum
FROM groups_marked )
SELECT MIN(id) start_of_range
, MAX(id) end_of_range
FROM sum_grp
GROUP BY grp_sum
ORDER BY grp_sum;
mr_consecutive.sql
19.11.2015

„Tabibitosan“- method*
* - https://community.oracle.com/message/3991177#3991177
ID
1
2
3
5
6
10
11
12
14
20
21
…
SELECT MIN(id) start_of_range
, MAX(id) end_of_range
FROM (SELECT id
, id - ROW_NUMBER() OVER(ORDER BY id) distance
FROM t_gaps)
GROUP BY distance
ORDER BY distance;
mr_consecutive.sql
19.11.2015

12c solution with MATCH_RECOGINZEID
1
2
3
5
6
10
11
12
14
20
21
…
SELECT *
FROM t_gaps MATCH_RECOGNIZE (
ORDER BY id
MEASURES FIRST(id) start_of_range
, LAST(id) end_of_range
, COUNT(*) cnt
ONE ROW PER MATCH
PATTERN (strt cont*)
DEFINE cont AS id = PREV(id)+1
);
mr_consecutive.sql
19.11.2015

Table T_GAPS, numeric column ID with gaps
Find the gaps in the values of column ID
Output: start- and end-ID of the gap
ID
1
2
3
5
6
10
11
12
14
20
21
…
mr_gaps.sql
Start of Gap End of Gap
4 4
7 9
13 13
15 19
19.11.2015

Solution with analytic functions
„Tabibitosan“-method*
* - https://community.oracle.com/message/3991177#3991177
ID
1
2
3
5
6
10
11
12
14
20
21
…
mr_gaps.sql
SELECT start_of_gap, end_of_gap
FROM ( SELECT id + 1 start_of_gap
, LEAD(id) OVER(ORDER BY id) - 1 end_of_gap
, CASE
WHEN id + 1 != LEAD(id) OVER(ORDER BY id) THEN 1
ELSE 0
END is_gap
FROM t_gaps)
WHERE is_gap = 1;
SELECT MAX(id) + 1 start_of_gap
, LEAD(MIN(id)) OVER (ORDER BY distance) -1 end_of_gap
FROM (SELECT id
, id - ROW_NUMBER() OVER(ORDER BY id) distance
FROM t_gaps)
GROUP BY distance;
19.11.2015

12c solution with MATCH_RECOGINZEID
1
2
3
5
6
10
11
12
14
20
21
…
mr_gaps.sql
SELECT *
FROM t_gaps MATCH_RECOGNIZE (
ORDER BY id
MEASURES PREV(gap.id)+1 start_of_gap
, gap.id - 1 end_of_gap
ONE ROW PER MATCH
PATTERN (strt gap+)
DEFINE gap AS id != PREV(id)+1
);
19.11.2015

Agenda
1. Introduction

Trouble Ticket roundtrip

Trouble Ticket Roundtrip
SCOTT
ADAMS
KING
ID Assignee Datum
1 SCOTT 01.02.2015
1 SCOTT 02.02.2015
1 ADAMS 03.02.2015
1 SCOTT 04.02.2015
2 ADAMS 01.02.2015
2 ADAMS 02.02.2015
2 SCOTT 03.02.2015
3 KING 01.02.2015
3 ADAMS 02.02.2015
3 ADAMS 03.02.2015
3 KING 04.02.2015
3 ADAMS 05.02.2015
4 KING 01.02.2015
4 ADAMS 02.02.2015
4 SCOTT 03.02.2015
4 KING 05.02.2015
▪ Find the tickets, which went
again to the same assignee
19.11.2015

Pre12c solution using self-joins
mr_trouble_ticket.sql
SELECT DISTINCT t1.ticket_id
, t1.assignee AS first_assignee
, t3.change_date AS last_change
FROM trouble_ticket t1
, trouble_ticket t2
, trouble_ticket t3
WHERE t1.ticket_id = t2.ticket_id
AND t1.assignee != t2.assignee
AND t2.change_date > t1.change_date
AND t3.assignee = t1.assignee
AND t3.ticket_id = t1.ticket_id
AND t3.change_date > t2.change_date
ORDER BY ticket_id
19.11.2015

12c solution using MATCH_RECOGINZE clause
New:
– Row Pattern Skip To:
where to start over after
match?
– match overlaping patterns
mr_trouble_ticket.sql
SELECT *
FROM trouble_ticket
MATCH_RECOGNIZE(
PARTITION BY ticket_id
ORDER BY change_date
MEASURES strt.assignee as first_assignee
, LAST(same.change_date) as letzte_bearbeitung
AFTER MATCH SKIP TO FIRST another
PATTERN (strt another+ same+)
DEFINE same AS same.assignee = strt.assignee,
another AS another.assignee != strt.assignee
);
Where to start over after a
match is found?
19.11.2015

Agenda
1. Introduction

Grouping on fuzzy criteria

Grouping over fuzzy criteria
„Sessionization“
– Group rows together where the gap between the timestamps is less
than defined
...
PATTERN (STRT SESS+)
DEFINE SESS AS SESS.ins_date – PREV(SESS.ins_date)<= 10/24/60
– Group rows together that are within a defined interval relatively to the
first row, otherwise start next group
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID
:13946369553642#3478381500346951056
...
PATTERN (A+)
DEFINE A AS ins_date < FIRST(ins_date) + 6/24
Group over running totals
– Split the data into the groups of defined capacity
19.11.2015

Example-Schema SH (Sales History)
Task: split the data into the group of fixed
capacity
▪ Fit all customers ordered by age into
groups providing that total sales in every
group < 200 000$
19.11.2015

12c solution with MATCH_RECOGINZE clause
mr_group_running_total.sql
WITH q AS (SELECT c.cust_id, c.cust_year_of_birth
, SUM(s.amount_sold) cust_amount_sold
FROM customers c JOIN sales s ON s.cust_id = c.cust_id
GROUP BY c.cust_id, c.cust_year_of_birth
)
SELECT *
FROM q
MATCH_RECOGNIZE(
ORDER BY cust_year_of_birth
MEASURES MATCH_NUMBER() gruppe
, SUM(cust_amount_sold) running_sum
, FINAL SUM(cust_amount_sold) final_sum
ALL ROWS PER MATCH
PATTERN (gr*)
DEFINE gr AS SUM(cust_amount_sold)<=200000
);
We need all matches
Aggregate function in
pattern variable‘s condition
function returns the macth
number
Aggregates in MEASURES:
Running vs. Final
19.11.2015

Agenda
1. Introduction

Merge temporal intervals

Temporal version of SCOTT-Schema: the data in EMP, DEPT and
JOB have temporal validity (VALID_FROM - VALID_TO)
19.11.2015

Task: Query the data for one employee joining four tables with
respect of temporal validity:
19.11.2015

WITH joined AS (
SELECT e.empno,
g.valid_from,
LEAST( e.valid_to, d.valid_to, j.valid_to,
NVL(m.valid_to, e.valid_to),
LEAD(g.valid_from - 1, 1, e.valid_to) OVER(
PARTITION BY e.empno ORDER BY g.valid_from )
) AS valid_to,
e.ename, j.job, e.mgr, m.ename AS mgr_ename, e.hiredate,
e.sal, e.comm, e.deptno, d.dname
FROM empv e
INNER JOIN (SELECT valid_from FROM empv
UNION
SELECT valid_from FROM deptv
UNION
SELECT valid_from FROM jobv
UNION
SELECT valid_to + 1 FROM empv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM deptv
WHERE valid_to != DATE '9999-12-31'
UNION
SELECT valid_to + 1 FROM jobv
WHERE valid_to != DATE '9999-12-31') g
ON g.valid_from BETWEEN e.valid_from AND e.valid_to
INNER JOIN deptv d
ON d.deptno = e.deptno AND g.valid_from BETWEEN d.valid_from AND d.valid_to
INNER JOIN jobv j
ON j.jobno = e.jobno AND g.valid_from BETWEEN j.valid_from AND j.valid_to
LEFT JOIN empv m
ON m.empno = e.mgr AND g.valid_from BETWEEN m.valid_from AND m.valid_to )
...
Quelle: Philipp Salvisberg:
http://www.salvis.com/blog/2012/12/28/joining-temporal-intervals-part-2/
19.11.2015

...
SELECT empno, valid_from, valid_to, ename, job, mgr,
mgr_ename, hiredate, sal, comm, deptno, dname
FROM joined
MATCH_RECOGNIZE (
PARTITION BY empno, ename, job, mgr,
mgr_ename, hiredate, sal, comm,
deptno, dname
ORDER BY valid_from
MEASURES FIRST(valid_from) valid_from,
LAST(valid_to) valid_to
PATTERN ( strt nxt* )
DEFINE nxt as valid_from = prev(valid_to) + 1
)
WHERE empno = 7788;
19.11.2015

Conclusion
Very powerful feature
Significantly simplifies a lot of queries (self-joins, semi-, anti-joins, nested queries),
mostly with performance benefit
Since 2007 a proposal for ANSI-SQL
Requires thinking in patterns
Complicated syntax (at first sight )
But in many cases the code looks like the requirement in „plain English“
19.11.2015

Further information...
Database Data Warehousing Guide - SQL for Pattern Matching -
http://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956
Stewart Ashton‘s Blog - https://stewashton.wordpress.com
Oracle Whitepaper - Patterns everywhere - Find them Fast! -
http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/1965433.pdf
19.11.2015

Trivadis an der DOAG 2015
Ebene 3 - gleich neben der Rolltreppe
Wir freuen uns auf Ihren Besuch.
Denn mit Trivadis gewinnen Sie immer.

SQL Pattern Matching – should I start using it?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SQL Pattern Matching – should I start using it?

Similar to SQL Pattern Matching – should I start using it? (20)

Recently uploaded

Recently uploaded (20)

SQL Pattern Matching – should I start using it?