# Ranges, ranges everywhere (Oracle SQL)

Presentation at #UKOUG_Tech16 and #DOAG2016 on ranges (dates and otherwise) in tables.

1. 1. Ranges, Ranges Everywhere! Stew Ashton (stewashton.wordpress.com) UKOUG Tech 2016 Can you read the following line? If not, please move closer. It's much better when you can read the code ;)
2. 2. Agenda • Defining ranges • Relating ranges: gaps, overlaps • Range DDL: sensible data • Ranges in one table • Ranges in two tables 2
3. 3. Who am I? • 36 years in IT – Developer, Technical Sales Engineer, Technical Architect – Aeronautics, IBM, Finance – Mainframe, client-server, Web apps • 12 years using Oracle database – SQL performance analysis – Replace Java with SQL • 4 years as in-house “Oracle Development Expert” • Conference speaker since 2014 • Currently independent 3
4. 4. Questions 4
5. 5. What is a range? • Two values that can be compared – Always use the same datatype  – Comparable datatypes: • integer, date (without time) • number, datetime, interval, (n)(var)char • rowid • Range design questions: – Is the "end" value part of the range? – Are NULLs allowed? 5
6. 6. Allen’s Interval Algebra 6 1 2 3 4 A precedes B 1 2 B preceded by A 3 4 A meets B 1 2 B met by A 2 3 A overlaps B 1 3 B overlapped by A 2 4 A finished by B 1 3 B finishes A 2 3 A contains B 1 4 B during A 2 3 A starts B 1 2 B started by A 1 3 A and B 1 2 are equal 1 2 Meet Gap "Overlap" 1 2 3 41 2 3 4 A precedes B 1 2 B preceded by A 3 4 1 2 3 4 A precedes B 1 2 B preceded by A 3 4 A meets B 1 2 B met by A 2 3
7. 7. End value: Inclusive or Exclusive • Design must allow ranges to "meet" • Discrete quantities can be inclusive – [1-3] meets [4-6] : no intermediate integer – [Jan. 1-31] meets [Feb. 1-28] : no intermediate date • Continuous quantities require exclusive – Most ranges are continuous (including dates, really) 7
8. 8. Votes for Exclusive end values • SQL:2013 and Oracle 12c Temporal Validity – "Period": date/time range • [Closed-Open): includes start time but not end time • WIDTH_BUCKET() function – Puts values in equiwidth histogram – Buckets must touch – [Closed-open): upper boundary value goes in higher bucket • Me! – Exclusive end values work for every kind of range – Except: ROWID ranges must be inclusive 8
9. 9. DDL: make sure data is sensible • Start_range < End_range • If date without time, CHECK( dte = trunc(dte)) • If integer, say so • Is NULL allowed? – If so, what does it mean? – Ex. Temporal Validity : NULL end value means "until the end of time" • Are overlaps allowed? 9
10. 10. Overlaps avoided by unique constraints 10 Unique(start,end) Unique(start) Unique(end) 1 2 3 4 No constraint works A overlaps B 1 3 B overlapped by A 2 4 Y A finished by B 1 3 B finishes A 2 3 No constraint works A contains B 1 4 B during A 2 3 Y A starts B 1 2 B started by A 1 3 Y Y Y A and B 1 2 are equal 1 2
11. 11. Avoiding Overlaps: 3 solutions 1. Triggers – Hard to do right, not very scalable 2. "Refresh on commit" materialized views – Not scalable? 3. Virtual ranges 11
12. 12. Virtual range: no gaps, no overlaps • One column: start value • End value is calculated: = next row's start – Putting identical value in 2 rows is denormalization • Last row has unlimited end • Maybe OK for audit trails? START_VALUE END_VALUE 16-11-15 08:30 16-11-15 09:30 16-11-15 09:30 16-11-15 18:30 16-11-15 18:30 (null) 12 START_VALUE 16-11-15 08:30 16-11-15 09:30 16-11-15 18:30 Physical (table) Virtual (view)
13. 13. Semi-Virtual range: no overlaps • Start column always used • End column optional: – If null, use next row's start – If not null, use lesser of end column and next row's start – Last row can have limited end • Or: intermediate row with 'not exists' flag – ≅ Change Data Capture format 13 START_VALUE END_VALUE 16-11-15 08:30 16-11-15 09:30 16-11-15 18:30 (null) START_VALUE D 16-11-15 08:30 16-11-15 09:30 D 16-11-15 18:30
14. 14. Range-related SQL • Why hard? – Can't use BETWEEN – Inequality joins impact performance – With overlaps, 1 value point can be in any number of rows – Joining 2 tables with overlaps -> row explosion – NULLs have special meanings • Common problems – Find gaps – Intersect: find overlaps – Union: packing ranges between gaps – Joins • Today, ends are exclusive, everything is NOT NULL (unless specified) 14
16. 16. FROM_TM TO_TM 07:00 08:00 09:00 10:50 10:00 10:45 12:00 12:45 18:00 23:00 select * from ( select max (to_tm) over(order by from_tm) as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t ) where gap_from < gap_to; select to_tm as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t FROM_TM GAP_FROM GAP_TO 07:00 08:00 09:00 09:00 10:50 10:00 10:00 10:45 12:00 12:00 12:45 18:00 18:00 23:00 GAP_FROM GAP_TO 08:00 09:00 10:50 12:00 12:45 18:00 Gaps, ex. Free time in calendar 16 FROM_TM GAP_FROM GAP_TO 07:00 08:00 09:00 09:00 10:50 10:00 10:00 10:50 12:00 12:00 12:45 18:00 18:00 23:00
17. 17. Intersect: finding Overlaps 17 Test case Start End 01:precedes 1 2 01:precedes 3 4 02:meets 1 2 02:meets 2 3 03:overlaps 1 3 03:overlaps 2 4 04:finished by 1 3 04:finished by 2 3 05:contains 1 4 05:contains 2 3 06:starts 1 2 06:starts 1 3 07:equals 1 2 07:equals 1 2 select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1)) A overlaps B 1 3 B overlapped by A 2 4 1 2 2 3 3 4
18. 18. select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1)) select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1)) Intersect: finding Overlaps 18 Test case Dte Col 01:precedes 1 1 01:precedes 2 -1 01:precedes 3 1 01:precedes 4 -1 02:meets 1 1 02:meets 2 -1 02:meets 2 1 02:meets 3 -1 03:overlaps 1 1 03:overlaps 3 -1 03:overlaps 2 1 03:overlaps 4 -1
19. 19. select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1)) select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1)) ) where "Start" < "End"; Intersect: finding Overlaps 19 Test case Start End Rows 01:precedes 1 2 1 01:precedes 2 3 0 01:precedes 3 4 1 01:precedes 4 4 0 02:meets 1 2 1 02:meets 2 2 2 02:meets 2 3 1 02:meets 3 3 0 03:overlaps 1 2 1 03:overlaps 2 3 2 03:overlaps 3 4 1 03:overlaps 4 4 0 ✖ ✖ ✖ ✖
20. 20. select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1)) ) where "Start" < "End"; select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1)) ) where "Rows" > 1 and "Start" < "End"; Intersect: finding Overlaps 20 Test case Start End Rows 01:precedes 1 2 1 01:precedes 2 3 0 01:precedes 3 4 1 02:meets 1 2 1 02:meets 2 3 1 03:overlaps 1 2 1 03:overlaps 2 3 2 03:overlaps 3 4 1 Test case Start End Rows 03:overlaps 2 3 2 04:finished by 2 3 2 05:contains 2 3 2 06:starts 1 2 2 07:equals 1 2 2
21. 21. Test case Start End 01:precedes 1 2 01:precedes 3 4 02:meets 1 2 02:meets 2 3 03:overlaps 1 3 03:overlaps 2 4 04:finished by 1 3 04:finished by 2 3 05:contains 1 4 05:contains 2 3 06:starts 1 2 06:starts 1 3 07:equals 1 2 07:equals 1 2 Packing Ranges 21 Test case Start End 01:precedes 1 2 01:precedes 3 4 02:meets 1 3 03:overlaps 1 4 04:finished by 1 3 05:contains 1 4 06:starts 1 3 07:equals 1 2 Test case Start End 01:precedes 1 2 01:precedes 3 02:meets 1 03:overlaps 1 04:finished by 1 05:contains 1 06:starts 1 07:equals 1 select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date) ); select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date) or end_date is null );
22. 22. JOIN: range to range 22 > create table A(start_n, end_n) as select level, level+1 from dual connect by level <= 10000; > create table B as select start_n+9995 start_n, end_n+9996 end_n from A; > select * from A join B on (A.start_n <= B.start_n and B.start_n < A.end_n) or (B.start_n <= A.start_n and A.start_n < B.end_n); Elapsed: 00:00:13.332 Exadata? All data in buffer cache Elapsed: 00:00:13.332 InMemory? Elapsed: 00:00:09.842
23. 23. JOIN: range to range 23 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.82 | 90 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.82 | 90 | | 2 | CONCATENATION | | 1 | | 10 |00:00:00.01 | 90 | | 3 | MERGE JOIN | | 1 | 55 | 10 |00:00:00.01 | 45 | | 4 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 24 | | 5 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 | |* 6 | FILTER | | 10000 | | 10 |00:00:00.01 | 21 | |* 7 | SORT JOIN | | 10000 | 10000 | 55 |00:00:00.01 | 21 | | 8 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.02 | 21 | | 9 | MERGE JOIN | | 1 | 55 | 0 |00:00:17.80 | 45 | | 10 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.02 | 24 | | 11 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 | |* 12 | FILTER | | 10000 | | 0 |00:00:17.78 | 21 | |* 13 | SORT JOIN | | 10000 | 10000 | 99M|00:01:21.50 | 21 | | 14 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.01 | 21 | ------------------------------------------------------------------------------------------
24. 24. Join, or Sort and Match? 24 A 1 4 B is equal 1 4 B started by A 1 5 B during A 2 3 B finishes A 3 4 B overlapped by A 3 4 5 B met by A 4 5 B preceded by A 5 6 another A 5 7 ✔ ✖ ? ✔ ✔ ✔ ✔
25. 25. Join, or Sort and Match? 25 A 1 4 B is equal 1 4 B started by A 1 5 B during A 2 3 B finishes A 3 4 B overlapped by A 3 4 5 B met by A 4 5 B preceded by A 5 6 another A 5 7 ✖ ? 3 3 3 3
26. 26. 26 select A_start_n, A_end_n, B_start_n, B_end_n from ( select 'A' ttype, A.* from A union all select 'B' ttype, B.* from B ) match_recognize ( order by start_n, end_n measures decode(f.ttype,'A',f.start_n, o.start_n) A_start_n, decode(f.ttype,'A',f.end_n, o.end_n) A_end_n, decode(f.ttype,'B',f.start_n, o.start_n) B_start_n, decode(f.ttype,'B',f.end_n, o.end_n) B_end_n all rows per match after match skip to next row pattern ( {-f-} (o|{-x-})+ ) define o as ttype != f.ttype and start_n < f.end_n, x as start_n < f.end_n ); Elapsed: 00:00:00.063 {- exclusion -} ( grouping ) + at least one Alternation A | B ✔ ✔
28. 28. More! • Overlapping ranges with priority • Data warehouses with date ranges: – Trickle feed • Impact on foreign keys • OLTP • Take advantage of MATCH_RECOGNIZE , 28