Simple Works Best

PDVBV
Partitioning
Simple works Best…
Piet de Visser
Simple Oracle DBA
Piet de Visser - PDVBV
Quotes: “The Limitation shows the master” (Goethe), “Simplicity is not a luxury, it is a necessity.
Unfortunately, “Complex’ solutions sell better. (EW Dijkstra). (skofja.Loka-Tolmin golden horn)

PDVBV
PostgreSQL
Click to edit Master title style
2
Logo Cloud
• Portbase
• (dutch gov)
• Shell
• Philips
• ING bank
• Nokia
• Insinger, BNP
• Etihad
• NHS
• BT
• Claritas, Nielse
• Unilever
• Exxon
• GEDon’t waste time on Self-Inflation… but Hey, this was such a cool Idea (from a marketing guy)…
Logos of my major customers over time. If you want your logo here: Hire me.

PDVBV
PostgreSQL
What does it look like..
•
Couldn’t resist… after this changing room, not allowed to take pictures anymore..
For travel pictures from various continents: some other time…
3

PDVBV
PostgreSQL
4
Agenda ( 45min +/- my “Dev/DBA” preso.. )
Partitioning…
Why ? … I’ve seen too many “failures”
Summary: Design !!
(see final slides. ;-) )
Top-Tip: Keep It Simple.
Discussion: Please… (I miss the live-, in person-events..)
Agenda. No longer allowed when presenting online (c.f. Connor…)
Oh, BTW: I am known for Typos.. Find a typo = get a drink..

PDVBV
PostgreSQL
5
Basics; What + Why Paritioning ?
• Partitioning: Split 1 table into “Many”
• Two Main Advantages:
• 1. Avoid WAL
• 2. Scan less data on Qrys.
• Many more… later.
– Range, List, Hash…
– Tablespaces => location of data.
– Read-only storage tiers
– Later… (next year’s ppt...)
(competitor: paid-for-EE-option…) Two main Advantages, will try to illustrate both
Piffalls later… other advantages: later. (add coffee break…)

PDVBV
PostgreSQL
6
Table and Index. Conventional.
A quick illustration of table and indexes…. Data in the tabler is randomly spread out,
but the indexes contain ordered lists and pointers to the table-records.
•Table •(Global) Index on ID
1
1
2
3
4
2
3
4

PDVBV
PostgreSQL
7
Partitioned table and (local) index.
A quick illustration of (range) partitions and local indexes…. Partitions are just small tables with
known (ranges) of data.. The database “Knows” those ranges.
1 1
2
3
4
2
3
4
Smaller pieces
“known” content
Still “One Table”
Local indexes !

PDVBV
PostgreSQL
8
1st Advantage: Less WAL
• Ins / Upd / Del is “Work…”
– ~ WAL (and vacuum activity)
–Local I/O, streaming, Remote I/O…
• Delete?
–Drop or Truncate is “Much Faster”
• You Can! - Drop Partitions!
• But…
–Only if your partitioning is suitable.
–Only on “drop” or “attach/detach”
Explain deleting old data with drop-partition.
Typical use-case: ingest + remove of data with limited lifetime in the DB.. You can save half the wal..

PDVBV
PostgreSQL
9
Drop Partition… (Fast, no WAL)
Instantaeous Delete of the “range” inside a partition. Very Little Effort.
Note: inserts and updates will still require redo… and Global indeses.. Well, just wait.
1 1
2
3
4
2
3
4
pg-# Drop Table PT_1 ;

PDVBV
PostgreSQL
10
Demo time..
• T = Table
• PT = Partitioned table
• Delete from T => WAL
• Delete from PT => still WAL..
• Drop partition => Much More Efficient..
Pg-> i pg_demo_part.sql
Pg-> i pg_demo_part_0.sql
demo deleting (old) data with dorp-partition.
Best use of partitioning IMHO. (oracle: show problems with global index: demo_part_0a.sql)

PDVBV
PostgreSQL
11
2nd Advantage: (some) Queries Go Faster…
• Scan Less Data
–less blocks, less IO, less Cache
• Typical use-case:
–Queries / Aggregates over 1 or few Partitions.
• Anti-pattern:
–Loop over All Partitions… (later)
• Next slides: show me how..
Ideally, queries scan as little data als possible to return results .. Fast
Reduce the work…

PDVBV
PostgreSQL
12
Aggregates, FTS over Conventional table
Data can be all over the table..
Hence FTS or inefficient range-scan + rowid-access needed…
•Table
1
2
3
4
• Data all over the Table..
Select Sum (amt)
Where [range]
Group by ..
• Probably FTS

PDVBV
PostgreSQL
13
Aggregates on Partitions: less data to scan?
Some (most) searches / scans can be limited to just the relevant partitions..
This Will Only Work if we can eliminate sufficient partitions. (Design!!). - note : No Indexes.
1
2
3
4
• IF… we know where to look..
• Then… FTS on…
• just 1 Part. ?
• Design !
–Know your data.
–Control your SQL

PDVBV
PostgreSQL
14
Demo time..
• T (Table)
• PT (partitioned)
Select Range, SUM(amt)
From T/PT
Where range Between 10000 and 19999
Group by Range;
• pg-> i pg_demo_part.sql
• pg-> i pg_demo_part_sum.sql
This is what we will see. In demo.. -- What do we Expect ?
(don’t forget to initiate the data)

PDVBV
PostgreSQL
15
More Queries: Find Specific Records
•Where ID = :n
Find 1 record; Easy, use (local) index.
•Where Active = ’Y’
Find Multiple records, all over…
Index..? But “local” … How many Partitions ?
Global index..? Not yet....
• Anti-pattern:
–Loop over All Partitions…
When you need “Fast” return of a small set, you need an index… Global or Local
But avoid having to loop/scan many partitions…

PDVBV
PostgreSQL
16
Conventional. QRY for 1 record; on PK/UK.
A quick illustration of table and indexes…. Data in the tabler is randomly spread out,
but the indexes contain ordered lists and pointers to the table-records.
•Table •(Global) Index on ID
1
1
2
3
4
2
3
4
ID = 2 ?
PK lookup

PDVBV
PostgreSQL
17
Table, index… QRY for a set; Active=Y
Same situation, different index.
The few active=Y fields can be all over the table (and in all partitions..).
•Table
•(Global) Index on active..
1. (active=Y)
N
2
3
4 (active=Y)
N
N
Y
Active = ‘Y’ ?
Range Scan

PDVBV
PostgreSQL
18
Partitioned table + local index on PK
Searching for the PK or partition key is Easy… Visit 1 local index, and find the record.
CBO can see from the where-clause which (local) index-partition it needs…
1 1
2
3
4
Id= 1 ?
2
3
4
PG “knows”:
Only 1 partition…

PDVBV
PostgreSQL
19
LOCAL index, active=Y…
If the SQL does not gives us a clue for the Partittion,
We need to Search Through Every Local Index… (Parition-Range-All.. Looping)
1 Active=Y
N
Y
2
3
4 Active=Y
Active=Y
N
Y
N
Y
N
Y
Looping over..
7 x 365
partitions..?

PDVBV
PostgreSQL
20
Demo time..
• T (conventional)
• PT (partitioned)
Select id, active
From T/PT
Where active = ‘Y’;
• Demo the (local) index.
• pg- > @pg_demo_part
• pg- > @pg_demo_part_1
This is what we will see. In demo.. -- What do we Expect ?
(don’t forget to initiate the data)

PDVBV
PostgreSQL
21
Soon: Global Indexes; …Problem ?
• Most partitioned-databases… FAILed.
– (old joke: First Attempt In Learning…)
– Some were “saved by hardware”
• Partition by date/time. But…
• PK on integer, varchar or guid.
• PK-Uniqueness enforced by Index…
• Global index… Let me illustrate…
Of the 10 or so partitioned (other) databases Ive seen: Only 2 where a Straight-up success.
Some problems could be “hidden in hardware”, and some just Failed…

PDVBV
PostgreSQL
22
Partitioned table; Global index; Active=‘Y’
illustration of GLOBAL indexes… The index is now One Single object, Pointing to all partitions.
The impact pro + con, of this will shown in next slides..
1. Active=Y
2
3
4 Active=Y
N
N
N
Y
GLOBAL index,
Points to all Parts
Table still Partitioned..

PDVBV
PostgreSQL
23
Global index; Active=‘Y’
illustration of GLOBAL indexes…. And the ups and downs, SQL is equally efficient as on ”Table”
But Point out the need for rebuild if you drop 1 partition: 25% of pointers is gone…
1. Active=Y
2
3
4 Active=Y
N
N
N
Y
Active=Y
Potentially
Effective:
No looping.

PDVBV
PostgreSQL
24
Global index; Now drop a Partition…
illustration of GLOBAL indexes… Can no longer “truncate” index, index points to whole range..
On “drop-partition, will need rebuild of WHOLE index…
1. Active=Y
2
3
4 Active=Y
N
N
N
Y
pg-# Drop table PT_1 ;
The Challenge
Of Global Indx

PDVBV
PostgreSQL
25
Bonus-Trick: a PK-Key for Partitioning. 1/3
(not saying this is a good idea… YMMV ! )
• Partitions = mostly a “date thing”
– Not always: List-part on Cstmr-ID also happens.
• No Global Indexing
• Only 1 Unique Key
• Hence UK = PK = Partition key.
• (did I say: Up Front Design?)
If no GLOBAL index, and partition on date, then what will be my PK?
Suggestions ?

PDVBV
PostgreSQL
26
• Take a bigint – image 2-parts of integer...
–Date + Sequence: YYYY DDD SSSS nnnnnn
–Date: YYYY DDD SSSS
–Sequence: nnnnnn, cycle at 999,999
• Id = “epoch” (10 digits) + seq (16 digits)
• Id = YYYY DDD SSSSS + seq (18 digits)
• Id = YYYYMMDD HH24MISS + seq (20 digits)
Also check : “GUID as PK” (@franckpachot)
•Lightbulb ?
Artificial PK, order-able, unique on 1M/sec, integer hence small+efficient.
More Suggestions ? DISCUSS!!

PDVBV
PostgreSQL
27
• Two part key (64bit integer)
• Id = YYYY DDD SSSSS 000999 (18 digits, 10 bytes)
• Range partitioning on “YYYY DDD SSSSS 000000”
– EPAS : can automatically create the partitions…
• Limit all Queries on last 30 days:
– Where id > to_number ( to_char ( sysdate – 30 )…. ,
– Hence only limited nr of partitions in each query..
• Discuss ?
Using the “known format” of the ID, we can have automatic (interval-) partiions,
And give each where-clause a 30-day-limit. (this slide only one that mentions EPAS)

PDVBV
PostgreSQL
28
Summary (the watch of the cstmr)
• Partitioning: Only From Design.
• 1. Less WAL (on drop/attach/detach)
• 2. Faster Queries (need the Partition Key)
• Use(ful) Cases:
– Time Series / Audit data
– Fast Moving data (batch-deletions…)
– List partitioning = Sharding (discuss !)
• Know + Control your Database + App.
In my opinion: For Large sets of fast moving, time-ordered data. Save on Redo, Optimize SQL.
You must understand the limitations! (before digging deeper… )

PDVBV
PostgreSQL
29
Pitfalls; What to Avoid…
• Avoid Global Indexes
–Extra work on drop-partition
• Avoid “Partition Range All”
–Looping, multiplies the work…
• Consequence:
–All Qries Need “The Part-Key”
• Up Front Design!
Two main Advantages, will try to illustrate both
Piffalls later… other advantages: later.

PDVBV
PostgreSQL
30
Interesting Times Ahead…
• Many Improvements
–(global indexes – soon ?)
• Many Features, Possibilities
–Global Indexes
–List-Partitioning (= Sharding… ?)
–Storage tiers, compression…
• Discuss
–What should be in next year’s ppt...?
Watch this space… Lots of interesting new features + tricks.
Would love to test some of those for Real… But. Beware of over-engineering.

PDVBV
PostgreSQL
31
Don’t Take my word for it…
RTFM: start there!
Test, Play, Test…
@sdjh2000 (Hermann Baer @ vendor)
Simplicity
– In case of doubt: Simplify!
SimpleOracleDba . Blogspot . com
@pdevisser (twitter)
Firefox
literature
Goethe ______________........ (simplicity)
Majority of times, I have been WRONG.So go see for yourself - but don’t complicate life
Favorite quote: “Simplicity shows the Master” .

PDVBV
PostgreSQL
32
Quick Q & A (3 min ;-) 3 .. 2 .. 1 .. Zero
• Questions ?
• Reactions ?
• Experiences from the audience ?
• @pdevisser (twitter..)
Question and Answer time. Discussion welcome (what about that Razor?)
Teach me something: Tell me where you do NOT AGREE.
Thank You !

PDVBV
PostgreSQL
33
He got it …
As Simple as Possible, but not too simple
Simplicity is a Requirement - but Comlexity just sells better (EWD).

PDVBV
PostgreSQL
This slide intentionally left blank..
;-)
34

PDVBV
PostgreSQL
35
Intermezzo: End of Part-1…
• After the break…
• Q+A, if any
• Bonus Trick PK; Avoid Global Index
• Some Ref-Partitioning, Quirks
• Discussion time…
There is more..

PDVBV
Partitioning – P2
Positives and Pitfalls…
Piet de Visser
Simple Oracle DBA
Piet de Visser - PDVBV
Quotes: “The Limitation shows the master” (Goethe), “Simplicity is not a luxury, it is a necessity.
Unfortunately, “Complex’ solutions sell better. (EW Dijkstra). (skofja.Loka-Tolmin golden horn)

PDVBV
PostgreSQL
37
• Use-Case: Parent and Child Tables…
– E.g. “Document” and “Properties”
– (big data.. NoSQL ? )
• Note: going back to Hierarchical datamodel
– With benefits of “RDBMS” (and less data “in JSON”)
• Discuss ?
– Stricter checking, Better Data Quality!
– BDUF ?
– You need SDUF (Some Design Up Front)
Ref-Partitioning… 1/n
Ref partitioning can be used on “hierarchies”, for example if your data is “a document”
But only if you can do some design up front

PDVBV
PostgreSQL
38
Ref Partitioning 2/n
Hieararchie of ref-partitioned table, 3 levels…
*I realize I need better drawing for this… imagine the indexes…
MMT: Parent Table
MMT_CHD
MMT_CHD_CHD

PDVBV
PostgreSQL
39
• Demo: SQL > @demo_part_r1
• Global Index came back to haunt us..
– Default indexes (even for partition-key-PK) …. Global
– Default indexes on dependent-tables… Global.
• Check indexes in SQLDeveloper..
• Demo: SQL > @demo_part_r2
• Discuss ?
Ref-Partitioning… 3/3
Instead of stuffing everything in one or several JSON columns, use real tables+columns..
Devs don’t like the limitation of “Design”.

PDVBV
PostgreSQL
40
Interesting Times Ahead…
• Many Improvements..
–(global indexes – are improving !!)
• Many Other New Features.
–Partial indexing
–Hybrid Partitioned-tbls…. Wow ??!
• Discuss
–What should be in next year’s ppt...
Watch this space… Lots of interesting new features + tricks.
Would love to test some of those for Real… But. Beware of over-engineering.

Simple Works Best

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Simple Works Best

Similar to Simple Works Best (20)

More from EDB

More from EDB (20)

Recently uploaded

Recently uploaded (20)

Simple Works Best