SlideShare a Scribd company logo
www.postgrespro.ru
B-tree - explore the heart
of PostgreSQL
Anastasia Lubennikova
Objectives
● Inspect internals of B-tree index
● Present new features
● Understand difficulties of development
● Clarify our roadmap
Important notes about indexes
● All indexes are secondary
● Index files are divided into standard-size pages
● Page layout is defined by Access Method
● Indexes store TIDs of heap tuples
● There is no visibility information in indexes
● But we have visibility map and LP_DEAD flag
● VACUUM removes dead tuples
● Update = insert
● except HOT-updates
B+ tree
● Tree is balanced
● Root node and inner nodes contain keys and
pointers to lower level nodes
● Leaf nodes contain keys and pointers to the heap
● All keys are sorted inside the node
Lehman & Yao Algorithm
✔ Right-link pointer to the page's right sibling
✔ "High key" - upper bound on the keys that are
allowed on that page
✔ Assume that we can fit at least three items per
page
✗ Assume that all btree keys are unique
✗ Assume fixed size keys
✗ Assume that in-memory copies of tree pages are
unshared
PostgreSQL B-tree implementation
● Left-link pointer to the page's left sibling
• For backward scan purposes
● Nonunique keys
• Use link field to check tuples equality
• Search must descend to the left subtree
• Tuples in b-tree are not ordered by TID
● Variable-size keys
• #define MaxIndexTuplesPerPage 
((int) ((BLCKSZ - SizeOfPageHeaderData) / 
(MAXALIGN(sizeof(IndexTupleData) + 1) + sizeof(ItemIdData))))
● Pages are shared
• Page-level read locking is required
Meta page
● Page zero of every btree is a meta-data page
typedef struct BTMetaPageData
{
uint32 btm_magic; /* should contain BTREE_MAGIC */
uint32 btm_version; /* should contain BTREE_VERSION */
BlockNumber btm_root; /* current root location */
uint32 btm_level; /* tree level of the root page */
BlockNumber btm_fastroot; /* current "fast" root location */
uint32 btm_fastlevel; /* tree level of the "fast" root page */
} BTMetaPageData;
Page layout
typedef struct BTPageOpaqueData
{
BlockNumber btpo_prev;
BlockNumber btpo_next;
union
{
uint32 level;
TransactionId xact;
} btpo;
uint16 btpo_flags;
BTCycleId btpo_cycleid;
} BTPageOpaqueData;
Unique and Primary Key
B-tree index
● 6.0 Add UNIQUE index capability (Dan McGuirk)
CREATE UNIQUE INDEX ON tbl (a);
● 6.3 Implement SQL92 PRIMARY KEY and
UNIQUE clauses using indexes (Thomas
Lockhart)
CREATE TABLE tbl (int a, PRIMARY KEY(a));
System and TOAST indexes
● System and TOAST indexes
postgres=# select count(*) from pg_indexes where
schemaname='pg_catalog';
count
-------
103
Fast index build
● Uses tuplesort.c to sort given tuples
● Loads tuples into leaf-level pages
● Builds index from the bottom up
● There is a trick for support of high-key
Multicolumn B-tree index
● 6.1 multicolumn btree indexes (Vadim Mikheev)
CREATE INDEX ON tbl (a, b);
Expressional B-tree index
CREATE INDEX people_names
ON people ((first_name || ' ' || last_name));
Partial B-tree index
● 7.2 Enable partial indexes (Martijn van Oosterhout)
CREATE INDEX ON tbl (a) WHERE a%2 = 0;
On-the-fly deletion (microvacuum)
● 8.2 Remove dead index entries before B-Tree page
split (Junji Teramoto)
● Also known as microvacuum
● Mark tuples as dead when reading table.
● Then remove them while performing insertion before
the page split.
HOT-updates
● 8.3 Heap-Only Tuples (HOT) accelerate space reuse
for most UPDATEs and DELETEs (Pavan Deolasee,
with ideas from many others)
HOT-updates
UPDATE tbl SET id = 16 WHERE id = 15;
HOT-updates
UPDATE tbl SET data = K WHERE id = 16;
HOT-updates
UPDATE tbl SET data = K WHERE id = 16;
Index-only scans
● 9.2 index-only scans Allow queries to retrieve data
only from indexes, avoiding heap access (Robert
Haas, Ibrar Ahmed, Heikki Linnakangas, Tom
Lane)
Covering indexes
Index with included columns
Index with included columns
CREATE UNIQUE INDEX ON tbl (a) INCLUDING (b);
CREATE INDEX ON tbl (a) INCLUDING (b);
CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box);
CREATE UNIQUE INDEX tbl_idx_unique ON tbl
using btree(c1, c2) INCLUDING(c3,c4);
ALTER TABLE tbl add UNIQUE USING INDEX
tbl_idx_unique;
CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box);
ALTER TABLE tbl add UNIQUE(c1,c2) INCLUDING(c3,c4);
CREATE TABLE tbl(c1 int,c2 int, c3 int, c4 box,
UNIQUE(c1,c2)INCLUDING(c3,c4));
Catalog changes
CREATE TABLE tbl
(c1 int,c2 int, c3 int, c4 box,
PRIMARY KEY(c1,c2)INCLUDING(c3,c4));
pg_class
indexrelid | indnatts | indnkeyatts | indkey | indclass
------------+----------+-------------+---------+----------
tbl_pkey | 4 | 2 | 1 2 3 4 | 1978 1978
pg_constraint
conname | conkey | conincluding
----------+--------+-------------
tbl_pkey | {1,2} | {3,4}
Index with included columns
● Indexes maintenace overhead decreased
• Size is smaller
• Inserts are faster
● Index can contain data with no suitable opclass
● Any column deletion leads to index deletion
● HOT-updates do not work for indexed data
Effective storage of duplicates
Effective storage of duplicates
Effective storage of duplicates
Difficulties of development
● Complicated code
● It's a crucial subsystem that can badly break
everything including system catalog
● Few active developers and experts in this area
● Few instruments for developers
pageinspect
postgres=# select bt.blkno, bt.type, bt.live_items,
bt.free_size, bt.btpo_prev, bt.btpo_next
from generate_series(1,pg_relation_size('idx')/8192 - 1) as n,
lateral bt_page_stats('idx', n::int) as bt;
blkno | type | live_items | free_size | btpo_prev | btpo_next
-------+------+------------+-----------+-----------+-----------
1 | l | 367 | 808 | 0 | 2
2 | l | 235 | 3448 | 1 | 0
3 | r | 2 | 8116 | 0 | 0
pageinspect
postgres=# select * from bt_page_items('idx', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
2 | (0,2) | 16 | f | f | 01 00 00 00 01 00 00 00
3 | (0,3) | 16 | f | f | 01 00 00 00 02 00 00 00
4 | (0,4) | 16 | f | f | 01 00 00 00 03 00 00 00
5 | (0,5) | 16 | f | f | 01 00 00 00 04 00 00 00
6 | (0,6) | 16 | f | f | 01 00 00 00 05 00 00 00
7 | (0,7) | 16 | f | f | 01 00 00 00 06 00 00 00
8 | (0,8) | 16 | f | f | 01 00 00 00 07 00 00 00
9 | (0,9) | 16 | f | f | 01 00 00 00 08 00 00 00
10 | (0,10) | 16 | f | f | 01 00 00 00 09 00 00 00
11 | (0,11) | 16 | f | f | 01 00 00 00 0a 00 00 00
pageinspect
postgres=# select blkno, bt.itemoffset, bt.ctid, bt.itemlen, bt.data
from generate_series(1,pg_relation_size('idx')/8192 - 1) as blkno,
lateral bt_page_items('idx', blkno::int) as bt where itemoffset <3;
blkno | itemoffset | ctid | itemlen | data
-------+------------+---------+---------+-------------------------
1 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00
1 | 2 | (0,1) | 16 | 01 00 00 00 00 00 00 00
2 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00
2 | 2 | (1,142) | 16 | 01 00 00 00 6f 01 00 00
3 | 1 | (1,1) | 8 |
3 | 2 | (2,1) | 16 | 01 00 00 00 6e 01 00 00
amcheck
● bt_index_check(index regclass);
● bt_index_parent_check(index regclass);
What do we need?
● Index compression
● Compression of leading columns
● Page compression
● Index-Organized-Tables (primary indexes)
● Users don't want to store data twice
● KNN for B-tree
● Nice task for beginners
● Batch update of indexes
● Indexes on partitioned tables
● pg_pathman
● Global indexes
● Global Partitioned Indexes
www.postgrespro.ru
Thanks for attention!
Any questions?
a.lubennikova@postgrespro.ru

More Related Content

What's hot

Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
Altinity Ltd
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
Altinity Ltd
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Altinity Ltd
 
Cilium - Network security for microservices
Cilium - Network security for microservicesCilium - Network security for microservices
Cilium - Network security for microservices
Thomas Graf
 
Boltdb - an embedded key value database
Boltdb - an embedded key value databaseBoltdb - an embedded key value database
Boltdb - an embedded key value database
Manoj Awasthi
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
Trinath Somanchi
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
Hans-Jürgen Schönig
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
Kernel TLV
 
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
Davidlohr Bueso
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniques
Vitaly Nikolenko
 
Coscup2021 open source network os for datacenter
Coscup2021  open source network os for datacenterCoscup2021  open source network os for datacenter
Coscup2021 open source network os for datacenter
Dung-Ru Tsai
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
HANCOM MDS
 
Pair programming
Pair programmingPair programming
Pair programming
thehoagie
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
Thomas Graf
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
Hans-Jürgen Schönig
 

What's hot (20)

Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UIData Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Cilium - Network security for microservices
Cilium - Network security for microservicesCilium - Network security for microservices
Cilium - Network security for microservices
 
Boltdb - an embedded key value database
Boltdb - an embedded key value databaseBoltdb - an embedded key value database
Boltdb - an embedded key value database
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
Futex Scaling for Multi-core Systems
Futex Scaling for Multi-core SystemsFutex Scaling for Multi-core Systems
Futex Scaling for Multi-core Systems
 
Linux SMEP bypass techniques
Linux SMEP bypass techniquesLinux SMEP bypass techniques
Linux SMEP bypass techniques
 
Coscup2021 open source network os for datacenter
Coscup2021  open source network os for datacenterCoscup2021  open source network os for datacenter
Coscup2021 open source network os for datacenter
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
 
Pair programming
Pair programmingPair programming
Pair programming
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 

Viewers also liked

Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
Ontico
 
SAMag2007 Conference: PostgreSQL 8.3 presentation
SAMag2007 Conference: PostgreSQL 8.3 presentationSAMag2007 Conference: PostgreSQL 8.3 presentation
SAMag2007 Conference: PostgreSQL 8.3 presentationNikolay Samokhvalov
 
demo2.ppt
demo2.pptdemo2.ppt
demo2.ppt
crazyvirtue
 
Instructivo hidrología unamba
Instructivo hidrología unambaInstructivo hidrología unamba
Instructivo hidrología unamba
Jesus Ayerve Tuiro
 
Destroying Router Security
Destroying Router SecurityDestroying Router Security
Destroying Router Security
Iván Sanz de Castro
 
Making Awesome Experiences with Biosensors
Making Awesome Experiences with BiosensorsMaking Awesome Experiences with Biosensors
Making Awesome Experiences with Biosensors
SophiKravitz
 
Andre Silva Campos P (9)
Andre Silva Campos P (9)Andre Silva Campos P (9)
Andre Silva Campos P (9)Josinei Tavares
 
Global warming
Global warmingGlobal warming
Global warming
Sindhuja Sompalli
 
P (1)ele escolheu você
P (1)ele escolheu vocêP (1)ele escolheu você
P (1)ele escolheu você
Josinei Tavares
 
Survival Packet
Survival PacketSurvival Packet
Survival Packetslr1541
 
Архитектура и новые возможности B-tree
Архитектура и новые возможности B-treeАрхитектура и новые возможности B-tree
Архитектура и новые возможности B-tree
Anastasia Lubennikova
 
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417a
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417aHistoria de los_satelites_de_comunicaciones._bit_134._5c6c417a
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417a
Jesus Ayerve Tuiro
 
Page compression. PGCON_2016
Page compression. PGCON_2016Page compression. PGCON_2016
Page compression. PGCON_2016
Anastasia Lubennikova
 
2011-2012 AmLit Syllabus
2011-2012 AmLit Syllabus2011-2012 AmLit Syllabus
2011-2012 AmLit Syllabusslr1541
 
SalesRev - Boost Your Conversions and Generate More Business Without Spending...
SalesRev - Boost Your Conversions and Generate More Business Without Spending...SalesRev - Boost Your Conversions and Generate More Business Without Spending...
SalesRev - Boost Your Conversions and Generate More Business Without Spending...
Keith McGibbon
 
Kpi google analytics
Kpi google analyticsKpi google analytics
Kpi google analyticschamberthomas
 

Viewers also liked (20)

Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
Олег Бартунов, Федор Сигаев, Александр Коротков (PostgreSQL)
 
SAMag2007 Conference: PostgreSQL 8.3 presentation
SAMag2007 Conference: PostgreSQL 8.3 presentationSAMag2007 Conference: PostgreSQL 8.3 presentation
SAMag2007 Conference: PostgreSQL 8.3 presentation
 
demo2.ppt
demo2.pptdemo2.ppt
demo2.ppt
 
Instructivo hidrología unamba
Instructivo hidrología unambaInstructivo hidrología unamba
Instructivo hidrología unamba
 
Destroying Router Security
Destroying Router SecurityDestroying Router Security
Destroying Router Security
 
Kpi key
Kpi keyKpi key
Kpi key
 
Making Awesome Experiences with Biosensors
Making Awesome Experiences with BiosensorsMaking Awesome Experiences with Biosensors
Making Awesome Experiences with Biosensors
 
Andre Silva Campos P (9)
Andre Silva Campos P (9)Andre Silva Campos P (9)
Andre Silva Campos P (9)
 
Global warming
Global warmingGlobal warming
Global warming
 
P (1)ele escolheu você
P (1)ele escolheu vocêP (1)ele escolheu você
P (1)ele escolheu você
 
Kpi indicator
Kpi indicatorKpi indicator
Kpi indicator
 
Survival Packet
Survival PacketSurvival Packet
Survival Packet
 
Архитектура и новые возможности B-tree
Архитектура и новые возможности B-treeАрхитектура и новые возможности B-tree
Архитектура и новые возможности B-tree
 
Kpi process
Kpi processKpi process
Kpi process
 
10
1010
10
 
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417a
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417aHistoria de los_satelites_de_comunicaciones._bit_134._5c6c417a
Historia de los_satelites_de_comunicaciones._bit_134._5c6c417a
 
Page compression. PGCON_2016
Page compression. PGCON_2016Page compression. PGCON_2016
Page compression. PGCON_2016
 
2011-2012 AmLit Syllabus
2011-2012 AmLit Syllabus2011-2012 AmLit Syllabus
2011-2012 AmLit Syllabus
 
SalesRev - Boost Your Conversions and Generate More Business Without Spending...
SalesRev - Boost Your Conversions and Generate More Business Without Spending...SalesRev - Boost Your Conversions and Generate More Business Without Spending...
SalesRev - Boost Your Conversions and Generate More Business Without Spending...
 
Kpi google analytics
Kpi google analyticsKpi google analytics
Kpi google analytics
 

Similar to Btree. Explore the heart of PostgreSQL.

PgconfSV compression
PgconfSV compressionPgconfSV compression
PgconfSV compression
Anastasia Lubennikova
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
Marco Tusa
 
Введение в современную PostgreSQL. Часть 2
Введение в современную PostgreSQL. Часть 2Введение в современную PostgreSQL. Часть 2
Введение в современную PostgreSQL. Часть 2
Dzianis Pirshtuk
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking VN
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Postgres indexes
Postgres indexesPostgres indexes
Postgres indexes
Bartosz Sypytkowski
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Citus Data
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
PGConf APAC
 
Explain this!
Explain this!Explain this!
Explain this!
Fabio Telles Rodriguez
 
In-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several timesIn-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several times
Aleksander Alekseev
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
Dave Stokes
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
Dave Stokes
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
Sergey Petrunya
 
linkedLists.ppt
linkedLists.pptlinkedLists.ppt
linkedLists.ppt
Sachin266673
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Citus Data
 
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Valerii Kravchuk
 
TYPO3 6.1. What's new
TYPO3 6.1. What's newTYPO3 6.1. What's new
TYPO3 6.1. What's new
Rafal Brzeski
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
Insight Technology, Inc.
 
Python SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite DatabasePython SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite Database
ElangovanTechNotesET
 

Similar to Btree. Explore the heart of PostgreSQL. (20)

PgconfSV compression
PgconfSV compressionPgconfSV compression
PgconfSV compression
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
Введение в современную PostgreSQL. Часть 2
Введение в современную PostgreSQL. Часть 2Введение в современную PostgreSQL. Часть 2
Введение в современную PostgreSQL. Часть 2
 
Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101Grokking TechTalk #20: PostgreSQL Internals 101
Grokking TechTalk #20: PostgreSQL Internals 101
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Postgres indexes
Postgres indexesPostgres indexes
Postgres indexes
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
Explain this!
Explain this!Explain this!
Explain this!
 
In-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several timesIn-core compression: how to shrink your database size in several times
In-core compression: how to shrink your database size in several times
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
 
linkedLists.ppt
linkedLists.pptlinkedLists.ppt
linkedLists.ppt
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
 
TYPO3 6.1. What's new
TYPO3 6.1. What's newTYPO3 6.1. What's new
TYPO3 6.1. What's new
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
Python SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite DatabasePython SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite Database
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 

More from Anastasia Lubennikova

Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
Anastasia Lubennikova
 
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
Anastasia Lubennikova
 
Hacking PostgreSQL. Разделяемая память и блокировки.
Hacking PostgreSQL. Разделяемая память и блокировки.Hacking PostgreSQL. Разделяемая память и блокировки.
Hacking PostgreSQL. Разделяемая память и блокировки.
Anastasia Lubennikova
 
Hacking PostgreSQL. Физическое представление данных
Hacking PostgreSQL. Физическое представление данныхHacking PostgreSQL. Физическое представление данных
Hacking PostgreSQL. Физическое представление данных
Anastasia Lubennikova
 
Hacking PostgreSQL. Обзор исходного кода
Hacking PostgreSQL. Обзор исходного кодаHacking PostgreSQL. Обзор исходного кода
Hacking PostgreSQL. Обзор исходного кода
Anastasia Lubennikova
 
Расширения для PostgreSQL
Расширения для PostgreSQLРасширения для PostgreSQL
Расширения для PostgreSQL
Anastasia Lubennikova
 
Hacking PostgreSQL. Обзор архитектуры.
Hacking PostgreSQL. Обзор архитектуры.Hacking PostgreSQL. Обзор архитектуры.
Hacking PostgreSQL. Обзор архитектуры.
Anastasia Lubennikova
 
Indexes don't mean slow inserts.
Indexes don't mean slow inserts.Indexes don't mean slow inserts.
Indexes don't mean slow inserts.
Anastasia Lubennikova
 
Советы для начинающих разработчиков PostgreSQL
Советы для начинающих разработчиков PostgreSQL Советы для начинающих разработчиков PostgreSQL
Советы для начинающих разработчиков PostgreSQL
Anastasia Lubennikova
 

More from Anastasia Lubennikova (9)

Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
 
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
Hacking PostgreSQL. Локальная память процессов. Контексты памяти.
 
Hacking PostgreSQL. Разделяемая память и блокировки.
Hacking PostgreSQL. Разделяемая память и блокировки.Hacking PostgreSQL. Разделяемая память и блокировки.
Hacking PostgreSQL. Разделяемая память и блокировки.
 
Hacking PostgreSQL. Физическое представление данных
Hacking PostgreSQL. Физическое представление данныхHacking PostgreSQL. Физическое представление данных
Hacking PostgreSQL. Физическое представление данных
 
Hacking PostgreSQL. Обзор исходного кода
Hacking PostgreSQL. Обзор исходного кодаHacking PostgreSQL. Обзор исходного кода
Hacking PostgreSQL. Обзор исходного кода
 
Расширения для PostgreSQL
Расширения для PostgreSQLРасширения для PostgreSQL
Расширения для PostgreSQL
 
Hacking PostgreSQL. Обзор архитектуры.
Hacking PostgreSQL. Обзор архитектуры.Hacking PostgreSQL. Обзор архитектуры.
Hacking PostgreSQL. Обзор архитектуры.
 
Indexes don't mean slow inserts.
Indexes don't mean slow inserts.Indexes don't mean slow inserts.
Indexes don't mean slow inserts.
 
Советы для начинающих разработчиков PostgreSQL
Советы для начинающих разработчиков PostgreSQL Советы для начинающих разработчиков PostgreSQL
Советы для начинающих разработчиков PostgreSQL
 

Recently uploaded

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 

Recently uploaded (20)

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 

Btree. Explore the heart of PostgreSQL.

  • 1. www.postgrespro.ru B-tree - explore the heart of PostgreSQL Anastasia Lubennikova
  • 2. Objectives ● Inspect internals of B-tree index ● Present new features ● Understand difficulties of development ● Clarify our roadmap
  • 3. Important notes about indexes ● All indexes are secondary ● Index files are divided into standard-size pages ● Page layout is defined by Access Method ● Indexes store TIDs of heap tuples ● There is no visibility information in indexes ● But we have visibility map and LP_DEAD flag ● VACUUM removes dead tuples ● Update = insert ● except HOT-updates
  • 4. B+ tree ● Tree is balanced ● Root node and inner nodes contain keys and pointers to lower level nodes ● Leaf nodes contain keys and pointers to the heap ● All keys are sorted inside the node
  • 5. Lehman & Yao Algorithm ✔ Right-link pointer to the page's right sibling ✔ "High key" - upper bound on the keys that are allowed on that page ✔ Assume that we can fit at least three items per page ✗ Assume that all btree keys are unique ✗ Assume fixed size keys ✗ Assume that in-memory copies of tree pages are unshared
  • 6. PostgreSQL B-tree implementation ● Left-link pointer to the page's left sibling • For backward scan purposes ● Nonunique keys • Use link field to check tuples equality • Search must descend to the left subtree • Tuples in b-tree are not ordered by TID ● Variable-size keys • #define MaxIndexTuplesPerPage ((int) ((BLCKSZ - SizeOfPageHeaderData) / (MAXALIGN(sizeof(IndexTupleData) + 1) + sizeof(ItemIdData)))) ● Pages are shared • Page-level read locking is required
  • 7. Meta page ● Page zero of every btree is a meta-data page typedef struct BTMetaPageData { uint32 btm_magic; /* should contain BTREE_MAGIC */ uint32 btm_version; /* should contain BTREE_VERSION */ BlockNumber btm_root; /* current root location */ uint32 btm_level; /* tree level of the root page */ BlockNumber btm_fastroot; /* current "fast" root location */ uint32 btm_fastlevel; /* tree level of the "fast" root page */ } BTMetaPageData;
  • 8. Page layout typedef struct BTPageOpaqueData { BlockNumber btpo_prev; BlockNumber btpo_next; union { uint32 level; TransactionId xact; } btpo; uint16 btpo_flags; BTCycleId btpo_cycleid; } BTPageOpaqueData;
  • 9. Unique and Primary Key B-tree index ● 6.0 Add UNIQUE index capability (Dan McGuirk) CREATE UNIQUE INDEX ON tbl (a); ● 6.3 Implement SQL92 PRIMARY KEY and UNIQUE clauses using indexes (Thomas Lockhart) CREATE TABLE tbl (int a, PRIMARY KEY(a));
  • 10. System and TOAST indexes ● System and TOAST indexes postgres=# select count(*) from pg_indexes where schemaname='pg_catalog'; count ------- 103
  • 11. Fast index build ● Uses tuplesort.c to sort given tuples ● Loads tuples into leaf-level pages ● Builds index from the bottom up ● There is a trick for support of high-key
  • 12. Multicolumn B-tree index ● 6.1 multicolumn btree indexes (Vadim Mikheev) CREATE INDEX ON tbl (a, b);
  • 13. Expressional B-tree index CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
  • 14. Partial B-tree index ● 7.2 Enable partial indexes (Martijn van Oosterhout) CREATE INDEX ON tbl (a) WHERE a%2 = 0;
  • 15. On-the-fly deletion (microvacuum) ● 8.2 Remove dead index entries before B-Tree page split (Junji Teramoto) ● Also known as microvacuum ● Mark tuples as dead when reading table. ● Then remove them while performing insertion before the page split.
  • 16. HOT-updates ● 8.3 Heap-Only Tuples (HOT) accelerate space reuse for most UPDATEs and DELETEs (Pavan Deolasee, with ideas from many others)
  • 17. HOT-updates UPDATE tbl SET id = 16 WHERE id = 15;
  • 18. HOT-updates UPDATE tbl SET data = K WHERE id = 16;
  • 19. HOT-updates UPDATE tbl SET data = K WHERE id = 16;
  • 20. Index-only scans ● 9.2 index-only scans Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)
  • 23. Index with included columns CREATE UNIQUE INDEX ON tbl (a) INCLUDING (b); CREATE INDEX ON tbl (a) INCLUDING (b); CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box); CREATE UNIQUE INDEX tbl_idx_unique ON tbl using btree(c1, c2) INCLUDING(c3,c4); ALTER TABLE tbl add UNIQUE USING INDEX tbl_idx_unique; CREATE TABLE tbl (c1 int, c2 int, c3 int, c4 box); ALTER TABLE tbl add UNIQUE(c1,c2) INCLUDING(c3,c4); CREATE TABLE tbl(c1 int,c2 int, c3 int, c4 box, UNIQUE(c1,c2)INCLUDING(c3,c4));
  • 24. Catalog changes CREATE TABLE tbl (c1 int,c2 int, c3 int, c4 box, PRIMARY KEY(c1,c2)INCLUDING(c3,c4)); pg_class indexrelid | indnatts | indnkeyatts | indkey | indclass ------------+----------+-------------+---------+---------- tbl_pkey | 4 | 2 | 1 2 3 4 | 1978 1978 pg_constraint conname | conkey | conincluding ----------+--------+------------- tbl_pkey | {1,2} | {3,4}
  • 25. Index with included columns ● Indexes maintenace overhead decreased • Size is smaller • Inserts are faster ● Index can contain data with no suitable opclass ● Any column deletion leads to index deletion ● HOT-updates do not work for indexed data
  • 26. Effective storage of duplicates
  • 27. Effective storage of duplicates
  • 28. Effective storage of duplicates
  • 29. Difficulties of development ● Complicated code ● It's a crucial subsystem that can badly break everything including system catalog ● Few active developers and experts in this area ● Few instruments for developers
  • 30. pageinspect postgres=# select bt.blkno, bt.type, bt.live_items, bt.free_size, bt.btpo_prev, bt.btpo_next from generate_series(1,pg_relation_size('idx')/8192 - 1) as n, lateral bt_page_stats('idx', n::int) as bt; blkno | type | live_items | free_size | btpo_prev | btpo_next -------+------+------------+-----------+-----------+----------- 1 | l | 367 | 808 | 0 | 2 2 | l | 235 | 3448 | 1 | 0 3 | r | 2 | 8116 | 0 | 0
  • 31. pageinspect postgres=# select * from bt_page_items('idx', 1); itemoffset | ctid | itemlen | nulls | vars | data ------------+--------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00 2 | (0,2) | 16 | f | f | 01 00 00 00 01 00 00 00 3 | (0,3) | 16 | f | f | 01 00 00 00 02 00 00 00 4 | (0,4) | 16 | f | f | 01 00 00 00 03 00 00 00 5 | (0,5) | 16 | f | f | 01 00 00 00 04 00 00 00 6 | (0,6) | 16 | f | f | 01 00 00 00 05 00 00 00 7 | (0,7) | 16 | f | f | 01 00 00 00 06 00 00 00 8 | (0,8) | 16 | f | f | 01 00 00 00 07 00 00 00 9 | (0,9) | 16 | f | f | 01 00 00 00 08 00 00 00 10 | (0,10) | 16 | f | f | 01 00 00 00 09 00 00 00 11 | (0,11) | 16 | f | f | 01 00 00 00 0a 00 00 00
  • 32. pageinspect postgres=# select blkno, bt.itemoffset, bt.ctid, bt.itemlen, bt.data from generate_series(1,pg_relation_size('idx')/8192 - 1) as blkno, lateral bt_page_items('idx', blkno::int) as bt where itemoffset <3; blkno | itemoffset | ctid | itemlen | data -------+------------+---------+---------+------------------------- 1 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00 1 | 2 | (0,1) | 16 | 01 00 00 00 00 00 00 00 2 | 1 | (1,141) | 16 | 01 00 00 00 6e 01 00 00 2 | 2 | (1,142) | 16 | 01 00 00 00 6f 01 00 00 3 | 1 | (1,1) | 8 | 3 | 2 | (2,1) | 16 | 01 00 00 00 6e 01 00 00
  • 33. amcheck ● bt_index_check(index regclass); ● bt_index_parent_check(index regclass);
  • 34. What do we need? ● Index compression ● Compression of leading columns ● Page compression ● Index-Organized-Tables (primary indexes) ● Users don't want to store data twice ● KNN for B-tree ● Nice task for beginners ● Batch update of indexes ● Indexes on partitioned tables ● pg_pathman ● Global indexes ● Global Partitioned Indexes
  • 35. www.postgrespro.ru Thanks for attention! Any questions? a.lubennikova@postgrespro.ru