SlideShare a Scribd company logo
1 of 67
Download to read offline
PostgreSQL Prologue
Stay If You
Want To:
Image Source: Himmelfarb et al 2002: 1526 (artist: G. Renee
Guzlas). All rights reserved ©. Available via license: CC BY-NC 3.0
- have an intro to postgres
- know basic components
of postgres
- have some idea on
postgres workology
- go through logical and
physical layout of
postgres
What is PostgreSQL in the first place?
- “PostgreSQL is an object-relational database management system (ORDBMS) based on
POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer
Science Department.” -- PostgreSQL Documentation, By postgresql.org.
- “PostgreSQL (pronounced Post-Gres-Q-L), or postgres for short, is an open source
object-relational-database management system.” -- Learning PostgreSQL 11 (Third
Edition), A beginner's guide to building high-performance PostgreSQL database solutions,
By Salahaldin Juba, Andrey Volkov.
If You Are Wondering...
- we heard about relational database, how does this differ from that?
- the definitions claimed postgres to be an “Object Relational Database Management
System”, what does this imply?
- or, we know Object Oriented Principles, does PostgreSQL adapt OOP paradigms like an
Object Oriented Language does?
Detour - Database
- in simplest words:
- organized collection of valid data where new records can be added or an existing
record can be accessed, modified or removed
Detour - DataBase Management System
- can be seen as gatekeeper of database, basically an interface that:
- offers and controls access to database to read, update or remove data from database
- ensures integrity by imposing given constraints
- ensures concurrency and transactions
- enables remote access to database
- ensures data recovery in case of any kind of failure
Detour - Relational DBMS
- group of related data can be stored in a tabular form considering:
- each property as a column in that table, attribute is more common term
- every single instance having those properties is a row or tuple
- relation between the properties of that set is also known as schema
- relating two different schema using some common attribute is possible, eg: foreign key
- use when:
- you know your data model right, structured data
- data pattern is fixed
- all of your entities has fixed attribute and it’s not gonna change
- you need immediate acid compliant transaction
Detour - Object Relational DBMS
- object, classes, inheritance etc paradigms of OOP are supported in schema, relation, even in
queries
- supports custom data types and nested data types like oop
- even functions or operators can be overloaded to facilitate polymorphism
Meet “SLONIK”
Source: Daniel Lundin - https://wiki.postgresql.org/images/a/a4/PostgreSQL_logo.3colors.svg
PostgreSQL - Evolution
- evolved from Ingres project of University of California, Berkeley led by Michael
Stonebraker
- that’s why sometimes termed as Post-Ingres
image src
Why Use PostgreSQL?
- can support both relational and non-relational data types
- extensive data read/write speed
- multi-versioning concurrency control
- parallel query execution using multiple cores
- non-blocking indexing
- partial indexing available(skipping deleted tuples)
Commercial Break
Who Trust PostgreSQL?
src: StackShare
PostgreSQL Components
Postgres Built-in Applications
- ships with a number of client and server applications
- uses server/client model
- client and server can reside in different hosts and communicate via TCP/IP or linux socket
- can handle multiple concurrent connection from a client
- each connection to a client forks a new process
Postgres Client Applications
- frontend application that requests some database action
- psql:
- offers interactive terminal to write queries and get response from postgres
- queries can be added from file or as command line arguments(cla) as well
- pg_config
- can tell different configured parameter for the installed version
- pgbench
- runs benchmark by executing a number of dummy transactions from a number of
dummy clients
Postgres Client Applications(continued...)
- clusterdb
- re-clusters previously clustered tables in the specified databases
- createdb
- creates a new database,
- nothing but a wrapper of CREATE DATABASE command
- dropdb
- removes the specified database
- nothing but a wrapper of DROP DATABASE command
Postgres Client Applications(continued...)
- createuser
- creates a new user
- just a wrapper of CREATE ROLE command
- dropuser
- removes a new user
- just a wrapper of DROP ROLE command
- vacuumdb(garbage collector and optionally analyzer)
- cleans dead tuples from all(or specified) tables of a database user has permission to
vacuum or generates statistics about the database
- a wrapper of VACUUM command
- full list here
Postgres Server Applications(continued...)
- backend application
- postgres
- accepts connection from client applications
- resolves client requests
- manages database files
- initdb
- creates a new pg cluster
Postgres Server Applications(continued...)
- pg_ctl
- initializing, starting, stopping, controlling and etc
- pg_upgrade
- upgrading a postgres server instance
- pg_waldump
- generates human readable wal logs
- full list here
PostgreSQL Internals
PostgreSQL Forked Process
PostgreSQL Forked Process(continued...)
- follows process per user method
- one client process gets connected to exactly one server process
- the master(postmaster) process spawns a new backend server process each time a new connection
is requested
PostgreSQL Forked Process(continued...)
- master process forks other background process at start-up
- walwriter
- manages Write Ahead Log
- any change to data files(table or index) are logged first into wal buffer
- ensures data integrity
- in case of system crash, roll-forward(or REDO) is done using the log records
- checkpointer
- keeps a checkpoint in the wal sequence
- flushes data files to disk from the last checkpoint reflecting the log
PostgreSQL Forked Process(continued...)
- background writer
- writes specific dirty(new or modified) buffers
- may increase I/O load significantly as a dirty page may be written only once per checkpoint
wherase bg writer may write this several times
- vacuum writer
- postgres uses pseudo-deletion method
- if deleted or updated, a tuple is not removed from physical storage of that table
- thes obsolete tuples are marked as deleted
PostgreSQL Forked Process(continued...)
- vacuum writer reclaims spaces consumed by dead tuples
- also updates the visibility map(_vm)
- if run with ANALYZE, updates pg_statistic catalog which query planner uses to plan for
most effective execution plan
- stats collector
- collects and reports server activity
- counts access to table and index, number of rows of a table, vacuum and analyze stats etc
PostgreSQL Forked Process(continued...)
- logical replication launcher
- doesn’t replicate byte by byte like physical(stream) replication
- replicates one database at a time and only committed row changes, not vacuum ones
- works in publisher-subscriber model
- unlike stream replication multi-master is possible
- DDL is not handled and so manual table creation is required at subscriber end
- column name must match, not order or number of column
- can’t stream transactions as they happen and so can add overhead if transaction is big
- server processes communicate with each other via semaphore and shared memory to ensure
data integrity
PostgreSQL Memory Model
Memory Layout
Memory Layout(continued...)
- shared memory
- accessible from all backend processes and user processes connected to database
- shared buffer, WAL buffer, CLog buffer etc
- local memory
- allocated and used by a specific process or subsystem
- vacuum buffer, temp buffer, work memory etc.
Memory Layout(continued...)
Shared Buffer
- where data is read or written
- data or blocks residing here is called dirty data or dirty blocks and they are called data files when
permanently written to disk
- shared memory
- can’t be resized unless running postgres server instance is restarted
- config parameter:
- shared_buffers: 128MB by default
Memory Layout(continued...)
WAL Buffer
- separate buffer to keep transaction logs
- wal data is first written to wal buffer before being written to wal disk
- shared memory
- usually 1/16th of shared buffer in size
- config parameter:
- wal_buffers: 4MB by default
Memory Layout(continued...)
CLog Buffer
- contains transaction metadata
- keeps status of transactions
- can tell if a transaction is committed or not
- shared memory
Lock Space
- all locks are stored here
- shared memory
Memory Layout(continued...)
Vacuum Buffer
- local memory: used by auto vacuum worker
- total size is autovacuum_work_mem times autovacuum_max_workers
- config parameter:
- autovacuum_max_workers: 3 by default
- autovacuum_work_mem: minimum 1MB or if set to -1 uses maintenance_work_mem
which is 64MB by default
Memory Layout(continued...)
Work Memory
- local memory: used by the executor or query workspaces
- memory to be used when sort(query example: ORDER BY, DISTINCT MERGE JOIN) or
hash(query example: HASH-JOIN, IN) operation is executed
- config parameter:
- work_mem: 4MB by default
Memory Layout(continued...)
Maintenance Memory
- local memory
- memory allocated for maintenance operations like: CREATE INDEX, VACUUM, REINDEX, or
while adding FOREIGN KEY
- config parameter:
- maintenance_work_mem: 64MB by default
Memory Layout(continued...)
Temp Buffer
- local memory: used by the executor
- space where temporary typed tables will be stored
- config parameter:
- temp_buffers: 8MB for each session by default
Life Of A Query In Postgres
Backend
Flowchart
Query Execution Phases
- client gets a connection to transmit a query to the server and to receive the results
- parser stage checks the query for correct syntax and creates a query tree
- traffic cop subsystem determines the query type
- utility query is passed to the utilities subsystem
- rewrite takes the query tree and looks for any rules to apply to the query tree
- planner/optimizer takes the (rewritten) query tree and creates a query plan
- first creates all possible paths leading to the same result
- next the cost for the execution of each path is estimated
- finally the cheapest path is chosen
- executor recursively steps through the plan tree and retrieves rows in the way represented
by the plan
Where All Those Data Goes?
Logical
Layout
Logical Layout(continued...)
database cluster
- collection of databases within the running postgres instance
- mainly resides in data area(eg: $PGDATA - /usr/local/pgsql/data)
- multiple clusters managed by different postgres instances can exist on the same machine
- don’t mix it up with physical database server or node cluster
Logical Layout(continued...)
database object:
- a data structure used to store and refer data
- tablespace, tables(heap), functions, views, indexes, etc and even database itself
- identified by object identifier or OID, unsigned 4 byte long integer
- respective oids are stored in system catalog(pg_catalog) schema
- for instance: when a new database or a new table is created it’s all meta data are stored into
the pg_catalog.pg_database table and pg_catalog.pg_class table respectively and so on
Physical
Layout
Physical Layout(continued...)
Files Directories
- pg_hba.conf
- pg_ident.conf
- PG_VERSION
- postgresql.auto.conf
- postgresql.conf
- postmaster.pid
- postmaster.opts
- base
- global
- pg_commit_ts
- pg_dynshmem
- pg_logical
- pg_stat
- pg_tblspc
- pg_wal
Physical Layout(continued...)
pg_hba.conf
- stands for host based authentication
- created when initdb is called
- can stay elsewhere as well, default location data area
- configuration file to control client authentication
Physical Layout(continued...)
pg_ident.conf:
- configuration file to control postgres user name mapping
- used along with pg_hba.conf file
- maps system user name(achieved from some external authentication system like iden or
GSSAPI) of the client trying to connect to postgres server to postgres user
- can stay elsewhere as well, default location data area
PG_VERSION
- containing the major version number of PostgreSQL
Physical Layout(continued...)
postgresql.auto.conf
- system configurations changed using `ALTER SYSTEM SET
<confParameter>=<confValue>;` sql command are overwritten here
- gets cleared after resetting the parameter
postgresql.conf
- server configuration file
- can stay elsewhere as well
postmaster.opts:
- file containing command-line options used at server start time
Physical Layout(continued...)
postmaster.pid:
- keeps track of followings each in a separate line
- currently running postgres server instance pid
- path of data area
- server start timestamp in epoch time
- server port number
- unix socket path
- ip or hostname of listen_address
- shared memory segment id
- server status
- file is absent if no server instance is running
Physical Layout - DB Cluster
Physical Layout - DB Cluster(continued...)
- base directory contains the databases as subdirectories named after the corresponding
database oid which are created on pg_default tablespace
- tables or databases created on different tablespace like the one(test_db_2 → 16412 is
created with default tablespace to be test_table_spcace → 16410) here
Physical Layout - Table Files
- when a table is created, a file having the filenode of the table as the filename is created
- max table size is 32TB
- divided into 1GB sized segments(if page size is 8KB)
- each segment file from the second one will be named as <filenode>.1, <filenode>.2
and so on
- usually filenode is same of oid unless TRUNCATE, REINDEX, CLUSTER or ALTER
TABLE or AUTOVACUUM is applied to that table
Physical Layout - Table Files(continued…)
Figure: table page layout(src)
Physical Layout - Table Files(continued…)
- each table segment contains several pages(8K sized)
- each page starts with some page header(24 bytes) followed by item pointers(4 bytes
each) and ends with the actual tuples(or items) and special space, the space in between
item pointer and actual item is called free space
- when a tuple can’t fit into a single page, it is stored in a separate file named
TOAST(The Oversized-Attribute Storage Technique) file created as
<filenode>_toast format
Physical Layout - Table Files(continued...)
- a table may contain an _fsm(Free Space Map) file and a _vm(Visibility Map) file
- when updating a tuple, postgres doesn’t overwrite it, creates a new one instead
marking the old one as deleted
- when deleting a tuple, postgres uses a policy of pseudo-deletion, it just marks the
existing tuple to be deleted and updates the _fsm file
- also vacuum worker finds out those unused spaces and recognises them as free space
and creates(if there’s none) or updates the _fsm and _vm file
Physical Layout - Table Files(continued...)
- _fsm file keeps track of the free spaces that can be reused by some other tuple
- _vm file keeps track of which pages in the segment has these tuple gaps by storing 2 bits per
page
- the first bit is only set when the corresponding page has no gaps making it easy for the next
scan
- _vm bits can only set by vacuum although other data-modifying operation can reset them
- index files don’t have any _vm file
Physical Layout - Table Files(Example)
Physical Layout - Table Files(Example - head)
Physical Layout - Table Files(Example - tail)
Physical Layout - Tablespace
- symlink to some other storage where table files will be stored
- pg_global tablespace is used for shared system catalogs
- pg_default tablespace is the default tablespace of the template1 and template0 databases
- different tables of the same database can be kept in different tablespace
- use case:
- if you are running out of disk space, you can create a tablespace to a different disk to
move data to that location
- tablespace for highly accessed data can be set to fast disks like SSD and less accessed
ones can be stored in slower disks like SATA
- temporary tables can be stored in separate table space
Summary Attempt of:
- https://www.postgresql.org/docs/12/index.html
- https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres
- https://en.wikipedia.org/wiki/PostgreSQL
- Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance
PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov
- https://www.izenda.com/relational-vs-non-relational-databases/
- https://medium.com/@zhenwu93/relational-vs-non-relational-databases-8336870da8bc
- https://www.ibm.com/cloud/blog/new-builders/brief-overview-database-landscape
Summary Attempt of:
- https://www.linuxjournal.com/content/postgresql-nosql-database
- https://stackoverflow.com/questions/33621906/difference-between-stream-replication-and-l
ogical-replication
- https://www.postgresql.fastware.com/blog/back-to-basics-with-postgresql-memory-compon
ents
- https://severalnines.com/database-blog/architecture-and-tuning-memory-postgresql-databas
es
- http://www.interdb.jp/pg/pgsql02.html
Summary Attempt of:
- http://rachbelaid.com/introduction-to-postgres-physical-storage/
- https://www.postgresql.org/docs/current/storage-page-layout.html
- https://www.postgresql.org/docs/current/storage-file-layout.html
- http://www.interdb.jp/pg/pgsql01.html
- https://blog.dbi-services.com/using-operating-system-users-to-connect-to-postgresql/
- http://etutorials.org/SQL/Postgresql/Part+I+General+PostgreSQL+Use/Chapter+4.+Perform
ance/How+PostgreSQL+Organizes+Data/
- https://www.postgresql.org/docs/12/limits.html
Summary Attempt of:
- https://pgdash.io/blog/tablespaces-postgres.html
- Chapter-3, Learning PostgreSQL 11, Third Edition, by Salahaldin Juba, Andrey Volkov
- https://www.postgresql.org/docs/12/manage-ag-tablespaces.html
- https://www.postgresql.org/docs/12/query-path.html
- https://www.postgresql.org/developer/backend/
- https://stackshare.io/postgresql
- https://www.postgresql.org/docs/12/history.html

More Related Content

Similar to PostgreSQL Prologue

Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetJeno Yamma
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1PoguttuezhiniVP
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsRafael Ferreira da Silva
 
Sql introduction
Sql introductionSql introduction
Sql introductionvimal_guru
 
Odoo command line interface
Odoo command line interfaceOdoo command line interface
Odoo command line interfaceJalal Zahid
 
Introduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsIntroduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsJignesh Shah
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentationvanjakom
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Lucas Jellema
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features" Anar Godjaev
 

Similar to PostgreSQL Prologue (20)

Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Sql introduction
Sql introductionSql introduction
Sql introduction
 
Odoo command line interface
Odoo command line interfaceOdoo command line interface
Odoo command line interface
 
Introduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System AdministratorsIntroduction to PostgreSQL for System Administrators
Introduction to PostgreSQL for System Administrators
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
SQL Server vs Postgres
SQL Server vs PostgresSQL Server vs Postgres
SQL Server vs Postgres
 
GUC Tutorial Package (9.0)
GUC Tutorial Package (9.0)GUC Tutorial Package (9.0)
GUC Tutorial Package (9.0)
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
A Practical Multi-Tenant Cluster
A Practical Multi-Tenant ClusterA Practical Multi-Tenant Cluster
A Practical Multi-Tenant Cluster
 
Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)Java Developers, make the database work for you (NLJUG JFall 2010)
Java Developers, make the database work for you (NLJUG JFall 2010)
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Introduction to oracle
Introduction to oracleIntroduction to oracle
Introduction to oracle
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features"
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 

Recently uploaded

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 

Recently uploaded (20)

Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 

PostgreSQL Prologue

  • 2. Stay If You Want To: Image Source: Himmelfarb et al 2002: 1526 (artist: G. Renee Guzlas). All rights reserved ©. Available via license: CC BY-NC 3.0 - have an intro to postgres - know basic components of postgres - have some idea on postgres workology - go through logical and physical layout of postgres
  • 3. What is PostgreSQL in the first place? - “PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department.” -- PostgreSQL Documentation, By postgresql.org. - “PostgreSQL (pronounced Post-Gres-Q-L), or postgres for short, is an open source object-relational-database management system.” -- Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov.
  • 4. If You Are Wondering... - we heard about relational database, how does this differ from that? - the definitions claimed postgres to be an “Object Relational Database Management System”, what does this imply? - or, we know Object Oriented Principles, does PostgreSQL adapt OOP paradigms like an Object Oriented Language does?
  • 5. Detour - Database - in simplest words: - organized collection of valid data where new records can be added or an existing record can be accessed, modified or removed
  • 6. Detour - DataBase Management System - can be seen as gatekeeper of database, basically an interface that: - offers and controls access to database to read, update or remove data from database - ensures integrity by imposing given constraints - ensures concurrency and transactions - enables remote access to database - ensures data recovery in case of any kind of failure
  • 7. Detour - Relational DBMS - group of related data can be stored in a tabular form considering: - each property as a column in that table, attribute is more common term - every single instance having those properties is a row or tuple - relation between the properties of that set is also known as schema - relating two different schema using some common attribute is possible, eg: foreign key - use when: - you know your data model right, structured data - data pattern is fixed - all of your entities has fixed attribute and it’s not gonna change - you need immediate acid compliant transaction
  • 8. Detour - Object Relational DBMS - object, classes, inheritance etc paradigms of OOP are supported in schema, relation, even in queries - supports custom data types and nested data types like oop - even functions or operators can be overloaded to facilitate polymorphism
  • 9. Meet “SLONIK” Source: Daniel Lundin - https://wiki.postgresql.org/images/a/a4/PostgreSQL_logo.3colors.svg
  • 10. PostgreSQL - Evolution - evolved from Ingres project of University of California, Berkeley led by Michael Stonebraker - that’s why sometimes termed as Post-Ingres
  • 12. Why Use PostgreSQL? - can support both relational and non-relational data types - extensive data read/write speed - multi-versioning concurrency control - parallel query execution using multiple cores - non-blocking indexing - partial indexing available(skipping deleted tuples)
  • 16. Postgres Built-in Applications - ships with a number of client and server applications - uses server/client model - client and server can reside in different hosts and communicate via TCP/IP or linux socket - can handle multiple concurrent connection from a client - each connection to a client forks a new process
  • 17. Postgres Client Applications - frontend application that requests some database action - psql: - offers interactive terminal to write queries and get response from postgres - queries can be added from file or as command line arguments(cla) as well - pg_config - can tell different configured parameter for the installed version - pgbench - runs benchmark by executing a number of dummy transactions from a number of dummy clients
  • 18. Postgres Client Applications(continued...) - clusterdb - re-clusters previously clustered tables in the specified databases - createdb - creates a new database, - nothing but a wrapper of CREATE DATABASE command - dropdb - removes the specified database - nothing but a wrapper of DROP DATABASE command
  • 19. Postgres Client Applications(continued...) - createuser - creates a new user - just a wrapper of CREATE ROLE command - dropuser - removes a new user - just a wrapper of DROP ROLE command - vacuumdb(garbage collector and optionally analyzer) - cleans dead tuples from all(or specified) tables of a database user has permission to vacuum or generates statistics about the database - a wrapper of VACUUM command - full list here
  • 20. Postgres Server Applications(continued...) - backend application - postgres - accepts connection from client applications - resolves client requests - manages database files - initdb - creates a new pg cluster
  • 21. Postgres Server Applications(continued...) - pg_ctl - initializing, starting, stopping, controlling and etc - pg_upgrade - upgrading a postgres server instance - pg_waldump - generates human readable wal logs - full list here
  • 24. PostgreSQL Forked Process(continued...) - follows process per user method - one client process gets connected to exactly one server process - the master(postmaster) process spawns a new backend server process each time a new connection is requested
  • 25.
  • 26. PostgreSQL Forked Process(continued...) - master process forks other background process at start-up - walwriter - manages Write Ahead Log - any change to data files(table or index) are logged first into wal buffer - ensures data integrity - in case of system crash, roll-forward(or REDO) is done using the log records - checkpointer - keeps a checkpoint in the wal sequence - flushes data files to disk from the last checkpoint reflecting the log
  • 27. PostgreSQL Forked Process(continued...) - background writer - writes specific dirty(new or modified) buffers - may increase I/O load significantly as a dirty page may be written only once per checkpoint wherase bg writer may write this several times - vacuum writer - postgres uses pseudo-deletion method - if deleted or updated, a tuple is not removed from physical storage of that table - thes obsolete tuples are marked as deleted
  • 28. PostgreSQL Forked Process(continued...) - vacuum writer reclaims spaces consumed by dead tuples - also updates the visibility map(_vm) - if run with ANALYZE, updates pg_statistic catalog which query planner uses to plan for most effective execution plan - stats collector - collects and reports server activity - counts access to table and index, number of rows of a table, vacuum and analyze stats etc
  • 29. PostgreSQL Forked Process(continued...) - logical replication launcher - doesn’t replicate byte by byte like physical(stream) replication - replicates one database at a time and only committed row changes, not vacuum ones - works in publisher-subscriber model - unlike stream replication multi-master is possible - DDL is not handled and so manual table creation is required at subscriber end - column name must match, not order or number of column - can’t stream transactions as they happen and so can add overhead if transaction is big - server processes communicate with each other via semaphore and shared memory to ensure data integrity
  • 32. Memory Layout(continued...) - shared memory - accessible from all backend processes and user processes connected to database - shared buffer, WAL buffer, CLog buffer etc - local memory - allocated and used by a specific process or subsystem - vacuum buffer, temp buffer, work memory etc.
  • 33. Memory Layout(continued...) Shared Buffer - where data is read or written - data or blocks residing here is called dirty data or dirty blocks and they are called data files when permanently written to disk - shared memory - can’t be resized unless running postgres server instance is restarted - config parameter: - shared_buffers: 128MB by default
  • 34. Memory Layout(continued...) WAL Buffer - separate buffer to keep transaction logs - wal data is first written to wal buffer before being written to wal disk - shared memory - usually 1/16th of shared buffer in size - config parameter: - wal_buffers: 4MB by default
  • 35. Memory Layout(continued...) CLog Buffer - contains transaction metadata - keeps status of transactions - can tell if a transaction is committed or not - shared memory Lock Space - all locks are stored here - shared memory
  • 36. Memory Layout(continued...) Vacuum Buffer - local memory: used by auto vacuum worker - total size is autovacuum_work_mem times autovacuum_max_workers - config parameter: - autovacuum_max_workers: 3 by default - autovacuum_work_mem: minimum 1MB or if set to -1 uses maintenance_work_mem which is 64MB by default
  • 37. Memory Layout(continued...) Work Memory - local memory: used by the executor or query workspaces - memory to be used when sort(query example: ORDER BY, DISTINCT MERGE JOIN) or hash(query example: HASH-JOIN, IN) operation is executed - config parameter: - work_mem: 4MB by default
  • 38. Memory Layout(continued...) Maintenance Memory - local memory - memory allocated for maintenance operations like: CREATE INDEX, VACUUM, REINDEX, or while adding FOREIGN KEY - config parameter: - maintenance_work_mem: 64MB by default
  • 39. Memory Layout(continued...) Temp Buffer - local memory: used by the executor - space where temporary typed tables will be stored - config parameter: - temp_buffers: 8MB for each session by default
  • 40. Life Of A Query In Postgres
  • 42. Query Execution Phases - client gets a connection to transmit a query to the server and to receive the results - parser stage checks the query for correct syntax and creates a query tree - traffic cop subsystem determines the query type - utility query is passed to the utilities subsystem - rewrite takes the query tree and looks for any rules to apply to the query tree - planner/optimizer takes the (rewritten) query tree and creates a query plan - first creates all possible paths leading to the same result - next the cost for the execution of each path is estimated - finally the cheapest path is chosen - executor recursively steps through the plan tree and retrieves rows in the way represented by the plan
  • 43. Where All Those Data Goes?
  • 45. Logical Layout(continued...) database cluster - collection of databases within the running postgres instance - mainly resides in data area(eg: $PGDATA - /usr/local/pgsql/data) - multiple clusters managed by different postgres instances can exist on the same machine - don’t mix it up with physical database server or node cluster
  • 46. Logical Layout(continued...) database object: - a data structure used to store and refer data - tablespace, tables(heap), functions, views, indexes, etc and even database itself - identified by object identifier or OID, unsigned 4 byte long integer - respective oids are stored in system catalog(pg_catalog) schema - for instance: when a new database or a new table is created it’s all meta data are stored into the pg_catalog.pg_database table and pg_catalog.pg_class table respectively and so on
  • 48. Physical Layout(continued...) Files Directories - pg_hba.conf - pg_ident.conf - PG_VERSION - postgresql.auto.conf - postgresql.conf - postmaster.pid - postmaster.opts - base - global - pg_commit_ts - pg_dynshmem - pg_logical - pg_stat - pg_tblspc - pg_wal
  • 49. Physical Layout(continued...) pg_hba.conf - stands for host based authentication - created when initdb is called - can stay elsewhere as well, default location data area - configuration file to control client authentication
  • 50. Physical Layout(continued...) pg_ident.conf: - configuration file to control postgres user name mapping - used along with pg_hba.conf file - maps system user name(achieved from some external authentication system like iden or GSSAPI) of the client trying to connect to postgres server to postgres user - can stay elsewhere as well, default location data area PG_VERSION - containing the major version number of PostgreSQL
  • 51. Physical Layout(continued...) postgresql.auto.conf - system configurations changed using `ALTER SYSTEM SET <confParameter>=<confValue>;` sql command are overwritten here - gets cleared after resetting the parameter postgresql.conf - server configuration file - can stay elsewhere as well postmaster.opts: - file containing command-line options used at server start time
  • 52. Physical Layout(continued...) postmaster.pid: - keeps track of followings each in a separate line - currently running postgres server instance pid - path of data area - server start timestamp in epoch time - server port number - unix socket path - ip or hostname of listen_address - shared memory segment id - server status - file is absent if no server instance is running
  • 53. Physical Layout - DB Cluster
  • 54. Physical Layout - DB Cluster(continued...) - base directory contains the databases as subdirectories named after the corresponding database oid which are created on pg_default tablespace - tables or databases created on different tablespace like the one(test_db_2 → 16412 is created with default tablespace to be test_table_spcace → 16410) here
  • 55. Physical Layout - Table Files - when a table is created, a file having the filenode of the table as the filename is created - max table size is 32TB - divided into 1GB sized segments(if page size is 8KB) - each segment file from the second one will be named as <filenode>.1, <filenode>.2 and so on - usually filenode is same of oid unless TRUNCATE, REINDEX, CLUSTER or ALTER TABLE or AUTOVACUUM is applied to that table
  • 56. Physical Layout - Table Files(continued…) Figure: table page layout(src)
  • 57. Physical Layout - Table Files(continued…) - each table segment contains several pages(8K sized) - each page starts with some page header(24 bytes) followed by item pointers(4 bytes each) and ends with the actual tuples(or items) and special space, the space in between item pointer and actual item is called free space - when a tuple can’t fit into a single page, it is stored in a separate file named TOAST(The Oversized-Attribute Storage Technique) file created as <filenode>_toast format
  • 58. Physical Layout - Table Files(continued...) - a table may contain an _fsm(Free Space Map) file and a _vm(Visibility Map) file - when updating a tuple, postgres doesn’t overwrite it, creates a new one instead marking the old one as deleted - when deleting a tuple, postgres uses a policy of pseudo-deletion, it just marks the existing tuple to be deleted and updates the _fsm file - also vacuum worker finds out those unused spaces and recognises them as free space and creates(if there’s none) or updates the _fsm and _vm file
  • 59. Physical Layout - Table Files(continued...) - _fsm file keeps track of the free spaces that can be reused by some other tuple - _vm file keeps track of which pages in the segment has these tuple gaps by storing 2 bits per page - the first bit is only set when the corresponding page has no gaps making it easy for the next scan - _vm bits can only set by vacuum although other data-modifying operation can reset them - index files don’t have any _vm file
  • 60. Physical Layout - Table Files(Example)
  • 61. Physical Layout - Table Files(Example - head)
  • 62. Physical Layout - Table Files(Example - tail)
  • 63. Physical Layout - Tablespace - symlink to some other storage where table files will be stored - pg_global tablespace is used for shared system catalogs - pg_default tablespace is the default tablespace of the template1 and template0 databases - different tables of the same database can be kept in different tablespace - use case: - if you are running out of disk space, you can create a tablespace to a different disk to move data to that location - tablespace for highly accessed data can be set to fast disks like SSD and less accessed ones can be stored in slower disks like SATA - temporary tables can be stored in separate table space
  • 64. Summary Attempt of: - https://www.postgresql.org/docs/12/index.html - https://developer.okta.com/blog/2019/07/19/mysql-vs-postgres - https://en.wikipedia.org/wiki/PostgreSQL - Learning PostgreSQL 11 (Third Edition), A beginner's guide to building high-performance PostgreSQL database solutions, By Salahaldin Juba, Andrey Volkov - https://www.izenda.com/relational-vs-non-relational-databases/ - https://medium.com/@zhenwu93/relational-vs-non-relational-databases-8336870da8bc - https://www.ibm.com/cloud/blog/new-builders/brief-overview-database-landscape
  • 65. Summary Attempt of: - https://www.linuxjournal.com/content/postgresql-nosql-database - https://stackoverflow.com/questions/33621906/difference-between-stream-replication-and-l ogical-replication - https://www.postgresql.fastware.com/blog/back-to-basics-with-postgresql-memory-compon ents - https://severalnines.com/database-blog/architecture-and-tuning-memory-postgresql-databas es - http://www.interdb.jp/pg/pgsql02.html
  • 66. Summary Attempt of: - http://rachbelaid.com/introduction-to-postgres-physical-storage/ - https://www.postgresql.org/docs/current/storage-page-layout.html - https://www.postgresql.org/docs/current/storage-file-layout.html - http://www.interdb.jp/pg/pgsql01.html - https://blog.dbi-services.com/using-operating-system-users-to-connect-to-postgresql/ - http://etutorials.org/SQL/Postgresql/Part+I+General+PostgreSQL+Use/Chapter+4.+Perform ance/How+PostgreSQL+Organizes+Data/ - https://www.postgresql.org/docs/12/limits.html
  • 67. Summary Attempt of: - https://pgdash.io/blog/tablespaces-postgres.html - Chapter-3, Learning PostgreSQL 11, Third Edition, by Salahaldin Juba, Andrey Volkov - https://www.postgresql.org/docs/12/manage-ag-tablespaces.html - https://www.postgresql.org/docs/12/query-path.html - https://www.postgresql.org/developer/backend/ - https://stackshare.io/postgresql - https://www.postgresql.org/docs/12/history.html