As the popularity of PostgreSQL continues to soar, many companies are exploring ways of migrating their application database over. At Redgate Software, we recently added PostgreSQL as an optional data store for SQL Monitor, our flagship monitoring application, after nearly 18 years of being backed exclusively by SQL Server. Knowing that others will be taking this journey in the near future, we'd like to discuss what we learned. In this training, we'll discuss the planning that needs to take place before a migration begins, including datatype changes, PostgreSQL configuration modifications, and query differences. This will be a mix of slides and demo from our own learnings, as well as those of some clients we've helped along the way.
8. • Application layer
• What tooling changes are necessary?
• Do you rely on SDK/framework features that are different in PostgreSQL
versions?
• If you have a multi-tenant application, could the migration be used
to improve tenant separation?
• Does your application rely on query hints?
• Why did you have them and how will you approach PostgreSQL with these
queries?
9. • Are partitions a better option in PostgreSQL?
• How will you manage partitions?
• Do you need to normalize some relationships more?
• Should you de-normalize a relationship that's been difficult
to maintain in its current form?
• Aggregated queries and materialized views – or methods
for doing incremental materialized views?
10.
11.
12. • Every non-serverless Postgres should be tuned
• Edit postgresql.conf
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
https://www.youtube.com/watch?v=IFIXpm73qtk
https://postgresqlco.nf/
Helpful Tuning Resources
13. shared_buffers:
• data cache
• at least 25% of total RAM
work_mem:
• max memory for each plan operation
• JOINS, GROUP BY, ORDER BY can all have workers
maintenance_work_mem:
• memory for background tasks like VACUUM and ANALYZE
max_connections:
• exactly as it says
14. wal_compression:
• Off (default)
• Default uses pglz
• PG15 can use lz4 or zstd
autovacuum:
• On (default)
• Do not turn off
• Might have to tune individual tables
autoanalyze:
• On (default)
• Do not turn off
• Might have to tune individual tables
15. • SHOW will display the current value
• SET allows settings to be modified (when possible)
• Examples:
• SHOW ALL – page all settings
• SHOW max_connections –configured connection
limit
16.
17.
18. For the default Read Committed Isolation Level:
• Each query only sees transactions that were committed before it
started (ID less than current transaction)
• At the beginning of a query, PostgreSQL records:
• The transaction ID (xid) of the current command
• All transaction IDs that are in process
• Multi-statement transactions also record any of their own
completed statements from earlier in transaction
19. • In PostgreSQL, transaction IDs are assigned when
the first command/statement is run within a
transaction, not when BEGIN is executed.
• XIDs are stored in-table and on-row
20. xmin xmax User Data
48 0 Hello!
INSERT
• xmin = current transaction ID
• xmax = zero (invalid)
21. xmin xmax User Data
48 78 Hello!
DELETE
• xmin is unchanged
• xmax = current transaction ID
Dead Tuple
22. xmin xmax User Data
48 78 Hello!
78 0 Hello World!
UPDATE
Original tuple:
• xmin is unchanged
• xmax = current transaction ID
New tuple:
• xmin = current transaction ID
• xmax = zero
Dead Tuple
23. xmin xmax User Data
48 78 Hello!
78 0 Hello!
UPDATE (with the same value)
Original tuple:
• xmin is unchanged
• xmax = current transaction ID
New tuple:
• xmin = current transaction ID
• xmax = zero
Dead Tuple
32. xmin xmax ID User data…
Sequential Scan
SELECT a, b, c FROM t;
33. xmin xmax ID User data…
Sequential Scan
SELECT a, b, c FROM t
WHERE ID=10;
Index
34. xmin xmax ID User data…
Index Scan
SELECT a, b, c FROM t
WHERE ID=10;
Index
35. • Serves two main purposes:
• Free up space on pages used by dead tuples
• Mark rows as "frozen" if they have a XID sufficiently older
than the most recent XID, allowing the XID to be reused
again in the future
• Additionally, if all rows on a page are frozen, the page can
be marked frozen to speed up scans
41. "Scale Factor" = Percentage of rows modified before process runs
"Threshold" = Added number of rows to add to the percent calculation.
(this prevents very small tables from taking resources)
autovacuum_vacuum_scale_factor = 0.2 (20%)
autovacuum_vacuum_threshold = 50
autovacuum_analyze_scale_factor = 0.1 (10%)
autovacuum_analyze_threshold = 50
Global Autovacuum and Autoanalyze Settings
42. • Autovacuum triggers at 20% data modification
• Autoanalyze triggers at 10% data modification
• Analogous to TF-2731, but per table!
43. Analyze at 5% modification:
ALTER TABLE t SET (autovacuum_vacuum_scale_factor=0.05)
Analyze threshold above percentage:
ALTER TABLE t SET (autovacuum_vacuum_threshold=500)
44. Setting maintenance_work_mem correctly:
• maintenance_work_mem is the amount of RAM that a
maintenance process like autovacuum can use to process pages of
data. If it is not sufficient, vacuuming (and other essential
maintenance processes) will take more time and fall behind.
• If autovacuum isn't completing it's work regularly, check this setting!
Default setting = 64MB
53. • Own database objects
• Tables, Functions, Etc.
• Have cluster-level privileges (attributes)
• Granted privileges to database objects
• Can possibly grant privileges to other roles
54. • Semantically the same as roles
• By Convention:
• User = LOGIN
• Group = NOLOGIN
• PostgreSQL 8.2+ CREATE (USER|GROUP) is an
alias
55. • All roles always have membership in PUBLIC role
(different from public schema)
• This is where CONNECT is usually granted, but can
be provided on a role-by-role basis
56. • Standard PostgreSQL functions are typically
installed into the PUBLIC schema
• <=PG14 all roles had CREATE on this schema by
default
• Schemas are a common security boundary
• Search path assumes PUBLIC schema by default
57. • Current best practice is to never create user objects in
the PUBLIC schema
• This will require setting the "search_path" to avoid
schema qualified object reference
• Easily limit what users are allowed to do with
PostgreSQL supplied objects vs. user/application
objects
58. • Object creator = owner
• Initial object access = Principle of Least Privilege
• Unless specifically granted ahead of time, objects are
owned and "accessible" by the creator/superuser only
• Roles can specify default privileges to GRANT for
each object type that they create
61. ALTER DEFAULT PRIVILEGES
GRANT SELECT ON TABLES TO public;
cituscon=> ddp
Default access privileges
Owner | Schema | Type | Access privileges
----------+--------+-------+---------------------------
postgres | | table | =r/postgres +
| | | postgres=arwdDxt/postgres
62. • Explicit GRANTs can be added each time
• Default privileges can be set by each role
• Very useful for things like migrations with an
"admin" application role
63.
64.
65. SQL Server PostgreSQL
BIT BOOLEAN
VARCHAR/NVARCHAR TEXT
DATETIME/DATETIME2 TIMESTAMP (kind of)
DATETIMEOFFSET TIMESTAMP WITH TIME ZONE
UUID CHAR(16), uuid-ossp extension, PG14
VARBINARY BYTEA
ARRAYS
RANGE
ENUM
66. • TEXT is suitable for 99% of string data
• Query memory doesn't work the same way,
therefore TEXT doesn't impact the allocated
memory like VARCHAR(MAX) in SQL Server
• Space efficient like VARCHAR
67. • Use TEXT with length constraint rather than
VARCHAR(n)
• Allows easy modification to the column length
without costly table modification
• NULL characters can cause issues
69. CREATE TABLE bluebox.film (
film_id bigint NOT NULL,
title TEXT NOT NULL,
overview TEXT NOT NULL,
release_date date,
original_language TEXT,
rating public.mpaa_rating,
popularity real,
vote_count integer,
vote_average real,
budget bigint,
revenue bigint,
runtime integer,
fulltext tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig,
((COALESCE(title, ''::text) || ' '::text) || COALESCE(overview,
''::text)))) STORED
);
70. SELECT e'Hello world!000';
-- ERROR: invalid byte sequence for encoding "UTF8": 0x00
INSERT INTO actor (first_name, last_name)
VALUES (e'Hello world!000','Smith');
71. • PostgreSQL has DATE, TIME, TIMESTAMP, and
TIMESTAMPTZ
• Most of the time you likely want TIMESTAMPTZ unless
you have a specific need for just DATE or TIME
• Date math is simple
• Use INTERVALs to modify by arbitrary units of time
72. CREATE TABLE bluebox.rental (
rental_id INT NOT NULL,
rental_start datetime2,
rental_end datetime2,
inventory_id INT NOT NULL,
customer_id INT NOT NULL,
staff_id INT NOT NULL,
last_update datetime2 DEFAULT now() NOT NULL
);
73. CREATE TABLE bluebox.rental (
rental_id INT NOT NULL,
rental_start timestamptz,
rental_end timestamptz,
inventory_id INT NOT NULL,
customer_id INT NOT NULL,
staff_id INT NOT NULL,
last_update timestamptz DEFAULT now() NOT NULL
);
74. • Currently supports INT, NUMERIC, DATE, and TIMSTAMP(TZ)
• A possible replacement for typical schema design of
start_date/end_date
• Indexable with GIST indexes
• Unique range type operators for comparisons like overlap and
contains
• Does present some (potentially) annoying query modifications that
need to use LOWER/UPPER to grab different ends of the range
75. CREATE TABLE bluebox.rental (
rental_id INT NOT NULL,
rental_start datetime2,
rental_end datetime2,
inventory_id INT NOT NULL,
customer_id INT NOT NULL,
staff_id INT NOT NULL,
last_update datetime2 DEFAULT now() NOT NULL
);
76. CREATE TABLE bluebox.rental (
rental_id INT NOT NULL,
rental_period tstzrange,
inventory_id INT NOT NULL,
customer_id INT NOT NULL,
staff_id INT NOT NULL,
last_update timestamptz DEFAULT now() NOT NULL
);
77. • Any defined data type can have a corresponding
array type
• Indexable with GIN and GIST indexes
• Not useful as a foreign key replacement
78. CREATE TABLE bluebox.film (
film_id bigint NOT NULL,
title varchar(50) NOT NULL,
overview varchar(2000) NOT
NULL,
release_date date,
original_language varchar(16),
rating public.mpaa_rating,
popularity real,
vote_count integer,
vote_average real,
budget bigint,
revenue bigint,
runtime integer
);
CREATE TABLE bluebox.genre (
genre_id int NOT NULL,
genre varchar(64) NOT NULL,
);
CREATE TABLE bluebox.film_genre (
film_id int NOT NULL,
genre_id int NOT NULL
);
79. CREATE TABLE bluebox.film (
film_id bigint NOT NULL,
title TEXT NOT NULL,
overview TEXT NOT NULL,
release_date date,
original_language TEXT,
genre_ids int[],
rating public.mpaa_rating,
popularity real,
vote_count integer,
vote_average real,
budget bigint,
revenue bigint,
runtime integer,
fulltext tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig,
((COALESCE(title, ''::text) || ' '::text) || COALESCE(overview,
''::text)))) STORED
);
80. Name |Value
-----------------+-----------------------
film_id |1726
title |Iron Man
overview |After being held captiv
release_date |2008-05-02
genre_ids |{28,878,12}
original_language|en
rating |PG-13
popularity |63.926
vote_count |24957
vote_average |7.6
budget |140000000
revenue |585174222
runtime |126
SELECT * FROM film;
81. Name |Value
-----------------+-----------------------
film_id |1726
title |Iron Man
overview |After being held captiv
release_date |2008-05-02
genre_ids |{28,878,12}
original_language|en
rating |PG-13
popularity |63.926
vote_count |24957
vote_average |7.6
budget |140000000
revenue |585174222
runtime |126
SELECT * FROM film
WHERE genre_ids @> '{28}';
82. Name |Value
-----------------+-----------------------
film_id |1726
title |Iron Man
overview |After being held captiv
release_date |2008-05-02
genre_ids |{28,878,12}
original_language|en
rating |PG-13
popularity |63.926
vote_count |24957
vote_average |7.6
budget |140000000
revenue |585174222
runtime |126
SELECT * FROM film
WHERE genre_ids && '{28,100}';
85. • Creation defaults to B-tree
• Support for:
• multi-column
• partial (filtered)
• covering indexes
86. • CONCURRENT create, reindex, and drop
• Multiple types supported
• Custom implementations of index types supported
87. • PostgreSQL uses HEAP-based storage
• No support for CLUSTERED/INDEX ORGANIZED
tables
• This does impact query plans because of random
page access
• Indexing strategies may change based on this
89. • Classic tree-based indexing
• Default if no index type is specified
• Used for all traditional predicate operators
• Cannot index composite types (eg. Arrays, JSONB)
• PostgreSQL 13+ provides deduplication
90. • Very fast pre-hash of column values
• Can only be used for equality (=) operator
• Don't use prior to PostgreSQL 10
• Were not transaction safe, crash safe, or replicated
91. • Generalized Search Tree Index
• Well suited for overlapping data
• Intermediate index pages are typically bitmaps
Common Uses
• Full-text search capabilities
• Geospatial
92. • Generalized INverted Index
• Useful when multiple values point to one row on disk
• Maintenance heavy
Common Uses
• Arrays
• JSONB
• Range types
93. • Block Range INdex
• Stores high/low for each range of pages on disk
• Block defaults to 128 pages
Common Uses
• Data with natural ordering on disk
• Timestamps, Zip Codes, odometer-like readings
94.
95. • Don't Repeat Yourself
• "If you can dream it, there is a SQL function for it in
PostgreSQL"
• Operators, Window, Aggregate, Arrays, and more!
96. • Functions can be overloaded
• Written in any supported language or compiled
variant
• Numerous query planner parameters available to
influence function execution
• Three modes: IMMUTABLE/STABLE/VOLATILE
97. • Scaler
• Inline Table Valued
• Multi-statement Table Valued
• No tuning hints for query planner
• TSQL or CLR (less common)
SQL Server Postgres
• Scaler
• Table/Set Types
• Triggers
• Query planner tuning
parameters
• Any procedural language, but
typically SQL or PL/pgSQL
98. ROWS – Hint row count for planner (1,000 default)
COST – Estimated execution cost (per row)
PARALLEL – SAFE/RESTRICTED/UNSAFE
99. • Introduced in PostgreSQL 11
• Nobody calls them "SPROC"
• Cannot return a result set
• One INOUT parameter possible
• No plan cache improvement (mostly)
100. • Stored procedures can create sub-transactions
• Commit transactions that survive later failure
• Helpful in data processing/ETL routines
101. SQL Server
• Complex logic
• Multi-language
• Contained in calling
transaction scope
• With some work, can return
results
Postgres
• Complex logic
• Multi-language
• Can issue COMMIT/ROLLBACK
and keep processing
105. • TEXT instead of VARCHAR
• UTF-8 by default. No need for two separate kinds of
text fields
• ICU in recent versions for better collation control
106. • Use IDENTITY columns rather than SERIAL
• Semantically the same under the covers for now (a
sequence is still created), but more control using
SQL standard IDENTITY
• By default, IDENTITY can be inserted into
• Must use ALWAYS to prevent this
107. • All collations are case sensitive
• PostgreSQL converts to lower case unless text qualified
• By convention, prefer snake_case rather than
CamelCase
• Using camel case will require quotes – e.g. "CamelCase"
108. • PostgreSQL resolution is microseconds
• Some servers like SQL Server resolution is
nanoseconds
• Practically, it means little to our real calculations,
but if anything relies on that precision (incoming
data), something will have to be done.
109. • INSERT… ON CONFLICT
• DO UPDATE - Simple, constraint aware method for doing
"upserts"
• DO NOTHING - Error-free inserts when data is duplicated
• MERGE (as of PG15)
110. • PostgreSQL can create prepared statements with
parameters
• There is no such thing as named parameters
• Limited to 65,535
111. • PostgreSQL executes SQL as… SQL, not a
procedural variant
• To use procedural features, something like
pl/pgSQL needs to be invoked specifically
• Inline SQL variables must be used in an anonymous
code block, but they can't return a table/set of data
112. • Move towards functions or stored procedures
• Any supported and installed language can be used,
most without pre-compiling (although compiled will
be faster)
117. • Pg_dump – By default, creates a SQL script for
recreating the database & inserting data
o You can dump to binary format, making for easier transport and
management
o If you use pg_dump to binary, you'll need to use pg_restore to
recover
• Pg_dumpall
o Pg_dump on all the databases
118. • Write Ahead Log (WAL) Archive
o Wal_level must be set to replica or logical
o Archive_mode must be set to on or continuous
o Edit the archive_command to supply a path to your WAL
archive
o Must have a base backup
119. • Base Backup
o Pg_basebackup
o All databases are copied to target location & WAL start
point is set
120. • First, run pg_switch_wal to close the current WAL file and move it to the archive
• Stop the service
• Remove all files & subdirectories in your data directory
• Remove all files from pg_wal
• Restore databases from Base Backup
• Change the postgresql.conf file to start recovery, add recovery_target_time
• Create recovery.signal in the cluster data directory
• Start the service
121.
122. • Before/After scripts
• Logfile support
• Data casting
• Error support
• Continuous migrations
• ...and more
• Initially created by Dimitri Fontaine
• CLI application
• Many migration formats
• CSV, DBF, IXF
• SQLite, MySQL, SQL Server
• PostgreSQL to PostgreSQL
• Limited Redshift support
123. • Perl-based migration tool
• Open-source with long-standing usage and support
• Export all major objects and data to SQL
• Migration assessment tooling
https://ora2pg.darold.net/
124. • Wide-ranging conversion tool for AWS hosted
databases
• Converts schema into target database
• Data is usually handled separately with extraction
agents
https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_Welcome.html
125. • Transparent SQL Server –> PostgreSQL migration
• Open-source
• Available in AWS Aurora Postgres
• Stackgres.io
• Not fully ready for complex applications
https://babelfishpg.org/
127. • Managed service for end-to-end migrations
• Primarily intended to migrate into AWS Aurora DB
• Ongoing change replication
• Pricing is complex like many hosted services
https://aws.amazon.com/dms/
130. DevOps Advocate
Microsoft Data Platform MVP
AWS Community Builder
grant@scarydba.com
scarydba.com
@gfritchey
linkedin.com/in/grant-fritchey
Editor's Notes
Presenter: Udara
Steve will cover the intro to database DevOps
Anderson will do the workshop component
Talks about Breaks, Lunch and after workshop if you are available to hang around we have some drinks available at the George Bar.
The goal of this training is to provide lots of tips and learnings from many years working between different database migrations.
The hope is that you can take away even one bit of information that makes a big difference in your understanding. I attended a similar training years ago for SQL Server, and while there was lots of good information provided, one minor slide at the end of the day talked about a trace flag that solved one of the major problems we were having. While I learned a lot, that one thing made it all worth it.
Hopefully, one thing we share this afternoon will be that kind of help.