NANCY CLI:
a unified way to manage Database Experiments in clouds
Postgres.AI
Nikolay Samokhvalov
twitter: @postgresmen
email: ru@postgresql.org
About me
Postgres experience: 12+ years (database systems: 17+)
Founder and CTO of 3 startups (total 30M+ users), all based on Postgres
Founder of #RuPostgres (1700+ members on Meetup.com, 2nd largest globally)
Re-launched consulting practice in the SF Bay Area http://PostgreSQL.support
Founder of Postgres.AI – the Postgres platform to automate what is not yet automated
Twitter: @postgresmen
Email: ru@postgresql.org
Part 0. Pre-story
Pre-story. Finding the largest tables in a database
How many times did
you google things
like this?
Finding the largest tables, a semi-automated way
postgres_dba – The missing set of useful tools for Postgres https://github.com/NikolayS/postgres_dba
Finding the largest tables, a semi-automated way
Report #2: sizes of the tables in the current database
Installation of postgres_dba
Installation is trivial:
Important: psql version 10 is needed.
(install postgresql-client-10 package, see README)
Server version may be older
(Use ssh tunnel to connect to remote servers, see README)
7
Part 1. Why do we need
automated DB experiments?
- Let’s do default_statistics_target = 1000!
- Let’s do random_page_cost = 1!
- I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore,
¾ is much better!
- Let’s add this index here!
- Let’s use partitioning for this table!
- Let’s don’t allow having >100,000 dead tuples in a table!
- Etc, etc, etc...
How do we do performance improvements nowadays?
The Whole Truth - Let’s do default_statistics_target = 1000!
- Let’s do random_page_cost = 1!
- I’ve heard that setting shared_buffers to ¼ of RAM
doesn’t rock anymore, ¾ is much better!
- Let’s add this index here!
- Let’s use partitioning for this table!
- Let’s don’t allow having >100,000 dead tuples in a
table!
- Etc, etc, etc...
The Whole Truth
Does it give better (or at
least not worse)
performance for all queries?
Is this value the best for our
database & workload?
Does it give a real gain for
our database & workload?
- Let’s do default_statistics_target = 1000!
- Let’s do random_page_cost = 1!
- I’ve heard that setting shared_buffers to ¼ of RAM
doesn’t rock anymore, ¾ is much better!
- Let’s add this index here!
- Let’s use partitioning for this table!
- Let’s don’t allow having >100,000 dead tuples in a
table!
- Etc, etc, etc...
...and more
- postgres_dba shows that the bloat level is 35% for this
1B-rows table. Is it good or bad? How bad? (Or how good?)
- When do I need to add more RAM to my database server to
keep good performance characteristics?
- What will happen when more users come to use my app?
- Will i3.xlarge handle 3,000 UPDATEs per second?
- Etc, etc, etc
How do we make changes now?
Option #1
Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this
query is slow on production. Let’s fix that!
⇒ the problem is already there...
How do we make changes now?
Option #1
Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this
query is slow on production. Let’s fix that!
⇒ the problem is already there...
Option #2
DBA: well, this will be slow. Add this index! And we can do even better, let’s use
a partial index here!
⇒ good if the DBA is really experienced and/or verified ideas on a DB
clone. But how often is it so? And how many queries were checked?
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Option 2: a human or artificial DBAs has an idea for improvement.
Let’s verify it! Again with the same “what-if” API
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Option 2: a human or artificial DBAs has an idea for improvement.
Let’s verify it! Again with the same “what-if” API
⇒ continuous database administration
Towards the better future
Option 1: we’re going to change something. Let’s verify *all* query groups form
pg_stat_statement and see how performance is changed – using some
“what-if” API
Option 2: a human or artificial DBAs has an idea for improvement.
Let’s verify it! Again with the same “what-if” API
⇒ continuous database administration
Bonus: let’s use the same “what-if” API to get more knowledge for AI we are
building (to train ML models)
So, why do we need to automate DB experiments?
If we see that our change is good for one or several queries it doesn’t mean
that it is so for all queries.
Without performing the deep SQL query analysis (analyzing “all” query groups
on pg_stat_statement’s Top-N) we are blind.
And database administration is a black magic.
Part 2. Existing works and tools
Existing works and solutions
Andy Pavlo, CMU:
● “What is a Self-Driving Database Management System?” great overview
of history and existing works, an entry point to learn what’s done (good
research papers, Microsoft works, etc)
● PelotonDB, ottertune – great research projects
Existing works and solutions
Andy Pavlo, CMU:
● “What is a Self-Driving Database Management System?” great overview
of history and existing works, an entry point to learn what’s done (good
research papers, Microsoft works, etc)
● PelotonDB, ottertune – great research projects
Oracle’s RAT (Real Application Testing)
– Database Replay + SQL Performance Analyzer:
● Real Application Testing, Oracle 18c
● Sample report
● “Oracle Real Application Testing Delivers 224% ROI”
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
DIY automated pipeline for DB optimization
DIY automated pipeline for DB optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
The basis for Nancy
DIY automated pipeline for DB optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
pgBadger:
● Grouping queries can be implemented better (see pg_query)
● Makes all queries lower cased (hurts "camelCased" names)
● Doesn’t really support plans (auto_explain)
DIY automated pipeline for DB optimization
How to automate database optimization using ecosystem tools and AWS?
Analyze:
● pg_stat_statements
● auto_explan
● pgBadger to parse logs, use JSON output
● pg_query to group queries better
● pg_stat_kcache to analyze FS-level ops
Configuration:
● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf
● ottertune
Suggested indexes (internal “what-if” API w/o actual execution)
● (useful: pgHero, POWA, HypoPG, dexter, plantuner)
Conduct experiments:
● pgreplay to replay logs (different log_line_prefix, you need to handle it)
● EC2 spot instances
Machine learning
● MADlib
pgBadger:
● Grouping queries can be implemented better (see pg_query)
● Makes all queries lower cased (hurts "camelCased" names)
● Doesn’t really support plans (auto_explain)
pgreplay and pgBadger are not friends,
require different log formats
Part 3. Postgres.ai basics
Already automated:
● Setup/tune hardware, OS, FS
● Provision Postgres instances
● Create replicas
● High Availability:
detect failures and switch to replicas
● Create backups
● Basic monitoring
28
Already automated:
● Postgres parameters tuning
● Query analysis and optimization
● Index set optimization
● Detailed monitoring
● Verify optimization ideas
● Benchmarks
● Regression&performance CI-like testing
● Setup/tune hardware, OS, FS
● Provision Postgres instances
● Create replicas
● High Availability:
detect failures and switch to replicas
● Create backups
● Basic monitoring
Little to zero level of automation:
29
Already automated:
● Postgres parameters tuning
● Query analysis and optimization
● Index set optimization
● Detailed monitoring
● Verify optimization ideas
● Benchmarks
● Regression&performance CI-like testing
● Setup/tune hardware, OS, FS
● Provision Postgres instances
● Create replicas
● High Availability:
detect failures and switch to replicas
● Create backups
● Basic monitoring
Little to zero level of automation:
30
Can be done with
Database Experiments
What is the Database Experiment?
The Database Experiment – a set of actions...
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
with an optional change of the database and environment (called “delta”).
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
with an optional change of the database and environment (called “delta”).
An experiment may consist of one or more experimental runs.
What is the Database Experiment?
The Database Experiment – a set of actions
to perform the deep SQL query analysis
for specified database
against specified workload
in specified environment
with an optional change of the database and environment (called “delta”).
An experiment may consist of one or more experimental runs.
To analyze the impact of some delta, we need at least two runs, one of them
being “clean run” (w/o delta).
What is the Database Experiment?
The input of a experimental run:
● Environment
○ Location (on-premise or GCP or AWS), hardware (CPU, RAM, disks)
○ System (OS, file system)
○ Postgres version
○ Postgres configuration
● Database snapshot. Can be:
○ A dump (regular or in directory format)
○ A physical archive (pg_basebackup or pgBackRest/WAL-E/WAL-G/…)
○ A replica promoted for experiments
○ Some synthetic one, a generated database (“create table as …”, pgbench -i, etc)
● Workload. Can be:
○ Synthetic (custom SQL), single-threaded
○ Synthetic (custom SQL), multi-threaded (with pgbench)
○ “Real workload” (based on logs)
● [Optional] Delta:
○ Configuration change(s) (e.g.: shared_buffers = 16GB)
○ Some DDL (e.g.: `create index …`). “Undo” DDL is required in this case `drop index …`) to enable
serialization of experiments
What is the Database Experiment?
The output of a experimental run:
● the contents of basic pg_stat_*** (e.g. pg_stat_user_tables)
● the contents of pg_stat_statements
● the contents of pg_stat_kcache
● the PostgreSQL detailed log (with auto_explain turned on)
● the pgBadger’s extended report in JSON format
● the Postgres config at time of applying the workload
AI-based cloud-friendly platform to automate database administration
41
Steve
AI-based expert in capacity planning and
database tuning
Joe
AI-based expert in query optimization and
Postgres indexes
Nancy
AI-based expert in database experiments.
Conducts experiments and presents
results to human and artificial DBAs
Sign up for early access:
http://Postgres.ai
AI-based cloud-friendly platform to automate database administration
42
Steve
AI-based expert in capacity planning and
database tuning
Joe
AI-based expert in query optimization and
Postgres indexes
Nancy
AI-based expert in database experiments.
Conducts experiments and presents
results to human and artificial DBAs
Sign up for early access:
http://Postgres.ai
Demo 1
Postgres.AI GUI live demonstration
Metastorage +
GUI
Postgres.AI architecture
Databases being
observed
AWS S3
dump/
backup
nancyprepare-workload
Nancy
Steve
Joe
AWS EC2 docker machines
Nancy CLI
A human engineer can use:
● GUI
● CLI
● Chat
Part 4. Nancy CLI
Demo 1
Nancy CLI live demonstration (local + AWS)
Meet Nancy CLI (open source)
Nancy CLI https://github.com/postgres-ai/nancy
● custom docker image (Postgres with extensions & tools)
● nancy prepare-workload to convert Postgres logs (now only .csv)
to workload binary file
● nancy run to run experiments
● able to run locally (any machine) on in EC2 spot instance (low price!),
including i3.*** instances (with NVMe)
● fully automated management of EC2 spots
What’s inside the docker container?
Source: https://github.com/postgres-ai/nancy/tree/master/docker
Image: https://hub.docker.com/r/postgresmen/postgres-with-stuff/
Inside:
● Ubuntu 16.04
● Postgres (now 9.6 or 10)
● postgres_dba (for manual debugging)
● pg_stat_statements enabled
● auto_explain enabled (all queries, with timing)
● pgreplay
● pgBadger
● pg_stat_kcache (soon)
● additional utilities
Part 5. The future of Nancy CLI
Various ways to create an experimental database
● plain text pg_dump
○ restoration is very slow (1 vcpu utilized)
○ “logical” – physical structure is lost (cannot experiment with bloat, etc)
○ small (if compressed)
○ “snapshot” only
● pg_dump with either -Fd (“directory”) or -Fc (“custom”):
○ restoration is faster (multiple vCPUs, -j option)
○ “logical” (again: bloat, physical layout is “lost”)
○ small (because compressed)
○ “snapshot” only
● pg_basebackup + WALs, point-in-time recovery (PITR), possibly with help from WAL-E, WAL-G, pgBackRest
○ less reliable, sometimes there issues (especially if 3rd party tools involved - e.g. WAL-E & WAL-G don’t
support tablespaces, there are bugs sometimes, etc)
○ “physical”: bloat and physical structure is preserved
○ not small – ~ size of the DB
○ can “walk in time” (PITR)
○ requires warm-up procedure (data is not in the memory!)
● AWS RDS: create a replica + promote it
○ no Spots :-/
○ Lazy Load is tricky (it looks like the DB is there but it’s very slow – warm-up is needed)
How can we speed up experimental runs?
● Prepare the EC2 instance(s) in advance and keep it
● Prepare EBS volume(s) only (perhaps, using an instance of the different
type) and keep it ready. When attached to the new instance, do warm-up
● Resource re-usage:
○ reuse docker container
○ reuse EC2 instance
○ serialize experimental runs serialization (DDL Do/Undo; VACUUM FULL; cleanup)
● Partial database snapshots (dump/restore only needed tables)
The future development of Nancy CLI
● Speedup DB creation
● Support GCP
● More artifacts delivered: pg_stat_kcache, etc
● nancy see-report to print the summary + top-30 queries
● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for
top-30 queries, ordered by by total time based on the 1st report)
● Postgres 11
● pgbench -i for database initialization
● pgbench to generate multithreaded synthetic workload
● Workload analysis: automatically detect “N+1 SELECT” when running workload
● Better support for the serialization of experimental runs
● Better support for multiple runs:
○ interval with step
○ gradient descent
● Provide costs estimation (time + money)
● Rewrite in Python or Go
The future development of Nancy CLI
● Speedup DB creation
● Support GCP
● More artifacts delivered: pg_stat_kcache, etc
● nancy see-report to print the summary + top-30 queries
● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for
top-30 queries, ordered by by total time based on the 1st report)
● Postgres 11
● pgbench -i for database initialization
● pgbench to generate multithreaded synthetic workload
● Workload analysis: automatically detect “N+1 SELECT” when running workload
● Better support for the serialization of experimental runs
● Better support for multiple runs:
○ interval with step
○ gradient descent
● Provide costs estimation (time + money)
● Rewrite in Python or Go
Contributions welcome!
Thank you!
Nikolay Samokhvalov
ru@postgresql.org
twitter: @postgresmen
Postgres.ai
https://github.com/postgres-ai/nancy
54

Nancy CLI. Automated Database Experiments

  • 1.
    NANCY CLI: a unifiedway to manage Database Experiments in clouds Postgres.AI Nikolay Samokhvalov twitter: @postgresmen email: ru@postgresql.org
  • 2.
    About me Postgres experience:12+ years (database systems: 17+) Founder and CTO of 3 startups (total 30M+ users), all based on Postgres Founder of #RuPostgres (1700+ members on Meetup.com, 2nd largest globally) Re-launched consulting practice in the SF Bay Area http://PostgreSQL.support Founder of Postgres.AI – the Postgres platform to automate what is not yet automated Twitter: @postgresmen Email: ru@postgresql.org
  • 3.
  • 4.
    Pre-story. Finding thelargest tables in a database How many times did you google things like this?
  • 5.
    Finding the largesttables, a semi-automated way postgres_dba – The missing set of useful tools for Postgres https://github.com/NikolayS/postgres_dba
  • 6.
    Finding the largesttables, a semi-automated way Report #2: sizes of the tables in the current database
  • 7.
    Installation of postgres_dba Installationis trivial: Important: psql version 10 is needed. (install postgresql-client-10 package, see README) Server version may be older (Use ssh tunnel to connect to remote servers, see README) 7
  • 8.
    Part 1. Whydo we need automated DB experiments?
  • 9.
    - Let’s dodefault_statistics_target = 1000! - Let’s do random_page_cost = 1! - I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore, ¾ is much better! - Let’s add this index here! - Let’s use partitioning for this table! - Let’s don’t allow having >100,000 dead tuples in a table! - Etc, etc, etc... How do we do performance improvements nowadays?
  • 10.
    The Whole Truth- Let’s do default_statistics_target = 1000! - Let’s do random_page_cost = 1! - I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore, ¾ is much better! - Let’s add this index here! - Let’s use partitioning for this table! - Let’s don’t allow having >100,000 dead tuples in a table! - Etc, etc, etc...
  • 11.
    The Whole Truth Doesit give better (or at least not worse) performance for all queries? Is this value the best for our database & workload? Does it give a real gain for our database & workload? - Let’s do default_statistics_target = 1000! - Let’s do random_page_cost = 1! - I’ve heard that setting shared_buffers to ¼ of RAM doesn’t rock anymore, ¾ is much better! - Let’s add this index here! - Let’s use partitioning for this table! - Let’s don’t allow having >100,000 dead tuples in a table! - Etc, etc, etc...
  • 12.
    ...and more - postgres_dbashows that the bloat level is 35% for this 1B-rows table. Is it good or bad? How bad? (Or how good?) - When do I need to add more RAM to my database server to keep good performance characteristics? - What will happen when more users come to use my app? - Will i3.xlarge handle 3,000 UPDATEs per second? - Etc, etc, etc
  • 13.
    How do wemake changes now? Option #1 Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this query is slow on production. Let’s fix that! ⇒ the problem is already there...
  • 14.
    How do wemake changes now? Option #1 Oh, we see (from monitoring, pg_stat_statements, pgBadger, etc) that this query is slow on production. Let’s fix that! ⇒ the problem is already there... Option #2 DBA: well, this will be slow. Add this index! And we can do even better, let’s use a partial index here! ⇒ good if the DBA is really experienced and/or verified ideas on a DB clone. But how often is it so? And how many queries were checked?
  • 15.
    Towards the betterfuture Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API
  • 16.
    Towards the betterfuture Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API Option 2: a human or artificial DBAs has an idea for improvement. Let’s verify it! Again with the same “what-if” API
  • 17.
    Towards the betterfuture Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API Option 2: a human or artificial DBAs has an idea for improvement. Let’s verify it! Again with the same “what-if” API ⇒ continuous database administration
  • 18.
    Towards the betterfuture Option 1: we’re going to change something. Let’s verify *all* query groups form pg_stat_statement and see how performance is changed – using some “what-if” API Option 2: a human or artificial DBAs has an idea for improvement. Let’s verify it! Again with the same “what-if” API ⇒ continuous database administration Bonus: let’s use the same “what-if” API to get more knowledge for AI we are building (to train ML models)
  • 19.
    So, why dowe need to automate DB experiments? If we see that our change is good for one or several queries it doesn’t mean that it is so for all queries. Without performing the deep SQL query analysis (analyzing “all” query groups on pg_stat_statement’s Top-N) we are blind. And database administration is a black magic.
  • 20.
    Part 2. Existingworks and tools
  • 21.
    Existing works andsolutions Andy Pavlo, CMU: ● “What is a Self-Driving Database Management System?” great overview of history and existing works, an entry point to learn what’s done (good research papers, Microsoft works, etc) ● PelotonDB, ottertune – great research projects
  • 22.
    Existing works andsolutions Andy Pavlo, CMU: ● “What is a Self-Driving Database Management System?” great overview of history and existing works, an entry point to learn what’s done (good research papers, Microsoft works, etc) ● PelotonDB, ottertune – great research projects Oracle’s RAT (Real Application Testing) – Database Replay + SQL Performance Analyzer: ● Real Application Testing, Oracle 18c ● Sample report ● “Oracle Real Application Testing Delivers 224% ROI”
  • 23.
    How to automatedatabase optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib DIY automated pipeline for DB optimization
  • 24.
    DIY automated pipelinefor DB optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib The basis for Nancy
  • 25.
    DIY automated pipelinefor DB optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib pgBadger: ● Grouping queries can be implemented better (see pg_query) ● Makes all queries lower cased (hurts "camelCased" names) ● Doesn’t really support plans (auto_explain)
  • 26.
    DIY automated pipelinefor DB optimization How to automate database optimization using ecosystem tools and AWS? Analyze: ● pg_stat_statements ● auto_explan ● pgBadger to parse logs, use JSON output ● pg_query to group queries better ● pg_stat_kcache to analyze FS-level ops Configuration: ● annotated.conf, pgtune, pgconfigurator, postgresqlco.nf ● ottertune Suggested indexes (internal “what-if” API w/o actual execution) ● (useful: pgHero, POWA, HypoPG, dexter, plantuner) Conduct experiments: ● pgreplay to replay logs (different log_line_prefix, you need to handle it) ● EC2 spot instances Machine learning ● MADlib pgBadger: ● Grouping queries can be implemented better (see pg_query) ● Makes all queries lower cased (hurts "camelCased" names) ● Doesn’t really support plans (auto_explain) pgreplay and pgBadger are not friends, require different log formats
  • 27.
  • 28.
    Already automated: ● Setup/tunehardware, OS, FS ● Provision Postgres instances ● Create replicas ● High Availability: detect failures and switch to replicas ● Create backups ● Basic monitoring 28
  • 29.
    Already automated: ● Postgresparameters tuning ● Query analysis and optimization ● Index set optimization ● Detailed monitoring ● Verify optimization ideas ● Benchmarks ● Regression&performance CI-like testing ● Setup/tune hardware, OS, FS ● Provision Postgres instances ● Create replicas ● High Availability: detect failures and switch to replicas ● Create backups ● Basic monitoring Little to zero level of automation: 29
  • 30.
    Already automated: ● Postgresparameters tuning ● Query analysis and optimization ● Index set optimization ● Detailed monitoring ● Verify optimization ideas ● Benchmarks ● Regression&performance CI-like testing ● Setup/tune hardware, OS, FS ● Provision Postgres instances ● Create replicas ● High Availability: detect failures and switch to replicas ● Create backups ● Basic monitoring Little to zero level of automation: 30 Can be done with Database Experiments
  • 31.
    What is theDatabase Experiment? The Database Experiment – a set of actions...
  • 32.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis
  • 33.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database
  • 34.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload
  • 35.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment
  • 36.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment with an optional change of the database and environment (called “delta”).
  • 37.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment with an optional change of the database and environment (called “delta”). An experiment may consist of one or more experimental runs.
  • 38.
    What is theDatabase Experiment? The Database Experiment – a set of actions to perform the deep SQL query analysis for specified database against specified workload in specified environment with an optional change of the database and environment (called “delta”). An experiment may consist of one or more experimental runs. To analyze the impact of some delta, we need at least two runs, one of them being “clean run” (w/o delta).
  • 39.
    What is theDatabase Experiment? The input of a experimental run: ● Environment ○ Location (on-premise or GCP or AWS), hardware (CPU, RAM, disks) ○ System (OS, file system) ○ Postgres version ○ Postgres configuration ● Database snapshot. Can be: ○ A dump (regular or in directory format) ○ A physical archive (pg_basebackup or pgBackRest/WAL-E/WAL-G/…) ○ A replica promoted for experiments ○ Some synthetic one, a generated database (“create table as …”, pgbench -i, etc) ● Workload. Can be: ○ Synthetic (custom SQL), single-threaded ○ Synthetic (custom SQL), multi-threaded (with pgbench) ○ “Real workload” (based on logs) ● [Optional] Delta: ○ Configuration change(s) (e.g.: shared_buffers = 16GB) ○ Some DDL (e.g.: `create index …`). “Undo” DDL is required in this case `drop index …`) to enable serialization of experiments
  • 40.
    What is theDatabase Experiment? The output of a experimental run: ● the contents of basic pg_stat_*** (e.g. pg_stat_user_tables) ● the contents of pg_stat_statements ● the contents of pg_stat_kcache ● the PostgreSQL detailed log (with auto_explain turned on) ● the pgBadger’s extended report in JSON format ● the Postgres config at time of applying the workload
  • 41.
    AI-based cloud-friendly platformto automate database administration 41 Steve AI-based expert in capacity planning and database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in database experiments. Conducts experiments and presents results to human and artificial DBAs Sign up for early access: http://Postgres.ai
  • 42.
    AI-based cloud-friendly platformto automate database administration 42 Steve AI-based expert in capacity planning and database tuning Joe AI-based expert in query optimization and Postgres indexes Nancy AI-based expert in database experiments. Conducts experiments and presents results to human and artificial DBAs Sign up for early access: http://Postgres.ai
  • 43.
    Demo 1 Postgres.AI GUIlive demonstration
  • 44.
    Metastorage + GUI Postgres.AI architecture Databasesbeing observed AWS S3 dump/ backup nancyprepare-workload Nancy Steve Joe AWS EC2 docker machines Nancy CLI A human engineer can use: ● GUI ● CLI ● Chat
  • 45.
  • 46.
    Demo 1 Nancy CLIlive demonstration (local + AWS)
  • 47.
    Meet Nancy CLI(open source) Nancy CLI https://github.com/postgres-ai/nancy ● custom docker image (Postgres with extensions & tools) ● nancy prepare-workload to convert Postgres logs (now only .csv) to workload binary file ● nancy run to run experiments ● able to run locally (any machine) on in EC2 spot instance (low price!), including i3.*** instances (with NVMe) ● fully automated management of EC2 spots
  • 48.
    What’s inside thedocker container? Source: https://github.com/postgres-ai/nancy/tree/master/docker Image: https://hub.docker.com/r/postgresmen/postgres-with-stuff/ Inside: ● Ubuntu 16.04 ● Postgres (now 9.6 or 10) ● postgres_dba (for manual debugging) ● pg_stat_statements enabled ● auto_explain enabled (all queries, with timing) ● pgreplay ● pgBadger ● pg_stat_kcache (soon) ● additional utilities
  • 49.
    Part 5. Thefuture of Nancy CLI
  • 50.
    Various ways tocreate an experimental database ● plain text pg_dump ○ restoration is very slow (1 vcpu utilized) ○ “logical” – physical structure is lost (cannot experiment with bloat, etc) ○ small (if compressed) ○ “snapshot” only ● pg_dump with either -Fd (“directory”) or -Fc (“custom”): ○ restoration is faster (multiple vCPUs, -j option) ○ “logical” (again: bloat, physical layout is “lost”) ○ small (because compressed) ○ “snapshot” only ● pg_basebackup + WALs, point-in-time recovery (PITR), possibly with help from WAL-E, WAL-G, pgBackRest ○ less reliable, sometimes there issues (especially if 3rd party tools involved - e.g. WAL-E & WAL-G don’t support tablespaces, there are bugs sometimes, etc) ○ “physical”: bloat and physical structure is preserved ○ not small – ~ size of the DB ○ can “walk in time” (PITR) ○ requires warm-up procedure (data is not in the memory!) ● AWS RDS: create a replica + promote it ○ no Spots :-/ ○ Lazy Load is tricky (it looks like the DB is there but it’s very slow – warm-up is needed)
  • 51.
    How can wespeed up experimental runs? ● Prepare the EC2 instance(s) in advance and keep it ● Prepare EBS volume(s) only (perhaps, using an instance of the different type) and keep it ready. When attached to the new instance, do warm-up ● Resource re-usage: ○ reuse docker container ○ reuse EC2 instance ○ serialize experimental runs serialization (DDL Do/Undo; VACUUM FULL; cleanup) ● Partial database snapshots (dump/restore only needed tables)
  • 52.
    The future developmentof Nancy CLI ● Speedup DB creation ● Support GCP ● More artifacts delivered: pg_stat_kcache, etc ● nancy see-report to print the summary + top-30 queries ● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for top-30 queries, ordered by by total time based on the 1st report) ● Postgres 11 ● pgbench -i for database initialization ● pgbench to generate multithreaded synthetic workload ● Workload analysis: automatically detect “N+1 SELECT” when running workload ● Better support for the serialization of experimental runs ● Better support for multiple runs: ○ interval with step ○ gradient descent ● Provide costs estimation (time + money) ● Rewrite in Python or Go
  • 53.
    The future developmentof Nancy CLI ● Speedup DB creation ● Support GCP ● More artifacts delivered: pg_stat_kcache, etc ● nancy see-report to print the summary + top-30 queries ● nancy compare-reports to print the “diff” for 2+ reports (the summary + numbers for top-30 queries, ordered by by total time based on the 1st report) ● Postgres 11 ● pgbench -i for database initialization ● pgbench to generate multithreaded synthetic workload ● Workload analysis: automatically detect “N+1 SELECT” when running workload ● Better support for the serialization of experimental runs ● Better support for multiple runs: ○ interval with step ○ gradient descent ● Provide costs estimation (time + money) ● Rewrite in Python or Go Contributions welcome!
  • 54.
    Thank you! Nikolay Samokhvalov ru@postgresql.org twitter:@postgresmen Postgres.ai https://github.com/postgres-ai/nancy 54