Slides from the Brighton PostgreSQL meetup presentation. An all around PostgreSQL exploration. The rocky physical layer, the treacherous MVCC’s swamp and the buffer manager’s garden.
PostgreSQL - backup and recovery with large databasesFederico Campoli
Life on a rollercoaster, backup and recovery with large databases
Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Dealing with large databases is always a challenge. The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
This document is an introduction to PostgreSQL presented by Federico Campoli to the Brighton PostgreSQL Users Group. It covers the history and development of PostgreSQL, its features including data types, JSON/JSONB support, and performance comparisons. The presentation includes sections on the history of PostgreSQL, its features and capabilities, NOSQL support using JSON/JSONB, and concludes with a wrap up on PostgreSQL and related projects.
The document discusses backup and recovery strategies in PostgreSQL. It describes logical backups using pg_dump, which takes a snapshot of the database and outputs SQL scripts or custom files. It also describes physical backups using write-ahead logging (WAL) archiving and point-in-time recovery (PITR). With WAL archiving enabled, PostgreSQL archives WAL files, allowing recovery to any point between backups by restoring the backup files and replaying the WAL logs. The document provides steps for performing PITR backups, including starting the backup, copying files, stopping the backup, and recovery by restoring files and using a recovery.conf file.
The document discusses PostgreSQL's physical storage structure. It describes the various directories within the PGDATA directory that stores the database, including the global directory containing shared objects and the critical pg_control file, the base directory containing numeric files for each database, the pg_tblspc directory containing symbolic links to tablespaces, and the pg_xlog directory which contains write-ahead log (WAL) segments that are critical for database writes and recovery. It notes that tablespaces allow spreading database objects across different storage devices to optimize performance.
The document discusses PostgreSQL's internal architecture and components. It describes the data area, which stores data files on disk, and key directories like pg_xlog for write-ahead logs. It explains the buffer cache and clock sweep algorithm for managing memory, and covers the multi-version concurrency control (MVCC) which allows simultaneous transactions. TOAST storage is also summarized, which stores large data values externally.
PostgreSQL - backup and recovery with large databasesFederico Campoli
Life on a rollercoaster, backup and recovery with large databases
Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Dealing with large databases is always a challenge. The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
This document is an introduction to PostgreSQL presented by Federico Campoli to the Brighton PostgreSQL Users Group. It covers the history and development of PostgreSQL, its features including data types, JSON/JSONB support, and performance comparisons. The presentation includes sections on the history of PostgreSQL, its features and capabilities, NOSQL support using JSON/JSONB, and concludes with a wrap up on PostgreSQL and related projects.
The document discusses backup and recovery strategies in PostgreSQL. It describes logical backups using pg_dump, which takes a snapshot of the database and outputs SQL scripts or custom files. It also describes physical backups using write-ahead logging (WAL) archiving and point-in-time recovery (PITR). With WAL archiving enabled, PostgreSQL archives WAL files, allowing recovery to any point between backups by restoring the backup files and replaying the WAL logs. The document provides steps for performing PITR backups, including starting the backup, copying files, stopping the backup, and recovery by restoring files and using a recovery.conf file.
The document discusses PostgreSQL's physical storage structure. It describes the various directories within the PGDATA directory that stores the database, including the global directory containing shared objects and the critical pg_control file, the base directory containing numeric files for each database, the pg_tblspc directory containing symbolic links to tablespaces, and the pg_xlog directory which contains write-ahead log (WAL) segments that are critical for database writes and recovery. It notes that tablespaces allow spreading database objects across different storage devices to optimize performance.
The document discusses PostgreSQL's internal architecture and components. It describes the data area, which stores data files on disk, and key directories like pg_xlog for write-ahead logs. It explains the buffer cache and clock sweep algorithm for managing memory, and covers the multi-version concurrency control (MVCC) which allows simultaneous transactions. TOAST storage is also summarized, which stores large data values externally.
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
The document outlines an introduction to databases presentation using PostgreSQL. It includes an introduction to databases concepts, an overview of PostgreSQL, demonstrations of SQL commands like CREATE TABLE, INSERT, SELECT and JOIN in psql, and discussions of database administration and GUI tools. Exercises are provided for attendees to practice the concepts covered.
Compare and contrast RDF triple stores and NoSQL: are triples stores NoSQL or not?
Talk given 2011-09-08 tot he BigData/NoSQL meetup at Bristol University.
Knowing your Garbage Collector / Python Madridfcofdezc
The document discusses garbage collection in Python. It introduces concepts like the heap, mutator, and collector. It then describes the reference counting algorithm used by CPython as well as techniques for handling cycles. It also discusses the mark and sweep algorithm used by PyPy as an alternative approach.
The document discusses analyzing clones and macro co-changes (MCC) in codebases to determine if cloned files tend to change consistently with other cloned files they are related to. The analysis found that cloned files were more likely to co-change with other cloned files they have relations with, rather than other cloned files. When analyzing bug fixes, files that had MCC with their clones were more likely to be modified than files only changing with other cloned files. This suggests clones may need consistent changes, though future work is needed to better define consistent changes.
DataPlotly is a plugin for QGIS that allows to create D3 like plots from spatial data. It is build on top of plotly, a javascript library which offers easy API for many languages such as Python, R, Matlab and NodeJS.
The plugin was created back in 2017 for the upcoming QGIS 3 version: today the plugin has been downloaded more than 50,000 times.
Creating plots is out of the main scopes of QGIS but thanks to the simple Python API it is easy enough to create additional scripts and plugins. Thanks to these APIs, DataPlotly is today a well maintained Python plugin with a growing community of developers, users and testers.
DataPlotly plots are completely interactive so that plot elements are directly linked with map items; therefore the user is able to query map items from the main plot canvas.
Thanks to a crowdfunding campaign launched in March 2019 during the annual QGIS User Conference, the functionalities of DataPlotly were extended: a complete refactoring of the code, more plots but especially the creation of plots in the layout composer.
More and more people are using the plugin to analyze the data and to create complex output reports of data (e.g. the Covid-19 pandemic
PostgreSQL 9.5 introduces many new capabilities that enhance its functionality as a relational database with NoSQL features. Key additions include full JSON support with CRUD operations and pretty printing, row-level security policies, UPSERT functionality, enhanced password management, improved replication after failovers, foreign table constraints and inheritance, expanded analytics with CUBE and ROLLUP, new versions of PostGIS with additional geospatial tools, and various administrative improvements. These new features further strengthen PostgreSQL's position as the most advanced open source database platform.
The document summarizes the state of the Go programming language in June 2014 and outlines plans for its future development. It discusses growth in Go's user base and communities, highlights from the recently released Go 1.3, and previews features planned for Go 1.4, including canonical import paths, internal packages, moving the standard library, and improvements to tools, runtime performance, and supported platforms. It aims to provide an overview of where Go is currently and what may be coming in the next year.
The document discusses reading and writing files in Python. It provides examples of opening files for reading, writing, and appending. It demonstrates how to read an entire file, individual lines, and loop through lines. It also shows how to write strings to files and close files once writing is complete. Additional topics covered include a template for reading files line by line and examples of counting lines, words, and characters in a file.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the RDF query language SPARQL from a user's perspective.
Apache Spark continues to grow in popularity - due to advanced analytics/machine learning, high performance processing, real-time streaming and multiple language support. Big Data technology is adding more data processing options to an already long list of legacy databases and file systems. As a result, enterprises continue to look for effective and approachable ways to federate all these data sources to solve business information needs. One under-appreciated feature of Spark is its ability to help quickly and powerfully enable federated data access. This presentation will discuss and demonstrate using Spark to query/combine multiple disparate data sources. We will see how to access the various data sources from Spark, normalize to Spark RDDs and combine for processing. The demo will show combining sources such as HDFS, JSON files, HBase, Hive and PostgreSQL and write the result back to a Data Mart for analysis. Also we will show the use of SparkSQL to access federated data in Spark through the Spark Thrift Server using the the Tableau BI tool.
This document provides information about biological databases including:
- Different types of biological databases such as relational, object-oriented, hierarchical, and hybrid systems.
- Common uses of biological databases including annotation searches, homology searches, pattern searches, predictions, and comparisons.
- Examples of database entries in common formats like GenBank, EMBL, and SwissProt that show the layout and key fields.
This document discusses RDSTK, an R package wrapper for the Data Science Toolkit (DSTK) API. It provides functions to interface with DSTK in R, such as street2coordinates() and text2people(). The document also discusses building R packages and includes examples of using DSTK functions. It concludes by acknowledging contributors to the R community and tools that helped develop RDSTK.
Developing PostgreSQL Performance, Simon RiggsFuenteovejuna
This document discusses PostgreSQL performance improvements over multiple versions from 7.3 to 8.4. It shows graphs demonstrating significant performance gains in peak read-only and read-write transaction rates. It describes the contributors to these gains as improvements to buffer management, locking, caching, and other internal aspects of the database engine. It also outlines ongoing focus areas and potential for further 10-20% gains in transaction processing and data warehouse workloads through continued optimization.
Este documento describe un experimento para medir la tensión superficial de líquidos a diferentes temperaturas utilizando el método de ascenso capilar. Se coloca un tubo capilar en el líquido y la tensión superficial hace que el líquido ascienda por el tubo hasta alcanzar un equilibrio entre la fuerza ascendente debida a la tensión superficial y la fuerza descendente debida al peso del líquido. Midiendo la altura alcanzada, se puede calcular la tensión superficial utilizando la ecuación dada. El procedimiento experimental incluye medir la altura
The document discusses PostgreSQL query planning and tuning. It covers the key stages of query execution including syntax validation, query tree generation, plan estimation, and execution. It describes different plan nodes like sequential scans, index scans, joins, and sorts. It emphasizes using EXPLAIN to view and analyze the execution plan for a query, which can help identify performance issues and opportunities for optimization. EXPLAIN shows the estimated plan while EXPLAIN ANALYZE shows the actual plan after executing the query.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
The document outlines an introduction to databases presentation using PostgreSQL. It includes an introduction to databases concepts, an overview of PostgreSQL, demonstrations of SQL commands like CREATE TABLE, INSERT, SELECT and JOIN in psql, and discussions of database administration and GUI tools. Exercises are provided for attendees to practice the concepts covered.
Compare and contrast RDF triple stores and NoSQL: are triples stores NoSQL or not?
Talk given 2011-09-08 tot he BigData/NoSQL meetup at Bristol University.
Knowing your Garbage Collector / Python Madridfcofdezc
The document discusses garbage collection in Python. It introduces concepts like the heap, mutator, and collector. It then describes the reference counting algorithm used by CPython as well as techniques for handling cycles. It also discusses the mark and sweep algorithm used by PyPy as an alternative approach.
The document discusses analyzing clones and macro co-changes (MCC) in codebases to determine if cloned files tend to change consistently with other cloned files they are related to. The analysis found that cloned files were more likely to co-change with other cloned files they have relations with, rather than other cloned files. When analyzing bug fixes, files that had MCC with their clones were more likely to be modified than files only changing with other cloned files. This suggests clones may need consistent changes, though future work is needed to better define consistent changes.
DataPlotly is a plugin for QGIS that allows to create D3 like plots from spatial data. It is build on top of plotly, a javascript library which offers easy API for many languages such as Python, R, Matlab and NodeJS.
The plugin was created back in 2017 for the upcoming QGIS 3 version: today the plugin has been downloaded more than 50,000 times.
Creating plots is out of the main scopes of QGIS but thanks to the simple Python API it is easy enough to create additional scripts and plugins. Thanks to these APIs, DataPlotly is today a well maintained Python plugin with a growing community of developers, users and testers.
DataPlotly plots are completely interactive so that plot elements are directly linked with map items; therefore the user is able to query map items from the main plot canvas.
Thanks to a crowdfunding campaign launched in March 2019 during the annual QGIS User Conference, the functionalities of DataPlotly were extended: a complete refactoring of the code, more plots but especially the creation of plots in the layout composer.
More and more people are using the plugin to analyze the data and to create complex output reports of data (e.g. the Covid-19 pandemic
PostgreSQL 9.5 introduces many new capabilities that enhance its functionality as a relational database with NoSQL features. Key additions include full JSON support with CRUD operations and pretty printing, row-level security policies, UPSERT functionality, enhanced password management, improved replication after failovers, foreign table constraints and inheritance, expanded analytics with CUBE and ROLLUP, new versions of PostGIS with additional geospatial tools, and various administrative improvements. These new features further strengthen PostgreSQL's position as the most advanced open source database platform.
The document summarizes the state of the Go programming language in June 2014 and outlines plans for its future development. It discusses growth in Go's user base and communities, highlights from the recently released Go 1.3, and previews features planned for Go 1.4, including canonical import paths, internal packages, moving the standard library, and improvements to tools, runtime performance, and supported platforms. It aims to provide an overview of where Go is currently and what may be coming in the next year.
The document discusses reading and writing files in Python. It provides examples of opening files for reading, writing, and appending. It demonstrates how to read an entire file, individual lines, and loop through lines. It also shows how to write strings to files and close files once writing is complete. Additional topics covered include a template for reading files line by line and examples of counting lines, words, and characters in a file.
I used these slides for an introductory lecture (90min) to a seminar on SPARQL. This slideset introduces the RDF query language SPARQL from a user's perspective.
Apache Spark continues to grow in popularity - due to advanced analytics/machine learning, high performance processing, real-time streaming and multiple language support. Big Data technology is adding more data processing options to an already long list of legacy databases and file systems. As a result, enterprises continue to look for effective and approachable ways to federate all these data sources to solve business information needs. One under-appreciated feature of Spark is its ability to help quickly and powerfully enable federated data access. This presentation will discuss and demonstrate using Spark to query/combine multiple disparate data sources. We will see how to access the various data sources from Spark, normalize to Spark RDDs and combine for processing. The demo will show combining sources such as HDFS, JSON files, HBase, Hive and PostgreSQL and write the result back to a Data Mart for analysis. Also we will show the use of SparkSQL to access federated data in Spark through the Spark Thrift Server using the the Tableau BI tool.
This document provides information about biological databases including:
- Different types of biological databases such as relational, object-oriented, hierarchical, and hybrid systems.
- Common uses of biological databases including annotation searches, homology searches, pattern searches, predictions, and comparisons.
- Examples of database entries in common formats like GenBank, EMBL, and SwissProt that show the layout and key fields.
This document discusses RDSTK, an R package wrapper for the Data Science Toolkit (DSTK) API. It provides functions to interface with DSTK in R, such as street2coordinates() and text2people(). The document also discusses building R packages and includes examples of using DSTK functions. It concludes by acknowledging contributors to the R community and tools that helped develop RDSTK.
Developing PostgreSQL Performance, Simon RiggsFuenteovejuna
This document discusses PostgreSQL performance improvements over multiple versions from 7.3 to 8.4. It shows graphs demonstrating significant performance gains in peak read-only and read-write transaction rates. It describes the contributors to these gains as improvements to buffer management, locking, caching, and other internal aspects of the database engine. It also outlines ongoing focus areas and potential for further 10-20% gains in transaction processing and data warehouse workloads through continued optimization.
Este documento describe un experimento para medir la tensión superficial de líquidos a diferentes temperaturas utilizando el método de ascenso capilar. Se coloca un tubo capilar en el líquido y la tensión superficial hace que el líquido ascienda por el tubo hasta alcanzar un equilibrio entre la fuerza ascendente debida a la tensión superficial y la fuerza descendente debida al peso del líquido. Midiendo la altura alcanzada, se puede calcular la tensión superficial utilizando la ecuación dada. El procedimiento experimental incluye medir la altura
1) El documento habla sobre la termodinámica de la combustión, definiendo la reacción de combustión y los requisitos para que ocurra, como la mezcla íntima del combustible y comburente. 2) Explica los tipos de combustión, combustibles y clasificación de quemadores. 3) Describe los factores que afectan la eficiencia térmica de las calderas y las pérdidas de calor principales.
Este documento presenta una serie de problemas de termodinámica relacionados con fluidos, gases ideales y cambios de estado. En el problema 3.1 se pregunta si es posible transferir energía a un fluido incompresible en forma de trabajo y cuál sería el cambio en la energía interna. En el problema 3.2 se pide calcular la presión a la que debe comprimirse el agua para que su densidad cambie en un 1%, dadas sus propiedades. En el problema 3.3 se pide derivar una expresión para la compresibilidad isotérmica consist
Van ness problemas termo cap 1 orihuela contreras joseSoldado Aliado<3
El documento presenta 1.9 problemas de termodinámica y mecánica de fluidos resueltos. En el primer problema se calcula la aceleración de la gravedad y sus unidades en un sistema de unidades específico. En el segundo problema se calcula la masa aproximada de pesos necesarios para medir presiones hasta 3500 bares con un manómetro de peso muerto. En el último problema se calcula la fuerza ejercida sobre un gas confinado, su presión y el trabajo realizado por el gas al expandirse y empujar un pistón.
Interpretación topográfica y elementos básicos de foto interpretaciónSoldado Aliado<3
Este documento describe los elementos básicos de la interpretación topográfica y fotográfica, incluyendo las vías de comunicación, hidrografía, vegetación y centros poblados. Explica cómo representar estos elementos usando diferentes colores, y cómo clasificar el terreno según factores como los tonos y texturas. También define términos como zona, roca y coluvial, y describe el proceso de crear un perfil topográfico a partir de curvas de nivel en un mapa, para ilustrar las variaciones en el relieve entre dos puntos
El documento presenta información sobre el motor Stirling de combustión externa. Explica que el motor Stirling fue inventado por Robert Stirling en 1816 y funciona mediante la expansión y contracción de un gas debido a diferencias de temperatura. También describe brevemente la historia de los motores de combustión externa y ofrece detalles sobre el ciclo termodinámico y funcionamiento del motor Stirling.
Este documento presenta una serie de problemas de termodinámica relacionados con fluidos, gases ideales y cambios de estado. En el problema 3.1 se pregunta si es posible transferir energía a un fluido incompresible en forma de trabajo y cómo cambia su energía interna al variar la presión. En el problema 3.2 se pide calcular la presión a la que debe comprimirse agua para que su densidad cambie en un 1%, dadas sus propiedades. En el problema 3.3 se pide derivar una expresión para la compresibilidad isotérmica consist
Este documento presenta tareas para el curso de dibujo técnico. Contiene instrucciones sobre cómo trazar líneas, inscribir caracteres, dividir circunferencias en partes iguales y realizar conjugaciones. El documento proporciona 30 variantes de tareas con múltiples problemas para que los estudiantes desarrollen sus habilidades en dibujo técnico siguiendo las normas y métodos adecuados.
El documento describe los motores de combustión interna, incluyendo los motores Otto y Diesel de cuatro tiempos. Explica el funcionamiento de cada uno a través de las cuatro etapas del ciclo y los componentes principales como el bloque motor, la culata y el cárter. También analiza los impactos ambientales y posibles soluciones.
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
PGDay UK 2016 -- Performace for queries with groupingAlexey Bashtanov
This document summarizes optimization techniques for queries involving grouping and aggregation in PostgreSQL. It discusses avoiding sorts through hash aggregation, count distinct optimization, handling ordered aggregates, summation optimization for data types and zero values, and denormalized data aggregation. Specific techniques covered include increasing work_mem, two-level hash aggregation, extensions for count distinct and hyperloglog, sorting arrays separately, and using MIN to correlate denormalized data in grouping.
This document summarizes a presentation about optimizing performance between PostgreSQL and JDBC.
The presenter discusses several strategies for improving query performance such as using prepared statements, avoiding closing statements, setting fetch sizes appropriately, and using batch inserts with COPY for large amounts of data. Some potential issues that can cause performance degradation are also covered, such as parameter type changes invalidating prepared statements and unexpected plan changes after repeated executions.
The presentation includes examples and benchmarks demonstrating the performance impact of different approaches. The overall message is that prepared statements are very important for performance but must be used carefully due to edge cases that can still cause issues.
PostgreSQL has kept up the momentum around JSON with version 9.4 featuring JSONB as demand for working with unstructured data continues to grow. In this talk delivered during Postgres Open 2014, Vibhor Kumar, principal systems engineer at EnterpriseDB, offered some scenarios for working with JSON in PostgreSQL and demonstrated performance metrics. This session also gave some instruction on how to use different operations and explored comparisons to BSON.
Nine Circles of Inferno or Explaining the PostgreSQL VacuumAlexey Lesovsky
The document describes the nine circles of the PostgreSQL vacuum process. Circle I discusses the postmaster process, which initializes shared memory and launches the autovacuum launcher and worker processes. Circle II focuses on the autovacuum launcher, which manages worker processes and determines when to initiate vacuuming for different databases. Circle III returns to the postmaster process and how it launches autovacuum workers. Circle IV discusses what occurs within an autovacuum worker process after it is launched, including initializing, signaling the launcher, scanning relations, and updating databases. Circle V delves into processing a single database by an autovacuum worker.
Este documento clasifica los diferentes tipos de números reales, incluyendo números naturales, enteros, primos, compuestos, racionales e irracionales. Los números naturales son los números 0, 1, 2, 3, etc., mientras que los enteros incluyen también los números negativos. Los números racionales son aquellos que pueden expresarse como fracciones de números enteros, e incluyen al cero.
pg_chameleon MySQL to PostgreSQL replica made easyFederico Campoli
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
Whether the user needs to setup a permanent replica between MySQL and PostgreSQL or perform an engine migration, pg_chamaleon is the perfect tool for the job.
The talk will cover the history the current implementation and the future releases.
The audience will learn how to setup a replica from MySQL to PostgreSQL in few easy steps. There will be also a coverage on the lessons learned during the tool’s development cycle.
Infrastructure as code might be literally impossible part 2ice799
The document discusses various issues with infrastructure as code including complexities that arise from software licenses, bugs, and inconsistencies across tools and platforms. Specific examples covered include problems with SSL and APT package management on Debian/Ubuntu, Linux networking configuration difficulties, and inconsistencies in Python packaging related to naming conventions for packages containing hyphens, underscores, or periods. Potential causes discussed include legacy code, lack of time for thorough testing and bug fixing, and economic pressures against developing fully working software systems.
Quickly re-publish CSV/TSV files from existing repositories as FAIR Data with just a few mouse clicks!
You select the columns to "project" as Linked Data, and the associated ontology terms. The FAIR Projector Builder will create a FAIR Projector for you: a Triple Pattern Fragment server to provide the Linked Data; a published DCAT Distribution containing metadata about those triples and their source; and an RML model (syntactic and semantic of the triples, to aid in third-party discovery of this novel projection.
(current status - first prototype, not ready for public consumption)
-------
Thanks to the NBDC/DBCLS for sponsoring the hackathon series.
MDW also funded by Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
The document discusses setting up high availability for PostgreSQL using Patroni for automated failover and pgBackRest for backups, providing configuration details for installing and configuring Patroni and pgBackRest on CentOS servers with an etcd distributed consensus store and Google Cloud Storage for backups.
The document provides a summary of the most common code changes needed when migrating Puppet code from version 3 to version 4. It outlines several key differences including:
1) Numbers being treated as numbers rather than strings, requiring file modes to be quoted.
2) Only undefined and false being treated as false, not empty strings.
3) Variable names needing to start with lowercase letters.
4) Other minor changes like hyphens not being allowed in class names and ERB variables requiring @ prefixes.
The document recommends using tools like puppet parser validate, puppet-lint, catalog_diff and catalog_preview to automatically check for issues when migrating code to Puppet 4. Overall,
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Most applications need a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance. Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.
Topics are:
– Challenges in developing a distributed, resilient data store
– Consensus, distributed transactions, distributed query optimization and execution
– The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB
The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.
Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.Peter Hurford
Vowpal Platypus is a general use, lightweight Python wrapper built on Vowpal Wabbit, that uses online learning to achieve great results. https://github.com/peterhurford/vowpal_platypus
The computer science behind a modern disributed data storeJ On The Beach
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary components which are everything else than trivial to combine, and, of course, even more challenging when heading for an acceptable performance.
Over the past years there has been significant progress in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores (ArangoDB, Cassandra, Cockroach and RethinkDB).
Similar to The hitchhiker's guide to PostgreSQL (8)
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Full-RAG: A modern architecture for hyper-personalization
The hitchhiker's guide to PostgreSQL
1. The hitchhiker’s guide to PostgreSQL
Federico Campoli
6 May 2016
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 1 / 43
2. Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 2 / 43
3. Don’t panic!
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 3 / 43
4. Don’t panic!
Don’t panic!
Copyright by Kreg Steppe - https://www.flickr.com/photos/spyndle/
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 4 / 43
6. The Ravenous Bugblatter Beast of Traal
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 6 / 43
7. The Ravenous Bugblatter Beast of Traal
The Ravenous Bugblatter Beast of Traal
Copyright by Federico Campoli
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 7 / 43
8. The Ravenous Bugblatter Beast of Traal
The data area
The PostgreSQL’s data area is the directory where the cluster stores the data on
durable storage.
Is referred by the environment variable $PGDATA on unix or %PGDATA% on
windows.
Inside the data area there are various folders.
We’ll take a look at the most important.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 8 / 43
9. The Ravenous Bugblatter Beast of Traal
The directory base
The default location when a new database is created without the TABLESPACE
clause.
Inside, for each database there is a folder with numeric names
An optional folder pgsql tmp is used for the external sorts
The base location is mapped as pg default in the pg tablespace system table
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 9 / 43
10. The Ravenous Bugblatter Beast of Traal
The directory base
Each database directory contains files with numeric names where postgres stores
the relation’s data.
The maximum size a data file can grow is 1 GB, a new file is generated with
a numerical suffix
Each data file is organised in fixed size pages of 8192 bytes
The data files are called file nodes. Their relationship with the logical
relations is stored in the pg class system table
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 10 / 43
11. The Ravenous Bugblatter Beast of Traal
The directory pg global
The directory pg global contains the data files used by the relations shared across
the cluster.
There is also a small 8kb file named pg control. This is a very critical file where
PostgreSQL stores the cluster’s vital data like the last checkpoint location.
A corrupted pg control prevents the cluster’s start.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 11 / 43
12. The Ravenous Bugblatter Beast of Traal
The directory pg tblspc
Contains the symbolic links to the tablespaces.
Very useful to spread tables and indices on different physical devices
Combined with the logical volume management can improve dramatically the
performance...
or drive the project to a complete failure
The objects tablespace location can be safely changed but this require an
exclusive lock on the affected object
the view pg tablespace maps the objects name and identifiers
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 12 / 43
13. The Ravenous Bugblatter Beast of Traal
The directory pg xlog
The write ahead logs are stored in this directory
Is probably the most important and critical directory in the cluster
Each WAL segment is 16 Mb
Each segment contains the records describing the tuple changed in the
volatile memory
In case of crash or an unclean shutdown the WAL are replayed to restore the
cluster’s consistent state
The number of segments is automatically managed by the database
Putting the location on a dedicated and high reliable device is vital for
performance and reliability
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 13 / 43
14. The Ravenous Bugblatter Beast of Traal
Data pages
Each block is structured almost the same, for tables and indices.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 14 / 43
15. The Ravenous Bugblatter Beast of Traal
Page header
Each page starts with a 24 bytes header followed by the tuple pointers. Those are
usually 4 bytes each and point the physical tuples are stored in the page’s end
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 15 / 43
16. The Ravenous Bugblatter Beast of Traal
The tuples
Now finally we can look to the physical tuples. For each tuple there is a 27 bytes
header. The numbers are the bytes used by the single values.
The user data can be either the data stream or a pointer to the out of line data
stream.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 16 / 43
17. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
TOAST, the best thing since sliced bread
TOAST is the acronym for The Oversized Attribute Storage Technique
The attribute is also known as field
The TOAST can store up to 1 GB in the out of line storage (and free of
charge)
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 17 / 43
18. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
Fixed length data types like integer, date, timestamp do not are not
TOASTable.
The data is stored after the tuple header.
Varlena data types as character varying without the upper bound, text or
bytea are stored in line or out of line.
The storage technique used depends from the data stream size, and the
storage method assigned to the attribute.
Depending from the storage strategy is possible to store the data in external
relations and/or compressed using the fast zlib algorithm.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 18 / 43
19. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
TOAST permits four storage strategies (shamelessly copied from the on line
manual).
PLAIN prevents either compression or out-of-line storage; This is the only
possible strategy for columns of non-TOAST-able data types.
EXTENDED allows both compression and out-of-line storage. This is the
default for most TOAST-able data types. Compression will be attempted
first, then out-of-line storage if the row is still too big.
EXTERNAL allows out-of-line storage but not compression. Use of
EXTERNAL will make substring operations on wide text and bytea columns
faster at the penalty of increased storage space.
MAIN allows compression but not out-of-line storage. Actually, out-of-line
storage will still be performed for such columns, but only as a last resort.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 19 / 43
20. The Ravenous Bugblatter Beast of Traal
TOAST with Marmite please
When the out of line storage is used the data is encoded in bytea and eventually
split in multiple chunks.
An unique index over the chunk id and chunk seq avoid either duplicate data and
speed up the lookups
Figure : Toast table
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 20 / 43
21. Time is an illusion. Lunchtime doubly so
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 21 / 43
22. Time is an illusion. Lunchtime doubly so
Time is an illusion. Lunchtime doubly so
Copyright by Federico Campoli
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 22 / 43
23. Time is an illusion. Lunchtime doubly so
The magic of the MVCC
t xmin contains the xid generated at tuple insert
t xmax contains the xid generated at tuple delete
t cid contains the internal command id to track the sequence inside the same
transaction
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 23 / 43
24. Time is an illusion. Lunchtime doubly so
The magic of the MVCC
The PostgreSQL’s consistency is achieved using MVCC which stands for Multi
Version Concurrency Control.
The base logic seems simple.
A 4 byte unsigned integer called xid is incremented by 1 and assigned to the
current transaction.
Every committed xid which value is lesser than the current xid is considered
in the past and then visible to the current transaction.
Every xid which value is greater than the current xid is in the future and then
invisible to the current transaction.
The commit status is managed in the $PGDATA using the directory pg clog
where small 8k files tracks the transaction statuses.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 24 / 43
25. Time is an illusion. Lunchtime doubly so
The magic of the MVCC
In this model there is no UPDATE field. Every time a row is updated it simply
generates a new version with the field t xmin set to the current XID value.
The old row is marked dead just by writing the same XID in the t xmax.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 25 / 43
26. The Pan Galactic Gargle Blaster
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 26 / 43
27. The Pan Galactic Gargle Blaster
The Pan Galactic Gargle Blaster
Copyright by Federico Campoli
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 27 / 43
28. The Pan Galactic Gargle Blaster
A little history
Back in the days, when the world was young, PostgreSQL memory was managed
by a simple MRU algorithm.
The new 8.x development cycle introduced a powerful algorithm called Adaptive
Replacement Cache (ARC) where two self adapting memory pools managed the
most recently used and most frequently used buffers.
Because a software patent on the algorithm shortly after the release 8.0 the buffer
manager was replaced by the two queue algorithm.
The release 8.1 adopted the clock sweep memory manager still in use in the latest
version because of its flexibility.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 28 / 43
29. The Pan Galactic Gargle Blaster
The clock sweep
The buffer manager’s main goal is to keep cached in memory the most recently
used blocks and adapt dynamically for the most frequently used blocks.
To do this a small memory portion is used as free list for the buffers available for
memory eviction.
Figure : Free list
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 29 / 43
30. The Pan Galactic Gargle Blaster
The clock sweep
The buffers have a reference counter which increase by one when the buffer is
pinned, up to a small value.
Figure : Block usage counter
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 30 / 43
31. The Pan Galactic Gargle Blaster
The clock sweep
Shamelessly copied from the file src/backend/storage/buffer/README
There is a ”free list” of buffers that are prime candidates for replacement. In
particular, buffers that are completely free (contain no valid page) are always in
this list.
To choose a victim buffer to recycle when there are no free buffers available, we
use a simple clock-sweep algorithm, which avoids the need to take system-wide
locks during common operations.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 31 / 43
32. The Pan Galactic Gargle Blaster
The clock sweep
It works like this:
Each buffer header contains a usage counter, which is incremented (up to a small
limit value) whenever the buffer is pinned. (This requires only the buffer header
spinlock, which would have to be taken anyway to increment the buffer reference
count, so it’s nearly free.)
The ”clock hand” is a buffer index, NextVictimBuffer, that moves circularly
through all the available buffers. NextVictimBuffer is protected by the
BufFreelistLock.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 32 / 43
33. The Pan Galactic Gargle Blaster
It’s bigger in the inside
The algorithm for a process that needs to obtain a victim buffer is:
1 Obtain BufFreelistLock.
2 If buffer free list is nonempty, remove its head buffer. If the buffer is pinned
or has a nonzero usage count, it cannot be used; ignore it and return to the
start of step 2. Otherwise, pin the buffer, release BufFreelistLock, and return
the buffer.
3 Otherwise, select the buffer pointed to by NextVictimBuffer, and circularly
advance NextVictimBuffer for next time.
4 If the selected buffer is pinned or has a nonzero usage count, it cannot be
used. Decrement its usage count (if nonzero) and return to step 3 to
examine the next buffer.
5 Pin the selected buffer, release BufFreelistLock, and return the buffer.
(Note that if the selected buffer is dirty, we will have to write it out before we can
recycle it; if someone else pins the buffer meanwhile we will have to give up and
try another buffer. This however is not a concern of the basic
select-a-victim-buffer algorithm.)
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 33 / 43
34. The Pan Galactic Gargle Blaster
It’s bigger in the inside
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 34 / 43
35. The Pan Galactic Gargle Blaster
It’s bigger in the inside
Since the version 8.3 the buffer manager have the ring buffer strategy.
Operations which require a large amount of buffers in memory, like VACUUM or
large tables sequential scans, have a dedicated 256kb ring buffer, small enough to
fit in the processor’s L2 cache.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 35 / 43
36. Mostly harmless
Table of contents
1 Don’t panic!
2 The Ravenous Bugblatter Beast of Traal
3 Time is an illusion. Lunchtime doubly so
4 The Pan Galactic Gargle Blaster
5 Mostly harmless
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 36 / 43
38. Mostly harmless
The XID wraparound failure
XID is a 4 byte unsigned integer.
Every 4 billions transactions the value wraps
PostgreSQL uses the modulo − 231
comparison method
For each value 2 billions XID are in the future and 2 billions in the past
When a xid’s age becomes too close to 2 billions VACUUM freezes the xmin
value to an hardcoded xid forever in the past
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 38 / 43
39. Mostly harmless
The XID wraparound failure
If for any reason an xid reaches 10 millions transactions from the wraparound
failure the database starts emitting scary messages
WARNING: database "mydb" must be vacuumed within 5770099 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".
If a xid’s age reaches 1 million transactions from the wraparound failure the
database simply shut down and can be started only in single user mode to perform
the VACUUM.
Anyway, the autovacuum deamon, even if turned off starts the required VACUUM
long before this catastrophic scenario happens.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 39 / 43
41. Mostly harmless
Boring legal stuff
All the images copyright is owned by the respective authors. The sources the
author’s attribution is provided with a link alongside with image.
The section’s titles are quotes from Douglas Adams hitchhiker’s guide to the
galaxy. No copyright infringement is intended.
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 41 / 43
42. Mostly harmless
Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 42 / 43
43. Mostly harmless
The hitchhiker’s guide to PostgreSQL
Federico Campoli
6 May 2016
Federico Campoli The hitchhiker’s guide to PostgreSQL 6 May 2016 43 / 43