This document summarizes the key features and changes in PostgreSQL version 8.4. It notes that over 1600 code updates and more than two dozen major features were added over 9 months of development and 5 CommitFests. Major new features include window functions, common table expressions, array_agg, per-database collations, and improved data types like unsigned integers and CIText. Performance and monitoring improvements include parallel restore, improved hash indexes, pg_stat_user_functions, and pg_stat_statements. The document also summarizes security, stored procedure, and exotic features like SQL/MED, multi-column GIN indexes, and Boyer-Moore string searching. It encourages testing and provides contact information for the
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pc_proctab is a collection of PostgreSQL stored functions that allow you to access the operating system process table using SQL. See examples on how to use these stored functions to collect processor and I/O statistics on SQL statements run against the database.
New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
Agenda:
- Introduction to Optimizer Hint
- Why Optimizer
- Hint Query
- Hint Statistics
- Hint Data
- Hint Drawback
By Kumar Rajiv Rastogi at India PG Day at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries. These stored functions currently only work on Linux-based systems.
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
pc_proctab is a collection of PostgreSQL stored functions that allow you to access the operating system process table using SQL. See examples on how to use these stored functions to collect processor and I/O statistics on SQL statements run against the database.
New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
Agenda:
- Introduction to Optimizer Hint
- Why Optimizer
- Hint Query
- Hint Statistics
- Hint Data
- Hint Drawback
By Kumar Rajiv Rastogi at India PG Day at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
About Flexible Indexing
Postgres’ rich variety of data structures and data-type specific indexes can be confusing for newer and experienced Postgres users alike who may be unsure when and how to use them. For example, gin indexing specializes in the rapid lookup of keys with many duplicates — an area where traditional btree indexes perform poorly. This is particularly useful for json and full text searching. GiST allows for efficient indexing of two-dimensional values and range types.
To listen to the recorded presentation with Bruce Momjian, visit Enterprisedb.com > Resources > Webcasts > Ondemand Webcasts.
For product information and subscriptions, please email sales@enterprisedb.com.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
PostgreSQL Procedural Languages: Tips, Tricks and GotchasJim Mlodgenski
One of the most powerful features of PostgreSQL is its diversity of procedural languages, but with that diversity comes a lot of options.
Did you ever wonder:
- What all of those options are on the CREATE FUNCTION statement?
- How do they affect my application?
- Does my choice of procedural language affect the performance of my statements?
- Should I create a single trigger with IF statements or several simple triggers?
- How do I debug my code?
- Can I tell which line in my function is taking all of the time?
Reproducible Computational Research in RSamuel Bosch
A short presentation with pointers on getting started with reproducible computational research in R. Some of the topics include git, R package development, document generation with R markdown, saving plots, saving tables and using packrat.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract: The processing of massive amount of data gives great insights into analysis for business. Many primary algorithms run over the data and gives information which can be used for business benefits and scientific research. Extraction and processing of large amount of data has become a primary concern in terms of time, processing power and cost. Map Reduce algorithm promises to address the above mentioned concerns. It makes computing of large sets of data considerably easy and flexible. The algorithm offers high scalability across many computing nodes. This session will introduce Map Reduce algorithm, followed by few variations of the same and also hands on example in Map Reduce using Apache Hadoop.
Speaker: Allahbaksh Asadullah is a Product Technology Lead from Infosys Labs, Bangalore. He has over 5 years of experience in software industry in various technologies. He has extensively worked on GWT, Eclipse Plugin development, Lucene, Solr, No SQL databases etc. He speaks at the developer events like ACM Compute, Indic Threads and Dev Camps.
The Query Optimizer is the “brain” of your Postgres database. It interprets SQL queries and determines the fastest method of execution. Using the EXPLAIN command , this presentation shows how the optimizer interprets queries and determines optimal execution.
This presentation will give you a better understanding of how Postgres optimally executes their queries and what steps you can take to understand and perhaps improve its behavior in your environment.
To listen to the webinar recording, please visit EnterpriseDB.com > Resources > Ondemand Webcasts
If you have any questions please email sales@enterprisedb.com
Postgres Vision 2018: Making Postgres Even FasterEDB
Andres Freund, a Senior Database Architect at EnterpriseDB, is one of the leading developers of PostgreSQL and his work has been influential in advancing the replication, performance, and scalability capabilities of Postgres. In this presentation he delivered at Postgres Vision 2018, Freund discusses JIT and general performance enhancements to Postgres and explains why PostgreSQL 11 will be the best option for application developers.
This presentation is for people who want to understand how PostgreSQL shares information among processes using shared memory. Topics covered include the internal data page format, usage of the shared buffers, locking methods, and various other shared memory data structures.
This presentation is about multitasking with std::future.
Presentation by Dmytro Gurin (Lead Software Engineer, GlobalLogic, Kyiv), delivered at GlobalLogic C++ TechTalk in Lviv, September 18, 2014.
More details -
http://www.globallogic.com.ua/press-releases/lviv-cpp-techtalk-coverage
About Flexible Indexing
Postgres’ rich variety of data structures and data-type specific indexes can be confusing for newer and experienced Postgres users alike who may be unsure when and how to use them. For example, gin indexing specializes in the rapid lookup of keys with many duplicates — an area where traditional btree indexes perform poorly. This is particularly useful for json and full text searching. GiST allows for efficient indexing of two-dimensional values and range types.
To listen to the recorded presentation with Bruce Momjian, visit Enterprisedb.com > Resources > Webcasts > Ondemand Webcasts.
For product information and subscriptions, please email sales@enterprisedb.com.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
PostgreSQL Procedural Languages: Tips, Tricks and GotchasJim Mlodgenski
One of the most powerful features of PostgreSQL is its diversity of procedural languages, but with that diversity comes a lot of options.
Did you ever wonder:
- What all of those options are on the CREATE FUNCTION statement?
- How do they affect my application?
- Does my choice of procedural language affect the performance of my statements?
- Should I create a single trigger with IF statements or several simple triggers?
- How do I debug my code?
- Can I tell which line in my function is taking all of the time?
Reproducible Computational Research in RSamuel Bosch
A short presentation with pointers on getting started with reproducible computational research in R. Some of the topics include git, R package development, document generation with R markdown, saving plots, saving tables and using packrat.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract: The processing of massive amount of data gives great insights into analysis for business. Many primary algorithms run over the data and gives information which can be used for business benefits and scientific research. Extraction and processing of large amount of data has become a primary concern in terms of time, processing power and cost. Map Reduce algorithm promises to address the above mentioned concerns. It makes computing of large sets of data considerably easy and flexible. The algorithm offers high scalability across many computing nodes. This session will introduce Map Reduce algorithm, followed by few variations of the same and also hands on example in Map Reduce using Apache Hadoop.
Speaker: Allahbaksh Asadullah is a Product Technology Lead from Infosys Labs, Bangalore. He has over 5 years of experience in software industry in various technologies. He has extensively worked on GWT, Eclipse Plugin development, Lucene, Solr, No SQL databases etc. He speaks at the developer events like ACM Compute, Indic Threads and Dev Camps.
The Query Optimizer is the “brain” of your Postgres database. It interprets SQL queries and determines the fastest method of execution. Using the EXPLAIN command , this presentation shows how the optimizer interprets queries and determines optimal execution.
This presentation will give you a better understanding of how Postgres optimally executes their queries and what steps you can take to understand and perhaps improve its behavior in your environment.
To listen to the webinar recording, please visit EnterpriseDB.com > Resources > Ondemand Webcasts
If you have any questions please email sales@enterprisedb.com
Postgres Vision 2018: Making Postgres Even FasterEDB
Andres Freund, a Senior Database Architect at EnterpriseDB, is one of the leading developers of PostgreSQL and his work has been influential in advancing the replication, performance, and scalability capabilities of Postgres. In this presentation he delivered at Postgres Vision 2018, Freund discusses JIT and general performance enhancements to Postgres and explains why PostgreSQL 11 will be the best option for application developers.
This presentation is for people who want to understand how PostgreSQL shares information among processes using shared memory. Topics covered include the internal data page format, usage of the shared buffers, locking methods, and various other shared memory data structures.
This presentation is about multitasking with std::future.
Presentation by Dmytro Gurin (Lead Software Engineer, GlobalLogic, Kyiv), delivered at GlobalLogic C++ TechTalk in Lviv, September 18, 2014.
More details -
http://www.globallogic.com.ua/press-releases/lviv-cpp-techtalk-coverage
Mark Wong
pg_proctab is a collection of PostgreSQL stored functions that provide access to the operating system process table using SQL. We'll show you which functions are available and where they collect the data, and give examples of their use to collect processor and I/O statistics on SQL queries.
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Ontico
The new PL profiler allows you to easily get through the dark barrier, PL/pgSQL puts between tools like pgbadger and the queries, you are looking for.
Query and schema tuning is tough enough by itself. But queries, buried many call levels deep in PL/pgSQL functions, make it torture. The reason is that the default monitoring tools like logs, pg_stat_activity and pg_stat_statements cannot penetrate into PL/pgSQL. All they report is that your query calling function X is slow. That is useful if function X has 20 lines of simple code. Not so useful if it calls other functions and the actual problem query is many call levels down in a dungeon of 100,000 lines of PL code.
Learn from the original author of PL/pgSQL and current maintainer of the plprofiler extension how you can easily analyze, what is going on inside your PL code.
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
The Apache Beam programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in the Google Cloud Dataflow runner and which path other Apache Beam runners link Apache Flink can follow to benefit from it.
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
http://flink-forward.org/kb_sessions/no-shard-left-behind-dynamic-work-rebalancing-in-apache-beam/
The Apache Beam (incubating) programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in Google Cloud Dataflow and which path other Apache Beam runners link Apache Flink can follow to benefit from it.
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
Spark SQL Catalyst optimizer, post query plan optimization, compiles the SQL query to Java code. Without code generation, such query expressions would have to be interpreted for each row of data, by walking down a tree of nodes. This introduces large amounts of branches and virtual function calls that slow down execution. With code generation, a query is collapsed into a single optimized function that eliminates multiple function calls and leverages CPU registers for intermediate data.
This code is then compiled in runtime to Java bytecode using Janino compiler. This presentation focuses on further catalyst code generation optimizations possible using function outlining. Automatic code generation tools generally tend to generate huge optimized functions. Large functions that are frequently executed might degrade runtime performance by preventing JVM optimizations such as function inlining. To avoid this, code generation tools should try to contain independent logic into separate functions.
This presentation will take the audience through the Spark Catalyst Code generation, how automatic split of large functions into smaller functions was achieved and the performance benefits associated with it
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
The pg_stat_monitor is the statistics collection tool based on PostgreSQL’s contrib module pg_stat_statements. PostgreSQL’s pg_stat_statements provides only basic statistics, which is sometimes not enough. The major shortcoming in pg_stat_statements is that it accumulates all the queries and statistics, but does not provide aggregated statistics or histogram information. In this case, a user needs to calculate the aggregate, which is quite expensive. Pg_stat_monitor provides the pre-calculated aggregates. pg_stat_monitor collects and aggregates data on a bucket basis. The size and number of buckets should be configured using GUC (Grand Unified Configuration). The buckets are used to collect the statistics and aggregate them in a bucket. The talk will cover the usage of pg_stat_monitor and how it is better than pg_stat_statements.
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...Big Data Spain
Performing ETL on big data can be slow, expensive and painful - but it doesn't have to be! In this session, we'll take an in-depth look at several real-world examples of computations that don't fit well with the SQL language model and how to solve them with user-defined functions in Google BigQuery.
Session presented at Big Data Spain 2014 Conference
18th Nov 2014
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/2014/conference/hands-on-with-bigquery-javascript-user-defined-functions
Parquet performance tuning: the missing guideRyan Blue
Ryan Blue explains how Netflix is building on Parquet to enhance its 40+ petabyte warehouse, combining Parquet’s features with Presto and Spark to boost ETL and interactive queries. Information about tuning Parquet is hard to find. Ryan shares what he’s learned, creating the missing guide you need.
Topics include:
* The tools and techniques Netflix uses to analyze Parquet tables
* How to spot common problems
* Recommendations for Parquet configuration settings to get the best performance out of your processing platform
* The impact of this work in speeding up applications like Netflix’s telemetry service and A/B testing platform
Updated version of my tutorial on how to give a great tech talk, this time without Ian Dees. New tutorial is longer thanks to longer talk slot. Mostly the extra time will be spent on exercises.
“PostgreSQL, Python and Squid” (otherwise known as, “using Python in PostgreSQL and PostgreSQL from Python”) presented at PyPgDay 2013 at PyCon 2013-Christophe Pettus
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
9. SQL Features
● Windowing Functions
● Common Table Expressions
● array_agg
● Per-database Collations
● New data types
– Unsigned Integers
– CIText
● Improved d commands
● Add columns to existing VIEWs
10. Windowing Functions
● Aggregate over part of the data
– SQL 2008 standard
– Great for BI, OLAP
● Functions:
– row_number()
– rank()
– lead()
– lag()
● More from David Fetter later!
11. Windowing Functions
SELECT
y,
m,
SUM(SUM(people)) OVER (PARTITION BY y ORDER BY m),
AVG(people)
FROM(
SELECT
EXTRACT(YEAR FROM accident_date) AS y,
EXTRACT(MONTH FROM accident_date) AS m,
*
FROM
accident SELECT
)s depname,
GROUP BY y, m; empno,
salary,
rank() OVER
(PARTITION BY depname
ORDER BY salary)
FROM
empsalary;
12. Common Table Expressions
● Ability to create "named subqueries" for your
query.
● Best use: WITH RECURSIVE
– real recursive queries
– "walk" trees with one query
● more from David Fetter later
13. Common Table Expressions
WITH RECURSIVE subdepartment AS
(
--
SELECT * FROM department WHERE id = 'A'
UNION ALL
-- recursive term referring to "subdepartment"
SELECT d.* FROM department AS d, subdepartment
AS sd
WHERE d.id = sd.parent_department
)
SELECT * FROM subdepartment;
14. array_agg
● History:
– added Arrays in 7.4
● array_accum() aggregate example code
– intarray contrib module in 8.0
● only ints, but very fast
● array_agg() in 8.4: all arrays, fast C Code
from Robert Haas, new contributor!
–
SELECT status, array_agg(username) FROM
logins GROUP BY status;
15. Per-Database Collations
● Collations (ordering character sets) used to be
per installation
● Now they are per database
● Someday they will be per column
● Google Summer of Code Project!
CREATE DATABASE mydb
COLLATE 'sv_se.UTF-8'
CTYPE 'sv_se.UTF-8'
TEMPLATE template0
16. New Data Types
● Make migrating from other DBMSes easier
● CIText (in /contrib)
– Case Insensitive Text
– Full CI indexing, comparisons
● Unsigned Integers (in pgFoundry)
– migrate from MySQL, others
17. Better d in psql
● d is now multi-version compatible
– dt etc. won't error if you connect an 8.4 client to an
8.2 database
● df for user functions only
– dfS for system functions
● ef to edit a funcion
18. Add columns to VIEWs
● In the bad old days:
– need to add another column to your VIEW?
– have to drop it & recreate it
– have to drop & recreate all dependancies
– enter the World Of Pain
● In 8.4:
– ALTER VIEW lets you add columns
– Can't rename or modify though
22. Improved Hash Indexes
● Our old hash indexes were slow and useless
● Improved hash indexes are fast!
– use them for ID columns
● or other unique keys
– not completely recovery-safe yet though
● don't switch over production DBs until 8.5
● Google Summer of Code project!
23. pg_stat_user_functions
● For each of your functions, see
– # of times called
– amount of time spent
– amount of time spent excluding other functions
26. pg_stat_statements
postgres=# SELECT * FROM pg_stat_statements ORDER BY total_time DESC
LIMIT 3;
-[ RECORD 1 ]------------------------------------------------------------
userid | 10
dbid | 63781
query | UPDATE branches SET bbalance = bbalance + $1 WHERE bid = $2;
calls | 3000
total_time | 20.716706
rows | 3000
-[ RECORD 2 ]------------------------------------------------------------
userid | 10
dbid | 63781
query | UPDATE tellers SET tbalance = tbalance + $1 WHERE tid = $2;
calls | 3000
total_time | 17.1107649999999
rows | 3000
-[ RECORD 3 ]------------------------------------------------------------
userid | 10
dbid | 63781
query | UPDATE accounts SET abalance = abalance + $1 WHERE aid = $2;
calls | 3000
total_time | 0.645601
rows | 3000
27. More DTrace Probes
* Probes to measure query time * Probes to measure checkpoint stats such as running time,
query-parse-start (int, char *) buffers written, xlog files added, removed, recycled, etc
query-parse-done (int, char *)
query-plan-start () checkpoint-start (int)
query-plan-done () checkpoint-done (int, int, int, int, int)
query-execute-start ()
query-execute-done () * Probes to measure Idle in Transaction and client/network
query-statement-start (int, char *) time
query-statement-done (int, char *) idle-transaction-start (int, int)
idle-transaction-done ()
* Probes to measure dirty buffer writes by the backend because * Probes to measure sort time
bgwriter is not effective sort-start (int, int, int, int, int)
sort-done (int, long)
dirty-buffer-write-start (int, int, int, int)
dirty-buffer-write-done (int, int, int, int)
* Probes to determine whether or not the deadlock detector
* Probes to measure physical writes from the shared buffer has found a deadlock
buffer-write-start (int, int, int, int)
buffer-write-done (int, int, int, int, int) deadlock-found ()
deadlock-notfound (int)
* Probes to measure reads of a relation from a particular buffer
block * Probes to measure reads/writes by block numbers and
buffer-read-start (int, int, int, int, int) relations
buffer-read-done (int, int, int, int, int, int) smgr-read-start (int, int, int, int)
smgr-read-end (int, int, int, int, int, int)
* Probes to measure the effectiveness of buffer caching smgr-write-start (int, int, int, int)
buffer-hit () smgr-write-end (int, int, int, int, int, int)
buffer-miss ()
* Probes to measure I/O time because wal_buffers is too small
wal-buffer-write-start ()
wal-buffer-write-done ()
28. auto_explain
● misnamed; actually allows you to manually set specific
queries/sessions/functions to output explain plans to the log
postgres=# LOAD 'auto_explain';
postgres=# SET auto_explain.log_min_duration = 0;
postgres=# SELECT count(*)
FROM pg_class, pg_index
WHERE oid = indrelid AND indisunique;
This might produce log output such as:
LOG: duration: 0.986 ms plan:
Aggregate (cost=14.90..14.91 rows=1 width=0)
-> Hash Join (cost=3.91..14.70 rows=81 width=0)
Hash Cond: (pg_class.oid = pg_index.indrelid)
-> Seq Scan on pg_class (cost=0.00..8.27 rows=227 width
-> Hash (cost=2.90..2.90 rows=81 width=4)
-> Seq Scan on pg_index (cost=0.00..2.90 rows=81
●
29. More Performance Improvements
● Free Space Map is dynamically sized (no more
max_fsm_pages!)
● Visibility Map
– VACUUM only changed pages
– Index-only Scans in 8.5
● Less writing to pgstat file
– plus you can move it
30. Stored Procedures
● Default Parameters
● Variadic Parameters
● New PL/pgSQL Statements
● PL/pythonU OUT Parameters
31. DEFAULT parameters
CREATE OR REPLACE FUNCTI ON
adder ) a i nt de f a ul t 4 0 ,
b i nt de f a ul t 2 (
RETURNS i nt LANGUAGE ' sql '
AS ' sel ect $ 1 + $ 2' ;
SELECT adder ) ( ;
SELECT adder ) 1( ;
SELECT adder ) 1, 2( ;
32. VARIADIC parameters
CREATE OR REPLACE FUNCTION
adder(VARIADIC v int[])
RETURNS int AS $$
DECLARE s int; i int;
BEGIN
s:=0;
FOR i IN SELECT generate_subscripts(v,1) LOOP
s := s + i;
END LOOP;
RETURN s;
END;
$$ LANGUAGE 'plpgsql';
SELECT adder(1);
SELECT adder(1,2,3);
SELECT adder(40,2);
33. New PL/PgSQL Statements
● RETURNS TABLE
– SQL-compliant alias for "SETOF"
● CASE statement
– real switching logic
CASE
WHEN x BETWEEN 0 AND 10 THEN
msg := 'value is between zero and ten';
WHEN x BETWEEN 11 AND 20 THEN
msg := 'value is between eleven and twenty';
END CASE;
34. PL/pythonU OUT Parameters
● You now can use IN, OUT and INOUT
parameters with PL/pythonU functions.
● That's it!
36. SQL/MED
● Foundation for connecting to external servers
– Future of PL/proxy and DBconnect
– Future of DBI-Connect
CREATE FOREIGN DATA WRAPPER pgsql LIBRARY
'pgsql_fdw';
CREATE SERVER foo FOREIGN DATA WRAPPER pgsql
OPTIONS (host 'remotehost', dbname 'remotedb');
CREATE USER MAPPING FOR PUBLIC SERVER foo OPTIONS
(username 'bob', password 'secret');
37. Multi-Column GIN Indexes
● Bad Old Days: to do a single Full Text Search
index over several columns, you had to
concatenate them.
● New Goodness: you can now do a proper
multicolumn index
– and it's faster!
41. Refactored SSL by Magnus
● Proper certificate verification
– Choose level, full verification is default
● Control over all key and certificate files
● SSL certificate authentication
– Trusted root certificate
– Map «cn» value of certificate
42. pg_hba Improvements
● "crypt" is gone (insecure)
● «ident sameuser» => «ident»
● New format for options
– name=value for all options
● usermaps for all external methods
– with regexp support
● Parsed on reload
43. Column Permissions
REVOKE SELECT (col1, col2), INSERT (col1, col2)
ON tab1 FROM role2;
● Restrict access to sensitive columns from
unprivileged ROLEs
– more fine-grained security
– no longer need to use VIEWs to do this
45. Many Patches == Lots of Testing
● Bug Testing
– can you make 8.4 crash?
● Specification Testing
– do the features do what the docs say they do?
● Performance Testing
– is 8.4 really faster? How much?
● Combinational Testing
– what happens when you put several new features
together?
46. Many Patches == Lots of Testing
1. Take a copy of your production applications
2. Port them to 8.4
3. Report breakage and issues
4. Play with implementing new features
Do It Now!
We're counting on you!
47. Contact Information
● Josh Berkus ● Upcoming events
– josh@postgresql.org – SCALE 7, Los
– http://it.toolbox.com/ Angeles, Feb. 20
blogs/database-soup – pgCon 2009, Ottawa,
May 20
This talk is copyright 2009 Josh Berkus, and is licensed under the Creative Commons Attribution License