PostgreSQL comes built-in with a variety of indexes, some of which are further extensible to build powerful new indexing schemes. But what are all these index types? What are some of the special features of these indexes? What are the size & performance tradeoffs? How do I know which ones are appropriate for my application?
Fortunately, this talk aims to answer all of these questions as we explore the whole family of PostgreSQL indexes: B-tree, expression, GiST (of all flavors), GIN and how they are used in theory and practice.
PostgreSQL (or Postgres) began its life in 1986 as POSTGRES, a research project of the University of California at Berkeley.
PostgreSQL isn't just relational, it's object-relational.it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
Advanced MySQL Query Tuning - talk at Percona Live and MySQL Meetup tour.
Tuning Queries and Schema/Indexes can significantly increase performance of your application and decrease response times.
This year I will cover new MySQL 5.6 and 5.7 algorithms that has been designed to improve query performance and simply tuning.
Topics:
1. Group by and order by optimizations
2. MySQL temporary tables and filesort
3. Using covered indexes to optimize your queries
4. Loose and tight index scan in MySQL
5. Using summary tables to optimize your reporting queries
6. New MySQL 5.6 and 5.7 Optimizer features and improvements
PostgreSQL (or Postgres) began its life in 1986 as POSTGRES, a research project of the University of California at Berkeley.
PostgreSQL isn't just relational, it's object-relational.it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
Advanced MySQL Query Tuning - talk at Percona Live and MySQL Meetup tour.
Tuning Queries and Schema/Indexes can significantly increase performance of your application and decrease response times.
This year I will cover new MySQL 5.6 and 5.7 algorithms that has been designed to improve query performance and simply tuning.
Topics:
1. Group by and order by optimizations
2. MySQL temporary tables and filesort
3. Using covered indexes to optimize your queries
4. Loose and tight index scan in MySQL
5. Using summary tables to optimize your reporting queries
6. New MySQL 5.6 and 5.7 Optimizer features and improvements
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlReactive.IO
Scaling a data-tier requires multiple concurrent database connections that are all vying for read and write access of the same data. In order to cater to this complex demand, PostgreSQL implements a concurrency method known as Multi Version Concurrency Control, or MVCC. By understating MVCC, you will be able to take advantage of advanced features such as transactional memory, atomic data isolation, and point in time consistent views.
This presentation will show you how MVCC works in both a theoretical and practical level. Furthermore, you will learn how to optimize common tasks such as database writes, vacuuming, and index maintenance. Afterwards, you will have a fundamental understanding on how PostgreSQL operates on your data.
Key points discussed:
* MVCC; what is really happening when I write data.
* Vacuuming; why it is needed and what is really going on.
* Transactions; much more then just an undo button.
* Isolation levels; seeing only the data you want to see.
* Locking; ensure writes happen in the order you choose.
* Cursors; how to stream chronologically correct data more efficiency.
SQL examples given during the presentation are available here: http://www.reactive.io/academy/presentations/postgresql/mvcc/mvcc-examples.zip
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
Replication in PostgreSQL tutorial given in Postgres Conference 2019Abbas Butt
The document describes describes what approaches are available for replication in PostgreSQL and how to use them. For each replication approach that is available in PostgreSQL the document describes how to configure a two node cluster using that approach and how to perform failover in case of a failure.
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
Efficient Query Processing in Geographic Web Search EnginesYen-Yu Chen
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
Spencer Christensen
There are many aspects to managing an RDBMS. Some of these are handled by an experienced DBA, but there are a good many things that any sys admin should be able to take care of if they know what to look for.
This presentation will cover basics of managing Postgres, including creating database clusters, overview of configuration, and logging. We will also look at tools to help monitor Postgres and keep an eye on what is going on. Some of the tools we will review are:
* pgtop
* pg_top
* pgfouine
* check_postgres.pl.
Check_postgres.pl is a great tool that can plug into your Nagios or Cacti monitoring systems, giving you even better visibility into your databases.
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlReactive.IO
Scaling a data-tier requires multiple concurrent database connections that are all vying for read and write access of the same data. In order to cater to this complex demand, PostgreSQL implements a concurrency method known as Multi Version Concurrency Control, or MVCC. By understating MVCC, you will be able to take advantage of advanced features such as transactional memory, atomic data isolation, and point in time consistent views.
This presentation will show you how MVCC works in both a theoretical and practical level. Furthermore, you will learn how to optimize common tasks such as database writes, vacuuming, and index maintenance. Afterwards, you will have a fundamental understanding on how PostgreSQL operates on your data.
Key points discussed:
* MVCC; what is really happening when I write data.
* Vacuuming; why it is needed and what is really going on.
* Transactions; much more then just an undo button.
* Isolation levels; seeing only the data you want to see.
* Locking; ensure writes happen in the order you choose.
* Cursors; how to stream chronologically correct data more efficiency.
SQL examples given during the presentation are available here: http://www.reactive.io/academy/presentations/postgresql/mvcc/mvcc-examples.zip
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
KNN-GiST indexes were added in PostgreSQL 9.1 and greatly accelerate some common queries in the geospatial and textual search realms. This presentation will demonstrate the power of KNN-GiST indexes on geospatial and text searching queries, but also their present limitations through some of my experimentations. I will also discuss some of the theory behind KNN (k-nearest neighbor) as well as some of the applications this feature can be applied too.
To see a version of the talk given at PostgresOpen 2011, please visit http://www.youtube.com/watch?v=N-MD08QqGEM
Replication in PostgreSQL tutorial given in Postgres Conference 2019Abbas Butt
The document describes describes what approaches are available for replication in PostgreSQL and how to use them. For each replication approach that is available in PostgreSQL the document describes how to configure a two node cluster using that approach and how to perform failover in case of a failure.
Talk for YOW! by Brendan Gregg. "Systems performance studies the performance of computing systems, including all physical components and the full software stack to help you find performance wins for your application and kernel. However, most of us are not performance or kernel engineers, and have limited time to study this topic. This talk summarizes the topic for everyone, touring six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Included are recipes for Linux performance analysis and tuning (using vmstat, mpstat, iostat, etc), overviews of complex areas including profiling (perf_events) and tracing (ftrace, bcc/BPF, and bpftrace/BPF), advice about what is and isn't important to learn, and case studies to see how it is applied. This talk is aimed at everyone: developers, operations, sysadmins, etc, and in any environment running Linux, bare metal or the cloud.
"
PostgreSQL is one of the most advanced relational databases. It offers superb replication capabilities. The most important features are: Streaming replication, Point-In-Time-Recovery, advanced monitoring, etc.
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
Presentation that I gave as a guest lecture for a summer intensive development course at nod coworking in Dallas, TX. The presentation targets beginning web developers with little, to no experience in databases, SQL, or PostgreSQL. I cover the creation of a database, creating records, reading/querying records, updating records, destroying records, joining tables, and a brief introduction to transactions.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
Efficient Query Processing in Geographic Web Search EnginesYen-Yu Chen
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
Congratulations: you've been selected to build an application that will manage whether or not the rooms for PGConf.EU are being occupied by a session!
On the surface, this sounds simple, but we will be managing the rooms of PGConf.EU, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the PGConf.EU website checking to see what availability each of the PGConf.EU rooms has.
To do this, we will explore the following PGConf.EU features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges
Indexes such as:
* GiST
* SP-Gist
* Common Table Expressions and Recursion
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all PGConf.EU attendees made possible by the innovation of PGConf.EU!
Ryan will expand on his popular blog series and drill down into the internals of the database. Ryan will discuss optimizing query performance, best indexing schemes, how to manage clustering (including meta and data nodes), the impact of IFQL on the database, the impact of cardinality on performance, TSI, and other internals that will help you architect better solutions around InfluxDB.
SQL-based databases have been around for decades and they power a wide range of applications. So what exactly do NoSQL databases bring to the table? In this webcast, you'll find out how NoSQL can liberate your development cycle, allow your application to scale and improve your system's uptime.
Nearly every application uses some sort of data storage. Proper data structure can lead to increased performance, reduced application complexity, and ensure data integrity. Foreign keys, indexes, and correct data types truly are your best friends when you respect them and use them for the correct purposes. Structuring data to be normalized and with the correct data types can lead to significant performance increases. Learn how to structure your tables to achieve normalization, performance, and integrity, by building a database from the ground up during this tutorial.
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...HappyDev
Появление большого количества NoSQL СУБД обусловлено требованиями современных информационных систем, которым большинство традиционных реляционных баз данных не удовлетворяет. Одним из таких требований является поддержка данных, структура которых заранее не определена. Однако при выборе NoSQL БД ради отсутствия схем данных можно потерять ряд преимуществ, которые дают зрелые SQL-решения, а именно: транзакции, скорость чтения строк из таблиц. PostgreSQL, являющаяся передовой реляционной СУБД, имела поддержку слабо-структурированных данных задолго до появления NoSQL, которая обрела новое дыхание в последнем релизе в виде типа данных jsonb, который не только поддерживает стандарт JSON, но и обладает производительностью, сравнимой или даже превосходящей наиболее популярные NoSQL СУБД.
Similar to Indexing Complex PostgreSQL Data Types (20)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. The heightened ease-of-use of AI/ML has lead to a surge of interested of storing vector data alongside application data, leading to some unique challenges. PostgreSQL has seen this story before with JSON, when JSON became the lingua franca of the web. So how can you use PostgreSQL to manage your vector data, and what challenges should you be aware of?
In this session, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector, an extension that adds additional vector search capabilities to PostgreSQL. Finally, we'll review ongoing development in both PostgreSQL and pgvector that will make it easier and more performant to search vector data in PostgreSQL.
There are parallels between storing JSON data in PostgreSQL and storing vectors that are produced from AI/ML systems. This lightning talk briefly covers the similarities in use-cases in storing JSON and vectors in PostgreSQL, shows some of the use-cases developers have for querying vectors in Postgres, and some roadmap items for improving PostgreSQL as a vector database.
This talk explores PostgreSQL 15 enhancements (along with some history) and looks at how they improve developer experience (MERGE and SQL/JSON), optimize support for backups and compression, logical replication improvements, enhanced security and performance, and more.
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
Congratulations: you've been selected to build an application that will manage reservations for rooms!
On the surface, this sounds simple, but you are building a system for managing a high traffic reservation web page, so we know that a lot of people will be accessing the system. Therefore, we need to ensure that the system can handle all of the eager users that will be flooding the website checking to see what availability each room has.
Fortunately, PostgreSQL is prepared for this! And even better, we will be using Postgres 14 to make the problem even easier!
We will explore the following PostgreSQL features:
* Data types and their functionality, such as:
* Data/Time types
* Ranges / Multirnages
Indexes such as:
* GiST
* Common Table Expressions and Recursion (though multiranges will make things easier!)
* Set generating functions and LATERAL queries
* Functions and the PL/PGSQL
* Triggers
* Logical decoding and streaming
We will be writing our application primary with SQL, though we will sneak in a little bit of Python and using Kafka to demonstrate the power of logical decoding.
At the end of the presentation, we will have a working application, and you will be happy knowing that you provided a wonderful user experience for all users made possible by the innovation of PostgreSQL!
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
Passwords: they just seem to work. You connect to your PostgreSQL database and you are prompted for your password. You type in the correct character combination, and presto! you're in, safe and sound.
But what if I told you that all was not as it seemed. What if I told you there was a better, safer way to use passwords with PostgreSQL? What if I told you it was imperative that you upgraded, too?
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
In this talk, we will look at:
* A history of the evolution of password storage and authentication in PostgreSQL
* How SCRAM works with a step-by-step deep dive into the algorithm (and convince you why you need to upgrade!)
* SCRAM channel binding, which helps prevent MITM attacks during authentication
* How to safely set and modify your passwords, as well as how to upgrade to SCRAM-SHA-256 (which we will do live!)
all of which will be explained by some adorable elephants and hippos!
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
PostgreSQL 10 introduced SCRAM (Salted Challenge Response Authentication Mechanism), introduced in RFC 5802, as a way to securely authenticate passwords. The SCRAM algorithm lets a client and server validate a password without ever sending the password, whether plaintext or a hashed form of it, to each other, using a series of cryptographic methods.
At the end of this talk, you will understand how SCRAM works, how to ensure your PostgreSQL drivers supports it, how to upgrade your passwords to using SCRAM-SHA-256, and why you want to tell other PostgreSQL password mechanisms to SCRAM!
Operating PostgreSQL at Scale with KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
All this sounds great, but if you are new to the world of containers, it can be very overwhelming to find a place to start. In this talk, which centers around demos, we will see how you can get PostgreSQL up and running in a containerized environment with some advanced sidecars in only a few steps! We will also see how it extends to a larger production environment with Kubernetes, and what the future holds for PostgreSQL in a containerized world.
We will cover the following:
* Why containers are important and what they mean for PostgreSQL
* Create a development environment with PostgreSQL, pgadmin4, monitoring, and more
* How to use Kubernetes to create your own "database-as-a-service"-like PostgreSQL environment
* Trends in the container world and how it will affect PostgreSQL
At the conclusion of the talk, you will understand the fundamentals of how to use container technologies with PostgreSQL and be on your way to running a containerized PostgreSQL environment at scale!
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL along with pgadmin4 and monitoring
- Running PostgreSQL on Kubernetes with a Demo
- Trends in the container world and how it will affect PostgreSQL
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
In this talk, we will cover the following:
- Why containers are important and what they mean for PostgreSQL
- Setting up and managing a PostgreSQL container
- Extending your setup with a pgadmin4 container
- Container orchestration: What this means, and how to use Kubernetes to leverage database-as-a-service with PostgreSQL
- Trends in the container world and how it will affect PostgreSQL
Developing and Deploying Apps with the Postgres FDWJonathan Katz
I couldn't wait to use the Postgres Foreign Data Wrapper (postgres_fdw) in a project; imagine being able to read and write data to many databases all from a single database! I finally found a project where it made sense to use this amazing technology.
I mapped out my architecture and began to code, and realized there were some things that did not work as expected: I could not call remote functions or insert into a table with a serial primary key and have it autoupdate. I found workarounds (which I will share), so the project went on.
We tested the setup, everything seemed to work well, and then we went to deploy to production. And then the real fun began.
Despite the title, I still love the Postgres FDW but wanted to provide some cautionary tales from a hybrid developer/DBA perspective on how to properly use them in your working environment. This talk will cover:
* Basic Postgres FDW setup in a development environment vs. production environment
* Handling some common FDW uses case that you think are trivial but are not
* Working with advanced Postgres constructs such as schemas and sequences with FDWs
* Putting it all together to make sure your production application is safe with your FDWs
* ...and when you really, really need to make a remote call and it is not supported by a FDW, how to do that too!
What's the great thing about a database? Why, it stores data of course! However, one feature that makes a database useful is the different data types that can be stored in it, and the breadth and sophistication of the data types in PostgreSQL is second-to-none, including some novel data types that do not exist in any other database software!
This talk will take an in-depth look at the special data types built right into PostgreSQL version 9.4, including:
* INET types
* UUIDs
* Geometries
* Arrays
* Ranges
* Document-based Data Types:
* Key-value store (hstore)
* JSON (text [JSON] & binary [JSONB])
We will also have some cleverly concocted examples to show how all of these data types can work together harmoniously.
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
All data is relational and can be represented through relational algebra, right? Perhaps, but there are other ways to represent data, and the PostgreSQL team continues to work on making it easier and more efficient to do so!
With the upcoming 9.4 release, PostgreSQL is introducing the "JSONB" data type which allows for fast, compressed, storage of JSON formatted data, and for quick retrieval. And JSONB comes with all the benefits of PostgreSQL, like its data durability, MVCC, and of course, access to all the other data types and features in PostgreSQL.
How fast is JSONB? How do we access data stored with this type? What can it do with the rest of PostgreSQL? What can't it do? How can we leverage this new data type and make PostgreSQL scale horizontally? Follow along with our presentation as we try to answer these questions.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. About
• Jonathan
S.
Katz
– CTO,
VenueBook
– Co-‐Organizer,
NYC
PostgreSQL
User
Group
– PGConf
NYC
2015
• Mar
25
-‐
27,
2015
• New
York
Marriott
Downtown
• http://nyc.pgconf.us
– @jkatz05
2
3. Quick
Overview
• Introductory
Talk
with
demos
and
fun
• B-‐trees
• GiST:
Generalized
Search
Trees
• GIN:
Generalized
Inverted
Index
• SP-‐GiST:
Space
Partitioned
Generalized
Search
Trees
3
4. Assumptions
• PostgreSQL
9.3+
• most
will
be
9.0+
• PostGIS
2.0+
• Believe
it
will
work
for
most
available
versions
• PostgreSQL
9.4
beta?
4
20. What
We
Learned
• Without
any
data
structure
around
search,
we
rely
on
"hope"
• Assumed
"unique
values"
and
"equality"
– would
have
to
scan
all
rows
otherwise
• …and
what
about:
– INSERT
– UPDATE
– DELETE
20
21. What
We
Need
• Need
a
data
structure
for
search
that:
– allows
efficient
lookups
– plays
nicely
with
disk
I/O
– does
not
take
too
long
for
updates
21
22. B-‐Trees
• "default"
index
• quick
traversal
to
leaf
nodes
• leaf
nodes
pre-‐sorted
• node
size
designed
to
fit
in
disk
block
size
– "degree"
of
nodes
has
max-‐size
• theoretical
performance
– reads:
O(log
n)
– writes:
O(log
n)
– space:
O(n)
22
23. B-‐Trees
and
PostgreSQL
• supports
– <=,
<,
=,
>,
>=
– BETWEEN,
IN
– IS
NOT
NULL,
IS
NULL
– LIKE
in
specific
case
of
‘plaintext%’
– ~
in
specific
case
of
‘^plaintext’
– ILIKE
and
~*
if
pattern
starts
with
nonalpha
characters
• does
not
support
• IS
NOT
DISTINCT
FROM
23
24. B-‐Trees
and
PostgreSQL
• data
types
supported
– any
data
type
with
all
the
equality
operators
defined
– number
types
• integer,
numeric,
decimal
– text
• char,
varchar,
text
– date
/
times
• timestamptz,
timestamp,
date,
time,
timetz,
interval
- arrays,
ranges
24
27. Demo
#1
Notes
• Index
maintenance
• VACUUM
–
"cleans
up"
after
writes
on
table
/
indexes
– ANALYZE
–
keeps
statistics
up-‐to-‐date
for
planner
!
VACUUM ANALYZE tablename;
!
• Good
idea
to
leave
autovacuum
on
27
28. Indexing
in
Production
• CREATE
INDEX
CONCURRENTLY
• REINDEX
– corruption,
bloat,
invalid
• FILLFACTOR
– 10
–
100
– default:
90
– strategy:
lower
%
::
write-‐activity
• TABLESPACE
• NULLS
LAST,
NULLS
FIRST
28
29. Demo
#2:
Partial
Indexes
29
CREATE INDEX indexname ON tablename (columnname)
WHERE somecondition;
30. Demo
#2
Notes
• Partial
Indexes
are
– good
if
known
to
query
limited
subset
of
table
– take
up
less
space
– allow
for
much
quicker
writes
• Like
all
good
things,
do
not
overuse
and
saturate
your
I/O
30
31. Unique
Indexes
• only
for
B-‐trees
• NULL
not
unique
• use
UNIQUE
constraints
–
automatically
create
indexes
!
CREATE TABLE foo (bar int UNIQUE);
-- or
CREATE UNIQUE INDEX foo_bar_idx ON foo (bar);
ALTER TABLE foo ADD CONSTRAINT a_unique
USING INDEX a_idx;
31
32. Multi-‐Column
Indexes
• Useful
for
– querying
two
columns
that
are
frequently
queried
together
– enforcing
UNIQUEness
across
columns
• n.b.
creating
UNIQUE
constraint
on
table
creates
UNIQUE
INDEX
• PostgreSQL
supports
– up
to
32
columns
– B-‐tree,
GiST,
GIN
• Be
careful
of
how
you
choose
initial
column
order!
32
33. Multi-‐Column
Indexes
33
CREATE INDEX multicolumn_idx ON
tablename (col1, col2);
!
!
!
CREATE UNIQUE INDEX pn_idx ON
phone_numbers (country_code, national_number)
WHERE extension IS NULL
34. Demo
#3
Notes
• Multi-‐column
indexes
can
be
– efficient
for
speed
+
space
– inefficient
with
performance
• Usage
depends
on
your
application
needs
34
35. Expression
Indexes
• can
index
on
expressions
to
speed
up
lookups
– e.g.
case
insensitive
email
addresses
– can
use
functions
or
scalars
• (x * y) / 100
• COALESCE(first_name, '') || ' ' ||
COALESCE(last_name, '')
• LOWER(email_address)
• tradeoff:
slower
writes
35
39. Geometric
Data
Types
CREATE TABLE points (coord point);
!
CREATE INDEX points_idx ON points (coord);
ERROR: data type point has no default
operator class for access method "btree"
HINT: You must specify an operator class
for the index or define a default operator
class for the data type.
39
40. GiST
• "generalized
search
tree"
• infrastructure
that
provides
template
to
create
arbitrary
indexing
schemes
– supports
concurrency,
logging,
searching
–
only
have
to
define
behavior
– user-‐defined
operator
class
• <<,
&<,
&>,
>>,
<<|,
&<|,
|&>,
|>>,
@>,
<@,
~=,
&&
– have
to
implement
functions
in
interface
• supports
lossless
+
lossy
indexes
• provides
support
for
"nearest-‐neighbor"
queries
–
"KNN-‐Gist"
40
CREATE INDEX points_coord_gist_idx ON points
USING gist(coord)
42. Demo
#5
Notes
• GiST
indexes
on
geometric
types
radically
speedup
reads
• Writes
are
slower
due
to
distance
calculation
• Index
size
can
be
very
big
42
43. PostGIS
• For
when
you
are
doing
real
things
with
shapes
43• (and
geographic
information
systems)
44. PostGIS
+
Indexes
• B-‐Tree?
• R-‐Tree?
• PostGIS
docs
do
not
recommend
using
just
an
R-‐Tree
index
• GiST
• overlaps!
containment!
• uses
a
combination
of
GiST
+
R-‐Tree
44
45. PostGIS
+
GiST
45
2-‐D
CREATE INDEX zipcodes_geom_gist_idx ON zipcodes
USING gist(geom);
N-‐D
(PostGIS
2.0+
CREATE INDEX zipcodes_geom_gist_idx ON zipcodes
USING gist(geom gist_geometry_ops_nd);
46. Example
-‐
USA
Zipcode
Boundaries
• 33,120
rows
• geom:
MultiPolygon
• 52MB
without
indexes
• With
geometry
GiST
+
integer
B-‐Tree:
869MB
46
47. What
Zipcode
Is
My
Office
In?
• Geocoded
Address
• Lat,Long
=
40.7356197,-‐73.9891102
• PostGIS:
POINT(-‐73.9891102
40.7356197)
• 4269
-‐
“SRID”
-‐
unique
ID
for
coordinate
system
definitions
47
SELECT zcta5ce10 AS zipcode
FROM zipcodes
WHERE ST_Contains(
geom, --MultiPolygon
ST_GeomFromText('POINT(-73.9891102 40.7356197)', 4269)
);
48. What
Zipcode
Is
My
Office
In?
• No
Index
48
Seq Scan on zipcodes (cost=0.00..15382.00 rows=1 width=6) (actual
time=64.780..5153.485 rows=1 loops=1)
Filter: ((geom &&
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry) AND
_st_contains(geom,
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry))
Rows Removed by Filter: 33119
Total runtime: 5153.505 ms
49. What
Zipcode
Is
My
Office
In?
• Here’s
the
GiST:
49
Index Scan using zipcodes_geom_gist on zipcodes (cost=0.28..8.54
rows=1 width=6) (actual time=0.120..0.207 rows=1 loops=1)
Index Cond: (geom &&
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)
Filter: _st_contains(geom,
'0101000020AD100000F648DE944D7F52C08CE54CC9285E4440'::geometry)
Rows Removed by Filter: 1
!
Total runtime: 0.235 ms
51. Full
Text
Search
• PostgreSQL
offers
full
text
search
with
the
tsearch2
engine
– algorithms
for
performing
FTS
– to_tsvector('english',
content)
@@
to_tsquery('irish
&
conference
|
meeting')
– provides
indexing
capabilities
for
efficient
search
51
52. Test
Data
Set
• Wikipedia
English
category
titles
–
all
1,823,644
that
I
downloaded
52
53. Full-‐Text
Search:
Basics
53
SELECT title
FROM category
WHERE
to_tsvector('english', title) @@ to_tsquery('united & kingdom’);
!
title
-----
Lists of railway stations in the United Kingdom
Political history of the United Kingdom
Military of the United Kingdom
United Kingdom constitution
Television channels in the United Kingdom
United Kingdom
Roman Catholic secondary schools in the United Kingdom
[results truncated]
!
!
QUERY PLAN
------------
Seq Scan on category (cost=0.00..49262.77 rows=46 width=29) (actual time=21.900..16809.890
rows=8810 loops=1)
Filter: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united & kingdom'::text))
Rows Removed by Filter: 1814834
!
Total runtime: 16811.108 ms
54. Full-‐Text
Search
+
GiST
54
CREATE INDEX category_title_gist_idx ON category
USING gist(to_tsvector('english', title));
!
SELECT title
FROM category
WHERE to_tsvector('english', title) @@ to_tsquery('united & kingdom');
QUERY PLAN
-------------
Bitmap Heap Scan on category (cost=4.77..182.47 rows=46 width=29) (actual time=75.517..180.650
rows=8810 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
-> Bitmap Index Scan on category_title_gist_idx (cost=0.00..4.76 rows=46 width=0) (actual
time=74.687..74.687 rows=8810 loops=1)
Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom’::text))
!
Total runtime: 181.354 ms
55. Full
Text
Search
+
GiST
• GiST
indexes
can
produce
false
positives
– "documents"
represented
by
fixed
length
signature
• words
are
hashed
into
single
bits
and
concatenated
– when
false
positive
occurs,
row
is
returned
and
checked
to
see
if
false
match
• Extra
validations
=
performance
degradation
55
56. Performance
Summary
with
GiST
• initial
index
build
takes
awhile
=>
slow
writes
• reads
are
quick
• Table
size:
271MB
• Index
size:
83MB
56
57. GIN
Index
• "generalized
inverted
index"
• supports
searching
within
composite
data
– arrays,
full-‐text
documents,
hstore
• key
is
stored
once
and
points
to
composites
it
is
contained
in
• like
GiST,
provides
index
infrastructure
to
extend
GIN
based
on
behavior
– supports
operators
<@,
@>,
=,
&&
• GIN
performance
⬄
log(#
unique
things)
57
58. Full
Text
Search
+
GIN
58
CREATE INDEX category_title_gin_idx ON category
USING gin(to_tsvector('english', title));
!
EXPLAIN ANALYZE SELECT title FROM category WHERE to_tsvector('english',
title) @@ to_tsquery('united & kingdom');
!
QUERY PLAN
-------
Bitmap Heap Scan on category (cost=28.36..206.06 rows=46 width=29) (actual time=8.864..14.674
rows=8810 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
-> Bitmap Index Scan on category_title_gin_idx (cost=0.00..28.35 rows=46 width=0) (actual
time=7.905..7.905 rows=8810 loops=1)
Index Cond: (to_tsvector('english'::regconfig, title) @@ to_tsquery('united &
kingdom'::text))
!
!
Total runtime: 15.157 ms
59. Performance
Summary
with
GIN
• index
build
was
much
quicker
• significant
speedup
from
no
index
– (12,000ms
=>
15ms)
• significant
speedup
from
GiST
– (181ms
=>
15ms)
• Table
size:
271MB
• Index
size:
• 9.3:
71MB
• 9.4
beta
1:
40MB
59
60. What
Was
Not
Discussed
• Word
density
– prior
to
9.3,
performance
issues
with
greater
word
density
• Type
of
text
data
–
phrases
vs
paragraphs
60
61. Full
Text
Search
–
GiST
vs
GIN
• Reads
– overall,
GIN
should
win
• Writes
– traditionally,
GiST
has
better
performance
for
writes
– GIN
• FASTUPDATE
• 9.4:
compression
61
62. Regular
Expression
Indexes
• Added
in
9.3
• Support
for
LIKE/ILIKE
wildcard
indexes
in
9.1
– title
LIKE
'%ab%e'
• Uses
pg_trgm
extension
+
GIN
!
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX category_title_regex_idx ON
category USING GIN(title gin_trgm_ops);
62
63. Regular
Expressions
-‐
No
Index
63
EXPLAIN ANALYZE SELECT title FROM category
WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$';
!
QUERY PLAN
----------
Seq Scan on category (cost=0.00..40144.55 rows=182 width=29) (actual
time=2.509..4260.792 rows=5878 loops=1)
Filter: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
Rows Removed by Filter: 1817766
!
Total runtime: 4261.204 ms
64. Regular
Expressions
-‐
Indexed
64
CREATE INDEX category_title_regex_idx ON category
USING gin(title gin_trgm_ops);
!
EXPLAIN ANALYZE SELECT title FROM category
WHERE title ~ '(([iI]sland(s)?)|([pP]eninsula))$';
QUERY PLAN
-----------
Bitmap Heap Scan on category (cost=197.41..871.77 rows=182 width=29) (actual
time=107.445..146.713 rows=5878 loops=1)
Recheck Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
Rows Removed by Index Recheck: 4712
-> Bitmap Index Scan on category_title_regex_idx (cost=0.00..197.37 rows=182 width=0) (actual
time=106.645..106.645 rows=10590 loops=1)
Index Cond: (title ~ '(([iI]sland(s)?)|([pP]eninsula))$'::text)
!
!
Total runtime: 147.026 ms
65. Range
Types
• stores
range
data
–
1
to
6
– 2013-‐10-‐29
–
2013-‐11-‐2
• easy-‐to-‐use
operators
to
check
inclusion,
overlaps
• built-‐in
types:
integers,
numerics,
dates,
timestamps
• extensible
65
66. Range
Type
Examples
66
--find all ranges that overlap with [100, 200)
!
SELECT * FROM ranges WHERE int4range(100, 200) && range;
!
range
-----------
[10,102)
[13,102)
[18,108)
[32,101)
[34,134)
[37,123)
[43,111)
[46,132)
[48,107)
[results trunctated]
!QUERY PLAN
-----------
Seq Scan on ranges (cost=0.00..14239.86 rows=7073 width=32) (actual time=0.018..185.411 rows=143 loops
Filter: ('[100,200)'::int4range && range)
Rows Removed by Filter: 999857
!
Total runtime: 185.439 ms
67. Range
Types
+
GiST
67
CREATE INDEX ranges_range_gist_idx ON ranges USING gist(range);
!
EXPLAIN ANALYZE SELECT * FROM ranges WHERE
int4range(100, 200) && range;
!
QUERY PLAN
------------
Bitmap Heap Scan on ranges (cost=5.29..463.10 rows=130 width=13)
(actual time=0.120..0.135 rows=144 loops=1)
Recheck Cond: ('[100,200)'::int4range && range)
-> Bitmap Index Scan on ranges_range_gist_idx (cost=0.00..5.26
rows=130 width=0) (actual time=0.109..0.109 rows=144 loops=1)
Index Cond: ('[100,200)'::int4range && range)
!
!
Total runtime: 0.168 ms
68. SP-‐GiST
• space-‐partitioned
generalized
search
tree
• ideal
for
non-‐balanced
data
structures
– k-‐d
trees,
quad-‐trees,
suffix
trees
– divides
search
space
into
partitions
of
unequal
size
• matching
partitioning
rule
=
fast
search
• traditionally
for
"in-‐memory"
transactions,
converted
to
play
nicely
with
I/O
68
69. Range
Types:
GiST
vs
SP-‐Gist
CREATE TABLE ranges AS
SELECT
int4range(
(random()*5)::int,
(random()*5)::int + 5
) AS range
FROM generate_series(1,<N>) x;
!
SELECT * FROM ranges WHERE range <operator>
int4range(3,6);
69
70. N
=
1,000,000
70
CREATE INDEX ranges_range_spgist_idx ON ranges USING spgist(range);
ERROR: unexpected spgdoinsert() failure
Fixed in 9.3.2
GiST
Used GiST
Time SP-Gist Used SP-GiST
Time= Yes 121 Yes 37
&& No 257
No 260
@> No 223 No 223
<@ Yes 163
Yes 111
<< Yes 95 Yes 5
>> Yes 95 Yes 25
&< No 184 No 185
&> No 203 No 203
71. Range
Types:
GiST
vs
SP-‐GiST
CREATE TABLE ranges AS
SELECT
int4range(x, x + (random()*5)::int + 5)
AS range
FROM generate_series(1,<N>) x;
71
72. N
=
250,000
72
GiST
Used GiST
Time SP-‐GiST
Used SP-‐GiST
Time
= Yes 0.5 Yes 0.7
&& Yes 0.3 Yes 0.3
@> Yes 0.3 Yes 0.3
<@ Yes 0.06 Yes 0.25
<< No 40 Yes 0.2
>> No 60 No 60
&< Yes 0.3 Yes 0.2
&> No 74 No 61
74. Integer
Arrays
74
CREATE UNLOGGED TABLE int_arrays AS
SELECT ARRAY[x, x + 1, x + 2] AS data
FROM generate_series(1,1000000) x;
!
CREATE INDEX int_arrays_data_idx ON
int_arrays (data);
!
CREATE INDEX int_arrays_data_gin_idx ON
int_arrays USING GIN(data);
75. B-‐Tree(?)
+
Integer
Arrays
75
EXPLAIN ANALYZE
SELECT *
FROM int_arrays
WHERE 5432 = ANY (data);
QUERY PLAN
-----------
Seq Scan on int_arrays (cost=0.00..30834.00 rows=5000 width=33) (actual
time=1.260..159.197 rows=3 loops=1)
Filter: (5432 = ANY (data))
Rows Removed by Filter: 999997
!
Total runtime: 159.222 ms
76. GIN
+
Integer
Arrays
76
EXPLAIN ANALYZE
SELECT *
FROM int_arrays
WHERE ARRAY[5432] <@ data;
QUERY PLAN
-----------
Bitmap Heap Scan on int_arrays (cost=70.75..7680.14 rows=5000 width=33)
(actual time=0.020..0.021 rows=3 loops=1)
Recheck Cond: ('{5432}'::integer[] <@ data)
-> Bitmap Index Scan on int_arrays_data_gin_idx (cost=0.00..69.50
rows=5000 width=0) (actual time=0.014..0.014 rows=3 loops=1)
Index Cond: ('{5432}'::integer[] <@ data)
!
Total runtime: 0.045 ms
77. Hash
Indexes
• only
work
with
"="
operator
• are
still
not
WAL
logged
as
of
9.4
beta
1
– not
crash
safe
– not
replicated
77
78. btree_gin
78
CREATE EXTENSION IF NOT EXISTS btree_gin;
!
CREATE UNLOGGED TABLE numbers AS
SELECT (random() * 2000)::int AS a FROM generate_series(1, 2000000) x;
!
CREATE INDEX numbers_gin_idx ON numbers USING gin(a);
!
EXPLAIN ANALYZE SELECT * FROM numbers WHERE a = 1000;
!
QUERY PLAN
------------
Bitmap Heap Scan on numbers (cost=113.50..9509.26 rows=10000 width=4) (actual
time=0.388..1.459 rows=991 loops=1)
Recheck Cond: (a = 1000)
-> Bitmap Index Scan on numbers_gin_idx (cost=0.00..111.00 rows=10000 width=0)
(actual time=0.232..0.232 rows=991 loops=1)
Index Cond: (a = 1000)
!
Total runtime: 1.563 ms
79. btree_gin
vs
btree
79
-- btree
SELECT pg_size_pretty(pg_total_relation_size('numbers_idx'));
pg_size_pretty
----------------
43 MB
!
!
!
-- GIN
SELECT pg_size_pretty(pg_total_relation_size('numbers_gin_idx'));
pg_size_pretty
----------------
16 MB
• Only
use
GIN
over
btree
if
you
have
a
lot
of
duplicate
entries
80. hstore
-‐
the
PostgreSQL
Key-‐Value
Store
80
CREATE EXTENSION IF NOT EXISTS hstore;
!
CREATE UNLOGGED TABLE keypairs AS
SELECT
(x || ' => ' || (x + (random() * 5)::int))::hstore AS data
FROM generate_series(1,1000000) x;
SELECT pg_size_pretty(pg_relation_size('keypairs'));
!
!
SELECT * FROM keypairs WHERE data ? ‘3';
data
----------
"3"=>"4"
!
EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3';
QUERY PLAN
-----------
Seq Scan on keypairs (cost=0.00..19135.06 rows=950 width=32) (actual time=0.065..208.808
rows=1 loops=1)
Filter: (data ? '3'::text)
Rows Removed by Filter: 999999
!
Total runtime: 208.825 ms
81. hstore
-‐
the
PostgreSQL
Key-‐Value
Store
81
CREATE INDEX keypairs_data_gin_idx ON keypairs
USING gin(data);
!
EXPLAIN ANALYZE SELECT * FROM keypairs WHERE data ? ‘3';
!
QUERY PLAN
-----------
Bitmap Heap Scan on keypairs (cost=27.75..2775.66 rows=1000 width=24) (actual
time=0.044..0.045 rows=1 loops=1)
Recheck Cond: (data ? '3'::text)
-> Bitmap Index Scan on keypairs_data_gin_idx (cost=0.00..27.50 rows=1000 width=0)
(actual time=0.039..0.039 rows=1 loops=1)
Index Cond: (data ? '3'::text)
!
Total runtime: 0.071 ms
82. JSONB:
Coming
in
9.4
82
INSERT INTO documents
SELECT row_to_json(ROW(x, x + 2, x + 3))::jsonb
FROM generate_series(1,1000000) x;
!
!
CREATE INDEX documents_data_gin_idx ON documents
USING gin(data jsonb_path_ops);
!
!
!
SELECT * FROM documents WHERE data @> '{ "f1": 10 }';
data
--------------------------------
{"f1": 10, "f2": 12, "f3": 13}
!
!
Execution time: 0.084 ms
83. Awesome
vs
WTF:
A
Note
On
Operator
Indexability
83
EXPLAIN ANALYZE SELECT * FROM documents WHERE data @> '{ "f1": 10 }';
!
QUERY PLAN
-----------
Bitmap Heap Scan on documents (cost=27.75..3082.65 rows=1000 width=66) (actual time=0.029..0.030
rows=1 loops=1)
Recheck Cond: (data @> '{"f1": 10}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on documents_data_gin_idx (cost=0.00..27.50 rows=1000 width=0) (actual
time=0.014..0.014 rows=1 loops=1)
Index Cond: (data @> '{"f1": 10}'::jsonb)
!
Execution time: 0.084 ms
!
EXPLAIN ANALYZE SELECT * FROM documents WHERE '{ "f1": 10 }' <@ data;
!
QUERY PLAN
-----------
Seq Scan on documents (cost=0.00..24846.00 rows=1000 width=66) (actual time=0.015..245.924
rows=1 loops=1)
Filter: ('{"f1": 10}'::jsonb <@ data)
Rows Removed by Filter: 999999
!
Execution time: 245.947 ms
84. For
More
Information…
• http://www.postgresql.org/docs/current/static/
indexes.html
• http://www.postgresql.org/docs/current/static/
gist.html
• http://www.postgresql.org/docs/current/static/
gin.html
• http://www.postgresql.org/docs/current/static/
spgist.html
• GiST
+
GIN
+
Full
Text
Search:
– http://www.postgresql.org/docs/current/static/textsearch-‐
indexes.html
84
85. Conclusion
• Postgres
has
*a
lot*
of
different
types
of
indexes,
and
variations
on
each
of
its
engines
• Extensions
make
use
of
PostgreSQL
indexes
– PostGIS
• Need
to
understand
where
index
usage
is
appropriate
in
your
application
85