SlideShare a Scribd company logo
4th of December 2014
Introduction to RDBMS Indexes
With PostgreSQL
by Clément Prévost
4th of December 2014
Introduction
● You DO need indexes
● This is how a basic index works
● Indexes are used in those cases
● Different indexes types for different problems
4th of December 2014
You DO need Indexes
● Disk access is AWFULLY slow
● RAM is limited
1000px by 12500px
4th of December 2014
From query to data
SQL Query Data
Parse Rewrite Plan
Metadata,
Rules,
...
Statistics,
Indexes,
...
Execute
4th of December 2014
The planner job
SELECT * FROM product WHERE price = 5000;
QUERY PLAN
-------------------------------------------------------------------------...
Hash Join (cost=131.45..9340.48 rows=1 width=17) ...
Hash Cond: (m.id = c.mid) ...
-> Bitmap Heap Scan on movie m (cost=108.25..9296.06 rows=5654 width=...
Recheck Cond: (year > 2010) ...
-> Bitmap Index Scan on movie_year_idx (cost=0.00..106.83 ...
Index Cond: (year > 2010) ...
-> Hash (cost=23.13..23.13 rows=6 width=4) (actual time=0.314..0.3 ...
...
4th of December 2014
The planner job
● Create an optimal execution plan that minimize disk
access
○ Input: declarative description of the query, Output: imperative way to retrieve the data.
○ Fetch this page then this page, then lookup this table using these ids, then call function f, then …
○ If the underlying table changes in volume, another plan may be optimal, the planner must adapt
● Doesn’t execute the final query
○ Brute-forcing every possible plan would be sub-optimal
○ Use statistics to access informations such as the table size, estimated row count, most found column
values, ...
● Cost based planners
○ Most used type of planner
○ Each function call, disk access, sort operation is associated to a cost
○ Compute a cost estimation for each plan and keep better performing plan
○ Allow the DBA to fine tune costs based on the database hardware (PostgreSQL)
4th of December 2014
The B-Tree index (Leaf nodes)
4th of December 2014
The B-Tree index (Structure)
4th of December 2014
Index features (planner PoV)
● Index access by ID
○ Fetch a single data row using the index structure
○ Efficient computationally and from a disk access point of view as soon as the table is big enough
○ Equality operator (=, IN, ...)
● Index range scan
○ Fetch a list of data pages based on an index
○ Use the index tree structure to find the first element, then follow the double linked list until the condition is
fulfilled
○ May be less efficient than a sequential scan !! Keep your stats up to date !
○ Range operators (<, >, <=, >=, BETWEEN, LIKE under certain ciscomstances, ...)
● Index only scan
○ Don’t fetch the data pages if not needed
○ The index contains the column data, fetching the table row may be irrelevant
○ Supported by most RDBMSs
4th of December 2014
Index features (developper PoV)
● Speed up where clauses
● Speed up joins
● May speed up order by + limit
● May speed up group by
4th of December 2014
Beware
● The filter expression must match the index content
○ The planner won’t be able to use an index if the content of the index does not match the query
○ Ex an index on column A may not be used in a query containing WHERE upper(A) = ‘B’. The planner
have to call upper many times to use the index, which may not be the most efficient plan
○ Use indexes on expressions if you can’t change the query to match the index content
● An index contains all the distinct values of your column
○ Index size matter ! The index is stored on data pages as well as the indexed data. If the cost of index
retrieval is enormous, the index may not be used at all
○ You can have access to index statistics to drop unused index that vampirize your disk space
● On insert and update, the RDBMS have to update your
index
○ The index have to be up to date at the end of every query, this is the “consistency” part of the RDBMS job
○ Using a transaction to wrap multiple updates/inserts allow the RDBMS to update the index only once at
the end of the transaction
4th of December 2014
Concatenated index
4th of December 2014
Concatenated index
● Efficient on specific queries only
○ Queries with multiple wheres or joins referencing all index columns
○ Individual column ordering is important
● Rule of thumb: if the first column is not used, so is the
index
○ Access on the second column is only made through the first column and so on
○ Consider switching the index column order
● A concatenated index is often bigger than the sum of 2
normal B-Tree indexes
○ Values from the second column are duplicated
○ Be sure to check index usage often to drop unused concatenated indexes
4th of December 2014
Common types of indexes
● B-Tree
○ Default on most RDBMS
○ Match the most use cases
● GIN
○ General inverted index
○ Allow indexing when the values are not atomic
○ It is an index structure storing a set of (key, posting list) pairs
● Hash
○ Ultra-fast on equality operator
○ Only support equality operator
● GiST
○ Generalized Search Tree
○ This is not an index, this is a framework to create index-like data structures
○ Extensible to new data types
○ http://www.postgresql.org/docs/9.3/static/gist-intro.html
● And many more
○ Any data structure that allow the RDBMS to avoid page fetch is considered an index
○ An index type can be specific to a RDBMS, a table structure, a column type, a business constraint, ...
4th of December 2014
Key things to remember
● Disk accesses are evil (even on SSD)
● The main goal is to find the most efficient plan
● Thinking like a query planner is the key to a well
indexed table
4th of December 2014
Thanks !
I would be pleased to answer any questions !
PS: This is how awesome use-the-index-luke.com is:
4th of December 2014
Questions ?
For online questions, please leave a comment on the article.
4th of December 2014
Join the community !
(in Paris)
Social networks :
● Follow us on Twitter : https://twitter.com/steamlearn
● Like us on Facebook : https://www.facebook.com/steamlearn
SteamLearn is an Inovia initiative : inovia.fr
You wish to be in the audience ? Join the meetup group!
http://www.meetup.com/Steam-Learn/
4th of December 2014
References
● http://www.postgresql.org/docs/9.3/static/internals.html
● http://use-the-index-luke.com/sql/table-of-contents
● https://news.ycombinator.com/item?id=702713
● http://www.postgresql.org/docs/9.3/static/indexes-types.
html

More Related Content

What's hot

Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...Javed Khan
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache HadoopKMS Technology
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page tableduvvuru madhuri
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquetManpreet Khurana
 
Analyzing Web Archives
Analyzing Web ArchivesAnalyzing Web Archives
Analyzing Web Archivesvinaygo
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSHarsh Thakkar
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Ted Dunning
 
Hardware implementation of page table
Hardware implementation of page table Hardware implementation of page table
Hardware implementation of page table Sukhraj Singh
 
Archivematica Community Profile: University of Texas, San Antonio by Julianna...
Archivematica Community Profile: University of Texas, San Antonio by Julianna...Archivematica Community Profile: University of Texas, San Antonio by Julianna...
Archivematica Community Profile: University of Texas, San Antonio by Julianna...Artefactual Systems - Archivematica
 
Types of Databases
Types of DatabasesTypes of Databases
Types of Databaseskedar2310
 

What's hot (20)

Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
Overview of Storage and Indexing ...
Overview of Storage and Indexing                                             ...Overview of Storage and Indexing                                             ...
Overview of Storage and Indexing ...
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
Data storage and indexing
Data storage and indexingData storage and indexing
Data storage and indexing
 
Databases
DatabasesDatabases
Databases
 
Apache HBase
Apache HBase  Apache HBase
Apache HBase
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page table
 
Bdam presentation on parquet
Bdam presentation on parquetBdam presentation on parquet
Bdam presentation on parquet
 
Analyzing Web Archives
Analyzing Web ArchivesAnalyzing Web Archives
Analyzing Web Archives
 
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisApache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
indexing and hashing
indexing and hashingindexing and hashing
indexing and hashing
 
Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
CCI DAY PRESENTATION
CCI DAY PRESENTATIONCCI DAY PRESENTATION
CCI DAY PRESENTATION
 
Hardware implementation of page table
Hardware implementation of page table Hardware implementation of page table
Hardware implementation of page table
 
Big Data - How important it is
Big Data - How important it isBig Data - How important it is
Big Data - How important it is
 
Archivematica Community Profile: University of Texas, San Antonio by Julianna...
Archivematica Community Profile: University of Texas, San Antonio by Julianna...Archivematica Community Profile: University of Texas, San Antonio by Julianna...
Archivematica Community Profile: University of Texas, San Antonio by Julianna...
 
Types of Databases
Types of DatabasesTypes of Databases
Types of Databases
 

Similar to Steam Learn: Introduction to RDBMS indexes

Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sqlSean Murphy
 
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...Command Prompt., Inc
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesIsaac Mosquera
 
Steam Learn: Introduction to NoSQL with MongoDB
Steam Learn: Introduction to NoSQL with MongoDBSteam Learn: Introduction to NoSQL with MongoDB
Steam Learn: Introduction to NoSQL with MongoDBinovia
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdfl235546
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingBADR
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netStephen Lorello
 
Quick overview on mongo db
Quick overview on mongo dbQuick overview on mongo db
Quick overview on mongo dbEman Mohamed
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
 
Making Postgres Central in Your Data Center
Making Postgres Central in Your Data CenterMaking Postgres Central in Your Data Center
Making Postgres Central in Your Data CenterEDB
 
Making.postgres.central.2015
Making.postgres.central.2015Making.postgres.central.2015
Making.postgres.central.2015EDB
 
Becoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search APIBecoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search APIcgmonroe
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAsLuis Marques
 
Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Tech in Asia ID
 

Similar to Steam Learn: Introduction to RDBMS indexes (20)

Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sql
 
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
PostgreSQL, Extensible to the Nth Degree: Functions, Languages, Types, Rules,...
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best Practices
 
Steam Learn: Introduction to NoSQL with MongoDB
Steam Learn: Introduction to NoSQL with MongoDBSteam Learn: Introduction to NoSQL with MongoDB
Steam Learn: Introduction to NoSQL with MongoDB
 
Database.pdf
Database.pdfDatabase.pdf
Database.pdf
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Hadoop
HadoopHadoop
Hadoop
 
Indexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .netIndexing, searching, and aggregation with redi search and .net
Indexing, searching, and aggregation with redi search and .net
 
Quick overview on mongo db
Quick overview on mongo dbQuick overview on mongo db
Quick overview on mongo db
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
The strength of a spatial database
The strength of a spatial databaseThe strength of a spatial database
The strength of a spatial database
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
Making Postgres Central in Your Data Center
Making Postgres Central in Your Data CenterMaking Postgres Central in Your Data Center
Making Postgres Central in Your Data Center
 
Making.postgres.central.2015
Making.postgres.central.2015Making.postgres.central.2015
Making.postgres.central.2015
 
Becoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search APIBecoming "Facet"-nated with Search API
Becoming "Facet"-nated with Search API
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)
 

More from inovia

10 tips for Redux at scale
10 tips for Redux at scale10 tips for Redux at scale
10 tips for Redux at scaleinovia
 
10 essentials steps for kafka streaming services
10 essentials steps for kafka streaming services10 essentials steps for kafka streaming services
10 essentials steps for kafka streaming servicesinovia
 
Redux at scale
Redux at scaleRedux at scale
Redux at scaleinovia
 
DocuSign's Road to react
DocuSign's Road to reactDocuSign's Road to react
DocuSign's Road to reactinovia
 
API Gateway: Nginx way
API Gateway: Nginx wayAPI Gateway: Nginx way
API Gateway: Nginx wayinovia
 
Kafka: meetup microservice
Kafka: meetup microserviceKafka: meetup microservice
Kafka: meetup microserviceinovia
 
Microservice: starting point
Microservice:  starting pointMicroservice:  starting point
Microservice: starting pointinovia
 
Correlation id (tid)
Correlation id (tid)Correlation id (tid)
Correlation id (tid)inovia
 
Meetic back end redesign - Meetup microservices
Meetic back end redesign - Meetup microservicesMeetic back end redesign - Meetup microservices
Meetic back end redesign - Meetup microservicesinovia
 
Security in microservices architectures
Security in microservices architecturesSecurity in microservices architectures
Security in microservices architecturesinovia
 
Building a Secure, Performant Network Fabric for Microservice Applications
Building a Secure, Performant Network Fabric for Microservice ApplicationsBuilding a Secure, Performant Network Fabric for Microservice Applications
Building a Secure, Performant Network Fabric for Microservice Applicationsinovia
 
Microservices vs SOA
Microservices vs SOAMicroservices vs SOA
Microservices vs SOAinovia
 
CQRS, an introduction by JC Bohin
CQRS, an introduction by JC BohinCQRS, an introduction by JC Bohin
CQRS, an introduction by JC Bohininovia
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Designinovia
 
Oauth2, open-id connect with microservices
Oauth2, open-id connect with microservicesOauth2, open-id connect with microservices
Oauth2, open-id connect with microservicesinovia
 
You probably don't need microservices
You probably don't need microservicesYou probably don't need microservices
You probably don't need microservicesinovia
 
Api Gateway - What's the use of an api gateway?
Api Gateway - What's the use of an api gateway?Api Gateway - What's the use of an api gateway?
Api Gateway - What's the use of an api gateway?inovia
 
Steam Learn: An introduction to Redis
Steam Learn: An introduction to RedisSteam Learn: An introduction to Redis
Steam Learn: An introduction to Redisinovia
 
Steam Learn: Speedrun et TAS
Steam Learn: Speedrun et TASSteam Learn: Speedrun et TAS
Steam Learn: Speedrun et TASinovia
 
Steam Learn: Asynchronous Javascript
Steam Learn: Asynchronous JavascriptSteam Learn: Asynchronous Javascript
Steam Learn: Asynchronous Javascriptinovia
 

More from inovia (20)

10 tips for Redux at scale
10 tips for Redux at scale10 tips for Redux at scale
10 tips for Redux at scale
 
10 essentials steps for kafka streaming services
10 essentials steps for kafka streaming services10 essentials steps for kafka streaming services
10 essentials steps for kafka streaming services
 
Redux at scale
Redux at scaleRedux at scale
Redux at scale
 
DocuSign's Road to react
DocuSign's Road to reactDocuSign's Road to react
DocuSign's Road to react
 
API Gateway: Nginx way
API Gateway: Nginx wayAPI Gateway: Nginx way
API Gateway: Nginx way
 
Kafka: meetup microservice
Kafka: meetup microserviceKafka: meetup microservice
Kafka: meetup microservice
 
Microservice: starting point
Microservice:  starting pointMicroservice:  starting point
Microservice: starting point
 
Correlation id (tid)
Correlation id (tid)Correlation id (tid)
Correlation id (tid)
 
Meetic back end redesign - Meetup microservices
Meetic back end redesign - Meetup microservicesMeetic back end redesign - Meetup microservices
Meetic back end redesign - Meetup microservices
 
Security in microservices architectures
Security in microservices architecturesSecurity in microservices architectures
Security in microservices architectures
 
Building a Secure, Performant Network Fabric for Microservice Applications
Building a Secure, Performant Network Fabric for Microservice ApplicationsBuilding a Secure, Performant Network Fabric for Microservice Applications
Building a Secure, Performant Network Fabric for Microservice Applications
 
Microservices vs SOA
Microservices vs SOAMicroservices vs SOA
Microservices vs SOA
 
CQRS, an introduction by JC Bohin
CQRS, an introduction by JC BohinCQRS, an introduction by JC Bohin
CQRS, an introduction by JC Bohin
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
 
Oauth2, open-id connect with microservices
Oauth2, open-id connect with microservicesOauth2, open-id connect with microservices
Oauth2, open-id connect with microservices
 
You probably don't need microservices
You probably don't need microservicesYou probably don't need microservices
You probably don't need microservices
 
Api Gateway - What's the use of an api gateway?
Api Gateway - What's the use of an api gateway?Api Gateway - What's the use of an api gateway?
Api Gateway - What's the use of an api gateway?
 
Steam Learn: An introduction to Redis
Steam Learn: An introduction to RedisSteam Learn: An introduction to Redis
Steam Learn: An introduction to Redis
 
Steam Learn: Speedrun et TAS
Steam Learn: Speedrun et TASSteam Learn: Speedrun et TAS
Steam Learn: Speedrun et TAS
 
Steam Learn: Asynchronous Javascript
Steam Learn: Asynchronous JavascriptSteam Learn: Asynchronous Javascript
Steam Learn: Asynchronous Javascript
 

Recently uploaded

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessWSO2
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of ProgrammingMatt Welsh
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisNeo4j
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageGlobus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptxGeorgi Kodinov
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobus
 

Recently uploaded (20)

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 

Steam Learn: Introduction to RDBMS indexes

  • 1. 4th of December 2014 Introduction to RDBMS Indexes With PostgreSQL by Clément Prévost
  • 2. 4th of December 2014 Introduction ● You DO need indexes ● This is how a basic index works ● Indexes are used in those cases ● Different indexes types for different problems
  • 3. 4th of December 2014 You DO need Indexes ● Disk access is AWFULLY slow ● RAM is limited 1000px by 12500px
  • 4. 4th of December 2014 From query to data SQL Query Data Parse Rewrite Plan Metadata, Rules, ... Statistics, Indexes, ... Execute
  • 5. 4th of December 2014 The planner job SELECT * FROM product WHERE price = 5000; QUERY PLAN -------------------------------------------------------------------------... Hash Join (cost=131.45..9340.48 rows=1 width=17) ... Hash Cond: (m.id = c.mid) ... -> Bitmap Heap Scan on movie m (cost=108.25..9296.06 rows=5654 width=... Recheck Cond: (year > 2010) ... -> Bitmap Index Scan on movie_year_idx (cost=0.00..106.83 ... Index Cond: (year > 2010) ... -> Hash (cost=23.13..23.13 rows=6 width=4) (actual time=0.314..0.3 ... ...
  • 6. 4th of December 2014 The planner job ● Create an optimal execution plan that minimize disk access ○ Input: declarative description of the query, Output: imperative way to retrieve the data. ○ Fetch this page then this page, then lookup this table using these ids, then call function f, then … ○ If the underlying table changes in volume, another plan may be optimal, the planner must adapt ● Doesn’t execute the final query ○ Brute-forcing every possible plan would be sub-optimal ○ Use statistics to access informations such as the table size, estimated row count, most found column values, ... ● Cost based planners ○ Most used type of planner ○ Each function call, disk access, sort operation is associated to a cost ○ Compute a cost estimation for each plan and keep better performing plan ○ Allow the DBA to fine tune costs based on the database hardware (PostgreSQL)
  • 7. 4th of December 2014 The B-Tree index (Leaf nodes)
  • 8. 4th of December 2014 The B-Tree index (Structure)
  • 9. 4th of December 2014 Index features (planner PoV) ● Index access by ID ○ Fetch a single data row using the index structure ○ Efficient computationally and from a disk access point of view as soon as the table is big enough ○ Equality operator (=, IN, ...) ● Index range scan ○ Fetch a list of data pages based on an index ○ Use the index tree structure to find the first element, then follow the double linked list until the condition is fulfilled ○ May be less efficient than a sequential scan !! Keep your stats up to date ! ○ Range operators (<, >, <=, >=, BETWEEN, LIKE under certain ciscomstances, ...) ● Index only scan ○ Don’t fetch the data pages if not needed ○ The index contains the column data, fetching the table row may be irrelevant ○ Supported by most RDBMSs
  • 10. 4th of December 2014 Index features (developper PoV) ● Speed up where clauses ● Speed up joins ● May speed up order by + limit ● May speed up group by
  • 11. 4th of December 2014 Beware ● The filter expression must match the index content ○ The planner won’t be able to use an index if the content of the index does not match the query ○ Ex an index on column A may not be used in a query containing WHERE upper(A) = ‘B’. The planner have to call upper many times to use the index, which may not be the most efficient plan ○ Use indexes on expressions if you can’t change the query to match the index content ● An index contains all the distinct values of your column ○ Index size matter ! The index is stored on data pages as well as the indexed data. If the cost of index retrieval is enormous, the index may not be used at all ○ You can have access to index statistics to drop unused index that vampirize your disk space ● On insert and update, the RDBMS have to update your index ○ The index have to be up to date at the end of every query, this is the “consistency” part of the RDBMS job ○ Using a transaction to wrap multiple updates/inserts allow the RDBMS to update the index only once at the end of the transaction
  • 12. 4th of December 2014 Concatenated index
  • 13. 4th of December 2014 Concatenated index ● Efficient on specific queries only ○ Queries with multiple wheres or joins referencing all index columns ○ Individual column ordering is important ● Rule of thumb: if the first column is not used, so is the index ○ Access on the second column is only made through the first column and so on ○ Consider switching the index column order ● A concatenated index is often bigger than the sum of 2 normal B-Tree indexes ○ Values from the second column are duplicated ○ Be sure to check index usage often to drop unused concatenated indexes
  • 14. 4th of December 2014 Common types of indexes ● B-Tree ○ Default on most RDBMS ○ Match the most use cases ● GIN ○ General inverted index ○ Allow indexing when the values are not atomic ○ It is an index structure storing a set of (key, posting list) pairs ● Hash ○ Ultra-fast on equality operator ○ Only support equality operator ● GiST ○ Generalized Search Tree ○ This is not an index, this is a framework to create index-like data structures ○ Extensible to new data types ○ http://www.postgresql.org/docs/9.3/static/gist-intro.html ● And many more ○ Any data structure that allow the RDBMS to avoid page fetch is considered an index ○ An index type can be specific to a RDBMS, a table structure, a column type, a business constraint, ...
  • 15. 4th of December 2014 Key things to remember ● Disk accesses are evil (even on SSD) ● The main goal is to find the most efficient plan ● Thinking like a query planner is the key to a well indexed table
  • 16. 4th of December 2014 Thanks ! I would be pleased to answer any questions ! PS: This is how awesome use-the-index-luke.com is:
  • 17. 4th of December 2014 Questions ? For online questions, please leave a comment on the article.
  • 18. 4th of December 2014 Join the community ! (in Paris) Social networks : ● Follow us on Twitter : https://twitter.com/steamlearn ● Like us on Facebook : https://www.facebook.com/steamlearn SteamLearn is an Inovia initiative : inovia.fr You wish to be in the audience ? Join the meetup group! http://www.meetup.com/Steam-Learn/
  • 19. 4th of December 2014 References ● http://www.postgresql.org/docs/9.3/static/internals.html ● http://use-the-index-luke.com/sql/table-of-contents ● https://news.ycombinator.com/item?id=702713 ● http://www.postgresql.org/docs/9.3/static/indexes-types. html