SlideShare a Scribd company logo
SQL Joins
and Query Optimization

How it works


by Brian Gallagher
www.BrianGallagher.com
         G
Why
Wh JOIN?
 Combine data from multiple tables
 Reduces the number of queries needed
 Moves processing burden to database
  server
 Improves data integrity (over combining
  data in
  d t i program code)
                    d )
Types of JOINs
T      f JOIN
   INNER
        the most common, return all rows with matching records
                        ,                             g
        SELECT * FROM T1 INNER JOIN T2 ON T1.fld1 = T2.fld2
   LEFT or LEFT OUTER
        return all rows on the left (first) table and right (second)
        If no matching record on the right side, NULL-values for each field are returned
        SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1 fld1 = T2 fld2  T1.fld1   T2.fld2
   FULL or FULL OUTER
        return all rows on the left (first) table and right (second)
        If no matching record on the left or right side, NULL-values for each field are returned
        S
         SELECT * FROM T1 FULL OU
               C         O         U       OUTER JO JOIN T2 ON T1.fld1 = T2.fld2
                                                                O     . d        . d
   CROSS
        the least common, return all possible row combinations
        SELECT * FROM T1, T2
   RIGHT or RIGHT OUTER
      Don’t use them without a very good reason.
      They do not add any functionality over a LEFT JOIN and make code more confusing
      Works the same as the LEFT and LEFT OUTER JOINs, but the second table is the one
       which all rows are returned from. These two queries are functionally identical:
      SELECT * FROM T1 LEFT JOIN T2 ON T1.fld1 = T2.fld2
      SELECT * FROM T2 RIGHT JOIN T1 ON T1.fld1 = T2.fld2
Sample T t Data
S   l Test D t
   Sample Customer and Job records
INNER JOIN
LEFT (OUTER) JOIN
     (     )
One-To-Many LEFT JOIN
          y
RIGHT (OUTER) JOIN
      (     )
FULL (OUTER) JOIN
     (     )
FULL (OUTER) JOIN
 FULL JOIN is not supported in MS-Access
 Can be emulated using UNION queries


   They
    Th are rarely used
               l     d
CROSS JOINs

The Join you never use…
except every time
Making Big
M ki it Bi
   A CROSS JOIN combines every row on the left
    table with every row in the right table
   Resulting recordset will b th t t l of b th row
    R     lti       d t ill be the total f both
    counts multiplied together
     Leftrow has 10 000 records
                   10,000
     Right row has 30,000 records
     Resulting recordset has 300,000,000 records
   If no JOIN condition is specified, a CROSS JOIN
    will be executed
Joins
J i each record t th other t bl
       h      d to the th table
CROSS JOIN’s resulting set of all
        JOIN s
record combinations
How C
H   Cross J i M k I
          Joins Make Inner J i
                           Joins
   Tables are joined using a CROSS JOIN
   JOIN criteria is evaluated, removing all records
    which d ’t match th “ON” condition
      hi h don’t    t h the         diti
    SELECT *
    FROM Customers C
    INNER JOIN Job J
    ON C.CustID = J.CustID
   The remaining rows are the recordset returned
    for the INNER JOIN
Using “ON” criteria to select records
One-to-Many CROSS JOIN
          y
One to Many: Cross Join to Inner Join
Unpredictable R
U    di t bl Record O d
                  d Orders
   Databases are not required to return records in
    any particular order
   Records may be returned by the order they are
    stored on disk or may be unordered, depending
    on how the query engine handles data
   If you need data in a particular order, use an
              dd t i         ti l      d
    ORDER BY clause
   Some databases may return data presorted by
    Primary key or Clustered index
     DO NOT DEPEND on this behavior
     It is not reliably portable across different databases
     Do not make a program rely on a behavior not
      specified by the program – bad programming style
Indexes

Unique, clustered, etc.
What
Wh t are Indexes
         I d
 Indexes are a pre-sorted list of database
  field values
 This list speeds up queries A LOT
     An index is much smaller than the full
      recordset
     An index tracks only the fields specified when
      created
Indexes and P f
I d       d Performance
   Indexes GREATLY speed up queries on the
    fields being indexed
   Indexes SLIGHTLY slow d
    I d                  l down INSERTS
                                INSERTS,
    UPDATES and DELETES
     The indexes need to be updated anytime a record
      changes
     This adds a small bit of overhead to the system
   The increase in query speed usually greatly
    offsets the slight extra load on INSERTS,
    UPDATES and DELETES
     Most  database use is typically reading (instead of
      writing)
Types of Indexes
T      fI d
   Simple Index
     Indexesthe field specified
     Can have multiple simple indexes
     Speeds up queries using the indexed field
   Composite Index
     Indexescombinations of fields
     Can have multiple composite indexes
     Speeds up queries using the specified
      combination of fi ld
         bi ti     f fields
Simple Indexes
Si l I d
   An Index contains a list of all field values
    and pointers to the record with that value
        p
How Indexes help SELECTs
               p
Composite I d
C     it Indexes
   Useful only when you will be querying multiple
    fields simultaneously
   Most selective (resulting in the fewest records
    matching) fields should be listed first
   Slight performance gain over multiple Simple
    Indexes when used correctly
   No performance gain (possibly even a
    performance loss) when used incorrectly.
Composite Index
   p
Searching I d
S    hi Indexes
 There are high performance algorithms for
              high-performance
  searching pre-sorted data
I d
  Index data stored i t
         d t t d internally as bi
                           ll  binary t
                                      trees
  (typically)
 Searching through
  binary tree much
  faster than reading
  through table
Design Strategy

Make it work
Make it fast
Make it pretty
        p y
Focus on what’s needed
F         h t’     d d
   Make your query as specific as possible
     Makes  joins simpler (and therefore faster)
     Returns no unwanted data
     Increases response time
     Reduces network and server load

 Put as much as possible in the WHERE
  clause
 J i your fi ld properly
  Join     fields      l
Maximum S l ti it / S
M i     Selectivity Specificity
                        ifi it

 If differents parts of the WHERE clause
  will result in smaller data sets than other
  ones, list them the ones that will most
  g
  greatly reduce the data first
          y
 List the others in the same order
Optimizing Queries

Faster is Better
Horizontal P titi i
H i    t l Partitioning
 Avoid extremely “wide” records (records
  with lots of fields)
                     )
 Split data into two tables with a common
  ID field
 All the record data is rarely used in all
  queries,
  queries so it reduces traffic and speeds
  up processing and disk reads
Add I d
    Indexes
   Make sure all fields searched on regularly
    are indexed
Define Clustered Index
D fi Cl t d I d
 The most-likely criteria to be sorted on
  should be defined as your clustered index
                         y
 Clustered index defines the order the
  records will physically be stored on disk
Normalize D t
N    li Data
 No redundant data
 Link related data
 Don’t go crazy
Denormalize D t
D      li Data
   If joining more 4 tables or more in a single
    q y,
    query, consider de-normalizing data
                                   g
    (combining data from external tables back
    into the main table))
     Can  return faster query results
     Risks include having to manage duplicate
      data in database and larger disk usage
Size records to fit database page
size
   SQL Server 7.0, 2000, and 2005 data pages are
    8K (8192 bytes) in size.
     Of  the
          th 8192 available b t i each d t page, only
                       il bl bytes in    h data         l
      8060 bytes are used to store a row. The rest is used
      for overhead.
     If you have a row that was 4031 bytes long, then only
      one row could fit into a data page, and the remaining
      4029 bytes left in the page would be empty
              y              p g               py
          Making each record 1 byte shorter would halve the disk
           access required to read the table
Use Primary K
U aPi       Key
   Make sure you have a primary key defined
   Ideally, it should be a single unique field
     You can define composite keys, but they are
      generally not needed if the database is designed
      p p y
      properly
   Most tables should have a unique TableNameID
    field
     This allows you to identify any row by a single unique
      value
Use
U an IDENTITY C l
              Column
   If there is:
     no Primary Key on the table
     no unique index on the table
   Then
     Add   an IDENTITY (unique value) column to
      the table
     Optionally, index the IDENTITY column if you
      will query it regularly
Move TEXT, NTEXT, and IMAGE
data into table
 These types normally stored outside the
  table (uses a pointer to the data in the
         (      p
  field)
 If these types will be searched frequently
                                  frequently,
  consider moving them into the database
  table s
  table’s storage with:
    sp_tableoption 'tablename', 'text in row', 'on'
      or
    sp_tableoption 'tablename', 'text in row', 'size'
Consider R li ti i advance
C   id Replication in d
   If Replication is to be used, factor the
    decision into the original design of the
                          g         g
    database
Use Built-in Referential I t it
U B ilt i R f       ti l Integrity
 Use foreign keys and validation
  constraints built into the database
 Don’t manage it in the application


   Benefits:
     Faster execution
     Can’t mess it up with application errors
Re-test h
R t t when changing servers
            h   i
   Retest and rebenchmark applications when
    moving to a new server, such as:
     Development
     Staging
     Production
   Problems often caused b
    P bl      ft        d by:
     More rows in test data on servers “closer” to
      production
     Server configurations different
     Tables not indexed the same way
Constraints are F t
C   t i t       Fast
   Constraints on fields are faster than
     Triggers
     Rules
     Defaults
Don’t duplicate ff t
D ’t d li t effort
   Don’t check for the same thing twice
     (duh, but it happens)
     Don’t use a trigger and a constraint to do the
      same thing g
     Same for constraints and defaults
     Same for constraints and rules
Limit
Li it records t b JOIN d
           d to be JOINed
   Use WHERE clause to minimize rows to
    be JOINed.
     Particularly   in the OUTER table of an OUTER
     JOIN
Index F i K
I d Foreign Keys
 Fields in a table that are a foreign key are
  not indexed automatically j
                            y just because
  they are a foreign key.
 Add these indexes manually
Minimize Duplicate
Mi i i D li t JOIN fi ld d t
                   field data

   JOINs are slower when there are few
    different keys in the j
                y         joining table
                                g
JOIN on unique indexed fi ld
          i    i d   d fields
   JOINs will perform fastest when joining
    indexed fields with no duplicate data
                             p
JOIN Numeric Fi ld
     N    i Fields
   JOINs on numeric fields perform much
    faster than JOINs on other datatype fields.
                                    yp
JOIN th exact same d t t
     the    t      datatypes

   JOINs should be to the exact same
    datatype for best p
         yp           performance
     Same type of field
     Same field length
     Same encoding (ASCII vs. Unicode, etc)
Use
U ANSI JOIN syntax
               t
   Improves readability
   Less likely to cause programmer errors
   No Aliases:
    SELECT fname, lname, department
      FROM names INNER JOIN departments ON
      names.employeeid
      names employeeid = departments employeeid
                         departments.employeeid
   Microsoft syntax example:
    SELECT fname, lname, department
      FROM names departments
           names,
      WHERE names.employeeid = departments.employeeid
   Code is more portable between databases
Use Table Aliases
U T bl Ali
   Shortens code
   Makes it easier to follow, especially with long queries
   Identifies which table each field is coming from
   No table aliases:
    SELECT fname, lname, department
      FROM names INNER JOIN departments ON
      names.employeeid = departments.employeeid
   Microsoft syntax example:
    SELECT N.fname, N.lname, D.department
      FROM names N INNER JOIN departments D ON
      N.employeeid = D.employeeid
Don’t
D ’t use SELECT *
   Requires additional parsing on the server to
    extract field names
   Returns unneccesary data (unless you are
    actually using every field)
   Returns duplicate data on JOINs (do you really
    need two copies of the RecordID field?)
   Can cause errors (in some databases) if JOINed
    tables have fields with the same name
Store i separate fil in fil
St    in      t files i filegroup

 For very large joins placing tables to be
  j
  joined in separate p y
              p      physical files within the
  same filegroup can improve performance
 SQL Server can spawn separate threads
  for processing each file
Don’t
D ’t use CROSS JOINs
               JOIN
 Unless actually needed (rarely) do not use
  a CROSS JOIN (returns all combinations
                     (
  of records on both sides of JOIN)
 People sometimes will do a CROSS JOIN
  and then use DISTINCT and GROUP BY
  to eliminate all the duplication
     Don’t   do that.
JOINs
JOIN vs. Subquery
         S b
   Depending on the specifics, either could be faster. Write
    both and test performance to be sure which is best.
   JOIN:
    SELECT a.*
      FROM Table1 a INNER JOIN Table2 b
      ON a.Table1ID = b.Table1ID
      WHERE b.Table2ID = 'SomeValue'
   Subquery:
    SELECT a.* FROM Table1 a WHERE Table1ID IN
           a.
      (SELECT Table1ID FROM Table2 WHERE Table2ID =
      'SomeValue')
Avoid Subqueries unless needed
A id S b     i     l       d d

 Avoid subqueries unless actually required
 Most subqueries can be expressed as
  JOINs
 Subqueries are generally harder to read
  and understand (for humans) and,
  therefore,
  therefore maintain
Use an Indexed View
(Enterprise 2000 and later only)
 An Indexed View maintains an updated
  record of how the tables are joined via a
                               j
  clustered index
 This slows INSERTs, UPDATEs and
              INSERTs
  DELETEs a bit, so consider the tradeoff
  for the faster queries
Use Database’s Performance
    Database s
Optimization Tools
 Most major databases have methods for
  monitoring and optimizing database
             g     p      g
  performance
 Google “databaseName optimizing” for
            databaseName optimizing
  tons of links
Avoid
A id DISTINCT when possible
               h       ibl
   DISTINCT clauses are frequently (and usually
    unintentionally) used to hide an incorrect JOIN
   Properly normalized database will not frequently
    need DISTINCT clauses
   Look for incorrect JOINs or create more explicit
    WHERE clauses to avoid needing DISTINCT
   Of course, use it when appropriate
The End
               (for now)

Google: Query Optimization
for lots more information

More Related Content

What's hot

Sql subquery
Sql  subquerySql  subquery
Sql subquery
Raveena Thakur
 
SQL Queries
SQL QueriesSQL Queries
SQL Queries
Nilt1234
 
SQL practice questions - set 3
SQL practice questions - set 3SQL practice questions - set 3
SQL practice questions - set 3
Mohd Tousif
 
Presentation slides of Sequence Query Language (SQL)
Presentation slides of Sequence Query Language (SQL)Presentation slides of Sequence Query Language (SQL)
Presentation slides of Sequence Query Language (SQL)
Punjab University
 
SQL : introduction
SQL : introductionSQL : introduction
SQL : introduction
Shakila Mahjabin
 
Sql queries presentation
Sql queries presentationSql queries presentation
Sql queries presentation
NITISH KUMAR
 
SQL Joins With Examples | Edureka
SQL Joins With Examples | EdurekaSQL Joins With Examples | Edureka
SQL Joins With Examples | Edureka
Edureka!
 
set operators.pptx
set operators.pptxset operators.pptx
set operators.pptx
Anusha sivakumar
 
Comandos utilizados en sql
Comandos utilizados en sqlComandos utilizados en sql
Comandos utilizados en sqlByron Eras
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMSkoolkampus
 
Aggregate functions
Aggregate functionsAggregate functions
Aggregate functions
sinhacp
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Beat Signer
 
SQL JOINS
SQL JOINSSQL JOINS
SQL JOINS
Swapnali Pawar
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
jkeriaki
 
Sql joins
Sql joinsSql joins
Sql joins
Berkeley
 
SQL subquery
SQL subquerySQL subquery
SQL subquery
Vikas Gupta
 
Advanced Sql Training
Advanced Sql TrainingAdvanced Sql Training
Advanced Sql Training
bixxman
 
[APJ] Common Table Expressions (CTEs) in SQL
[APJ] Common Table Expressions (CTEs) in SQL[APJ] Common Table Expressions (CTEs) in SQL
[APJ] Common Table Expressions (CTEs) in SQL
EDB
 
Triggers
TriggersTriggers
Triggers
Pooja Dixit
 

What's hot (20)

Sql subquery
Sql  subquerySql  subquery
Sql subquery
 
SQL Queries
SQL QueriesSQL Queries
SQL Queries
 
SQL practice questions - set 3
SQL practice questions - set 3SQL practice questions - set 3
SQL practice questions - set 3
 
Presentation slides of Sequence Query Language (SQL)
Presentation slides of Sequence Query Language (SQL)Presentation slides of Sequence Query Language (SQL)
Presentation slides of Sequence Query Language (SQL)
 
SQL : introduction
SQL : introductionSQL : introduction
SQL : introduction
 
Sql queries presentation
Sql queries presentationSql queries presentation
Sql queries presentation
 
SQL Joins With Examples | Edureka
SQL Joins With Examples | EdurekaSQL Joins With Examples | Edureka
SQL Joins With Examples | Edureka
 
set operators.pptx
set operators.pptxset operators.pptx
set operators.pptx
 
Trigger
TriggerTrigger
Trigger
 
Comandos utilizados en sql
Comandos utilizados en sqlComandos utilizados en sql
Comandos utilizados en sql
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
 
Aggregate functions
Aggregate functionsAggregate functions
Aggregate functions
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
SQL JOINS
SQL JOINSSQL JOINS
SQL JOINS
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
Sql joins
Sql joinsSql joins
Sql joins
 
SQL subquery
SQL subquerySQL subquery
SQL subquery
 
Advanced Sql Training
Advanced Sql TrainingAdvanced Sql Training
Advanced Sql Training
 
[APJ] Common Table Expressions (CTEs) in SQL
[APJ] Common Table Expressions (CTEs) in SQL[APJ] Common Table Expressions (CTEs) in SQL
[APJ] Common Table Expressions (CTEs) in SQL
 
Triggers
TriggersTriggers
Triggers
 

Similar to SQL Joins and Query Optimization

Database Performance
Database PerformanceDatabase Performance
Database Performance
Boris Hristov
 
Sap abap
Sap abapSap abap
Sap abap
Jugul Crasta
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
BADR
 
Ms sql server tips 1 0
Ms sql server tips 1 0Ms sql server tips 1 0
Ms sql server tips 1 0
Arman Nasrollahi
 
San diegophp
San diegophpSan diegophp
San diegophp
Dave Stokes
 
CS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdfCS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdf
raji175286
 
Physical elements of data
Physical elements of dataPhysical elements of data
Physical elements of dataDimara Hakim
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Data structure
 Data structure Data structure
Data structure
Shahariar limon
 
SAP ABAP data dictionary
SAP ABAP data dictionarySAP ABAP data dictionary
SAP ABAP data dictionary
Revanth Nagaraju
 
0104 abap dictionary
0104 abap dictionary0104 abap dictionary
0104 abap dictionary
vkyecc1
 
初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務
Amazon Web Services
 
Etl2
Etl2Etl2
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
guest9912e5
 
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxStructured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Edu4Sure
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
eddiew
 

Similar to SQL Joins and Query Optimization (20)

Database Performance
Database PerformanceDatabase Performance
Database Performance
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
Sap abap
Sap abapSap abap
Sap abap
 
T-SQL Overview
T-SQL OverviewT-SQL Overview
T-SQL Overview
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
Ms sql server tips 1 0
Ms sql server tips 1 0Ms sql server tips 1 0
Ms sql server tips 1 0
 
Indexing
IndexingIndexing
Indexing
 
San diegophp
San diegophpSan diegophp
San diegophp
 
CS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdfCS8391-DATA-STRUCTURES.pdf
CS8391-DATA-STRUCTURES.pdf
 
Physical elements of data
Physical elements of dataPhysical elements of data
Physical elements of data
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Data structure
 Data structure Data structure
Data structure
 
SAP ABAP data dictionary
SAP ABAP data dictionarySAP ABAP data dictionary
SAP ABAP data dictionary
 
0104 abap dictionary
0104 abap dictionary0104 abap dictionary
0104 abap dictionary
 
初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務初探AWS 平台上的 NoSQL 雲端資料庫服務
初探AWS 平台上的 NoSQL 雲端資料庫服務
 
Etl2
Etl2Etl2
Etl2
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxStructured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptx
 
Indy pass writing efficient queries – part 1 - indexing
Indy pass   writing efficient queries – part 1 - indexingIndy pass   writing efficient queries – part 1 - indexing
Indy pass writing efficient queries – part 1 - indexing
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

SQL Joins and Query Optimization

  • 1. SQL Joins and Query Optimization How it works by Brian Gallagher www.BrianGallagher.com G
  • 2. Why Wh JOIN?  Combine data from multiple tables  Reduces the number of queries needed  Moves processing burden to database server  Improves data integrity (over combining data in d t i program code) d )
  • 3. Types of JOINs T f JOIN  INNER  the most common, return all rows with matching records , g  SELECT * FROM T1 INNER JOIN T2 ON T1.fld1 = T2.fld2  LEFT or LEFT OUTER  return all rows on the left (first) table and right (second)  If no matching record on the right side, NULL-values for each field are returned  SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1 fld1 = T2 fld2 T1.fld1 T2.fld2  FULL or FULL OUTER  return all rows on the left (first) table and right (second)  If no matching record on the left or right side, NULL-values for each field are returned  S SELECT * FROM T1 FULL OU C O U OUTER JO JOIN T2 ON T1.fld1 = T2.fld2 O . d . d  CROSS  the least common, return all possible row combinations  SELECT * FROM T1, T2  RIGHT or RIGHT OUTER  Don’t use them without a very good reason.  They do not add any functionality over a LEFT JOIN and make code more confusing  Works the same as the LEFT and LEFT OUTER JOINs, but the second table is the one which all rows are returned from. These two queries are functionally identical:  SELECT * FROM T1 LEFT JOIN T2 ON T1.fld1 = T2.fld2  SELECT * FROM T2 RIGHT JOIN T1 ON T1.fld1 = T2.fld2
  • 4. Sample T t Data S l Test D t  Sample Customer and Job records
  • 10. FULL (OUTER) JOIN  FULL JOIN is not supported in MS-Access  Can be emulated using UNION queries  They Th are rarely used l d
  • 11. CROSS JOINs The Join you never use… except every time
  • 12. Making Big M ki it Bi  A CROSS JOIN combines every row on the left table with every row in the right table  Resulting recordset will b th t t l of b th row R lti d t ill be the total f both counts multiplied together  Leftrow has 10 000 records 10,000  Right row has 30,000 records  Resulting recordset has 300,000,000 records  If no JOIN condition is specified, a CROSS JOIN will be executed
  • 13. Joins J i each record t th other t bl h d to the th table
  • 14. CROSS JOIN’s resulting set of all JOIN s record combinations
  • 15. How C H Cross J i M k I Joins Make Inner J i Joins  Tables are joined using a CROSS JOIN  JOIN criteria is evaluated, removing all records which d ’t match th “ON” condition hi h don’t t h the diti SELECT * FROM Customers C INNER JOIN Job J ON C.CustID = J.CustID  The remaining rows are the recordset returned for the INNER JOIN
  • 16. Using “ON” criteria to select records
  • 18. One to Many: Cross Join to Inner Join
  • 19. Unpredictable R U di t bl Record O d d Orders  Databases are not required to return records in any particular order  Records may be returned by the order they are stored on disk or may be unordered, depending on how the query engine handles data  If you need data in a particular order, use an dd t i ti l d ORDER BY clause  Some databases may return data presorted by Primary key or Clustered index  DO NOT DEPEND on this behavior  It is not reliably portable across different databases  Do not make a program rely on a behavior not specified by the program – bad programming style
  • 21. What Wh t are Indexes I d  Indexes are a pre-sorted list of database field values  This list speeds up queries A LOT  An index is much smaller than the full recordset  An index tracks only the fields specified when created
  • 22. Indexes and P f I d d Performance  Indexes GREATLY speed up queries on the fields being indexed  Indexes SLIGHTLY slow d I d l down INSERTS INSERTS, UPDATES and DELETES  The indexes need to be updated anytime a record changes  This adds a small bit of overhead to the system  The increase in query speed usually greatly offsets the slight extra load on INSERTS, UPDATES and DELETES  Most database use is typically reading (instead of writing)
  • 23. Types of Indexes T fI d  Simple Index  Indexesthe field specified  Can have multiple simple indexes  Speeds up queries using the indexed field  Composite Index  Indexescombinations of fields  Can have multiple composite indexes  Speeds up queries using the specified combination of fi ld bi ti f fields
  • 24. Simple Indexes Si l I d  An Index contains a list of all field values and pointers to the record with that value p
  • 25. How Indexes help SELECTs p
  • 26. Composite I d C it Indexes  Useful only when you will be querying multiple fields simultaneously  Most selective (resulting in the fewest records matching) fields should be listed first  Slight performance gain over multiple Simple Indexes when used correctly  No performance gain (possibly even a performance loss) when used incorrectly.
  • 28. Searching I d S hi Indexes  There are high performance algorithms for high-performance searching pre-sorted data I d Index data stored i t d t t d internally as bi ll binary t trees (typically)  Searching through binary tree much faster than reading through table
  • 29. Design Strategy Make it work Make it fast Make it pretty p y
  • 30. Focus on what’s needed F h t’ d d  Make your query as specific as possible  Makes joins simpler (and therefore faster)  Returns no unwanted data  Increases response time  Reduces network and server load  Put as much as possible in the WHERE clause  J i your fi ld properly Join fields l
  • 31. Maximum S l ti it / S M i Selectivity Specificity ifi it  If differents parts of the WHERE clause will result in smaller data sets than other ones, list them the ones that will most g greatly reduce the data first y  List the others in the same order
  • 33. Horizontal P titi i H i t l Partitioning  Avoid extremely “wide” records (records with lots of fields) )  Split data into two tables with a common ID field  All the record data is rarely used in all queries, queries so it reduces traffic and speeds up processing and disk reads
  • 34. Add I d Indexes  Make sure all fields searched on regularly are indexed
  • 35. Define Clustered Index D fi Cl t d I d  The most-likely criteria to be sorted on should be defined as your clustered index y  Clustered index defines the order the records will physically be stored on disk
  • 36. Normalize D t N li Data  No redundant data  Link related data  Don’t go crazy
  • 37. Denormalize D t D li Data  If joining more 4 tables or more in a single q y, query, consider de-normalizing data g (combining data from external tables back into the main table))  Can return faster query results  Risks include having to manage duplicate data in database and larger disk usage
  • 38. Size records to fit database page size  SQL Server 7.0, 2000, and 2005 data pages are 8K (8192 bytes) in size.  Of the th 8192 available b t i each d t page, only il bl bytes in h data l 8060 bytes are used to store a row. The rest is used for overhead.  If you have a row that was 4031 bytes long, then only one row could fit into a data page, and the remaining 4029 bytes left in the page would be empty y p g py  Making each record 1 byte shorter would halve the disk access required to read the table
  • 39. Use Primary K U aPi Key  Make sure you have a primary key defined  Ideally, it should be a single unique field  You can define composite keys, but they are generally not needed if the database is designed p p y properly  Most tables should have a unique TableNameID field  This allows you to identify any row by a single unique value
  • 40. Use U an IDENTITY C l Column  If there is:  no Primary Key on the table  no unique index on the table  Then  Add an IDENTITY (unique value) column to the table  Optionally, index the IDENTITY column if you will query it regularly
  • 41. Move TEXT, NTEXT, and IMAGE data into table  These types normally stored outside the table (uses a pointer to the data in the ( p field)  If these types will be searched frequently frequently, consider moving them into the database table s table’s storage with: sp_tableoption 'tablename', 'text in row', 'on' or sp_tableoption 'tablename', 'text in row', 'size'
  • 42. Consider R li ti i advance C id Replication in d  If Replication is to be used, factor the decision into the original design of the g g database
  • 43. Use Built-in Referential I t it U B ilt i R f ti l Integrity  Use foreign keys and validation constraints built into the database  Don’t manage it in the application  Benefits:  Faster execution  Can’t mess it up with application errors
  • 44. Re-test h R t t when changing servers h i  Retest and rebenchmark applications when moving to a new server, such as:  Development  Staging  Production  Problems often caused b P bl ft d by:  More rows in test data on servers “closer” to production  Server configurations different  Tables not indexed the same way
  • 45. Constraints are F t C t i t Fast  Constraints on fields are faster than  Triggers  Rules  Defaults
  • 46. Don’t duplicate ff t D ’t d li t effort  Don’t check for the same thing twice  (duh, but it happens)  Don’t use a trigger and a constraint to do the same thing g  Same for constraints and defaults  Same for constraints and rules
  • 47. Limit Li it records t b JOIN d d to be JOINed  Use WHERE clause to minimize rows to be JOINed.  Particularly in the OUTER table of an OUTER JOIN
  • 48. Index F i K I d Foreign Keys  Fields in a table that are a foreign key are not indexed automatically j y just because they are a foreign key.  Add these indexes manually
  • 49. Minimize Duplicate Mi i i D li t JOIN fi ld d t field data  JOINs are slower when there are few different keys in the j y joining table g
  • 50. JOIN on unique indexed fi ld i i d d fields  JOINs will perform fastest when joining indexed fields with no duplicate data p
  • 51. JOIN Numeric Fi ld N i Fields  JOINs on numeric fields perform much faster than JOINs on other datatype fields. yp
  • 52. JOIN th exact same d t t the t datatypes  JOINs should be to the exact same datatype for best p yp performance  Same type of field  Same field length  Same encoding (ASCII vs. Unicode, etc)
  • 53. Use U ANSI JOIN syntax t  Improves readability  Less likely to cause programmer errors  No Aliases: SELECT fname, lname, department FROM names INNER JOIN departments ON names.employeeid names employeeid = departments employeeid departments.employeeid  Microsoft syntax example: SELECT fname, lname, department FROM names departments names, WHERE names.employeeid = departments.employeeid  Code is more portable between databases
  • 54. Use Table Aliases U T bl Ali  Shortens code  Makes it easier to follow, especially with long queries  Identifies which table each field is coming from  No table aliases: SELECT fname, lname, department FROM names INNER JOIN departments ON names.employeeid = departments.employeeid  Microsoft syntax example: SELECT N.fname, N.lname, D.department FROM names N INNER JOIN departments D ON N.employeeid = D.employeeid
  • 55. Don’t D ’t use SELECT *  Requires additional parsing on the server to extract field names  Returns unneccesary data (unless you are actually using every field)  Returns duplicate data on JOINs (do you really need two copies of the RecordID field?)  Can cause errors (in some databases) if JOINed tables have fields with the same name
  • 56. Store i separate fil in fil St in t files i filegroup  For very large joins placing tables to be j joined in separate p y p physical files within the same filegroup can improve performance  SQL Server can spawn separate threads for processing each file
  • 57. Don’t D ’t use CROSS JOINs JOIN  Unless actually needed (rarely) do not use a CROSS JOIN (returns all combinations ( of records on both sides of JOIN)  People sometimes will do a CROSS JOIN and then use DISTINCT and GROUP BY to eliminate all the duplication  Don’t do that.
  • 58. JOINs JOIN vs. Subquery S b  Depending on the specifics, either could be faster. Write both and test performance to be sure which is best.  JOIN: SELECT a.* FROM Table1 a INNER JOIN Table2 b ON a.Table1ID = b.Table1ID WHERE b.Table2ID = 'SomeValue'  Subquery: SELECT a.* FROM Table1 a WHERE Table1ID IN a. (SELECT Table1ID FROM Table2 WHERE Table2ID = 'SomeValue')
  • 59. Avoid Subqueries unless needed A id S b i l d d  Avoid subqueries unless actually required  Most subqueries can be expressed as JOINs  Subqueries are generally harder to read and understand (for humans) and, therefore, therefore maintain
  • 60. Use an Indexed View (Enterprise 2000 and later only)  An Indexed View maintains an updated record of how the tables are joined via a j clustered index  This slows INSERTs, UPDATEs and INSERTs DELETEs a bit, so consider the tradeoff for the faster queries
  • 61. Use Database’s Performance Database s Optimization Tools  Most major databases have methods for monitoring and optimizing database g p g performance  Google “databaseName optimizing” for databaseName optimizing tons of links
  • 62. Avoid A id DISTINCT when possible h ibl  DISTINCT clauses are frequently (and usually unintentionally) used to hide an incorrect JOIN  Properly normalized database will not frequently need DISTINCT clauses  Look for incorrect JOINs or create more explicit WHERE clauses to avoid needing DISTINCT  Of course, use it when appropriate
  • 63. The End (for now) Google: Query Optimization for lots more information