SlideShare a Scribd company logo
1 of 32
Download to read offline
ˇ
      Cteme EXPLAIN
         CSPUG, Praha


Tom´ˇ Vondra (tv@fuzzy.cz)
   as

Czech and Slovak PostgreSQL Users Group




             21.6.2011
Agenda




    K ˇemu slouˇ´ EXPLAIN a EXPLAIN ANALYZE?
      c        zı
    Jak funguje pl´nov´n´ jak se vyb´ a “optim´ln´ pl´n?
                  a   a ı,          ır´       a ı” a
    Z´kladn´ fyzick´ oper´tory : scany, joiny, ...
     a     ı       e     a
    Jak poznat ˇe je nˇco ˇpatnˇ?
               z      e s      e
    Dalˇ´ uˇiteˇn´ n´stroje.
       sı z c e a




                               T. Vondra (CSPUG)   ˇ
                                                   Cteme EXPLAIN
K ˇemu slouˇ´ EXPLAIN a EXPLAIN ANALYZE?
  c        zı



 SQL je deklarativn´ jazyk
                   ı
     SQL dotaz nen´ program, popisuje v´sledek (logick´ algebra).
                  ı                    y              a
     Existuje mnoho zp˚sob˚ jak dan´ dotaz vyhodnotit (fyzick´ algebra).
                      u u          y                         a
     Nalezen´ “optim´ln´
            ı       a ıho” zp˚sobu je starost´ datab´ze.
                             u               ı      a
     Optim´ln´ = nejm´nˇ n´roˇn´ na zdroje (CPU, I/O, pamˇˇ, ...)
          a ı        e e a c y                           et
     Z´vis´ na podm´ ach (poˇet uˇivatel˚, velikost work mem, ...).
      a ı          ınk´     c    z      u

 stupnˇ volnosti
      e
     access strategy (sequential scan, index scan, ...)
     join order
     join strategy (merge join, hash join, nested loop)
     aggregation strategy (plain, hash, sorted)




                              T. Vondra (CSPUG)   ˇ
                                                  Cteme EXPLAIN
Stromov´ struktura exekuˇn´ pl´nu
       a                c ıho a




   SELECT * FROM a JOIN b ON ( a . id = b . id ) LIMIT 100;




                                T. Vondra (CSPUG)   ˇ
                                                    Cteme EXPLAIN
V´poˇet ceny
 y c




    chci porovnat nˇkolik variant ˇeˇen´ a vybrat tu “nejlevnˇjˇ´
                   e              r s ı                      e sı”
    pˇıstup obvykl´ v (ne)line´rn´ programov´n´
     r´           y           a ım          a ı
    ze statistik se odhadne poˇet ˇ´dek
                              c ra
    s vyuˇit´ “cost” promˇnn´ch se spoˇte cena pl´nu
         z ım               e y         c        a
        seq page cost = 1.0
        random page cost = 4.0
        cpu tuple cost = 0.01
        cpu index tuple cost = 0.005
        cpu operator cost = 0.0025
        ...
    porovn´m ceny moˇnost´ vyberu tu s nejniˇˇ´ cenou
          a         z    ı,                 zsı




                             T. Vondra (CSPUG)   ˇ
                                                 Cteme EXPLAIN
Orientaˇn´ principy
       c ı




     I/O tradiˇnˇ dominuje - minimalizace I/O operac´
              c e                                   ı
     n´hodn´ I/O je n´roˇnˇjˇ´ neˇ sekvenˇn´ I/O
      a    e         a c e sı z          c ı
     minimalizace CPU operac´
                            ı
     nepouˇ´
          zıvat pˇıliˇ mnoho pamˇti
                 r´ s           e
     minimalizace toku dat
     preferovat niˇˇ´ startup nebo celkovou cenu (?)
                  zsı


      Cena je zhruba ˇas proporˇnˇ k sekvenˇn´
                     c         c e         c ımu naˇten´ str´nky z disku.
                                                   c ı      a




                             T. Vondra (CSPUG)   ˇ
                                                 Cteme EXPLAIN
Z´kladn´ fyzick´ oper´tory / pˇıstup k tabulce
 a     ı       e     a        r´

 sequential scan
     pˇeˇti vˇechny ˇ´dky tabulky (a aˇ pak filtruj)
      r c s         ra                z
     data (bloky) se ˇtou sekvenˇnˇ, kaˇd´ pr´vˇ 1x
                     c          c e    z y a e

 index scan
     najdi v indexu odkazy na odpov´ ıc´ ˇ´dky
                                   ıdaj´ ı ra
     z tabulky naˇti jen ty potˇebn´ bloky (i opakovanˇ)
                 c             r   e                  e
     kombinace sekvenˇn´ a n´hodn´ho I/O
                     c ıho  a    e

 bitmap index scan
     pˇeˇti listy indexu, vytvoˇ z nich bitmapu ˇ´dk˚
      r c                      r                ra u
     naˇti jen ty bloky tabulky pro kter´ je v bitmapˇ “1”
       c                                e            e
     sekvenˇn´ I/O ale “startup” cena (tvorba bitmapy)
           c ı
     moˇnost kombinace v´ index˚ (OR, AND)
       z                ıce    u
     flexibilnˇjˇ´ neˇ multi-column indexy
             e sı z


                             T. Vondra (CSPUG)   ˇ
                                                 Cteme EXPLAIN
Pˇıklad - vytvoˇen´ tabulky
 r´            r ı




 tabulka se 100.000 ˇ´dk˚
                    ra u
 CREATE TABLE tab ( id INT );

 INSERT INTO tab SELECT * FROM generate_series (1 ,100000);

 ANALYZE tab ;

 SELECT relpages , reltuples FROM pg_class
                            WHERE relname = ’ tab ’;

  relpages | reltuples
 -- - - - - - - - -+ - - - - - - - - - - -
            393 |            100000
 (1 row )




                                             T. Vondra (CSPUG)   ˇ
                                                                 Cteme EXPLAIN
Pˇıklad - sequential vs. index scan
 r´



 sekvenˇn´ sken
       c ı
 EXPLAIN SELECT * FROM tab WHERE id BETWEEN 1000 AND 2000;

                                                  QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Seq Scan on tab ( cost =0.00..1893.00 rows =927 width =4)
      Filter : (( id >= 1000) AND ( id <= 2000))



 index scan
 CREATE INDEX idx ON tab ( id );
 EXPLAIN ANALYZE SELECT * FROM tab WHERE id BETWEEN 1000 AND 2000;

                                                            QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Index Scan using idx on tab ( cost =0.00..39.54 rows =1014 width =4)
                                                            ( actual time =0.108..1.703 rows =1001 loops =1)
      Index Cond : (( id >= 1000) AND ( id <= 2000))
  Total runtime : 2.840 ms




                                                           T. Vondra (CSPUG)               ˇ
                                                                                           Cteme EXPLAIN
Pˇıklad - bitmap index scan
 r´




 bitmap index scan
 EXPLAIN SELECT * FROM tab WHERE ( id = 110 OR id = 130);

                                                              QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Bitmap Heap Scan on tab ( cost =8.53..16.14 rows =2 width =4)
      Recheck Cond : (( id = 110) OR ( id = 130))
      -> BitmapOr ( cost =8.53..8.53 rows =2 width =0)
                  -> Bitmap Index Scan on idx ( cost =0.00..4.27 rows =1 width =0)
                              Index Cond : ( id = 110)
                  -> Bitmap Index Scan on idx ( cost =0.00..4.27 rows =1 width =0)
                              Index Cond : ( id = 130)




                                                           T. Vondra (CSPUG)               ˇ
                                                                                           Cteme EXPLAIN
Join strategies




     nested loop
     hash join
     merge join




                   T. Vondra (CSPUG)   ˇ
                                       Cteme EXPLAIN
Nested loop




    velice jednoduch´ - v principu dvˇ vnoˇen´ smyˇky
                    y                e    r e     c
    pro vˇtˇ´ relace pomal´, ale rychle produkuje prvn´ ˇ´dek
         e sı             y                           ı ra
    jedin´ join pouˇiteln´ pro CROSS JOIN a non-equijoin podm´
         y         z     y                                   ınky
    vˇtˇinou je k vidˇn´ v OLTP syst´mech (pr´ce s mal´mi poˇty ˇ´dek)
     es              e ı            e        a        y     c ra


    FOR a IN vnejsi_relace
       FOR b IN vnitrni_relace
          RETURN (a,b) pokud splˇuje JOIN podm´nku
                                n             ı




                            T. Vondra (CSPUG)   ˇ
                                                Cteme EXPLAIN
Nested Loop



 CREATE TABLE vnejsi ( id INT , val INT UNIQUE );
 CREATE TABLE vnitrni ( id INT PRIMARY KEY );

 INSERT INTO vnejsi
 SELECT i , i +1 FROM generate_series (1 ,1000) s ( i );

 INSERT INTO vnitrni
 SELECT i FROM generate_series (1 ,1000) s ( i );




 EXPLAIN SELECT 1 FROM vnejsi , vnitrni
                 WHERE vnejsi . id = vnitrni . id
                   AND vnejsi . val = 10;

                                                                        QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Nested Loop ( cost =0.00..16.55 rows =1 width =0)
      -> Index Scan using vnejsi_val_key on vnejsi ( cost =0.00..8.27 rows =1 width =4)
                  Index Cond : ( val = 10)
      -> Index Scan using vnitrni_pkey on vnitrni ( cost =0.00..8.27 rows =1 width =4)
                  Index Cond : ( vnitrni . id = vnejsi . id )
 (5 rows )




                                                           T. Vondra (CSPUG)                ˇ
                                                                                            Cteme EXPLAIN
Merge Join




    setˇıd´ obˇ relace dle joinovac´ podm´
       r´ ı e                      ı     ınky (jen equijoin)
    potom ˇte ˇ´dek po ˇ´dku a posouv´ se kupˇedu
          c ra         ra            a       r
    nˇkdy jsou potˇeba rescany (duplicity ve vnˇjˇ´ tabulce)
     e            r                            e sı
    velmi rychl´ pro setˇıdˇn´ relace, jinak n´roˇn´ startup
               y        r´ e e                a c y
    vˇtˇinou k vidˇn´ v DSS/DWH syst´mech
     es           e ı               e




                             T. Vondra (CSPUG)   ˇ
                                                 Cteme EXPLAIN
Merge Join



 CREATE TABLE vnejsi ( id INT );
 CREATE TABLE vnitrni ( id INT );

 INSERT INTO vnejsi
 SELECT i FROM generate_series (1 ,100000) s ( i );

 INSERT INTO vnitrni
 SELECT i FROM generate_series (1 ,100000) s ( i );




 EXPLAIN SELECT 1 FROM vnejsi JOIN vnitrni USING ( id );

                                                                QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Merge Join ( cost = 1 9 3 95 . 6 4 . . 2 1 3 95 . 6 4 rows =100000 width =0)
      Merge Cond : ( vnejsi . id = vnitrni . id )
      -> Sort ( cost =96 97. 82.. 994 7.8 2 rows =100000 width =4)
                  Sort Key : vnejsi . id
                  -> Seq Scan on vnejsi ( cost =0.00..1393.00 rows =100000 width =4)
      -> Sort ( cost =96 97. 82.. 994 7.8 2 rows =100000 width =4)
                  Sort Key : vnitrni . id
                  -> Seq Scan on vnitrni ( cost =0.00..1393.00 rows =100000 width =4)




                                                           T. Vondra (CSPUG)                ˇ
                                                                                            Cteme EXPLAIN
Hash Join


     1    naˇti menˇ´ (vnitˇn´ relaci a vygeneruj z n´ hash tabulku (pˇes join kl´c)
            c      sı      r ı)                      ı                r          ıˇ
     2    ˇti vnˇjˇ´ tabulku a vyhled´vej v hash tabulce pˇed hash kl´ce
          c     e sı                 a                    r          ıˇ


 CREATE TABLE vnejsi ( id INT );
 CREATE TABLE vnitrni ( id INT );

 INSERT INTO vnejsi SELECT i FROM generate_series (1 ,100000) s ( i );
 INSERT INTO vnitrni SELECT i FROM generate_series (1 ,100000) s ( i );




 EXPLAIN SELECT 1 FROM vnejsi_tabulka JOIN vnitrni_tabulka USING ( id );

                                                                QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Hash Join ( cost =29 85. 00.. 702 9.0 0 rows =100000 width =0)
      Hash Cond : ( vnejsi . id = vnitrni . id )
      -> Seq Scan on vnejsi ( cost =0.00..1393.00 rows =100000 width =4)
      -> Hash ( cost =13 93. 00.. 139 3.0 0 rows =100000 width =4)
                  -> Seq Scan on vnitrni ( cost =0.00..1393.00 rows =100000 width =4)
 (5 rows )




                                                           T. Vondra (CSPUG)                ˇ
                                                                                            Cteme EXPLAIN
Hash Join / batches

 Co kdyˇ se hash tabulka nevejde to pamˇti (work mem)?
       z                               e
     1    rozdˇl menˇ´ tabulku na ˇ´sti, aby se tabulka do pamˇti veˇla
              e     sı            ca                            e    s
     2    pro kaˇdou ˇ´st sestav tabulku a proved
                z    ca                          ˇ join s “velkou” tabulkou
     3    m´nˇ efektivn´ (opakovan´ ˇten´ vnˇjˇ´ tabulky)
           e e         ı          e c ı e sı
     4    pozn´ se dle “batches” v pl´nu
              a                      a


 EXPLAIN ANALYZE SELECT 1 FROM vnejsi_tabulka JOIN vnitrni_tabulka USING ( id );

                                                                        QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -   ...
  Hash Join ( cost =29 85. 00.. 702 9.0 0 rows =100000 width =0) ( actual time =277.886..792                                                                        ...
      Hash Cond : ( vnejsi . id = vnitrni . id )
      -> Seq Scan on vnejsi ( cost =0.00..1393.00 rows =100000 width =4) ( actual time =                                                                            ...
      -> Hash ( cost =13 93. 00.. 139 3.0 0 rows =100000 width =4) ( actual time =277.836..27                                                                       ...
                  Buckets : 8192 Batches : 4 Memory Usage : 589 kB
                  -> Seq Scan on vnitrni ( cost =0.00..1393.00 rows =100000 width =4) ( actua                                                                       ...
  Total runtime : 900.664 ms
 (7 rows )




          zvyˇte work mem (ˇ´ m´nˇ batch˚, t´ vˇtˇinou l´pe)
             s             cım e e      u ım e s        e


                                                           T. Vondra (CSPUG)                ˇ
                                                                                            Cteme EXPLAIN
Srovn´n´ join metod
     a ı


 Nested Loop
    ˇpatnˇ funguje pro dvˇ velk´ relace
    s    e               e     e
     ide´ln´ pro malou vnˇjˇ´ relaci + rychl´ dotaz do vnitˇn´ (index scan)
        a ı              e sı               y              r ı
     jedin´ metoda pro non-equijoin :-(
          a

 Merge Join
     ide´ln´ pro jiˇ setˇıdˇn´ relace (napˇ. CLUSTER + index scan)
        a ı        z    r´ e e            r
     pokud vyˇaduje extra tˇıdˇn´ probl´m (hlavnˇ velk´ on-disk tˇıdˇn´
             z             r´ e ı,     e        e     e          r´ e ı)

 Hash Join
     nevyˇaduje tˇıdˇn´ mus´ ale vytvoˇit hash tabulku
         z       r´ e ı,   ı          r
     vyˇaduje ale dostatek pamˇti (work mem pro hash tabulku)
       z                      e
     pokud je hash tabulka moc velk´, dˇl´ se do batch˚ (pomalejˇ´
                                   a eı               u         sı)




                              T. Vondra (CSPUG)   ˇ
                                                  Cteme EXPLAIN
Sort & Limit




    ORDER BY ale i spousta dalˇ´ (DISTINCT, GROUP BY, UNION)
                              sıch
    tˇi moˇnosti
     r    z
         quicksort (v pamˇti, omezeno work mem)
                          e
         merge sort (na disku)
         index scan (dostateˇnˇ korelovan´ index, napˇ. CLUSTERED)
                            c e          y           r
    LIMIT ˇık´ “chci jenom p´r ˇ´dek, preferuj rychle startuj´ ı pl´ny”
          r´ a              a ra                             ıc´ a
    vˇtˇinou mal´ startovn´ ˇas znamen´ velk´ celkov´ ˇas
     es         y         ıc          a     y       yc


   EXPLAIN ANALYZE SELECT * FROM tab ORDER BY id ;

    Sort (...) ( actual time =44 6.08 9.. 591 .71 4 rows =100000 loops =1)
      Sort Key : id
      Sort Method : external sort Disk : 1368 kB
      -> Seq Scan on tab (...) ( actual time =0.016..129.756 rows =100000 loops =1)




                              T. Vondra (CSPUG)   ˇ
                                                  Cteme EXPLAIN
Sort


 v pamˇti
      e
 SET work_mem = ’8 MB ’;

 EXPLAIN ANALYZE SELECT * FROM tab ORDER BY id ;

                                                                        QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  Sort (...) ( actual time =31 2.7 09. .432 .41 0 rows =100000 loops =1)
    Sort Key : id
    Sort Method : quicksort Memory : 4392 kB
    -> Seq Scan on tab (...) ( actual time =0.020..146.975 rows =100000 loops =1)



 s dobˇe korelovan´m indexem
      r           y
 CREATE INDEX idx ON tab ( id );

 EXPLAIN ANALYZE SELECT * FROM tab ORDER BY id ;

                                                                        QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Index Scan using idx on tab ( cost =0.00..2780.26 rows =100000 width =4)
                                                            ( actual time =0.088..162.377 rows =100000 loops =1)
  Total runtime : 272.881 ms




                                                           T. Vondra (CSPUG)                ˇ
                                                                                            Cteme EXPLAIN
Typy uzl˚ - ostatn´
        u         ı




     agregace (GROUP BY, DISTINCT)
     LIMIT
     modifikace tabulky (INSERT, UPDATE, DELETE)
     mnoˇinov´ operace (INTERSECT, EXCEPT)
        z    e
     subplan (pro korelovan´ subselecty), initplan (nekorelovan´)
                           e                                   e
     CTE, window functions
     materializace
     zamyk´n´ ˇ´dek
          a ı ra
     append (inheritance)
     ...




                             T. Vondra (CSPUG)   ˇ
                                                 Cteme EXPLAIN
Chybn´ odhad poˇtu ˇ´dk˚ (relace resp. vyhovuj´ ıch podm´
     y         c ra u                         ıc´       ınce).


 V ˇem spoˇ´ a probl´m?
   c      cıv´      e
     Pl´novaˇ si mysl´ ˇe tabulka je mal´ ale ve skuteˇnosti je velk´.
       a    c        ız                 a             c             a
     Pl´novaˇ si mysl´ ˇe podm´
       a    c        ız       ınce vyhovuje p´r ˇ´dek, ve skuteˇnosti mnoho.
                                             a ra              c
     nebo naopak ...

 Jak se projevuje?
     vol´ se nevhodn´ zp˚sob pˇıstupu k tabulce (index vs. sekvenˇn´ sken)
        ı           y u       r´                                 c ı
     vol´ se nevhodn´ zp˚sob joinov´n´ (nested loop nam´ hash/merge joinu)
        ı           y u            a ı                 ısto

 Co je pˇıˇinou?
        r´c
     zastaral´ statistiky (napˇ. hned po loadu)
             e                r
     chybn´ statistiky - obˇas poˇet distinct hodnot, nevhodn´ formulace podm´
          e                c     c                           a               ınek
     podm´
         ınky na korelovan´ch sloupc´ (cross-column statistiky zat´ nejsou)
                          y         ıch                           ım
     LIMIT situaci vˇtˇinou v´raznˇ zhorˇuje (preferuje pl´ny s levn´m startem)
                    es       y    e     s                 a         y



                              T. Vondra (CSPUG)   ˇ
                                                  Cteme EXPLAIN
Pˇıklad - zd´nlivˇ velk´ selektivita
 r´         a e        a

 zaloˇeno na “race condition” - spust´ dotaz jeˇtˇ neˇ se staˇ´ pˇepoˇ´ statistiky
     z                               ım        se z          cı r    cıtat
 CREATE TABLE tab ( id INT );
 CREATE INDEX idx ON tab ( id );
 INSERT INTO tab SELECT * FROM generate_series (1 ,100000);
 ANALYZE tab ;

 DELETE FROM tab ;
 INSERT INTO tab SELECT 1111 FROM generate_series (1 ,100000);



 EXPLAIN ANALYZE SELECT * FROM tab WHERE id = 1111;
                                                            QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Index Scan using idx on tab ( cost =0.00..8.29 rows =1 width =4)
                                                            ( actual time =0.049..166.562 rows =100000 loops =1)
      Index Cond : ( id = 1111)
 (3 rows )


 ... wait ....


 EXPLAIN ANALYZE SELECT * FROM tab WHERE id = 1111;
                                                            QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Seq Scan on tab ( cost =0.00..2035.00 rows =100000 width =4)
                                    ( actual time =0.949..158.568 rows =100000 loops =1)
      Filter : ( id = 1111)



                                                           T. Vondra (CSPUG)                ˇ
                                                                                            Cteme EXPLAIN
Pˇıklad - korelovan´ sloupce
 r´                e




 CREATE TABLE tab ( a INT , b INT );
 INSERT INTO tab SELECT i , i FROM generate_series (1 ,100000) s ( i );
 ANALYZE tab ;



 EXPLAIN ANALYZE SELECT * FROM tab WHERE a >= 50000 AND b <= 50000;

                                                            QUERY PLAN
 -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  Seq Scan on tab ( cost =0.00..1943.00 rows =25000 width =8)
                                    ( actual time =26.196..58.715 rows =1 loops =1)
      Filter : (( a >= 50000) AND ( b <= 50000))
  Total runtime : 58.762 ms
 (3 rows )




                                                           T. Vondra (CSPUG)               ˇ
                                                                                           Cteme EXPLAIN
Dalˇ´ problematick´ m´
   sı             a ısta




 Nevhodn´ nastaven´ “cost” promˇnn´ch
        e         ı            e y
     v´choz´ hodnoty vych´z´ z “typick´ho” syst´mu
      y    ı             a ı          e        e
     nemus´ nutnˇ odpov´
          ı     e      ıdat tomu vaˇemu
                                   s
     napˇ. pokud m´te SSD, st´ a se rozd´ mezi n´hodn´m a sekvenˇn´ I/O
        r         a           ır´       ıl      a     y            c ım
     pokud m´te rychl´ disky (15k SAS) tak ˇ´steˇnˇ tak´, byˇ ne tak markantnˇ
             a       e                     ca c e       e   t                e
     mal´ effective cache size znev´hodˇuje indexy
        a                         y   n

 ˇ
 Cern´ d´
     e ıry
     triggery
     referenˇn´ integrita (ciz´ kl´ce bez index˚)
            c ı               ı ıˇ             u




                               T. Vondra (CSPUG)   ˇ
                                                   Cteme EXPLAIN
EXPLAIN kuchaˇka
             r



 Zkontrolujte uzly kde nesed´ odhad poˇtu ˇ´dek.
                            ı         c ra
     Mal´ rozd´ nevad´ ˇ´dov´ rozd´ uˇ jsou probl´m.
        e     ıly    ı, ra  e     ıly z          e
     Pokud je ˇpatnˇ odhad, nem˚ˇe b´t volba pl´nu spolehliv´.
              s    e           uz y            a            a
     Zkuste aktualizovat statistiky, pˇeformulovat podm´
                                      r                ınky, ...

 Pod´
    ıvejte se na na uzly s nejvˇtˇım proporˇn´ rozd´
                               e s´        c ım    ılem mezi cenou a ˇasem.
                                                                     c
     Jste si jisti ˇe m´te rozumnˇ nastaveny promˇnn´?
                   z   a         e               e e
     Zmˇnte nastaven´ (v session) a sledujte jak se zmˇn´ pl´n a v´kon dotazu.
       eˇ           ı                                 e ı a       y

  r                       r ˇ
 Pˇi optimalizaci se soustˇedte na uzly s nejvyˇˇı cenou / skuteˇn´m ˇasem.
                                               ss´              c y c
     Tam kde se tr´v´ nejv´ ˇasu m˚ˇete optimalizac´ nejv´ z´
                  a ı     ıc c    uz               ı     ıce ıskat.
     Nelze napˇ. pˇidat index nebo zv´ˇit work mem?
              r r                    ys




                             T. Vondra (CSPUG)   ˇ
                                                 Cteme EXPLAIN
explain.depesz.com




    http://explain.depesz.com
    v´born´ n´stroj pro vizualizaci a anal´zu explain planu
     y    y a                             y
    skvˇl´ pro pos´ an´ pl´nu napˇ. do e-mailov´ch konferenc´ (nezmrˇ´ se)
       ee         ıl´ ı a        r             y            ı       sı

                            T. Vondra (CSPUG)   ˇ
                                                Cteme EXPLAIN
explain.depesz.com




    Jak dlouho trval dan´ krok (samostatnˇ / vˇetnˇ podˇızen´ch)?
                        y                e    c e      r´ y
    Jak pˇesn´ byl odhad poˇtu ˇ´dek?
         r y               c ra
    Kolik ˇ´dek se vyprodukovalo?
          ra

                           T. Vondra (CSPUG)   ˇ
                                               Cteme EXPLAIN
explain.depesz.com

 Unique (cost=30938464.86..31166982.10 rows=30468966 width=89) (actual
 time=249353.521..250273.108 rows=342107 loops=1)
 -> Sort (cost=30938464.86..31014637.27 rows=30468966 width=89) (actual
 time=249353.518..250155.187 rows=342108 loops=1)
 Sort Key: (lower(u.samaccountname[1])), (g.cn[1])
 Sort Method: external merge Disk: 13176kB
 -> Append (cost=0.00..19340392.34 rows=30468966 width=89) (actual
 time=44.687..242695.135 rows=342108 loops=1)
 -> Nested Loop (cost=0.00..19031015.08 rows=30385836 width=89) (actual
 time=44.685..240132.584 rows=2535 loops=1)
 Join Filter: ((u.primarygroupid[1] = ANY (tmp_g.primarygrouptoken)) OR
 (u.gidnumber[1] = ANY (tmp_g.gidnumber)) OR (tmp_g.dn = ANY (u.memberof)) OR
 (tmp_g.cn[1] = ANY (u.memberof)) OR (tmp_g.dn = ANY (u.groupmembership)) OR
 (tmp_g.cn[1] = ANY (u.groupmembership)) OR (u.samaccountname[1] = ANY
 (tmp_g.memberuid)) OR (u.dn = ANY (tmp_g.member)) OR (u.cn[1] = ANY
 (tmp_g.member)))
 -> Nested Loop (cost=0.00..1421.74 rows=1350 width=986) (actual
 time=0.054..116.528 rows=1350 loops=1)
 -> Nested Loop (cost=0.00..734.12 rows=1350 width=1023) (actual
 time=0.038..76.647 rows=1350 loops=1)
 -> Seq Scan on ldap_group_inheritance i (cost=0.00..46.50 rows=1350 width=166)
 (actual time=0.015..1.633 rows=1350 loops=1)
 -> Index Scan using ldap_import_groups_dn_key on ldap_import_groups tmp_g
 (cost=0.00..0.50 rows=1 width=940) (actual time=0.048..0.049 rows=1 loops=1350)
 Index Cond: (tmp_g.dn = i.groupdn)
 -> Index Scan using ldap_import_groups_dn_key on ldap_import_groups g
 (cost=0.00..0.50 rows=1 width=129) (actual time=0.022..0.026 rows=1 loops=1350)
 Index Cond: (g.dn = i.parentdn)
 -> Seq Scan on ldap_import_users u (cost=0.00..3856.30 rows=83130 width=372)
 (actual time=0.006..26.162 rows=83130 loops=1350)
 -> Seq Scan on ldap_import_users u (cost=0.00..4687.60 rows=83130 width=126)
 (actual time=0.098..2499.336 rows=339573 loops=1)
 Total runtime: 250301.001 ms
 (17 rows)




                                           T. Vondra (CSPUG)      ˇ
                                                                  Cteme EXPLAIN
pgadmin3




    http://www.pgadmin.org/
    GUI umoˇnuj´ ı mimo jin´ i vizualizaci SQL dotaz˚
           zˇ ıc´          e                        u




                           T. Vondra (CSPUG)   ˇ
                                               Cteme EXPLAIN
auto explain & explanation


 auto explain
     jak´si doplnˇk k log min duration statement
        y        e
     umoˇnuje logovat EXPLAIN (ˇi EXPLAIN ANALYZE) pro dlouh´ dotazy
        zˇ                     c                            e
     http://developer.postgresql.org/pgdocs/postgres/auto-explain.html

 explanation
     flexibilnˇjˇ´ pr´ce s informacemi o pl´nu pˇımo v SQL
             e sı a                       a    r´
     http://www.pgxn.org/dist/explanation/doc/explanation.html

    SELECT node_type , strategy , actual_startup_time , ac tual _to tal _ti me
      FROM explanation (
          query    := $$ SELECT * FROM pg_class WHERE relname = ’ users ’ $$ ,
          analyzed := true
      );


     node_type | strategy | a c t u al _ s t a r t u p _t i m e | act ual _tot al_ tim e
    -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     Index Scan |                                | 00:00:00.000017                           | 00:00:00.000017




                                                      T. Vondra (CSPUG)               ˇ
                                                                                      Cteme EXPLAIN
Odkazy


    Query Execution Techniques in PostgreSQL, Neil Conway, 2007
    http://neilconway.org/talks/executor.pdf

    ˇ ı
    Cten´ prov´dˇc´ pl´n˚ v PostgreSQL, Pavel Stˇhule, 2008
              a e ıch a u                       e
    http://www.root.cz/clanky/cteni-provadecich-planu-v-postgresql/

    Using EXPLAIN @ wiki
    http://wiki.postgresql.org/wiki/Using_EXPLAIN

    Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT @ wiki
    http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,
    _EXPLAIN,_and_COUNT

    Explaining EXPLAIN, R. Treat, G. S. Mullane, AndrewSN, Magnifikus, B. Encina,
    N. Conway, 2008
    http://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf



                          T. Vondra (CSPUG)   ˇ
                                              Cteme EXPLAIN

More Related Content

More from Tomas Vondra

CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)Tomas Vondra
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?Tomas Vondra
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6Tomas Vondra
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltTomas Vondra
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
 
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFSPostgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFSTomas Vondra
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
Novinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONBNovinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONBTomas Vondra
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeologyTomas Vondra
 
Výkonnostní archeologie
Výkonnostní archeologieVýkonnostní archeologie
Výkonnostní archeologieTomas Vondra
 
Český fulltext a sdílené slovníky
Český fulltext a sdílené slovníkyČeský fulltext a sdílené slovníky
Český fulltext a sdílené slovníkyTomas Vondra
 
SSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsyncSSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsyncTomas Vondra
 
Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)Tomas Vondra
 
Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)Tomas Vondra
 
PostgreSQL / Performance monitoring
PostgreSQL / Performance monitoringPostgreSQL / Performance monitoring
PostgreSQL / Performance monitoringTomas Vondra
 

More from Tomas Vondra (19)

CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)
 
Data corruption
Data corruptionData corruption
Data corruption
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?
 
DB vs. encryption
DB vs. encryptionDB vs. encryption
DB vs. encryption
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFSPostgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFS
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Novinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONBNovinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONB
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Výkonnostní archeologie
Výkonnostní archeologieVýkonnostní archeologie
Výkonnostní archeologie
 
Český fulltext a sdílené slovníky
Český fulltext a sdílené slovníkyČeský fulltext a sdílené slovníky
Český fulltext a sdílené slovníky
 
SSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsyncSSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsync
 
Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)
 
Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)
 
PostgreSQL / Performance monitoring
PostgreSQL / Performance monitoringPostgreSQL / Performance monitoring
PostgreSQL / Performance monitoring
 

Čtení explain planu (CSPUG 21.6.2011)

  • 1. ˇ Cteme EXPLAIN CSPUG, Praha Tom´ˇ Vondra (tv@fuzzy.cz) as Czech and Slovak PostgreSQL Users Group 21.6.2011
  • 2. Agenda K ˇemu slouˇ´ EXPLAIN a EXPLAIN ANALYZE? c zı Jak funguje pl´nov´n´ jak se vyb´ a “optim´ln´ pl´n? a a ı, ır´ a ı” a Z´kladn´ fyzick´ oper´tory : scany, joiny, ... a ı e a Jak poznat ˇe je nˇco ˇpatnˇ? z e s e Dalˇ´ uˇiteˇn´ n´stroje. sı z c e a T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 3. K ˇemu slouˇ´ EXPLAIN a EXPLAIN ANALYZE? c zı SQL je deklarativn´ jazyk ı SQL dotaz nen´ program, popisuje v´sledek (logick´ algebra). ı y a Existuje mnoho zp˚sob˚ jak dan´ dotaz vyhodnotit (fyzick´ algebra). u u y a Nalezen´ “optim´ln´ ı a ıho” zp˚sobu je starost´ datab´ze. u ı a Optim´ln´ = nejm´nˇ n´roˇn´ na zdroje (CPU, I/O, pamˇˇ, ...) a ı e e a c y et Z´vis´ na podm´ ach (poˇet uˇivatel˚, velikost work mem, ...). a ı ınk´ c z u stupnˇ volnosti e access strategy (sequential scan, index scan, ...) join order join strategy (merge join, hash join, nested loop) aggregation strategy (plain, hash, sorted) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 4. Stromov´ struktura exekuˇn´ pl´nu a c ıho a SELECT * FROM a JOIN b ON ( a . id = b . id ) LIMIT 100; T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 5. V´poˇet ceny y c chci porovnat nˇkolik variant ˇeˇen´ a vybrat tu “nejlevnˇjˇ´ e r s ı e sı” pˇıstup obvykl´ v (ne)line´rn´ programov´n´ r´ y a ım a ı ze statistik se odhadne poˇet ˇ´dek c ra s vyuˇit´ “cost” promˇnn´ch se spoˇte cena pl´nu z ım e y c a seq page cost = 1.0 random page cost = 4.0 cpu tuple cost = 0.01 cpu index tuple cost = 0.005 cpu operator cost = 0.0025 ... porovn´m ceny moˇnost´ vyberu tu s nejniˇˇ´ cenou a z ı, zsı T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 6. Orientaˇn´ principy c ı I/O tradiˇnˇ dominuje - minimalizace I/O operac´ c e ı n´hodn´ I/O je n´roˇnˇjˇ´ neˇ sekvenˇn´ I/O a e a c e sı z c ı minimalizace CPU operac´ ı nepouˇ´ zıvat pˇıliˇ mnoho pamˇti r´ s e minimalizace toku dat preferovat niˇˇ´ startup nebo celkovou cenu (?) zsı Cena je zhruba ˇas proporˇnˇ k sekvenˇn´ c c e c ımu naˇten´ str´nky z disku. c ı a T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 7. Z´kladn´ fyzick´ oper´tory / pˇıstup k tabulce a ı e a r´ sequential scan pˇeˇti vˇechny ˇ´dky tabulky (a aˇ pak filtruj) r c s ra z data (bloky) se ˇtou sekvenˇnˇ, kaˇd´ pr´vˇ 1x c c e z y a e index scan najdi v indexu odkazy na odpov´ ıc´ ˇ´dky ıdaj´ ı ra z tabulky naˇti jen ty potˇebn´ bloky (i opakovanˇ) c r e e kombinace sekvenˇn´ a n´hodn´ho I/O c ıho a e bitmap index scan pˇeˇti listy indexu, vytvoˇ z nich bitmapu ˇ´dk˚ r c r ra u naˇti jen ty bloky tabulky pro kter´ je v bitmapˇ “1” c e e sekvenˇn´ I/O ale “startup” cena (tvorba bitmapy) c ı moˇnost kombinace v´ index˚ (OR, AND) z ıce u flexibilnˇjˇ´ neˇ multi-column indexy e sı z T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 8. Pˇıklad - vytvoˇen´ tabulky r´ r ı tabulka se 100.000 ˇ´dk˚ ra u CREATE TABLE tab ( id INT ); INSERT INTO tab SELECT * FROM generate_series (1 ,100000); ANALYZE tab ; SELECT relpages , reltuples FROM pg_class WHERE relname = ’ tab ’; relpages | reltuples -- - - - - - - - -+ - - - - - - - - - - - 393 | 100000 (1 row ) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 9. Pˇıklad - sequential vs. index scan r´ sekvenˇn´ sken c ı EXPLAIN SELECT * FROM tab WHERE id BETWEEN 1000 AND 2000; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Seq Scan on tab ( cost =0.00..1893.00 rows =927 width =4) Filter : (( id >= 1000) AND ( id <= 2000)) index scan CREATE INDEX idx ON tab ( id ); EXPLAIN ANALYZE SELECT * FROM tab WHERE id BETWEEN 1000 AND 2000; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Index Scan using idx on tab ( cost =0.00..39.54 rows =1014 width =4) ( actual time =0.108..1.703 rows =1001 loops =1) Index Cond : (( id >= 1000) AND ( id <= 2000)) Total runtime : 2.840 ms T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 10. Pˇıklad - bitmap index scan r´ bitmap index scan EXPLAIN SELECT * FROM tab WHERE ( id = 110 OR id = 130); QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Bitmap Heap Scan on tab ( cost =8.53..16.14 rows =2 width =4) Recheck Cond : (( id = 110) OR ( id = 130)) -> BitmapOr ( cost =8.53..8.53 rows =2 width =0) -> Bitmap Index Scan on idx ( cost =0.00..4.27 rows =1 width =0) Index Cond : ( id = 110) -> Bitmap Index Scan on idx ( cost =0.00..4.27 rows =1 width =0) Index Cond : ( id = 130) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 11. Join strategies nested loop hash join merge join T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 12. Nested loop velice jednoduch´ - v principu dvˇ vnoˇen´ smyˇky y e r e c pro vˇtˇ´ relace pomal´, ale rychle produkuje prvn´ ˇ´dek e sı y ı ra jedin´ join pouˇiteln´ pro CROSS JOIN a non-equijoin podm´ y z y ınky vˇtˇinou je k vidˇn´ v OLTP syst´mech (pr´ce s mal´mi poˇty ˇ´dek) es e ı e a y c ra FOR a IN vnejsi_relace FOR b IN vnitrni_relace RETURN (a,b) pokud splˇuje JOIN podm´nku n ı T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 13. Nested Loop CREATE TABLE vnejsi ( id INT , val INT UNIQUE ); CREATE TABLE vnitrni ( id INT PRIMARY KEY ); INSERT INTO vnejsi SELECT i , i +1 FROM generate_series (1 ,1000) s ( i ); INSERT INTO vnitrni SELECT i FROM generate_series (1 ,1000) s ( i ); EXPLAIN SELECT 1 FROM vnejsi , vnitrni WHERE vnejsi . id = vnitrni . id AND vnejsi . val = 10; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Nested Loop ( cost =0.00..16.55 rows =1 width =0) -> Index Scan using vnejsi_val_key on vnejsi ( cost =0.00..8.27 rows =1 width =4) Index Cond : ( val = 10) -> Index Scan using vnitrni_pkey on vnitrni ( cost =0.00..8.27 rows =1 width =4) Index Cond : ( vnitrni . id = vnejsi . id ) (5 rows ) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 14. Merge Join setˇıd´ obˇ relace dle joinovac´ podm´ r´ ı e ı ınky (jen equijoin) potom ˇte ˇ´dek po ˇ´dku a posouv´ se kupˇedu c ra ra a r nˇkdy jsou potˇeba rescany (duplicity ve vnˇjˇ´ tabulce) e r e sı velmi rychl´ pro setˇıdˇn´ relace, jinak n´roˇn´ startup y r´ e e a c y vˇtˇinou k vidˇn´ v DSS/DWH syst´mech es e ı e T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 15. Merge Join CREATE TABLE vnejsi ( id INT ); CREATE TABLE vnitrni ( id INT ); INSERT INTO vnejsi SELECT i FROM generate_series (1 ,100000) s ( i ); INSERT INTO vnitrni SELECT i FROM generate_series (1 ,100000) s ( i ); EXPLAIN SELECT 1 FROM vnejsi JOIN vnitrni USING ( id ); QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Merge Join ( cost = 1 9 3 95 . 6 4 . . 2 1 3 95 . 6 4 rows =100000 width =0) Merge Cond : ( vnejsi . id = vnitrni . id ) -> Sort ( cost =96 97. 82.. 994 7.8 2 rows =100000 width =4) Sort Key : vnejsi . id -> Seq Scan on vnejsi ( cost =0.00..1393.00 rows =100000 width =4) -> Sort ( cost =96 97. 82.. 994 7.8 2 rows =100000 width =4) Sort Key : vnitrni . id -> Seq Scan on vnitrni ( cost =0.00..1393.00 rows =100000 width =4) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 16. Hash Join 1 naˇti menˇ´ (vnitˇn´ relaci a vygeneruj z n´ hash tabulku (pˇes join kl´c) c sı r ı) ı r ıˇ 2 ˇti vnˇjˇ´ tabulku a vyhled´vej v hash tabulce pˇed hash kl´ce c e sı a r ıˇ CREATE TABLE vnejsi ( id INT ); CREATE TABLE vnitrni ( id INT ); INSERT INTO vnejsi SELECT i FROM generate_series (1 ,100000) s ( i ); INSERT INTO vnitrni SELECT i FROM generate_series (1 ,100000) s ( i ); EXPLAIN SELECT 1 FROM vnejsi_tabulka JOIN vnitrni_tabulka USING ( id ); QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hash Join ( cost =29 85. 00.. 702 9.0 0 rows =100000 width =0) Hash Cond : ( vnejsi . id = vnitrni . id ) -> Seq Scan on vnejsi ( cost =0.00..1393.00 rows =100000 width =4) -> Hash ( cost =13 93. 00.. 139 3.0 0 rows =100000 width =4) -> Seq Scan on vnitrni ( cost =0.00..1393.00 rows =100000 width =4) (5 rows ) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 17. Hash Join / batches Co kdyˇ se hash tabulka nevejde to pamˇti (work mem)? z e 1 rozdˇl menˇ´ tabulku na ˇ´sti, aby se tabulka do pamˇti veˇla e sı ca e s 2 pro kaˇdou ˇ´st sestav tabulku a proved z ca ˇ join s “velkou” tabulkou 3 m´nˇ efektivn´ (opakovan´ ˇten´ vnˇjˇ´ tabulky) e e ı e c ı e sı 4 pozn´ se dle “batches” v pl´nu a a EXPLAIN ANALYZE SELECT 1 FROM vnejsi_tabulka JOIN vnitrni_tabulka USING ( id ); QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ... Hash Join ( cost =29 85. 00.. 702 9.0 0 rows =100000 width =0) ( actual time =277.886..792 ... Hash Cond : ( vnejsi . id = vnitrni . id ) -> Seq Scan on vnejsi ( cost =0.00..1393.00 rows =100000 width =4) ( actual time = ... -> Hash ( cost =13 93. 00.. 139 3.0 0 rows =100000 width =4) ( actual time =277.836..27 ... Buckets : 8192 Batches : 4 Memory Usage : 589 kB -> Seq Scan on vnitrni ( cost =0.00..1393.00 rows =100000 width =4) ( actua ... Total runtime : 900.664 ms (7 rows ) zvyˇte work mem (ˇ´ m´nˇ batch˚, t´ vˇtˇinou l´pe) s cım e e u ım e s e T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 18. Srovn´n´ join metod a ı Nested Loop ˇpatnˇ funguje pro dvˇ velk´ relace s e e e ide´ln´ pro malou vnˇjˇ´ relaci + rychl´ dotaz do vnitˇn´ (index scan) a ı e sı y r ı jedin´ metoda pro non-equijoin :-( a Merge Join ide´ln´ pro jiˇ setˇıdˇn´ relace (napˇ. CLUSTER + index scan) a ı z r´ e e r pokud vyˇaduje extra tˇıdˇn´ probl´m (hlavnˇ velk´ on-disk tˇıdˇn´ z r´ e ı, e e e r´ e ı) Hash Join nevyˇaduje tˇıdˇn´ mus´ ale vytvoˇit hash tabulku z r´ e ı, ı r vyˇaduje ale dostatek pamˇti (work mem pro hash tabulku) z e pokud je hash tabulka moc velk´, dˇl´ se do batch˚ (pomalejˇ´ a eı u sı) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 19. Sort & Limit ORDER BY ale i spousta dalˇ´ (DISTINCT, GROUP BY, UNION) sıch tˇi moˇnosti r z quicksort (v pamˇti, omezeno work mem) e merge sort (na disku) index scan (dostateˇnˇ korelovan´ index, napˇ. CLUSTERED) c e y r LIMIT ˇık´ “chci jenom p´r ˇ´dek, preferuj rychle startuj´ ı pl´ny” r´ a a ra ıc´ a vˇtˇinou mal´ startovn´ ˇas znamen´ velk´ celkov´ ˇas es y ıc a y yc EXPLAIN ANALYZE SELECT * FROM tab ORDER BY id ; Sort (...) ( actual time =44 6.08 9.. 591 .71 4 rows =100000 loops =1) Sort Key : id Sort Method : external sort Disk : 1368 kB -> Seq Scan on tab (...) ( actual time =0.016..129.756 rows =100000 loops =1) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 20. Sort v pamˇti e SET work_mem = ’8 MB ’; EXPLAIN ANALYZE SELECT * FROM tab ORDER BY id ; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Sort (...) ( actual time =31 2.7 09. .432 .41 0 rows =100000 loops =1) Sort Key : id Sort Method : quicksort Memory : 4392 kB -> Seq Scan on tab (...) ( actual time =0.020..146.975 rows =100000 loops =1) s dobˇe korelovan´m indexem r y CREATE INDEX idx ON tab ( id ); EXPLAIN ANALYZE SELECT * FROM tab ORDER BY id ; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Index Scan using idx on tab ( cost =0.00..2780.26 rows =100000 width =4) ( actual time =0.088..162.377 rows =100000 loops =1) Total runtime : 272.881 ms T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 21. Typy uzl˚ - ostatn´ u ı agregace (GROUP BY, DISTINCT) LIMIT modifikace tabulky (INSERT, UPDATE, DELETE) mnoˇinov´ operace (INTERSECT, EXCEPT) z e subplan (pro korelovan´ subselecty), initplan (nekorelovan´) e e CTE, window functions materializace zamyk´n´ ˇ´dek a ı ra append (inheritance) ... T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 22. Chybn´ odhad poˇtu ˇ´dk˚ (relace resp. vyhovuj´ ıch podm´ y c ra u ıc´ ınce). V ˇem spoˇ´ a probl´m? c cıv´ e Pl´novaˇ si mysl´ ˇe tabulka je mal´ ale ve skuteˇnosti je velk´. a c ız a c a Pl´novaˇ si mysl´ ˇe podm´ a c ız ınce vyhovuje p´r ˇ´dek, ve skuteˇnosti mnoho. a ra c nebo naopak ... Jak se projevuje? vol´ se nevhodn´ zp˚sob pˇıstupu k tabulce (index vs. sekvenˇn´ sken) ı y u r´ c ı vol´ se nevhodn´ zp˚sob joinov´n´ (nested loop nam´ hash/merge joinu) ı y u a ı ısto Co je pˇıˇinou? r´c zastaral´ statistiky (napˇ. hned po loadu) e r chybn´ statistiky - obˇas poˇet distinct hodnot, nevhodn´ formulace podm´ e c c a ınek podm´ ınky na korelovan´ch sloupc´ (cross-column statistiky zat´ nejsou) y ıch ım LIMIT situaci vˇtˇinou v´raznˇ zhorˇuje (preferuje pl´ny s levn´m startem) es y e s a y T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 23. Pˇıklad - zd´nlivˇ velk´ selektivita r´ a e a zaloˇeno na “race condition” - spust´ dotaz jeˇtˇ neˇ se staˇ´ pˇepoˇ´ statistiky z ım se z cı r cıtat CREATE TABLE tab ( id INT ); CREATE INDEX idx ON tab ( id ); INSERT INTO tab SELECT * FROM generate_series (1 ,100000); ANALYZE tab ; DELETE FROM tab ; INSERT INTO tab SELECT 1111 FROM generate_series (1 ,100000); EXPLAIN ANALYZE SELECT * FROM tab WHERE id = 1111; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Index Scan using idx on tab ( cost =0.00..8.29 rows =1 width =4) ( actual time =0.049..166.562 rows =100000 loops =1) Index Cond : ( id = 1111) (3 rows ) ... wait .... EXPLAIN ANALYZE SELECT * FROM tab WHERE id = 1111; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Seq Scan on tab ( cost =0.00..2035.00 rows =100000 width =4) ( actual time =0.949..158.568 rows =100000 loops =1) Filter : ( id = 1111) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 24. Pˇıklad - korelovan´ sloupce r´ e CREATE TABLE tab ( a INT , b INT ); INSERT INTO tab SELECT i , i FROM generate_series (1 ,100000) s ( i ); ANALYZE tab ; EXPLAIN ANALYZE SELECT * FROM tab WHERE a >= 50000 AND b <= 50000; QUERY PLAN -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Seq Scan on tab ( cost =0.00..1943.00 rows =25000 width =8) ( actual time =26.196..58.715 rows =1 loops =1) Filter : (( a >= 50000) AND ( b <= 50000)) Total runtime : 58.762 ms (3 rows ) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 25. Dalˇ´ problematick´ m´ sı a ısta Nevhodn´ nastaven´ “cost” promˇnn´ch e ı e y v´choz´ hodnoty vych´z´ z “typick´ho” syst´mu y ı a ı e e nemus´ nutnˇ odpov´ ı e ıdat tomu vaˇemu s napˇ. pokud m´te SSD, st´ a se rozd´ mezi n´hodn´m a sekvenˇn´ I/O r a ır´ ıl a y c ım pokud m´te rychl´ disky (15k SAS) tak ˇ´steˇnˇ tak´, byˇ ne tak markantnˇ a e ca c e e t e mal´ effective cache size znev´hodˇuje indexy a y n ˇ Cern´ d´ e ıry triggery referenˇn´ integrita (ciz´ kl´ce bez index˚) c ı ı ıˇ u T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 26. EXPLAIN kuchaˇka r Zkontrolujte uzly kde nesed´ odhad poˇtu ˇ´dek. ı c ra Mal´ rozd´ nevad´ ˇ´dov´ rozd´ uˇ jsou probl´m. e ıly ı, ra e ıly z e Pokud je ˇpatnˇ odhad, nem˚ˇe b´t volba pl´nu spolehliv´. s e uz y a a Zkuste aktualizovat statistiky, pˇeformulovat podm´ r ınky, ... Pod´ ıvejte se na na uzly s nejvˇtˇım proporˇn´ rozd´ e s´ c ım ılem mezi cenou a ˇasem. c Jste si jisti ˇe m´te rozumnˇ nastaveny promˇnn´? z a e e e Zmˇnte nastaven´ (v session) a sledujte jak se zmˇn´ pl´n a v´kon dotazu. eˇ ı e ı a y r r ˇ Pˇi optimalizaci se soustˇedte na uzly s nejvyˇˇı cenou / skuteˇn´m ˇasem. ss´ c y c Tam kde se tr´v´ nejv´ ˇasu m˚ˇete optimalizac´ nejv´ z´ a ı ıc c uz ı ıce ıskat. Nelze napˇ. pˇidat index nebo zv´ˇit work mem? r r ys T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 27. explain.depesz.com http://explain.depesz.com v´born´ n´stroj pro vizualizaci a anal´zu explain planu y y a y skvˇl´ pro pos´ an´ pl´nu napˇ. do e-mailov´ch konferenc´ (nezmrˇ´ se) ee ıl´ ı a r y ı sı T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 28. explain.depesz.com Jak dlouho trval dan´ krok (samostatnˇ / vˇetnˇ podˇızen´ch)? y e c e r´ y Jak pˇesn´ byl odhad poˇtu ˇ´dek? r y c ra Kolik ˇ´dek se vyprodukovalo? ra T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 29. explain.depesz.com Unique (cost=30938464.86..31166982.10 rows=30468966 width=89) (actual time=249353.521..250273.108 rows=342107 loops=1) -> Sort (cost=30938464.86..31014637.27 rows=30468966 width=89) (actual time=249353.518..250155.187 rows=342108 loops=1) Sort Key: (lower(u.samaccountname[1])), (g.cn[1]) Sort Method: external merge Disk: 13176kB -> Append (cost=0.00..19340392.34 rows=30468966 width=89) (actual time=44.687..242695.135 rows=342108 loops=1) -> Nested Loop (cost=0.00..19031015.08 rows=30385836 width=89) (actual time=44.685..240132.584 rows=2535 loops=1) Join Filter: ((u.primarygroupid[1] = ANY (tmp_g.primarygrouptoken)) OR (u.gidnumber[1] = ANY (tmp_g.gidnumber)) OR (tmp_g.dn = ANY (u.memberof)) OR (tmp_g.cn[1] = ANY (u.memberof)) OR (tmp_g.dn = ANY (u.groupmembership)) OR (tmp_g.cn[1] = ANY (u.groupmembership)) OR (u.samaccountname[1] = ANY (tmp_g.memberuid)) OR (u.dn = ANY (tmp_g.member)) OR (u.cn[1] = ANY (tmp_g.member))) -> Nested Loop (cost=0.00..1421.74 rows=1350 width=986) (actual time=0.054..116.528 rows=1350 loops=1) -> Nested Loop (cost=0.00..734.12 rows=1350 width=1023) (actual time=0.038..76.647 rows=1350 loops=1) -> Seq Scan on ldap_group_inheritance i (cost=0.00..46.50 rows=1350 width=166) (actual time=0.015..1.633 rows=1350 loops=1) -> Index Scan using ldap_import_groups_dn_key on ldap_import_groups tmp_g (cost=0.00..0.50 rows=1 width=940) (actual time=0.048..0.049 rows=1 loops=1350) Index Cond: (tmp_g.dn = i.groupdn) -> Index Scan using ldap_import_groups_dn_key on ldap_import_groups g (cost=0.00..0.50 rows=1 width=129) (actual time=0.022..0.026 rows=1 loops=1350) Index Cond: (g.dn = i.parentdn) -> Seq Scan on ldap_import_users u (cost=0.00..3856.30 rows=83130 width=372) (actual time=0.006..26.162 rows=83130 loops=1350) -> Seq Scan on ldap_import_users u (cost=0.00..4687.60 rows=83130 width=126) (actual time=0.098..2499.336 rows=339573 loops=1) Total runtime: 250301.001 ms (17 rows) T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 30. pgadmin3 http://www.pgadmin.org/ GUI umoˇnuj´ ı mimo jin´ i vizualizaci SQL dotaz˚ zˇ ıc´ e u T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 31. auto explain & explanation auto explain jak´si doplnˇk k log min duration statement y e umoˇnuje logovat EXPLAIN (ˇi EXPLAIN ANALYZE) pro dlouh´ dotazy zˇ c e http://developer.postgresql.org/pgdocs/postgres/auto-explain.html explanation flexibilnˇjˇ´ pr´ce s informacemi o pl´nu pˇımo v SQL e sı a a r´ http://www.pgxn.org/dist/explanation/doc/explanation.html SELECT node_type , strategy , actual_startup_time , ac tual _to tal _ti me FROM explanation ( query := $$ SELECT * FROM pg_class WHERE relname = ’ users ’ $$ , analyzed := true ); node_type | strategy | a c t u al _ s t a r t u p _t i m e | act ual _tot al_ tim e -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Index Scan | | 00:00:00.000017 | 00:00:00.000017 T. Vondra (CSPUG) ˇ Cteme EXPLAIN
  • 32. Odkazy Query Execution Techniques in PostgreSQL, Neil Conway, 2007 http://neilconway.org/talks/executor.pdf ˇ ı Cten´ prov´dˇc´ pl´n˚ v PostgreSQL, Pavel Stˇhule, 2008 a e ıch a u e http://www.root.cz/clanky/cteni-provadecich-planu-v-postgresql/ Using EXPLAIN @ wiki http://wiki.postgresql.org/wiki/Using_EXPLAIN Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT @ wiki http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE, _EXPLAIN,_and_COUNT Explaining EXPLAIN, R. Treat, G. S. Mullane, AndrewSN, Magnifikus, B. Encina, N. Conway, 2008 http://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf T. Vondra (CSPUG) ˇ Cteme EXPLAIN